1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Chapter&#160;9.&#160;Locking Speed</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Unreliable Guide To Locking"><link rel="up" href="index.html" title="Unreliable Guide To Locking"><link rel="prev" href="racing-timers.html" title="Racing Timers: A Kernel Pastime"><link rel="next" href="efficiency-read-copy-update.html" title="Avoiding Locks: Read Copy Update"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&#160;9.&#160;Locking Speed</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="racing-timers.html">Prev</a>&#160;</td><th width="60%" align="center">&#160;</th><td width="20%" align="right">&#160;<a accesskey="n" href="efficiency-read-copy-update.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="Efficiency"></a>Chapter&#160;9.&#160;Locking Speed</h1></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="sect1"><a href="Efficiency.html#efficiency-rwlocks">Read/Write Lock Variants</a></span></dt><dt><span class="sect1"><a href="efficiency-read-copy-update.html">Avoiding Locks: Read Copy Update</a></span></dt><dt><span class="sect1"><a href="per-cpu.html">Per-CPU Data</a></span></dt><dt><span class="sect1"><a href="mostly-hardirq.html">Data Which Mostly Used By An IRQ Handler</a></span></dt></dl></div><p>
2There are three main things to worry about when considering speed of
3some code which does locking.  First is concurrency: how many things
4are going to be waiting while someone else is holding a lock.  Second
5is the time taken to actually acquire and release an uncontended lock.
6Third is using fewer, or smarter locks.  I'm assuming that the lock is
7used fairly often: otherwise, you wouldn't be concerned about
8efficiency.
9</p><p>
10Concurrency depends on how long the lock is usually held: you should
11hold the lock for as long as needed, but no longer.  In the cache
12example, we always create the object without the lock held, and then
13grab the lock only when we are ready to insert it in the list.
14</p><p>
15Acquisition times depend on how much damage the lock operations do to
16the pipeline (pipeline stalls) and how likely it is that this CPU was
17the last one to grab the lock (ie. is the lock cache-hot for this
18CPU): on a machine with more CPUs, this likelihood drops fast.
19Consider a 700MHz Intel Pentium III: an instruction takes about 0.7ns,
20an atomic increment takes about 58ns, a lock which is cache-hot on
21this CPU takes 160ns, and a cacheline transfer from another CPU takes
22an additional 170 to 360ns.  (These figures from Paul McKenney's
23<a class="ulink" href="http://www.linuxjournal.com/article.php?sid=6993" target="_top"> Linux
24Journal RCU article</a>).
25</p><p>
26These two aims conflict: holding a lock for a short time might be done
27by splitting locks into parts (such as in our final per-object-lock
28example), but this increases the number of lock acquisitions, and the
29results are often slower than having a single lock.  This is another
30reason to advocate locking simplicity.
31</p><p>
32The third concern is addressed below: there are some methods to reduce
33the amount of locking which needs to be done.
34</p><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="efficiency-rwlocks"></a>Read/Write Lock Variants</h2></div></div></div><p>
35      Both spinlocks and mutexes have read/write variants:
36      <span class="type">rwlock_t</span> and <span class="structname">struct rw_semaphore</span>.
37      These divide users into two classes: the readers and the writers.  If
38      you are only reading the data, you can get a read lock, but to write to
39      the data you need the write lock.  Many people can hold a read lock,
40      but a writer must be sole holder.
41    </p><p>
42      If your code divides neatly along reader/writer lines (as our
43      cache code does), and the lock is held by readers for
44      significant lengths of time, using these locks can help.  They
45      are slightly slower than the normal locks though, so in practice
46      <span class="type">rwlock_t</span> is not usually worthwhile.
47    </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="racing-timers.html">Prev</a>&#160;</td><td width="20%" align="center">&#160;</td><td width="40%" align="right">&#160;<a accesskey="n" href="efficiency-read-copy-update.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Racing Timers: A Kernel Pastime&#160;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&#160;Avoiding Locks: Read Copy Update</td></tr></table></div></body></html>
48