1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Avoiding Locks: Read Copy Update</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Unreliable Guide To Locking"><link rel="up" href="Efficiency.html" title="Chapter&#160;9.&#160;Locking Speed"><link rel="prev" href="Efficiency.html" title="Chapter&#160;9.&#160;Locking Speed"><link rel="next" href="per-cpu.html" title="Per-CPU Data"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Avoiding Locks: Read Copy Update</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="Efficiency.html">Prev</a>&#160;</td><th width="60%" align="center">Chapter&#160;9.&#160;Locking Speed</th><td width="20%" align="right">&#160;<a accesskey="n" href="per-cpu.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="efficiency-read-copy-update"></a>Avoiding Locks: Read Copy Update</h2></div></div></div><p>
2      There is a special method of read/write locking called Read Copy
3      Update.  Using RCU, the readers can avoid taking a lock
4      altogether: as we expect our cache to be read more often than
5      updated (otherwise the cache is a waste of time), it is a
6      candidate for this optimization.
7    </p><p>
8      How do we get rid of read locks?  Getting rid of read locks
9      means that writers may be changing the list underneath the
10      readers.  That is actually quite simple: we can read a linked
11      list while an element is being added if the writer adds the
12      element very carefully.  For example, adding
13      <span class="symbol">new</span> to a single linked list called
14      <span class="symbol">list</span>:
15    </p><pre class="programlisting">
16        new-&gt;next = list-&gt;next;
17        wmb();
18        list-&gt;next = new;
19    </pre><p>
20      The <code class="function">wmb()</code> is a write memory barrier.  It
21      ensures that the first operation (setting the new element's
22      <span class="symbol">next</span> pointer) is complete and will be seen by
23      all CPUs, before the second operation is (putting the new
24      element into the list).  This is important, since modern
25      compilers and modern CPUs can both reorder instructions unless
26      told otherwise: we want a reader to either not see the new
27      element at all, or see the new element with the
28      <span class="symbol">next</span> pointer correctly pointing at the rest of
29      the list.
30    </p><p>
31      Fortunately, there is a function to do this for standard
32      <span class="structname">struct list_head</span> lists:
33      <code class="function">list_add_rcu()</code>
34      (<code class="filename">include/linux/list.h</code>).
35    </p><p>
36      Removing an element from the list is even simpler: we replace
37      the pointer to the old element with a pointer to its successor,
38      and readers will either see it, or skip over it.
39    </p><pre class="programlisting">
40        list-&gt;next = old-&gt;next;
41    </pre><p>
42      There is <code class="function">list_del_rcu()</code>
43      (<code class="filename">include/linux/list.h</code>) which does this (the
44      normal version poisons the old object, which we don't want).
45    </p><p>
46      The reader must also be careful: some CPUs can look through the
47      <span class="symbol">next</span> pointer to start reading the contents of
48      the next element early, but don't realize that the pre-fetched
49      contents is wrong when the <span class="symbol">next</span> pointer changes
50      underneath them.  Once again, there is a
51      <code class="function">list_for_each_entry_rcu()</code>
52      (<code class="filename">include/linux/list.h</code>) to help you.  Of
53      course, writers can just use
54      <code class="function">list_for_each_entry()</code>, since there cannot
55      be two simultaneous writers.
56    </p><p>
57      Our final dilemma is this: when can we actually destroy the
58      removed element?  Remember, a reader might be stepping through
59      this element in the list right now: if we free this element and
60      the <span class="symbol">next</span> pointer changes, the reader will jump
61      off into garbage and crash.  We need to wait until we know that
62      all the readers who were traversing the list when we deleted the
63      element are finished.  We use <code class="function">call_rcu()</code> to
64      register a callback which will actually destroy the object once
65      all pre-existing readers are finished.  Alternatively,
66      <code class="function">synchronize_rcu()</code> may be used to block until
67      all pre-existing are finished.
68    </p><p>
69      But how does Read Copy Update know when the readers are
70      finished?  The method is this: firstly, the readers always
71      traverse the list inside
72      <code class="function">rcu_read_lock()</code>/<code class="function">rcu_read_unlock()</code>
73      pairs: these simply disable preemption so the reader won't go to
74      sleep while reading the list.
75    </p><p>
76      RCU then waits until every other CPU has slept at least once:
77      since readers cannot sleep, we know that any readers which were
78      traversing the list during the deletion are finished, and the
79      callback is triggered.  The real Read Copy Update code is a
80      little more optimized than this, but this is the fundamental
81      idea.
82    </p><pre class="programlisting">
83--- cache.c.perobjectlock	2003-12-11 17:15:03.000000000 +1100
84+++ cache.c.rcupdate	2003-12-11 17:55:14.000000000 +1100
85@@ -1,15 +1,18 @@
86 #include &lt;linux/list.h&gt;
87 #include &lt;linux/slab.h&gt;
88 #include &lt;linux/string.h&gt;
89+#include &lt;linux/rcupdate.h&gt;
90 #include &lt;linux/mutex.h&gt;
91 #include &lt;asm/errno.h&gt;
92
93 struct object
94 {
95-        /* These two protected by cache_lock. */
96+        /* This is protected by RCU */
97         struct list_head list;
98         int popularity;
99
100+        struct rcu_head rcu;
101+
102         atomic_t refcnt;
103
104         /* Doesn't change once created. */
105@@ -40,7 +43,7 @@
106 {
107         struct object *i;
108
109-        list_for_each_entry(i, &amp;cache, list) {
110+        list_for_each_entry_rcu(i, &amp;cache, list) {
111                 if (i-&gt;id == id) {
112                         i-&gt;popularity++;
113                         return i;
114@@ -49,19 +52,25 @@
115         return NULL;
116 }
117
118+/* Final discard done once we know no readers are looking. */
119+static void cache_delete_rcu(void *arg)
120+{
121+        object_put(arg);
122+}
123+
124 /* Must be holding cache_lock */
125 static void __cache_delete(struct object *obj)
126 {
127         BUG_ON(!obj);
128-        list_del(&amp;obj-&gt;list);
129-        object_put(obj);
130+        list_del_rcu(&amp;obj-&gt;list);
131         cache_num--;
132+        call_rcu(&amp;obj-&gt;rcu, cache_delete_rcu);
133 }
134
135 /* Must be holding cache_lock */
136 static void __cache_add(struct object *obj)
137 {
138-        list_add(&amp;obj-&gt;list, &amp;cache);
139+        list_add_rcu(&amp;obj-&gt;list, &amp;cache);
140         if (++cache_num &gt; MAX_CACHE_SIZE) {
141                 struct object *i, *outcast = NULL;
142                 list_for_each_entry(i, &amp;cache, list) {
143@@ -104,12 +114,11 @@
144 struct object *cache_find(int id)
145 {
146         struct object *obj;
147-        unsigned long flags;
148
149-        spin_lock_irqsave(&amp;cache_lock, flags);
150+        rcu_read_lock();
151         obj = __cache_find(id);
152         if (obj)
153                 object_get(obj);
154-        spin_unlock_irqrestore(&amp;cache_lock, flags);
155+        rcu_read_unlock();
156         return obj;
157 }
158</pre><p>
159Note that the reader will alter the
160<em class="structfield"><code>popularity</code></em> member in
161<code class="function">__cache_find()</code>, and now it doesn't hold a lock.
162One solution would be to make it an <span class="type">atomic_t</span>, but for
163this usage, we don't really care about races: an approximate result is
164good enough, so I didn't change it.
165</p><p>
166The result is that <code class="function">cache_find()</code> requires no
167synchronization with any other functions, so is almost as fast on SMP
168as it would be on UP.
169</p><p>
170There is a further optimization possible here: remember our original
171cache code, where there were no reference counts and the caller simply
172held the lock whenever using the object?  This is still possible: if
173you hold the lock, no one can delete the object, so you don't need to
174get and put the reference count.
175</p><p>
176Now, because the 'read lock' in RCU is simply disabling preemption, a
177caller which always has preemption disabled between calling
178<code class="function">cache_find()</code> and
179<code class="function">object_put()</code> does not need to actually get and
180put the reference count: we could expose
181<code class="function">__cache_find()</code> by making it non-static, and
182such callers could simply call that.
183</p><p>
184The benefit here is that the reference count is not written to: the
185object is not altered in any way, which is much faster on SMP
186machines due to caching.
187</p></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="Efficiency.html">Prev</a>&#160;</td><td width="20%" align="center"><a accesskey="u" href="Efficiency.html">Up</a></td><td width="40%" align="right">&#160;<a accesskey="n" href="per-cpu.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter&#160;9.&#160;Locking Speed&#160;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&#160;Per-CPU Data</td></tr></table></div></body></html>
188