memory-barriers.txt - OpenGrok cross reference for /linux-4.4.14/Documentation/memory-barriers.txt

Lines Matching refs:CPU
29      - CPU memory barriers.
39  (*) Inter-CPU locking barrier effects.
84 		| CPU 1 |<----->| Memory |<----->| CPU 2 |
101 Each CPU executes a program that generates memory access operations.  In the
102 abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
109 CPU are perceived by the rest of the system as the operations cross the
110 interface between the CPU and rest of the system (the dotted lines).
115 	CPU 1		CPU 2
142 Furthermore, the stores committed by a CPU to the memory system may not be
143 perceived by the loads made by another CPU in the same order as the stores were
149 	CPU 1		CPU 2
156 the address retrieved from P by CPU 2.  At the end of the sequence, any of the
163 Note that CPU 2 will never try and load C into D because the CPU will load P
192 There are some minimal guarantees that may be expected of a CPU:
194  (*) On any given CPU, dependent memory accesses will be issued in order, with
199      the CPU will issue the following memory operations:
209  (*) Overlapping loads and stores within a particular CPU will appear to be
210      ordered within that CPU.  This means that for:
214      the CPU will only issue the following sequence of memory operations:
222      the CPU will only issue:
322 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
324 CPU to restrict the order.
352      A CPU can be viewed as committing a sequence of store operations to the
374      committing sequences of stores to the memory system that the CPU being
375      considered can then perceive.  A data dependency barrier issued by the CPU
377      load touches one of a sequence of stores from another CPU, then by the
469 between two CPUs or between a CPU and a device.  If it can be guaranteed that
486      instruction; the barrier can be considered to draw a line in that CPU's
489  (*) There is no guarantee that issuing a memory barrier on one CPU will have
490      any direct effect on another CPU or any other hardware in the system.  The
491      indirect effect will be the order in which the second CPU sees the effects
492      of the first CPU's accesses occur, but see the next point:
494  (*) There is no guarantee that a CPU will see the correct order of effects
495      from a second CPU's accesses, even _if_ the second CPU uses a memory
496      barrier, unless the first CPU _also_ uses a matching memory barrier (see
499  (*) There is no guarantee that some intervening piece of off-the-CPU
500      hardware[*] will not reorder the memory accesses.  CPU cache coherency
518 	CPU 1		      CPU 2
533 But!  CPU 2's perception of P may be updated _before_ its perception of B, thus
545 	CPU 1		      CPU 2
563 even-numbered bank of the reading CPU's cache is extremely busy while the
572 	CPU 1		      CPU 2
606 dependency, but rather a control dependency that the CPU may short-circuit
637 	b = p;  /* BUG: Compiler and CPU can both reorder!!! */
670 'b', which means that the CPU is within its rights to reorder them:
721 Given this transformation, the CPU is not required to respect the ordering
765 	CPU 0                     CPU 1
773 The above two-CPU example will never trigger the assert().  However,
775 then adding the following CPU would guarantee a related assertion:
777 	CPU 2
784 assertion can fail after the combined three-CPU example completes.  If you
785 need the three-CPU example to provide ordering, you will need smp_mb()
786 between the loads and stores in the CPU 0 and CPU 1 code fragments,
788 the original two-CPU example is very fragile and should be avoided.
828 When dealing with CPU-CPU interactions, certain types of memory barrier should
840 	CPU 1		      CPU 2
850 	CPU 1		      CPU 2
860 	CPU 1		      CPU 2
878 	CPU 1                               CPU 2
893 	CPU 1
913 	| CPU 1 |  :    | B=2  |     }
924 	                   | memory system by CPU 1
931 	CPU 1			CPU 2
941 Without intervention, CPU 2 may perceive the events on CPU 1 in some
942 effectively random order, despite the write barrier issued by CPU 1:
947 	|       |  :    +------+     \          +-------+  | CPU 2
948 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
958 	                               |        :       :       | CPU 2 |
971 In the above example, CPU 2 perceives that B is 7, despite the load of *C
975 and the load of *C (ie: B) on CPU 2:
977 	CPU 1			CPU 2
994 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |
1004 	                               |        :       :       | CPU 2 |
1018 	CPU 1			CPU 2
1027 Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
1028 some effectively random order, despite the write barrier issued by CPU 1:
1034 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1040 	                                |       +-------+       | CPU 2 |
1052 load of A on CPU 2:
1054 	CPU 1			CPU 2
1064 then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
1071 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1077 	                                |       +-------+       | CPU 2 |
1083 	  to be perceptible to CPU 2            +-------+       |       |
1090 	CPU 1			CPU 2
1108 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1114 	                                |       +-------+       | CPU 2 |
1123 	  to be perceptible to CPU 2            +-------+       |       |
1127 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1134 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1140 	                                |       +-------+       | CPU 2 |
1165 actual load instruction to potentially complete immediately because the CPU
1168 It may turn out that the CPU didn't actually need the value - perhaps because a
1174 	CPU 1			CPU 2
1186 	                                        +-------+       | CPU 2 |
1189 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1195 	the CPU can then perform the            :       :       |       |
1202 	CPU 1			CPU 2
1217 	                                        +-------+       | CPU 2 |
1220 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1233 but if there was an update or an invalidation from another CPU pending, then
1239 	                                        +-------+       | CPU 2 |
1242 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1262 	CPU 1			CPU 2			CPU 3
1269 Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
1270 This indicates that CPU 2's load from X in some sense follows CPU 1's
1271 store to X and that CPU 2's load from Y in some sense preceded CPU 3's
1272 store to Y.  The question is then "Can CPU 3's load from X return 0?"
1274 Because CPU 2's load from X in some sense came after CPU 1's store, it
1275 is natural to expect that CPU 3's load from X must therefore return 1.
1277 CPU A follows a load from the same variable executing on CPU B, then
1278 CPU A's load must either return the same value that CPU B's load did,
1282 transitivity.  Therefore, in the above example, if CPU 2's load from X
1283 returns 1 and its load from Y returns 0, then CPU 3's load from X must
1287 For example, suppose that CPU 2's general barrier in the above example
1290 	CPU 1			CPU 2			CPU 3
1298 legal for CPU 2's load from X to return 1, its load from Y to return 0,
1299 and CPU 3's load from X to return 0.
1301 The key point is that although CPU 2's read barrier orders its pair
1302 of loads, it does not guarantee to order CPU 1's store.  Therefore, if
1304 or a level of cache, CPU 2 might have early access to CPU 1's writes.
1306 on the combined order of CPU 1's and CPU 2's accesses.
1321   (*) CPU memory barriers.
1355      to the same variable, and in some cases, the CPU is within its
1363      Prevent both the compiler and the CPU from doing this as follows:
1407      a was modified by some other CPU between the "while" statement and
1434      will carry out its proof assuming that the current CPU is the only
1456      Again, the compiler assumes that the current CPU is the only one
1467      surprise if some other CPU might have stored to variable 'a' in the
1540      though the CPU of course need not do so.
1558      could cause some other CPU to see a spurious value of 42 -- even
1625 Please note that these compiler barriers have no direct effect on the CPU,
1629 CPU MEMORY BARRIERS
1632 The Linux kernel has eight basic CPU memory barriers:
1656 systems because it is assumed that a CPU will appear to be self-consistent,
1668 CPU from reordering them.
1719      of writes or reads of shared memory accessible to both the CPU and a
1724      to the device or the CPU, and a doorbell to notify it when new
1767 CPU->Hardware interface and actually affect the hardware at some level.
1855 another CPU not holding that lock.  In short, a ACQUIRE followed by an
1859 not imply a full memory barrier.  Therefore, the CPU's execution of the
1878 	One key point is that we are only talking about the CPU doing
1883 	But suppose the CPU reordered the operations.  In this case,
1884 	the unlock precedes the lock in the assembly code.  The CPU
1887 	try to sleep, but more on that later).	The CPU will eventually
1904 See also the section on "Inter-CPU locking barrier effects".
1964 	CPU 1
2005 	CPU 1				CPU 2
2017 	CPU 1				CPU 2
2025 In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
2092 INTER-CPU ACQUIRING BARRIER EFFECTS
2106 	CPU 1				CPU 2
2115 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2141 	CPU 1				CPU 2
2162 	CPU 1				CPU 2
2175 this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
2176 before either of the stores issued on CPU 2.
2183 	CPU 1				CPU 2
2219 When there's a system with more than one processor, more than one CPU in the
2270 another CPU might start processing the waiter and might clobber the waiter's
2275 	CPU 1				CPU 2
2313 right order without actually intervening in the CPU.  Since there's only one
2314 CPU, that CPU's dependency ordering logic will take care of everything else.
2412 Many devices can be memory mapped, and so appear to the CPU as if they're just
2416 However, having a clever CPU or a clever compiler creates a potential problem
2418 device in the requisite order if the CPU or the compiler thinks it is more
2449 routine is executing, the driver's core may not run on the same CPU, and its
2498      that's primarily a CPU-specific concept. The i386 and x86_64 processors do
2503      CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
2504      space.  However, it may also be mapped as a virtual I/O space in the CPU's
2520      respect to each other on the issuing CPU depends on the characteristics
2563 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2569 This means that it must be considered that the CPU will execute its instruction
2580 A CPU may also discard any instruction sequence that winds up having no
2591 THE EFFECTS OF THE CPU CACHE
2598 As far as the way a CPU interacts with another part of the system through the
2599 caches goes, the memory system has to include the CPU's caches, and memory
2600 barriers for the most part act at the interface between the CPU and its cache
2603 	    <--- CPU --->         :       <----------- Memory ----------->
2607 	|  CPU   |    | Memory |  :   | CPU    |    |           |    |        |
2617 	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device |
2626 CPU that issued it since it may have been satisfied within the CPU's own cache,
2629 cacheline over to the accessing CPU and propagate the effects upon conflict.
2631 The CPU core may execute instructions in any order it deems fit, provided the
2639 accesses cross from the CPU side of things to the memory side of things, and
2643 [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see
2648 the use of any special device communication instructions the CPU may have.
2656 will be ordered.  This means that whilst changes made on one CPU will
2662 has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
2669 	|  CPU 1 |<---+                        |        |
2677 	|  CPU 2 |<---+                        |        |
2692  (*) whilst the CPU core is interrogating one cache, the other cache may be
2703 Imagine, then, that two writes are made on the first CPU, with a write barrier
2704 between them to guarantee that they will appear to reach that CPU's caches in
2707 	CPU 1		CPU 2		COMMENT
2718 the local CPU's caches have apparently been updated in the correct order.  But
2719 now imagine that the second CPU wants to read those values:
2721 	CPU 1		CPU 2		COMMENT
2728 cacheline holding p may get updated in one of the second CPU's caches whilst
2730 CPU's caches by some other cache event:
2732 	CPU 1		CPU 2		COMMENT
2748 Basically, whilst both cachelines will be updated on CPU 2 eventually, there's
2750 as that committed on CPU 1.
2757 	CPU 1		CPU 2		COMMENT
2792 the kernel must flush the overlapping bits of cache on each CPU (and maybe
2796 cache lines being written back to RAM from a CPU's cache after the device has
2797 installed its own data, or cache lines present in the CPU's cache may simply
2799 is discarded from the CPU's cache and reloaded.  To deal with this, the
2801 cache on each CPU.
2810 a window in the CPU's memory space that has different properties assigned than
2825 A programmer might take it for granted that the CPU will perform memory
2826 operations in exactly the order specified, so that if the CPU is, for example,
2835 they would then expect that the CPU will complete the memory operation for each
2856      of the CPU buses and caches;
2863  (*) the CPU's data cache may affect the ordering, and whilst cache-coherency
2868 So what another CPU, say, might actually observe from the above piece of code
2876 However, it is guaranteed that a CPU will be self-consistent: it will see its
2895 The code above may cause the CPU to generate the full sequence of memory
2904 where a given CPU might reorder successive loads to the same location.
2911 the CPU even sees them.
2934 and the LOAD operation never appear outside of the CPU.
2940 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
2941 some versions of the Alpha CPU have a split data cache, permitting them to have