memory-barriers.txt - OpenGrok cross reference for /linux-4.1.27/Documentation/memory-barriers.txt

Lines Matching refs:that
101 Each CPU executes a program that generates memory access operations.  In the
163 Note that CPU 2 will never try and load C into D because the CPU will load P
173 registers that are accessed through an address port register (A) and a data
192 There are some minimal guarantees that may be expected of a CPU:
195      respect to itself.  This means that for:
203      and always in that order.  On most systems, smp_read_barrier_depends()
205      is required to prevent compiler mischief.  Please note that you
210      ordered within that CPU.  This means that for:
229 And there are a number of things that _must_ or _must_not_ be assumed:
231  (*) It _must_not_ be assumed that the compiler will do what you want with
232      memory references that are not protected by ACCESS_ONCE().  Without
237  (*) It _must_not_ be assumed that independent loads and stores will be issued
238      in the order given.  This means that for:
251  (*) It _must_ be assumed that overlapping memory accesses may be merged or
252      discarded.  This means that for:
286      variables.  "Properly sized" currently means variables that are
291      on 32-bit and 64-bit systems, respectively.  Note that these
344      A write memory barrier gives a guarantee that all the STORE operations
356      [!] Note that write barriers should normally be paired with read or data
363      where two loads are performed such that the second depends on the result
366      make sure that the target of the second load is updated before the address
374      committing sequences of stores to the memory system that the CPU being
376      under consideration guarantees that for any load preceding it, if that
378      time the barrier completes, the effects of all the stores prior to that
385      [!] Note that the first load really has to have a _data_ dependency and
392      [!] Note that data dependency barriers should normally be paired with
398      A read barrier is a data dependency barrier plus a guarantee that all the
409      [!] Note that read barriers should normally be paired with write barriers;
415      A general memory barrier gives a guarantee that all the LOAD and STORE
430      This acts as a one-way permeable barrier.  It guarantees that all memory
436      Memory operations that occur before an ACQUIRE operation may appear to
445      This also acts as a one-way permeable barrier.  It guarantees that all
451      Memory operations that occur after a RELEASE operation may appear to
459      RELEASE on that same variable are guaranteed to be visible.  In other
461      previous critical sections for that variable are guaranteed to have
464      This means that ACQUIRE acts as a minimal "acquire" operation and
469 between two CPUs or between a CPU and a device.  If it can be guaranteed that
471 memory barriers are unnecessary in that piece of code.
474 Note that these are the _minimum_ guarantees.  Different architectures may give
482 There are certain things that the Linux kernel memory barriers do not guarantee:
484  (*) There is no guarantee that any of the memory accesses specified before a
486      instruction; the barrier can be considered to draw a line in that CPU's
487      access queue that accesses of the appropriate type may not cross.
489  (*) There is no guarantee that issuing a memory barrier on one CPU will have
494  (*) There is no guarantee that a CPU will see the correct order of effects
499  (*) There is no guarantee that some intervening piece of off-the-CPU
515 it's not always obvious that they're needed.  To illustrate, consider the
527 There's a clear data dependency here, and it would seem that by the end of the
528 sequence, Q must be either &A or &B, and that:
558 [!] Note that this extremely counterintuitive situation arises most easily on
559 machines with split caches, so that, for example, one cache bank processes
606 dependency, but rather a control dependency that the CPU may short-circuit
607 by attempting to predict the outcome in advance, so that other CPUs see
617 However, stores are not speculated.  This means that ordering -is- provided
626 That said, please note that ACCESS_ONCE() is not optional!  Without the
631 Worse yet, if the compiler is able to prove (say) that the value of
670 'b', which means that the CPU is within its rights to reorder them:
713 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
725 relying on this ordering, you should make sure that MAX is greater than
738 Please note once again that the stores to 'b' differ.  If they were
755 This example underscores the need to ensure that the compiler cannot
786 that is, just before or just after the "if" statements.
812   (*) Control dependencies require that the compiler avoid reordering the
872 [!] Note that the stores before the write barrier would normally be expected to
901 that the rest of the system might perceive as the unordered set of { STORE A,
969 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1125 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1151 The guarantee is that the second load will always come up with A == 1 if the
1153 A; that may come up with either A == 0 or A == 1.
1159 Many CPUs speculate with loads: that is they see that they will need to load an
1162 got to that point in the instruction execution flow yet.  This permits the
1166 It may turn out that the CPU didn't actually need the value - perhaps because a
1256 Transitivity is a deeply intuitive notion about ordering that is not
1267 Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
1268 This indicates that CPU 2's load from X in some sense follows CPU 1's
1269 store to X and that CPU 2's load from Y in some sense preceded CPU 3's
1273 is natural to expect that CPU 3's load from X must therefore return 1.
1276 CPU A's load must either return the same value that CPU B's load did,
1285 For example, suppose that CPU 2's general barrier in the above example
1299 The key point is that although CPU 2's read barrier orders its pair
1303 General barriers are therefore required to ensure that all CPUs agree
1314 The Linux kernel has a variety of different barriers that act at different
1327 The Linux kernel has an explicit compiler barrier function that prevents the
1334 for barrier() that affects only the specific accesses flagged by the
1342      interrupt-handler code and the code that was interrupted.
1345      in that loop's conditional on each pass through that loop.
1347 The ACCESS_ONCE() function can prevent any number of optimizations that,
1353      rights to reorder loads to the same variable.  This means that
1412      Note that if the compiler runs short of registers, it might save
1419      what the value will be.  For example, if the compiler can prove that
1430      rid of a load and a branch.  The problem is that the compiler will
1431      carry out its proof assuming that the current CPU is the only one
1434      that it doesn't know as much as it thinks it does:
1439      But please note that the compiler is also closely watching what you
1446      Then the compiler knows that the result of the "%" operator applied
1452      if it knows that the variable already has the value being stored.
1453      Again, the compiler assumes that the current CPU is the only one
1459 	/* Code that does not store to variable a. */
1462      The compiler sees that the value of variable 'a' is already zero, so
1471 	/* Code that does not store to variable a. */
1516      Note that the ACCESS_ONCE() wrappers in interrupt_handler()
1518      by something that also accesses 'flag' and 'msg', for example,
1521      (Note also that nested interrupts do not typically occur in modern
1525      You should assume that the compiler can move ACCESS_ONCE() past
1531      the compiler must discard the value of all memory locations that
1576      Please note that GCC really does use this sort of optimization,
1577      which is not surprising given that it would likely take more
1612 All that aside, it is never necessary to use ACCESS_ONCE() on a variable
1613 that has been marked volatile.  For example, because 'jiffies' is marked
1615 for this is that ACCESS_ONCE() is implemented as a volatile cast, which
1618 Please note that these compiler barriers have no direct effect on the CPU,
1641 that the compiler may not speculate the value of b (eg. is equal to 1) and load
1648 systems because it is assumed that a CPU will appear to be self-consistent,
1651 [!] Note that SMP memory barriers _must_ be used to control the ordering of
1676      decrement) functions that don't return a value, especially when used for
1679      These are also used for atomic bitop functions that do not return a
1682      As an example, consider a piece of code that marks an object as being dead
1689      This makes sure that the death mark on the object is perceived to be set
1703      For example, consider a device driver that shares memory with a device
1732      can see it now has ownership.  The wmb() is needed to guarantee that the
1746 This is a variation on the mandatory write barrier that causes writes to weakly
1789      subsequent stores.  Note that this is weaker than smp_mb()!  The
1803      completed before that ACQUIRE operation.
1818 one-way barriers is that the effects of instructions outside of a critical
1838 another CPU not holding that lock.  In short, a ACQUIRE followed by an
1849 sections corresponding to the RELEASE and the ACQUIRE can cross, so that:
1860 It might appear that this reordering could introduce a deadlock.
1866 	One key point is that we are only talking about the CPU doing
1868 	that matter, the developer) switched the operations, deadlock
1875 	try to sleep, but more on that later).	The CPU will eventually
1880 	But what if the lock is a sleeplock?  In that case, the code will
1907 ensures that the store to *A will always be seen as happening before
1933 	[+] Note that {*F,*A} indicates a combined access.
1947 Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
1958 the event and the global data used to indicate the event.  To make sure that
2002 Secondly, code that performs a wake up normally follows something like this:
2058 [!] Note that the memory barriers implied by the sleeper and the waker do _not_
2075 there's no guarantee that the change to event_indicated will be perceived by
2097 Other functions that imply barriers:
2107 that does affect memory access ordering on other CPUs, within the context of
2169 Note that the smp_mb__after_unlock_lock() is critically important
2220 this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
2269 operations that affect both CPUs may have to be carefully ordered to prevent
2313 before proceeding.  Since the record is on the waiter's stack, this means that
2350 In this case, the barrier makes a guarantee that all memory accesses before the
2352 with respect to the other CPUs on the system.  It does _not_ guarantee that all
2359 CPU, that CPU's dependency ordering logic will take care of everything else.
2370 Any atomic operation that modifies some state in memory and returns information
2446 [!] Note that special memory barrier primitives are available for these
2462 in that the carefully sequenced accesses in the driver code won't reach the
2464 efficient to reorder, combine or merge accesses - something that would cause
2492 form of locking), such that the critical operations are all contained within
2496 handled, thus the interrupt handler does not need to lock against that.
2498 However, consider a driver that was talking to an ethernet card that sports an
2499 address register and a data register.  If that driver's core talks to the card
2517 If ordering rules are relaxed, it must be assumed that accesses done inside an
2524 registers that form implicit I/O barriers. If this isn't sufficient then an
2529 running on separate CPUs that communicate with each other. If such a case is
2543      that's primarily a CPU-specific concept. The i386 and x86_64 processors do
2550      memory map, particularly on those CPUs that don't support alternate I/O
2555      that.
2594      required, an mmiowb() barrier can be used. Note that relaxed accesses to
2608 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2614 This means that it must be considered that the CPU will execute its instruction
2615 stream in any order it feels like - or even in parallel - provided that if an
2616 instruction in the stream depends on an earlier instruction, then that
2618 instruction may proceed; in other words: provided that the appearance of
2625 A CPU may also discard any instruction sequence that winds up having no
2630 Similarly, it has to be assumed that compiler might reorder the instruction
2640 a certain extent by the caches that lie between CPUs and memory, and by the
2641 memory coherence system that maintains the consistency of state in the system.
2671 CPU that issued it since it may have been satisfied within the CPU's own cache,
2700 caches are expected to be coherent, there's no guarantee that that coherency
2701 will be ordered.  This means that whilst changes made on one CPU will
2702 eventually become visible on all CPUs, there's no guarantee that they will
2706 Consider dealing with a system that has a pair of CPUs (1 & 2), each of which
2741  (*) each cache has a queue of operations that need to be applied to that cache
2748 Imagine, then, that two writes are made on the first CPU, with a write barrier
2749 between them to guarantee that they will appear to reach that CPU's caches in
2762 The write memory barrier forces the other CPUs in the system to perceive that
2764 now imagine that the second CPU wants to read those values:
2794 no guarantee that, without intervention, the order of update will be the same
2795 as that committed on CPU 1.
2821 split cache that improves performance by making better use of the data bus.
2843 obscure the fact that RAM has been updated, until at such time as the cacheline
2854 Memory mapped I/O usually takes place through memory locations that are part of
2855 a window in the CPU's memory space that has different properties assigned than
2858 Amongst these properties is usually the fact that such accesses bypass the
2860 may, in effect, overtake accesses to cached memory that were emitted earlier.
2870 A programmer might take it for granted that the CPU will perform memory
2871 operations in exactly the order specified, so that if the CPU is, for example,
2880 they would then expect that the CPU will complete the memory operation for each
2904      memory or I/O hardware that can do batched accesses of adjacent locations,
2910      - there's no guarantee that the coherency management will be propagated in
2921 However, it is guaranteed that a CPU will be self-consistent: it will see its
2932 and assuming no intervention by an external influence, it can be assumed that
2945 in that order, but, without intervention, the sequence may have almost any
2947 the world remains consistent.  Note that ACCESS_ONCE() is -not- optional
2952 special ld.acq and st.rel instructions that prevent such reordering.
2967 assumed that the effect of the storage of V to *A is lost.  Similarly:
2983 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,