1Split page table lock
2=====================
3
4Originally, mm->page_table_lock spinlock protected all page tables of the
5mm_struct. But this approach leads to poor page fault scalability of
6multi-threaded applications due high contention on the lock. To improve
7scalability, split page table lock was introduced.
8
9With split page table lock we have separate per-table lock to serialize
10access to the table. At the moment we use split lock for PTE and PMD
11tables. Access to higher level tables protected by mm->page_table_lock.
12
13There are helpers to lock/unlock a table and other accessor functions:
14 - pte_offset_map_lock()
15	maps pte and takes PTE table lock, returns pointer to the taken
16	lock;
17 - pte_unmap_unlock()
18	unlocks and unmaps PTE table;
19 - pte_alloc_map_lock()
20	allocates PTE table if needed and take the lock, returns pointer
21	to taken lock or NULL if allocation failed;
22 - pte_lockptr()
23	returns pointer to PTE table lock;
24 - pmd_lock()
25	takes PMD table lock, returns pointer to taken lock;
26 - pmd_lockptr()
27	returns pointer to PMD table lock;
28
29Split page table lock for PTE tables is enabled compile-time if
30CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
31If split lock is disabled, all tables guaded by mm->page_table_lock.
32
33Split page table lock for PMD tables is enabled, if it's enabled for PTE
34tables and the architecture supports it (see below).
35
36Hugetlb and split page table lock
37---------------------------------
38
39Hugetlb can support several page sizes. We use split lock only for PMD
40level, but not for PUD.
41
42Hugetlb-specific helpers:
43 - huge_pte_lock()
44	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
45	otherwise;
46 - huge_pte_lockptr()
47	returns pointer to table lock;
48
49Support of split page table lock by an architecture
50---------------------------------------------------
51
52There's no need in special enabling of PTE split page table lock:
53everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
54which must be called on PTE table allocation / freeing.
55
56Make sure the architecture doesn't use slab allocator for page table
57allocation: slab uses page->slab_cache and page->first_page for its pages.
58These fields share storage with page->ptl.
59
60PMD split lock only makes sense if you have more than two page table
61levels.
62
63PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
64allocation and pgtable_pmd_page_dtor() on freeing.
65
66Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
67pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
68paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
69
70With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
71
72NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
73be handled properly.
74
75page->ptl
76---------
77
78page->ptl is used to access split page table lock, where 'page' is struct
79page of page containing the table. It shares storage with page->private
80(and few other fields in union).
81
82To avoid increasing size of struct page and have best performance, we use a
83trick:
84 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
85   can avoid indirect access and save a cache line.
86 - if size of spinlock_t is bigger then size of long, we use page->ptl as
87   pointer to spinlock_t and allocate it dynamically. This allows to use
88   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
89   one more cache line for indirect access;
90
91The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
92pgtable_pmd_page_ctor() for PMD table.
93
94Please, never access page->ptl directly -- use appropriate helper.
95