1<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux GPU Driver Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter 4. drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-csr-ucode-fini.html" title="intel_csr_ucode_fini"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><th width="60%" align="center">Chapter 4. drm/i915 Intel GFX Driver</th><td width="20%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id-1.4.3.5"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.3">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.4">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.5">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.6">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.7">GTT Fences and Swizzling</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.8">Object Tiling IOCTLs</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.9">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.10">Buffer Object Memory Shrinking</a></span></dt></dl></div><p> 2 This sections covers all things related to the GEM implementation in the 3 i915 driver. 4 </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.3"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> — 5 set cmd parser related fields for a ringbuffer 6 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> — 7 clean up cmd parser related fields 8 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> — 9 should a given ring use software command parsing? 10 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> — 11 parse a submitted batch buffer for privilege violations 12 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> — 13 get the cmd parser version number 14 </span></dt></dl></div><p> 15 </p><p> 16 Motivation: 17 Certain OpenGL features (e.g. transform feedback, performance monitoring) 18 require userspace code to submit batches containing commands such as 19 MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some 20 generations of the hardware will noop these commands in <span class="quote">“<span class="quote">unsecure</span>”</span> batches 21 (which includes all userspace batches submitted via i915) even though the 22 commands may be safe and represent the intended programming model of the 23 device. 24 </p><p> 25 The software command parser is similar in operation to the command parsing 26 done in hardware for unsecure batches. However, the software parser allows 27 some operations that would be noop'd by hardware, if the parser determines 28 the operation is safe, and submits the batch as <span class="quote">“<span class="quote">secure</span>”</span> to prevent hardware 29 parsing. 30 </p><p> 31 Threats: 32 At a high level, the hardware (and software) checks attempt to prevent 33 granting userspace undue privileges. There are three categories of privilege. 34 </p><p> 35 First, commands which are explicitly defined as privileged or which should 36 only be used by the kernel driver. The parser generally rejects such 37 commands, though it may allow some from the drm master process. 38 </p><p> 39 Second, commands which access registers. To support correct/enhanced 40 userspace functionality, particularly certain OpenGL extensions, the parser 41 provides a whitelist of registers which userspace may safely access (for both 42 normal and drm master processes). 43 </p><p> 44 Third, commands which access privileged memory (i.e. GGTT, HWS page, etc). 45 The parser always rejects such commands. 46 </p><p> 47 The majority of the problematic commands fall in the MI_* range, with only a 48 few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW). 49 </p><p> 50 Implementation: 51 Each ring maintains tables of commands and registers which the parser uses in 52 scanning batch buffers submitted to that ring. 53 </p><p> 54 Since the set of commands that the parser must check for is significantly 55 smaller than the number of commands supported, the parser tables contain only 56 those commands required by the parser. This generally works because command 57 opcode ranges have standard command length encodings. So for commands that 58 the parser does not need to check, it can easily skip them. This is 59 implemented via a per-ring length decoding vfunc. 60 </p><p> 61 Unfortunately, there are a number of commands that do not follow the standard 62 length encoding for their opcode range, primarily amongst the MI_* commands. 63 To handle this, the parser provides a way to define explicit <span class="quote">“<span class="quote">skip</span>”</span> entries 64 in the per-ring command tables. 65 </p><p> 66 Other command table entries map fairly directly to high level categories 67 mentioned above: rejected, master-only, register whitelist. The parser 68 implements a number of checks, including the privileged memory checks, via a 69 general bitmasking mechanism. 70</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.4"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> — 71 initialize a batch buffer pool 72 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> — 73 clean up a batch buffer pool 74 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> — 75 allocate a buffer from the pool 76 </span></dt></dl></div><p> 77 </p><p> 78 In order to submit batch buffers as 'secure', the software command parser 79 must ensure that a batch buffer cannot be modified after parsing. It does 80 this by copying the user provided batch buffer contents to a kernel owned 81 buffer from which the hardware will actually execute, and by carefully 82 managing the address space bindings for such buffers. 83 </p><p> 84 The batch pool framework provides a mechanism for the driver to manage a 85 set of scratch buffers to use for this purpose. The framework can be 86 extended to support other uses cases should they arise. 87</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.5"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> — 88 sanitize i915.enable_execlists 89 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> — 90 get the Execlists Context ID 91 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> — 92 handle Context Switch interrupts 93 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> — 94 prepare the logical ringbuffer to accept some commands 95 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> — 96 submit a batchbuffer for execution, Execlists style 97 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-indirectctx-bb.html"><span class="phrase">gen8_init_indirectctx_bb</span></a></span><span class="refpurpose"> — 98 initialize indirect ctx batch with WA 99 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-perctx-bb.html"><span class="phrase">gen8_init_perctx_bb</span></a></span><span class="refpurpose"> — 100 initialize per ctx batch with WA 101 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> — 102 deallocate the Engine Command Streamer 103 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> — 104 allocate, populate and init the Engine Command Streamers 105 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> — 106 free the LRC specific bits of a context 107 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-alloc.html"><span class="phrase">intel_lr_context_deferred_alloc</span></a></span><span class="refpurpose"> — 108 create the LRC specific bits of a context 109 </span></dt></dl></div><p> 110 </p><p> 111 Motivation: 112 GEN8 brings an expansion of the HW contexts: <span class="quote">“<span class="quote">Logical Ring Contexts</span>”</span>. 113 These expanded contexts enable a number of new abilities, especially 114 <span class="quote">“<span class="quote">Execlists</span>”</span> (also implemented in this file). 115 </p><p> 116 One of the main differences with the legacy HW contexts is that logical 117 ring contexts incorporate many more things to the context's state, like 118 PDPs or ringbuffer control registers: 119 </p><p> 120 The reason why PDPs are included in the context is straightforward: as 121 PPGTTs (per-process GTTs) are actually per-context, having the PDPs 122 contained there mean you don't need to do a ppgtt->switch_mm yourself, 123 instead, the GPU will do it for you on the context switch. 124 </p><p> 125 But, what about the ringbuffer control registers (head, tail, etc..)? 126 shouldn't we just need a set of those per engine command streamer? This is 127 where the name <span class="quote">“<span class="quote">Logical Rings</span>”</span> starts to make sense: by virtualizing the 128 rings, the engine cs shifts to a new <span class="quote">“<span class="quote">ring buffer</span>”</span> with every context 129 switch. When you want to submit a workload to the GPU you: A) choose your 130 context, B) find its appropriate virtualized ring, C) write commands to it 131 and then, finally, D) tell the GPU to switch to that context. 132 </p><p> 133 Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 134 to a contexts is via a context execution list, ergo <span class="quote">“<span class="quote">Execlists</span>”</span>. 135 </p><p> 136 LRC implementation: 137 Regarding the creation of contexts, we have: 138 </p><p> 139 - One global default context. 140 - One local default context for each opened fd. 141 - One local extra context for each context create ioctl call. 142 </p><p> 143 Now that ringbuffers belong per-context (and not per-engine, like before) 144 and that contexts are uniquely tied to a given engine (and not reusable, 145 like before) we need: 146 </p><p> 147 - One ringbuffer per-engine inside each context. 148 - One backing object per-engine inside each context. 149 </p><p> 150 The global default context starts its life with these new objects fully 151 allocated and populated. The local default context for each opened fd is 152 more complex, because we don't know at creation time which engine is going 153 to use them. To handle this, we have implemented a deferred creation of LR 154 contexts: 155 </p><p> 156 The local context starts its life as a hollow or blank holder, that only 157 gets populated for a given engine once we receive an execbuffer. If later 158 on we receive another execbuffer ioctl for the same context but a different 159 engine, we allocate/populate a new ringbuffer and context backing object and 160 so on. 161 </p><p> 162 Finally, regarding local contexts created using the ioctl call: as they are 163 only allowed with the render ring, we can allocate & populate them right 164 away (no need to defer anything, at least for now). 165 </p><p> 166 Execlists implementation: 167 Execlists are the new method by which, on gen8+ hardware, workloads are 168 submitted for execution (as opposed to the legacy, ringbuffer-based, method). 169 This method works as follows: 170 </p><p> 171 When a request is committed, its commands (the BB start and any leading or 172 trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 173 for the appropriate context. The tail pointer in the hardware context is not 174 updated at this time, but instead, kept by the driver in the ringbuffer 175 structure. A structure representing this request is added to a request queue 176 for the appropriate engine: this structure contains a copy of the context's 177 tail after the request was written to the ring buffer and a pointer to the 178 context itself. 179 </p><p> 180 If the engine's request queue was empty before the request was added, the 181 queue is processed immediately. Otherwise the queue will be processed during 182 a context switch interrupt. In any case, elements on the queue will get sent 183 (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 184 globally unique 20-bits submission ID. 185 </p><p> 186 When execution of a request completes, the GPU updates the context status 187 buffer with a context complete event and generates a context switch interrupt. 188 During the interrupt handling, the driver examines the events in the buffer: 189 for each context complete event, if the announced ID matches that on the head 190 of the request queue, then that request is retired and removed from the queue. 191 </p><p> 192 After processing, if any requests were retired and the queue is not empty 193 then a new execution list can be submitted. The two requests at the front of 194 the queue are next to be submitted but since a context may not occur twice in 195 an execution list, if subsequent requests have the same ID as the first then 196 the two requests must be combined. This is done simply by discarding requests 197 at the head of the queue until either only one requests is left (in which case 198 we use a NULL second context) or the first two requests have unique IDs. 199 </p><p> 200 By always executing the first two requests in the queue the driver ensures 201 that the GPU is kept as busy as possible. In the case where a single context 202 completes but a second context is still executing, the request for this second 203 context will be at the head of the queue when we remove the first one. This 204 request will then be resubmitted along with a new request for a different context, 205 which will cause the hardware to continue executing the second request and queue 206 the new request (the GPU detects the condition of a context getting preempted 207 with the same context and optimizes the context switch flow by not doing 208 preemption, but just sampling the new tail pointer). 209 </p><p> 210</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.6"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-pagetabs.html"><span class="phrase">gen8_ppgtt_alloc_pagetabs</span></a></span><span class="refpurpose"> — 211 Allocate page tables for VA range. 212 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-directories.html"><span class="phrase">gen8_ppgtt_alloc_page_directories</span></a></span><span class="refpurpose"> — 213 Allocate page directories for VA range. 214 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-dirpointers.html"><span class="phrase">gen8_ppgtt_alloc_page_dirpointers</span></a></span><span class="refpurpose"> — 215 Allocate pdps for VA range. 216 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> — 217 Sets up PTEs for an VMA in it's corresponding address space. 218 </span></dt><dt><span class="refentrytitle"><a href="API-i915-ggtt-view-size.html"><span class="phrase">i915_ggtt_view_size</span></a></span><span class="refpurpose"> — 219 Get the size of a GGTT view. 220 </span></dt></dl></div><p> 221 </p><p> 222 Background and previous state 223 </p><p> 224 Historically objects could exists (be bound) in global GTT space only as 225 singular instances with a view representing all of the object's backing pages 226 in a linear fashion. This view will be called a normal view. 227 </p><p> 228 To support multiple views of the same object, where the number of mapped 229 pages is not equal to the backing store, or where the layout of the pages 230 is not linear, concept of a GGTT view was added. 231 </p><p> 232 One example of an alternative view is a stereo display driven by a single 233 image. In this case we would have a framebuffer looking like this 234 (2x2 pages): 235 </p><p> 236 12 237 34 238 </p><p> 239 Above would represent a normal GGTT view as normally mapped for GPU or CPU 240 rendering. In contrast, fed to the display engine would be an alternative 241 view which could look something like this: 242 </p><p> 243 1212 244 3434 245 </p><p> 246 In this example both the size and layout of pages in the alternative view is 247 different from the normal view. 248 </p><p> 249 Implementation and usage 250 </p><p> 251 GGTT views are implemented using VMAs and are distinguished via enum 252 i915_ggtt_view_type and struct i915_ggtt_view. 253 </p><p> 254 A new flavour of core GEM functions which work with GGTT bound objects were 255 added with the _ggtt_ infix, and sometimes with _view postfix to avoid 256 renaming in large amounts of code. They take the struct i915_ggtt_view 257 parameter encapsulating all metadata required to implement a view. 258 </p><p> 259 As a helper for callers which are only interested in the normal view, 260 globally const i915_ggtt_view_normal singleton instance exists. All old core 261 GEM API functions, the ones not taking the view parameter, are operating on, 262 or with the normal GGTT view. 263 </p><p> 264 Code wanting to add or use a new GGTT view needs to: 265 </p><p> 266 1. Add a new enum with a suitable name. 267 2. Extend the metadata in the i915_ggtt_view structure if required. 268 3. Add support to <code class="function">i915_get_vma_pages</code>. 269 </p><p> 270 New views are required to build a scatter-gather table from within the 271 i915_get_vma_pages function. This table is stored in the vma.ggtt_view and 272 exists for the lifetime of an VMA. 273 </p><p> 274 Core API is designed to have copy semantics which means that passed in 275 struct i915_ggtt_view does not need to be persistent (left around after 276 calling the core API functions). 277 </p><p> 278</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.7"></a>GTT Fences and Swizzling</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-object-put-fence.html"><span class="phrase">i915_gem_object_put_fence</span></a></span><span class="refpurpose"> — 279 force-remove fence for an object 280 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-get-fence.html"><span class="phrase">i915_gem_object_get_fence</span></a></span><span class="refpurpose"> — 281 set up fencing for an object 282 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-pin-fence.html"><span class="phrase">i915_gem_object_pin_fence</span></a></span><span class="refpurpose"> — 283 pin fencing state 284 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-unpin-fence.html"><span class="phrase">i915_gem_object_unpin_fence</span></a></span><span class="refpurpose"> — 285 unpin fencing state 286 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-restore-fences.html"><span class="phrase">i915_gem_restore_fences</span></a></span><span class="refpurpose"> — 287 restore fence state 288 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-detect-bit-6-swizzle.html"><span class="phrase">i915_gem_detect_bit_6_swizzle</span></a></span><span class="refpurpose"> — 289 detect bit 6 swizzling pattern 290 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-do-bit-17-swizzle.html"><span class="phrase">i915_gem_object_do_bit_17_swizzle</span></a></span><span class="refpurpose"> — 291 fixup bit 17 swizzling 292 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-save-bit-17-swizzle.html"><span class="phrase">i915_gem_object_save_bit_17_swizzle</span></a></span><span class="refpurpose"> — 293 save bit 17 swizzling 294 </span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.10">Global GTT Fence Handling</a></span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.11">Hardware Tiling and Swizzling Details</a></span></dt></dl></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.10"></a>Global GTT Fence Handling</h4></div></div></div><p> 295 </p><p> 296 Important to avoid confusions: <span class="quote">“<span class="quote">fences</span>”</span> in the i915 driver are not execution 297 fences used to track command completion but hardware detiler objects which 298 wrap a given range of the global GTT. Each platform has only a fairly limited 299 set of these objects. 300 </p><p> 301 Fences are used to detile GTT memory mappings. They're also connected to the 302 hardware frontbuffer render tracking and hence interract with frontbuffer 303 conmpression. Furthermore on older platforms fences are required for tiled 304 objects used by the display engine. They can also be used by the render 305 engine - they're required for blitter commands and are optional for render 306 commands. But on gen4+ both display (with the exception of fbc) and rendering 307 have their own tiling state bits and don't need fences. 308 </p><p> 309 Also note that fences only support X and Y tiling and hence can't be used for 310 the fancier new tiling formats like W, Ys and Yf. 311 </p><p> 312 Finally note that because fences are such a restricted resource they're 313 dynamically associated with objects. Furthermore fence state is committed to 314 the hardware lazily to avoid unecessary stalls on gen2/3. Therefore code must 315 explictly call <code class="function"><a class="link" href="API-i915-gem-object-get-fence.html" title="i915_gem_object_get_fence">i915_gem_object_get_fence</a></code> to synchronize fencing status 316 for cpu access. Also note that some code wants an unfenced view, for those 317 cases the fence can be removed forcefully with <code class="function"><a class="link" href="API-i915-gem-object-put-fence.html" title="i915_gem_object_put_fence">i915_gem_object_put_fence</a></code>. 318 </p><p> 319 Internally these functions will synchronize with userspace access by removing 320 CPU ptes into GTT mmaps (not the GTT ptes themselves) as needed. 321</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.11"></a>Hardware Tiling and Swizzling Details</h4></div></div></div><p> 322 </p><p> 323 The idea behind tiling is to increase cache hit rates by rearranging 324 pixel data so that a group of pixel accesses are in the same cacheline. 325 Performance improvement from doing this on the back/depth buffer are on 326 the order of 30%. 327 </p><p> 328 Intel architectures make this somewhat more complicated, though, by 329 adjustments made to addressing of data when the memory is in interleaved 330 mode (matched pairs of DIMMS) to improve memory bandwidth. 331 For interleaved memory, the CPU sends every sequential 64 bytes 332 to an alternate memory channel so it can get the bandwidth from both. 333 </p><p> 334 The GPU also rearranges its accesses for increased bandwidth to interleaved 335 memory, and it matches what the CPU does for non-tiled. However, when tiled 336 it does it a little differently, since one walks addresses not just in the 337 X direction but also Y. So, along with alternating channels when bit 338 6 of the address flips, it also alternates when other bits flip -- Bits 9 339 (every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines) 340 are common to both the 915 and 965-class hardware. 341 </p><p> 342 The CPU also sometimes XORs in higher bits as well, to improve 343 bandwidth doing strided access like we do so frequently in graphics. This 344 is called <span class="quote">“<span class="quote">Channel XOR Randomization</span>”</span> in the MCH documentation. The result 345 is that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its address 346 decode. 347 </p><p> 348 All of this bit 6 XORing has an effect on our memory management, 349 as we need to make sure that the 3d driver can correctly address object 350 contents. 351 </p><p> 352 If we don't have interleaved memory, all tiling is safe and no swizzling is 353 required. 354 </p><p> 355 When bit 17 is XORed in, we simply refuse to tile at all. Bit 356 17 is not just a page offset, so as we page an objet out and back in, 357 individual pages in it will have different bit 17 addresses, resulting in 358 each 64 bytes being swapped with its neighbor! 359 </p><p> 360 Otherwise, if interleaved, we have to tell the 3d driver what the address 361 swizzling it needs to do is, since it's writing with the CPU to the pages 362 (bit 6 and potentially bit 11 XORed in), and the GPU is reading from the 363 pages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzling 364 required by the CPU of XORing in bit 6, 9, 10, and potentially 11, in order 365 to match what the GPU expects. 366</p></div></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.8"></a>Object Tiling IOCTLs</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-set-tiling.html"><span class="phrase">i915_gem_set_tiling</span></a></span><span class="refpurpose"> — 367 IOCTL handler to set tiling mode 368 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-get-tiling.html"><span class="phrase">i915_gem_get_tiling</span></a></span><span class="refpurpose"> — 369 IOCTL handler to get tiling mode 370 </span></dt></dl></div><p> 371 </p><p> 372 <code class="function"><a class="link" href="API-i915-gem-set-tiling.html" title="i915_gem_set_tiling">i915_gem_set_tiling</a></code> and <code class="function"><a class="link" href="API-i915-gem-get-tiling.html" title="i915_gem_get_tiling">i915_gem_get_tiling</a></code> is the userspace interface to 373 declare fence register requirements. 374 </p><p> 375 In principle GEM doesn't care at all about the internal data layout of an 376 object, and hence it also doesn't care about tiling or swizzling. There's two 377 exceptions: 378 </p><p> 379 - For X and Y tiling the hardware provides detilers for CPU access, so called 380 fences. Since there's only a limited amount of them the kernel must manage 381 these, and therefore userspace must tell the kernel the object tiling if it 382 wants to use fences for detiling. 383 - On gen3 and gen4 platforms have a swizzling pattern for tiled objects which 384 depends upon the physical page frame number. When swapping such objects the 385 page frame number might change and the kernel must be able to fix this up 386 and hence now the tiling. Note that on a subset of platforms with 387 asymmetric memory channel population the swizzling pattern changes in an 388 unknown way, and for those the kernel simply forbids swapping completely. 389 </p><p> 390 Since neither of this applies for new tiling layouts on modern platforms like 391 W, Ys and Yf tiling GEM only allows object tiling to be set to X or Y tiled. 392 Anything else can be handled in userspace entirely without the kernel's 393 invovlement. 394</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.9"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> — 395 Evict vmas to make room for binding a new one 396 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> — 397 Evict all idle vmas from a vm 398 </span></dt></dl></div><p> 399 This section documents the interface functions for evicting buffer 400 objects to make space available in the virtual gpu address spaces. 401 Note that this is mostly orthogonal to shrinking buffer objects 402 caches, which has the goal to make main memory (shared with the gpu 403 through the unified memory architecture) available. 404 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.10"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> — 405 Shrink buffer object caches 406 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> — 407 Shrink buffer object caches completely 408 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> — 409 Initialize i915 shrinker 410 </span></dt></dl></div><p> 411 This section documents the interface function for shrinking memory 412 usage of buffer object caches. Shrinking is used to make main memory 413 available. Note that this is mostly orthogonal to evicting buffer 414 objects, which has the goal to make space in gpu virtual address 415 spaces. 416 </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_csr_ucode_fini</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> <span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html> 417