1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux DRM Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter 4. drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-dp-drrs-init.html" title="intel_dp_drrs_init"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a> </td><th width="60%" align="center">Chapter 4. drm/i915 Intel GFX Driver</th><td width="20%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idp1128257548"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#idp1128258212">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128310204">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128337188">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128435628">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128473508">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128510940">Buffer Object Memory Shrinking</a></span></dt></dl></div><p> 2 This sections covers all things related to the GEM implementation in the 3 i915 driver. 4 </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128258212"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> — 5 set cmd parser related fields for a ringbuffer 6 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> — 7 clean up cmd parser related fields 8 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> — 9 should a given ring use software command parsing? 10 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> — 11 parse a submitted batch buffer for privilege violations 12 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> — 13 get the cmd parser version number 14 </span></dt></dl></div><p> 15 </p><p> 16 Motivation: 17 Certain OpenGL features (e.g. transform feedback, performance monitoring) 18 require userspace code to submit batches containing commands such as 19 MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some 20 generations of the hardware will noop these commands in <span class="quote">“<span class="quote">unsecure</span>”</span> batches 21 (which includes all userspace batches submitted via i915) even though the 22 commands may be safe and represent the intended programming model of the 23 device. 24 </p><p> 25 The software command parser is similar in operation to the command parsing 26 done in hardware for unsecure batches. However, the software parser allows 27 some operations that would be noop'd by hardware, if the parser determines 28 the operation is safe, and submits the batch as <span class="quote">“<span class="quote">secure</span>”</span> to prevent hardware 29 parsing. 30 </p><p> 31 Threats: 32 At a high level, the hardware (and software) checks attempt to prevent 33 granting userspace undue privileges. There are three categories of privilege. 34 </p><p> 35 First, commands which are explicitly defined as privileged or which should 36 only be used by the kernel driver. The parser generally rejects such 37 commands, though it may allow some from the drm master process. 38 </p><p> 39 Second, commands which access registers. To support correct/enhanced 40 userspace functionality, particularly certain OpenGL extensions, the parser 41 provides a whitelist of registers which userspace may safely access (for both 42 normal and drm master processes). 43 </p><p> 44 Third, commands which access privileged memory (i.e. GGTT, HWS page, etc). 45 The parser always rejects such commands. 46 </p><p> 47 The majority of the problematic commands fall in the MI_* range, with only a 48 few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW). 49 </p><p> 50 Implementation: 51 Each ring maintains tables of commands and registers which the parser uses in 52 scanning batch buffers submitted to that ring. 53 </p><p> 54 Since the set of commands that the parser must check for is significantly 55 smaller than the number of commands supported, the parser tables contain only 56 those commands required by the parser. This generally works because command 57 opcode ranges have standard command length encodings. So for commands that 58 the parser does not need to check, it can easily skip them. This is 59 implemented via a per-ring length decoding vfunc. 60 </p><p> 61 Unfortunately, there are a number of commands that do not follow the standard 62 length encoding for their opcode range, primarily amongst the MI_* commands. 63 To handle this, the parser provides a way to define explicit <span class="quote">“<span class="quote">skip</span>”</span> entries 64 in the per-ring command tables. 65 </p><p> 66 Other command table entries map fairly directly to high level categories 67 mentioned above: rejected, master-only, register whitelist. The parser 68 implements a number of checks, including the privileged memory checks, via a 69 general bitmasking mechanism. 70</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128310204"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> — 71 initialize a batch buffer pool 72 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> — 73 clean up a batch buffer pool 74 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> — 75 select a buffer from the pool 76 </span></dt></dl></div><p> 77 </p><p> 78 In order to submit batch buffers as 'secure', the software command parser 79 must ensure that a batch buffer cannot be modified after parsing. It does 80 this by copying the user provided batch buffer contents to a kernel owned 81 buffer from which the hardware will actually execute, and by carefully 82 managing the address space bindings for such buffers. 83 </p><p> 84 The batch pool framework provides a mechanism for the driver to manage a 85 set of scratch buffers to use for this purpose. The framework can be 86 extended to support other uses cases should they arise. 87</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128337188"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> — 88 sanitize i915.enable_execlists 89 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> — 90 get the Execlists Context ID 91 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> — 92 handle Context Switch interrupts 93 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> — 94 submit a batchbuffer for execution, Execlists style 95 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> — 96 prepare the logical ringbuffer to accept some commands 97 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> — 98 deallocate the Engine Command Streamer 99 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> — 100 allocate, populate and init the Engine Command Streamers 101 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> — 102 free the LRC specific bits of a context 103 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-create.html"><span class="phrase">intel_lr_context_deferred_create</span></a></span><span class="refpurpose"> — 104 create the LRC specific bits of a context 105 </span></dt></dl></div><p> 106 </p><p> 107 Motivation: 108 GEN8 brings an expansion of the HW contexts: <span class="quote">“<span class="quote">Logical Ring Contexts</span>”</span>. 109 These expanded contexts enable a number of new abilities, especially 110 <span class="quote">“<span class="quote">Execlists</span>”</span> (also implemented in this file). 111 </p><p> 112 One of the main differences with the legacy HW contexts is that logical 113 ring contexts incorporate many more things to the context's state, like 114 PDPs or ringbuffer control registers: 115 </p><p> 116 The reason why PDPs are included in the context is straightforward: as 117 PPGTTs (per-process GTTs) are actually per-context, having the PDPs 118 contained there mean you don't need to do a ppgtt->switch_mm yourself, 119 instead, the GPU will do it for you on the context switch. 120 </p><p> 121 But, what about the ringbuffer control registers (head, tail, etc..)? 122 shouldn't we just need a set of those per engine command streamer? This is 123 where the name <span class="quote">“<span class="quote">Logical Rings</span>”</span> starts to make sense: by virtualizing the 124 rings, the engine cs shifts to a new <span class="quote">“<span class="quote">ring buffer</span>”</span> with every context 125 switch. When you want to submit a workload to the GPU you: A) choose your 126 context, B) find its appropriate virtualized ring, C) write commands to it 127 and then, finally, D) tell the GPU to switch to that context. 128 </p><p> 129 Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch 130 to a contexts is via a context execution list, ergo <span class="quote">“<span class="quote">Execlists</span>”</span>. 131 </p><p> 132 LRC implementation: 133 Regarding the creation of contexts, we have: 134 </p><p> 135 - One global default context. 136 - One local default context for each opened fd. 137 - One local extra context for each context create ioctl call. 138 </p><p> 139 Now that ringbuffers belong per-context (and not per-engine, like before) 140 and that contexts are uniquely tied to a given engine (and not reusable, 141 like before) we need: 142 </p><p> 143 - One ringbuffer per-engine inside each context. 144 - One backing object per-engine inside each context. 145 </p><p> 146 The global default context starts its life with these new objects fully 147 allocated and populated. The local default context for each opened fd is 148 more complex, because we don't know at creation time which engine is going 149 to use them. To handle this, we have implemented a deferred creation of LR 150 contexts: 151 </p><p> 152 The local context starts its life as a hollow or blank holder, that only 153 gets populated for a given engine once we receive an execbuffer. If later 154 on we receive another execbuffer ioctl for the same context but a different 155 engine, we allocate/populate a new ringbuffer and context backing object and 156 so on. 157 </p><p> 158 Finally, regarding local contexts created using the ioctl call: as they are 159 only allowed with the render ring, we can allocate & populate them right 160 away (no need to defer anything, at least for now). 161 </p><p> 162 Execlists implementation: 163 Execlists are the new method by which, on gen8+ hardware, workloads are 164 submitted for execution (as opposed to the legacy, ringbuffer-based, method). 165 This method works as follows: 166 </p><p> 167 When a request is committed, its commands (the BB start and any leading or 168 trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer 169 for the appropriate context. The tail pointer in the hardware context is not 170 updated at this time, but instead, kept by the driver in the ringbuffer 171 structure. A structure representing this request is added to a request queue 172 for the appropriate engine: this structure contains a copy of the context's 173 tail after the request was written to the ring buffer and a pointer to the 174 context itself. 175 </p><p> 176 If the engine's request queue was empty before the request was added, the 177 queue is processed immediately. Otherwise the queue will be processed during 178 a context switch interrupt. In any case, elements on the queue will get sent 179 (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a 180 globally unique 20-bits submission ID. 181 </p><p> 182 When execution of a request completes, the GPU updates the context status 183 buffer with a context complete event and generates a context switch interrupt. 184 During the interrupt handling, the driver examines the events in the buffer: 185 for each context complete event, if the announced ID matches that on the head 186 of the request queue, then that request is retired and removed from the queue. 187 </p><p> 188 After processing, if any requests were retired and the queue is not empty 189 then a new execution list can be submitted. The two requests at the front of 190 the queue are next to be submitted but since a context may not occur twice in 191 an execution list, if subsequent requests have the same ID as the first then 192 the two requests must be combined. This is done simply by discarding requests 193 at the head of the queue until either only one requests is left (in which case 194 we use a NULL second context) or the first two requests have unique IDs. 195 </p><p> 196 By always executing the first two requests in the queue the driver ensures 197 that the GPU is kept as busy as possible. In the case where a single context 198 completes but a second context is still executing, the request for this second 199 context will be at the head of the queue when we remove the first one. This 200 request will then be resubmitted along with a new request for a different context, 201 which will cause the hardware to continue executing the second request and queue 202 the new request (the GPU detects the condition of a context getting preempted 203 with the same context and optimizes the context switch flow by not doing 204 preemption, but just sampling the new tail pointer). 205 </p><p> 206</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128435628"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-dma-map-single.html"><span class="phrase">i915_dma_map_single</span></a></span><span class="refpurpose"> — 207 Create a dma mapping for a page table/dir/etc. 208 </span></dt><dt><span class="refentrytitle"><a href="API-alloc-pt-range.html"><span class="phrase">alloc_pt_range</span></a></span><span class="refpurpose"> — 209 Allocate a multiple page tables 210 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> — 211 Sets up PTEs for an VMA in it's corresponding address space. 212 </span></dt></dl></div><p> 213 </p><p> 214 Background and previous state 215 </p><p> 216 Historically objects could exists (be bound) in global GTT space only as 217 singular instances with a view representing all of the object's backing pages 218 in a linear fashion. This view will be called a normal view. 219 </p><p> 220 To support multiple views of the same object, where the number of mapped 221 pages is not equal to the backing store, or where the layout of the pages 222 is not linear, concept of a GGTT view was added. 223 </p><p> 224 One example of an alternative view is a stereo display driven by a single 225 image. In this case we would have a framebuffer looking like this 226 (2x2 pages): 227 </p><p> 228 12 229 34 230 </p><p> 231 Above would represent a normal GGTT view as normally mapped for GPU or CPU 232 rendering. In contrast, fed to the display engine would be an alternative 233 view which could look something like this: 234 </p><p> 235 1212 236 3434 237 </p><p> 238 In this example both the size and layout of pages in the alternative view is 239 different from the normal view. 240 </p><p> 241 Implementation and usage 242 </p><p> 243 GGTT views are implemented using VMAs and are distinguished via enum 244 i915_ggtt_view_type and struct i915_ggtt_view. 245 </p><p> 246 A new flavour of core GEM functions which work with GGTT bound objects were 247 added with the _ggtt_ infix, and sometimes with _view postfix to avoid 248 renaming in large amounts of code. They take the struct i915_ggtt_view 249 parameter encapsulating all metadata required to implement a view. 250 </p><p> 251 As a helper for callers which are only interested in the normal view, 252 globally const i915_ggtt_view_normal singleton instance exists. All old core 253 GEM API functions, the ones not taking the view parameter, are operating on, 254 or with the normal GGTT view. 255 </p><p> 256 Code wanting to add or use a new GGTT view needs to: 257 </p><p> 258 1. Add a new enum with a suitable name. 259 2. Extend the metadata in the i915_ggtt_view structure if required. 260 3. Add support to <code class="function">i915_get_vma_pages</code>. 261 </p><p> 262 New views are required to build a scatter-gather table from within the 263 i915_get_vma_pages function. This table is stored in the vma.ggtt_view and 264 exists for the lifetime of an VMA. 265 </p><p> 266 Core API is designed to have copy semantics which means that passed in 267 struct i915_ggtt_view does not need to be persistent (left around after 268 calling the core API functions). 269 </p><p> 270</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128473508"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> — 271 Evict vmas to make room for binding a new one 272 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> — 273 Evict all idle vmas from a vm 274 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-everything.html"><span class="phrase">i915_gem_evict_everything</span></a></span><span class="refpurpose"> — 275 Try to evict all objects 276 </span></dt></dl></div><p> 277 This section documents the interface functions for evicting buffer 278 objects to make space available in the virtual gpu address spaces. 279 Note that this is mostly orthogonal to shrinking buffer objects 280 caches, which has the goal to make main memory (shared with the gpu 281 through the unified memory architecture) available. 282 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128510940"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> — 283 Shrink buffer object caches 284 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> — 285 Shrink buffer object caches completely 286 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> — 287 Initialize i915 shrinker 288 </span></dt></dl></div><p> 289 This section documents the interface function for shrinking memory 290 usage of buffer object caches. Shrinking is used to make main memory 291 available. Note that this is mostly orthogonal to evicting buffer 292 objects, which has the goal to make space in gpu virtual address 293 spaces. 294 </p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_dp_drrs_init</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> <span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html> 295