1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux DRM Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter&#160;4.&#160;drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-dp-drrs-init.html" title="intel_dp_drrs_init"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a>&#160;</td><th width="60%" align="center">Chapter&#160;4.&#160;drm/i915 Intel GFX Driver</th><td width="20%" align="right">&#160;<a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idp1128257548"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#idp1128258212">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128310204">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128337188">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128435628">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128473508">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128510940">Buffer Object Memory Shrinking</a></span></dt></dl></div><p>
2	This sections covers all things related to the GEM implementation in the
3	i915 driver.
4      </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128258212"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> &#8212; 
5  set cmd parser related fields for a ringbuffer
6 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> &#8212; 
7     clean up cmd parser related fields
8 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> &#8212; 
9     should a given ring use software command parsing?
10 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> &#8212; 
11     parse a submitted batch buffer for privilege violations
12 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> &#8212; 
13     get the cmd parser version number
14 </span></dt></dl></div><p>
15   </p><p>
16   Motivation:
17   Certain OpenGL features (e.g. transform feedback, performance monitoring)
18   require userspace code to submit batches containing commands such as
19   MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
20   generations of the hardware will noop these commands in <span class="quote">&#8220;<span class="quote">unsecure</span>&#8221;</span> batches
21   (which includes all userspace batches submitted via i915) even though the
22   commands may be safe and represent the intended programming model of the
23   device.
24   </p><p>
25   The software command parser is similar in operation to the command parsing
26   done in hardware for unsecure batches. However, the software parser allows
27   some operations that would be noop'd by hardware, if the parser determines
28   the operation is safe, and submits the batch as <span class="quote">&#8220;<span class="quote">secure</span>&#8221;</span> to prevent hardware
29   parsing.
30   </p><p>
31   Threats:
32   At a high level, the hardware (and software) checks attempt to prevent
33   granting userspace undue privileges. There are three categories of privilege.
34   </p><p>
35   First, commands which are explicitly defined as privileged or which should
36   only be used by the kernel driver. The parser generally rejects such
37   commands, though it may allow some from the drm master process.
38   </p><p>
39   Second, commands which access registers. To support correct/enhanced
40   userspace functionality, particularly certain OpenGL extensions, the parser
41   provides a whitelist of registers which userspace may safely access (for both
42   normal and drm master processes).
43   </p><p>
44   Third, commands which access privileged memory (i.e. GGTT, HWS page, etc).
45   The parser always rejects such commands.
46   </p><p>
47   The majority of the problematic commands fall in the MI_* range, with only a
48   few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW).
49   </p><p>
50   Implementation:
51   Each ring maintains tables of commands and registers which the parser uses in
52   scanning batch buffers submitted to that ring.
53   </p><p>
54   Since the set of commands that the parser must check for is significantly
55   smaller than the number of commands supported, the parser tables contain only
56   those commands required by the parser. This generally works because command
57   opcode ranges have standard command length encodings. So for commands that
58   the parser does not need to check, it can easily skip them. This is
59   implemented via a per-ring length decoding vfunc.
60   </p><p>
61   Unfortunately, there are a number of commands that do not follow the standard
62   length encoding for their opcode range, primarily amongst the MI_* commands.
63   To handle this, the parser provides a way to define explicit <span class="quote">&#8220;<span class="quote">skip</span>&#8221;</span> entries
64   in the per-ring command tables.
65   </p><p>
66   Other command table entries map fairly directly to high level categories
67   mentioned above: rejected, master-only, register whitelist. The parser
68   implements a number of checks, including the privileged memory checks, via a
69   general bitmasking mechanism.
70</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128310204"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> &#8212; 
71  initialize a batch buffer pool
72 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> &#8212; 
73     clean up a batch buffer pool
74 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> &#8212; 
75     select a buffer from the pool
76 </span></dt></dl></div><p>
77   </p><p>
78   In order to submit batch buffers as 'secure', the software command parser
79   must ensure that a batch buffer cannot be modified after parsing. It does
80   this by copying the user provided batch buffer contents to a kernel owned
81   buffer from which the hardware will actually execute, and by carefully
82   managing the address space bindings for such buffers.
83   </p><p>
84   The batch pool framework provides a mechanism for the driver to manage a
85   set of scratch buffers to use for this purpose. The framework can be
86   extended to support other uses cases should they arise.
87</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128337188"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> &#8212; 
88  sanitize i915.enable_execlists
89 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> &#8212; 
90     get the Execlists Context ID
91 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> &#8212; 
92     handle Context Switch interrupts
93 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> &#8212; 
94     submit a batchbuffer for execution, Execlists style
95 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> &#8212; 
96     prepare the logical ringbuffer to accept some commands
97 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> &#8212; 
98     deallocate the Engine Command Streamer
99 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> &#8212; 
100     allocate, populate and init the Engine Command Streamers
101 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> &#8212; 
102     free the LRC specific bits of a context
103 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-create.html"><span class="phrase">intel_lr_context_deferred_create</span></a></span><span class="refpurpose"> &#8212; 
104     create the LRC specific bits of a context
105 </span></dt></dl></div><p>
106   </p><p>
107   Motivation:
108   GEN8 brings an expansion of the HW contexts: <span class="quote">&#8220;<span class="quote">Logical Ring Contexts</span>&#8221;</span>.
109   These expanded contexts enable a number of new abilities, especially
110   <span class="quote">&#8220;<span class="quote">Execlists</span>&#8221;</span> (also implemented in this file).
111   </p><p>
112   One of the main differences with the legacy HW contexts is that logical
113   ring contexts incorporate many more things to the context's state, like
114   PDPs or ringbuffer control registers:
115   </p><p>
116   The reason why PDPs are included in the context is straightforward: as
117   PPGTTs (per-process GTTs) are actually per-context, having the PDPs
118   contained there mean you don't need to do a ppgtt-&gt;switch_mm yourself,
119   instead, the GPU will do it for you on the context switch.
120   </p><p>
121   But, what about the ringbuffer control registers (head, tail, etc..)?
122   shouldn't we just need a set of those per engine command streamer? This is
123   where the name <span class="quote">&#8220;<span class="quote">Logical Rings</span>&#8221;</span> starts to make sense: by virtualizing the
124   rings, the engine cs shifts to a new <span class="quote">&#8220;<span class="quote">ring buffer</span>&#8221;</span> with every context
125   switch. When you want to submit a workload to the GPU you: A) choose your
126   context, B) find its appropriate virtualized ring, C) write commands to it
127   and then, finally, D) tell the GPU to switch to that context.
128   </p><p>
129   Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
130   to a contexts is via a context execution list, ergo <span class="quote">&#8220;<span class="quote">Execlists</span>&#8221;</span>.
131   </p><p>
132   LRC implementation:
133   Regarding the creation of contexts, we have:
134   </p><p>
135   - One global default context.
136   - One local default context for each opened fd.
137   - One local extra context for each context create ioctl call.
138   </p><p>
139   Now that ringbuffers belong per-context (and not per-engine, like before)
140   and that contexts are uniquely tied to a given engine (and not reusable,
141   like before) we need:
142   </p><p>
143   - One ringbuffer per-engine inside each context.
144   - One backing object per-engine inside each context.
145   </p><p>
146   The global default context starts its life with these new objects fully
147   allocated and populated. The local default context for each opened fd is
148   more complex, because we don't know at creation time which engine is going
149   to use them. To handle this, we have implemented a deferred creation of LR
150   contexts:
151   </p><p>
152   The local context starts its life as a hollow or blank holder, that only
153   gets populated for a given engine once we receive an execbuffer. If later
154   on we receive another execbuffer ioctl for the same context but a different
155   engine, we allocate/populate a new ringbuffer and context backing object and
156   so on.
157   </p><p>
158   Finally, regarding local contexts created using the ioctl call: as they are
159   only allowed with the render ring, we can allocate &amp; populate them right
160   away (no need to defer anything, at least for now).
161   </p><p>
162   Execlists implementation:
163   Execlists are the new method by which, on gen8+ hardware, workloads are
164   submitted for execution (as opposed to the legacy, ringbuffer-based, method).
165   This method works as follows:
166   </p><p>
167   When a request is committed, its commands (the BB start and any leading or
168   trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
169   for the appropriate context. The tail pointer in the hardware context is not
170   updated at this time, but instead, kept by the driver in the ringbuffer
171   structure. A structure representing this request is added to a request queue
172   for the appropriate engine: this structure contains a copy of the context's
173   tail after the request was written to the ring buffer and a pointer to the
174   context itself.
175   </p><p>
176   If the engine's request queue was empty before the request was added, the
177   queue is processed immediately. Otherwise the queue will be processed during
178   a context switch interrupt. In any case, elements on the queue will get sent
179   (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
180   globally unique 20-bits submission ID.
181   </p><p>
182   When execution of a request completes, the GPU updates the context status
183   buffer with a context complete event and generates a context switch interrupt.
184   During the interrupt handling, the driver examines the events in the buffer:
185   for each context complete event, if the announced ID matches that on the head
186   of the request queue, then that request is retired and removed from the queue.
187   </p><p>
188   After processing, if any requests were retired and the queue is not empty
189   then a new execution list can be submitted. The two requests at the front of
190   the queue are next to be submitted but since a context may not occur twice in
191   an execution list, if subsequent requests have the same ID as the first then
192   the two requests must be combined. This is done simply by discarding requests
193   at the head of the queue until either only one requests is left (in which case
194   we use a NULL second context) or the first two requests have unique IDs.
195   </p><p>
196   By always executing the first two requests in the queue the driver ensures
197   that the GPU is kept as busy as possible. In the case where a single context
198   completes but a second context is still executing, the request for this second
199   context will be at the head of the queue when we remove the first one. This
200   request will then be resubmitted along with a new request for a different context,
201   which will cause the hardware to continue executing the second request and queue
202   the new request (the GPU detects the condition of a context getting preempted
203   with the same context and optimizes the context switch flow by not doing
204   preemption, but just sampling the new tail pointer).
205   </p><p>
206</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128435628"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-dma-map-single.html"><span class="phrase">i915_dma_map_single</span></a></span><span class="refpurpose"> &#8212; 
207  Create a dma mapping for a page table/dir/etc.
208 </span></dt><dt><span class="refentrytitle"><a href="API-alloc-pt-range.html"><span class="phrase">alloc_pt_range</span></a></span><span class="refpurpose"> &#8212; 
209     Allocate a multiple page tables
210 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> &#8212; 
211     Sets up PTEs for an VMA in it's corresponding address space.
212 </span></dt></dl></div><p>
213   </p><p>
214   Background and previous state
215   </p><p>
216   Historically objects could exists (be bound) in global GTT space only as
217   singular instances with a view representing all of the object's backing pages
218   in a linear fashion. This view will be called a normal view.
219   </p><p>
220   To support multiple views of the same object, where the number of mapped
221   pages is not equal to the backing store, or where the layout of the pages
222   is not linear, concept of a GGTT view was added.
223   </p><p>
224   One example of an alternative view is a stereo display driven by a single
225   image. In this case we would have a framebuffer looking like this
226   (2x2 pages):
227   </p><p>
228   12
229   34
230   </p><p>
231   Above would represent a normal GGTT view as normally mapped for GPU or CPU
232   rendering. In contrast, fed to the display engine would be an alternative
233   view which could look something like this:
234   </p><p>
235   1212
236   3434
237   </p><p>
238   In this example both the size and layout of pages in the alternative view is
239   different from the normal view.
240   </p><p>
241   Implementation and usage
242   </p><p>
243   GGTT views are implemented using VMAs and are distinguished via enum
244   i915_ggtt_view_type and struct i915_ggtt_view.
245   </p><p>
246   A new flavour of core GEM functions which work with GGTT bound objects were
247   added with the _ggtt_ infix, and sometimes with _view postfix to avoid
248   renaming  in large amounts of code. They take the struct i915_ggtt_view
249   parameter encapsulating all metadata required to implement a view.
250   </p><p>
251   As a helper for callers which are only interested in the normal view,
252   globally const i915_ggtt_view_normal singleton instance exists. All old core
253   GEM API functions, the ones not taking the view parameter, are operating on,
254   or with the normal GGTT view.
255   </p><p>
256   Code wanting to add or use a new GGTT view needs to:
257   </p><p>
258   1. Add a new enum with a suitable name.
259   2. Extend the metadata in the i915_ggtt_view structure if required.
260   3. Add support to <code class="function">i915_get_vma_pages</code>.
261   </p><p>
262   New views are required to build a scatter-gather table from within the
263   i915_get_vma_pages function. This table is stored in the vma.ggtt_view and
264   exists for the lifetime of an VMA.
265   </p><p>
266   Core API is designed to have copy semantics which means that passed in
267   struct i915_ggtt_view does not need to be persistent (left around after
268   calling the core API functions).
269   </p><p>
270</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128473508"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> &#8212; 
271  Evict vmas to make room for binding a new one
272 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> &#8212; 
273     Evict all idle vmas from a vm
274 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-everything.html"><span class="phrase">i915_gem_evict_everything</span></a></span><span class="refpurpose"> &#8212; 
275     Try to evict all objects
276 </span></dt></dl></div><p>
277	  This section documents the interface functions for evicting buffer
278	  objects to make space available in the virtual gpu address spaces.
279	  Note that this is mostly orthogonal to shrinking buffer objects
280	  caches, which has the goal to make main memory (shared with the gpu
281	  through the unified memory architecture) available.
282	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128510940"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> &#8212; 
283  Shrink buffer object caches
284 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> &#8212; 
285     Shrink buffer object caches completely
286 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> &#8212; 
287     Initialize i915 shrinker
288 </span></dt></dl></div><p>
289	  This section documents the interface function for shrinking memory
290	  usage of buffer object caches. Shrinking is used to make main memory
291	  available.  Note that this is mostly orthogonal to evicting buffer
292	  objects, which has the goal to make space in gpu virtual address
293	  spaces.
294	</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a>&#160;</td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right">&#160;<a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_dp_drrs_init</span>&#160;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&#160;<span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html>
295