DocBook/drm/ch04s03.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux DRM Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter&#160;4.&#160;drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-dp-drrs-init.html" title="intel_dp_drrs_init"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a>&#160;</td><th width="60%" align="center">Chapter&#160;4.&#160;drm/i915 Intel GFX Driver</th><td width="20%" align="right">&#160;<a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idp1128257548"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#idp1128258212">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128310204">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128337188">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128435628">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128473508">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#idp1128510940">Buffer Object Memory Shrinking</a></span></dt></dl></div><p>
	This sections covers all things related to the GEM implementation in the
	i915 driver.
      </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128258212"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> &#8212;
  set cmd parser related fields for a ringbuffer
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> &#8212;
     clean up cmd parser related fields
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> &#8212;
     should a given ring use software command parsing?
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> &#8212;
     parse a submitted batch buffer for privilege violations
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> &#8212;
     get the cmd parser version number
 </span></dt></dl></div><p>
   </p><p>
   Motivation:
   Certain OpenGL features (e.g. transform feedback, performance monitoring)
   require userspace code to submit batches containing commands such as
   MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
   generations of the hardware will noop these commands in <span class="quote">&#8220;<span class="quote">unsecure</span>&#8221;</span> batches
   (which includes all userspace batches submitted via i915) even though the
   commands may be safe and represent the intended programming model of the
   device.
   </p><p>
   The software command parser is similar in operation to the command parsing
   done in hardware for unsecure batches. However, the software parser allows
   some operations that would be noop'd by hardware, if the parser determines
   the operation is safe, and submits the batch as <span class="quote">&#8220;<span class="quote">secure</span>&#8221;</span> to prevent hardware
   parsing.
   </p><p>
   Threats:
   At a high level, the hardware (and software) checks attempt to prevent
   granting userspace undue privileges. There are three categories of privilege.
   </p><p>
   First, commands which are explicitly defined as privileged or which should
   only be used by the kernel driver. The parser generally rejects such
   commands, though it may allow some from the drm master process.
   </p><p>
   Second, commands which access registers. To support correct/enhanced
   userspace functionality, particularly certain OpenGL extensions, the parser
   provides a whitelist of registers which userspace may safely access (for both
   normal and drm master processes).
   </p><p>
   Third, commands which access privileged memory (i.e. GGTT, HWS page, etc).
   The parser always rejects such commands.
   </p><p>
   The majority of the problematic commands fall in the MI_* range, with only a
   few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW).
   </p><p>
   Implementation:
   Each ring maintains tables of commands and registers which the parser uses in
   scanning batch buffers submitted to that ring.
   </p><p>
   Since the set of commands that the parser must check for is significantly
   smaller than the number of commands supported, the parser tables contain only
   those commands required by the parser. This generally works because command
   opcode ranges have standard command length encodings. So for commands that
   the parser does not need to check, it can easily skip them. This is
   implemented via a per-ring length decoding vfunc.
   </p><p>
   Unfortunately, there are a number of commands that do not follow the standard
   length encoding for their opcode range, primarily amongst the MI_* commands.
   To handle this, the parser provides a way to define explicit <span class="quote">&#8220;<span class="quote">skip</span>&#8221;</span> entries
   in the per-ring command tables.
   </p><p>
   Other command table entries map fairly directly to high level categories
   mentioned above: rejected, master-only, register whitelist. The parser
   implements a number of checks, including the privileged memory checks, via a
   general bitmasking mechanism.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128310204"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> &#8212;
  initialize a batch buffer pool
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> &#8212;
     clean up a batch buffer pool
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> &#8212;
     select a buffer from the pool
 </span></dt></dl></div><p>
   </p><p>
   In order to submit batch buffers as 'secure', the software command parser
   must ensure that a batch buffer cannot be modified after parsing. It does
   this by copying the user provided batch buffer contents to a kernel owned
   buffer from which the hardware will actually execute, and by carefully
   managing the address space bindings for such buffers.
   </p><p>
   The batch pool framework provides a mechanism for the driver to manage a
   set of scratch buffers to use for this purpose. The framework can be
   extended to support other uses cases should they arise.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128337188"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> &#8212;
  sanitize i915.enable_execlists
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> &#8212;
     get the Execlists Context ID
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> &#8212;
     handle Context Switch interrupts
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> &#8212;
     submit a batchbuffer for execution, Execlists style
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> &#8212;
     prepare the logical ringbuffer to accept some commands
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> &#8212;
     deallocate the Engine Command Streamer
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> &#8212;
     allocate, populate and init the Engine Command Streamers
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> &#8212;
     free the LRC specific bits of a context
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-create.html"><span class="phrase">intel_lr_context_deferred_create</span></a></span><span class="refpurpose"> &#8212;
     create the LRC specific bits of a context
 </span></dt></dl></div><p>
   </p><p>
   Motivation:
   GEN8 brings an expansion of the HW contexts: <span class="quote">&#8220;<span class="quote">Logical Ring Contexts</span>&#8221;</span>.
   These expanded contexts enable a number of new abilities, especially
   <span class="quote">&#8220;<span class="quote">Execlists</span>&#8221;</span> (also implemented in this file).
   </p><p>
   One of the main differences with the legacy HW contexts is that logical
   ring contexts incorporate many more things to the context's state, like
   PDPs or ringbuffer control registers:
   </p><p>
   The reason why PDPs are included in the context is straightforward: as
   PPGTTs (per-process GTTs) are actually per-context, having the PDPs
   contained there mean you don't need to do a ppgtt-&gt;switch_mm yourself,
   instead, the GPU will do it for you on the context switch.
   </p><p>
   But, what about the ringbuffer control registers (head, tail, etc..)?
   shouldn't we just need a set of those per engine command streamer? This is
   where the name <span class="quote">&#8220;<span class="quote">Logical Rings</span>&#8221;</span> starts to make sense: by virtualizing the
   rings, the engine cs shifts to a new <span class="quote">&#8220;<span class="quote">ring buffer</span>&#8221;</span> with every context
   switch. When you want to submit a workload to the GPU you: A) choose your
   context, B) find its appropriate virtualized ring, C) write commands to it
   and then, finally, D) tell the GPU to switch to that context.
   </p><p>
   Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
   to a contexts is via a context execution list, ergo <span class="quote">&#8220;<span class="quote">Execlists</span>&#8221;</span>.
   </p><p>
   LRC implementation:
   Regarding the creation of contexts, we have:
   </p><p>
   - One global default context.
   - One local default context for each opened fd.
   - One local extra context for each context create ioctl call.
   </p><p>
   Now that ringbuffers belong per-context (and not per-engine, like before)
   and that contexts are uniquely tied to a given engine (and not reusable,
   like before) we need:
   </p><p>
   - One ringbuffer per-engine inside each context.
   - One backing object per-engine inside each context.
   </p><p>
   The global default context starts its life with these new objects fully
   allocated and populated. The local default context for each opened fd is
   more complex, because we don't know at creation time which engine is going
   to use them. To handle this, we have implemented a deferred creation of LR
   contexts:
   </p><p>
   The local context starts its life as a hollow or blank holder, that only
   gets populated for a given engine once we receive an execbuffer. If later
   on we receive another execbuffer ioctl for the same context but a different
   engine, we allocate/populate a new ringbuffer and context backing object and
   so on.
   </p><p>
   Finally, regarding local contexts created using the ioctl call: as they are
   only allowed with the render ring, we can allocate &amp; populate them right
   away (no need to defer anything, at least for now).
   </p><p>
   Execlists implementation:
   Execlists are the new method by which, on gen8+ hardware, workloads are
   submitted for execution (as opposed to the legacy, ringbuffer-based, method).
   This method works as follows:
   </p><p>
   When a request is committed, its commands (the BB start and any leading or
   trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
   for the appropriate context. The tail pointer in the hardware context is not
   updated at this time, but instead, kept by the driver in the ringbuffer
   structure. A structure representing this request is added to a request queue
   for the appropriate engine: this structure contains a copy of the context's
   tail after the request was written to the ring buffer and a pointer to the
   context itself.
   </p><p>
   If the engine's request queue was empty before the request was added, the
   queue is processed immediately. Otherwise the queue will be processed during
   a context switch interrupt. In any case, elements on the queue will get sent
   (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
   globally unique 20-bits submission ID.
   </p><p>
   When execution of a request completes, the GPU updates the context status
   buffer with a context complete event and generates a context switch interrupt.
   During the interrupt handling, the driver examines the events in the buffer:
   for each context complete event, if the announced ID matches that on the head
   of the request queue, then that request is retired and removed from the queue.
   </p><p>
   After processing, if any requests were retired and the queue is not empty
   then a new execution list can be submitted. The two requests at the front of
   the queue are next to be submitted but since a context may not occur twice in
   an execution list, if subsequent requests have the same ID as the first then
   the two requests must be combined. This is done simply by discarding requests
   at the head of the queue until either only one requests is left (in which case
   we use a NULL second context) or the first two requests have unique IDs.
   </p><p>
   By always executing the first two requests in the queue the driver ensures
   that the GPU is kept as busy as possible. In the case where a single context
   completes but a second context is still executing, the request for this second
   context will be at the head of the queue when we remove the first one. This
   request will then be resubmitted along with a new request for a different context,
   which will cause the hardware to continue executing the second request and queue
   the new request (the GPU detects the condition of a context getting preempted
   with the same context and optimizes the context switch flow by not doing
   preemption, but just sampling the new tail pointer).
   </p><p>
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128435628"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-dma-map-single.html"><span class="phrase">i915_dma_map_single</span></a></span><span class="refpurpose"> &#8212;
  Create a dma mapping for a page table/dir/etc.
 </span></dt><dt><span class="refentrytitle"><a href="API-alloc-pt-range.html"><span class="phrase">alloc_pt_range</span></a></span><span class="refpurpose"> &#8212;
     Allocate a multiple page tables
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> &#8212;
     Sets up PTEs for an VMA in it's corresponding address space.
 </span></dt></dl></div><p>
   </p><p>
   Background and previous state
   </p><p>
   Historically objects could exists (be bound) in global GTT space only as
   singular instances with a view representing all of the object's backing pages
   in a linear fashion. This view will be called a normal view.
   </p><p>
   To support multiple views of the same object, where the number of mapped
   pages is not equal to the backing store, or where the layout of the pages
   is not linear, concept of a GGTT view was added.
   </p><p>
   One example of an alternative view is a stereo display driven by a single
   image. In this case we would have a framebuffer looking like this
   (2x2 pages):
   </p><p>
   12
   34
   </p><p>
   Above would represent a normal GGTT view as normally mapped for GPU or CPU
   rendering. In contrast, fed to the display engine would be an alternative
   view which could look something like this:
   </p><p>
   1212
   3434
   </p><p>
   In this example both the size and layout of pages in the alternative view is
   different from the normal view.
   </p><p>
   Implementation and usage
   </p><p>
   GGTT views are implemented using VMAs and are distinguished via enum
   i915_ggtt_view_type and struct i915_ggtt_view.
   </p><p>
   A new flavour of core GEM functions which work with GGTT bound objects were
   added with the _ggtt_ infix, and sometimes with _view postfix to avoid
   renaming  in large amounts of code. They take the struct i915_ggtt_view
   parameter encapsulating all metadata required to implement a view.
   </p><p>
   As a helper for callers which are only interested in the normal view,
   globally const i915_ggtt_view_normal singleton instance exists. All old core
   GEM API functions, the ones not taking the view parameter, are operating on,
   or with the normal GGTT view.
   </p><p>
   Code wanting to add or use a new GGTT view needs to:
   </p><p>
   1. Add a new enum with a suitable name.
   2. Extend the metadata in the i915_ggtt_view structure if required.
   3. Add support to <code class="function">i915_get_vma_pages</code>.
   </p><p>
   New views are required to build a scatter-gather table from within the
   i915_get_vma_pages function. This table is stored in the vma.ggtt_view and
   exists for the lifetime of an VMA.
   </p><p>
   Core API is designed to have copy semantics which means that passed in
   struct i915_ggtt_view does not need to be persistent (left around after
   calling the core API functions).
   </p><p>
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128473508"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> &#8212;
  Evict vmas to make room for binding a new one
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> &#8212;
     Evict all idle vmas from a vm
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-everything.html"><span class="phrase">i915_gem_evict_everything</span></a></span><span class="refpurpose"> &#8212;
     Try to evict all objects
 </span></dt></dl></div><p>
	  This section documents the interface functions for evicting buffer
	  objects to make space available in the virtual gpu address spaces.
	  Note that this is mostly orthogonal to shrinking buffer objects
	  caches, which has the goal to make main memory (shared with the gpu
	  through the unified memory architecture) available.
	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="idp1128510940"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> &#8212;
  Shrink buffer object caches
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> &#8212;
     Shrink buffer object caches completely
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> &#8212;
     Initialize i915 shrinker
 </span></dt></dl></div><p>
	  This section documents the interface function for shrinking memory
	  usage of buffer object caches. Shrinking is used to make main memory
	  available.  Note that this is mostly orthogonal to evicting buffer
	  objects, which has the goal to make space in gpu virtual address
	  spaces.
	</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-dp-drrs-init.html">Prev</a>&#160;</td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right">&#160;<a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_dp_drrs_init</span>&#160;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&#160;<span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html>