DocBook/gpu/ch04s03.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux GPU Driver Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter 4. drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-csr-ucode-fini.html" title="intel_csr_ucode_fini"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><th width="60%" align="center">Chapter 4. drm/i915 Intel GFX Driver</th><td width="20%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id-1.4.3.5"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.3">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.4">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.5">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.6">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.7">GTT Fences and Swizzling</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.8">Object Tiling IOCTLs</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.9">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.10">Buffer Object Memory Shrinking</a></span></dt></dl></div><p>
	This sections covers all things related to the GEM implementation in the
	i915 driver.
      </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.3"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> —
  set cmd parser related fields for a ringbuffer
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> —
     clean up cmd parser related fields
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> —
     should a given ring use software command parsing?
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> —
     parse a submitted batch buffer for privilege violations
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> —
     get the cmd parser version number
 </span></dt></dl></div><p>
   </p><p>
   Motivation:
   Certain OpenGL features (e.g. transform feedback, performance monitoring)
   require userspace code to submit batches containing commands such as
   MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
   generations of the hardware will noop these commands in <span class="quote">“<span class="quote">unsecure</span>”</span> batches
   (which includes all userspace batches submitted via i915) even though the
   commands may be safe and represent the intended programming model of the
   device.
   </p><p>
   The software command parser is similar in operation to the command parsing
   done in hardware for unsecure batches. However, the software parser allows
   some operations that would be noop'd by hardware, if the parser determines
   the operation is safe, and submits the batch as <span class="quote">“<span class="quote">secure</span>”</span> to prevent hardware
   parsing.
   </p><p>
   Threats:
   At a high level, the hardware (and software) checks attempt to prevent
   granting userspace undue privileges. There are three categories of privilege.
   </p><p>
   First, commands which are explicitly defined as privileged or which should
   only be used by the kernel driver. The parser generally rejects such
   commands, though it may allow some from the drm master process.
   </p><p>
   Second, commands which access registers. To support correct/enhanced
   userspace functionality, particularly certain OpenGL extensions, the parser
   provides a whitelist of registers which userspace may safely access (for both
   normal and drm master processes).
   </p><p>
   Third, commands which access privileged memory (i.e. GGTT, HWS page, etc).
   The parser always rejects such commands.
   </p><p>
   The majority of the problematic commands fall in the MI_* range, with only a
   few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW).
   </p><p>
   Implementation:
   Each ring maintains tables of commands and registers which the parser uses in
   scanning batch buffers submitted to that ring.
   </p><p>
   Since the set of commands that the parser must check for is significantly
   smaller than the number of commands supported, the parser tables contain only
   those commands required by the parser. This generally works because command
   opcode ranges have standard command length encodings. So for commands that
   the parser does not need to check, it can easily skip them. This is
   implemented via a per-ring length decoding vfunc.
   </p><p>
   Unfortunately, there are a number of commands that do not follow the standard
   length encoding for their opcode range, primarily amongst the MI_* commands.
   To handle this, the parser provides a way to define explicit <span class="quote">“<span class="quote">skip</span>”</span> entries
   in the per-ring command tables.
   </p><p>
   Other command table entries map fairly directly to high level categories
   mentioned above: rejected, master-only, register whitelist. The parser
   implements a number of checks, including the privileged memory checks, via a
   general bitmasking mechanism.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.4"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> —
  initialize a batch buffer pool
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> —
     clean up a batch buffer pool
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> —
     allocate a buffer from the pool
 </span></dt></dl></div><p>
   </p><p>
   In order to submit batch buffers as 'secure', the software command parser
   must ensure that a batch buffer cannot be modified after parsing. It does
   this by copying the user provided batch buffer contents to a kernel owned
   buffer from which the hardware will actually execute, and by carefully
   managing the address space bindings for such buffers.
   </p><p>
   The batch pool framework provides a mechanism for the driver to manage a
   set of scratch buffers to use for this purpose. The framework can be
   extended to support other uses cases should they arise.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.5"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> —
  sanitize i915.enable_execlists
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> —
     get the Execlists Context ID
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> —
     handle Context Switch interrupts
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> —
     prepare the logical ringbuffer to accept some commands
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> —
     submit a batchbuffer for execution, Execlists style
 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-indirectctx-bb.html"><span class="phrase">gen8_init_indirectctx_bb</span></a></span><span class="refpurpose"> —
     initialize indirect ctx batch with WA
 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-perctx-bb.html"><span class="phrase">gen8_init_perctx_bb</span></a></span><span class="refpurpose"> —
     initialize per ctx batch with WA
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> —
     deallocate the Engine Command Streamer
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> —
     allocate, populate and init the Engine Command Streamers
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> —
     free the LRC specific bits of a context
 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-alloc.html"><span class="phrase">intel_lr_context_deferred_alloc</span></a></span><span class="refpurpose"> —
     create the LRC specific bits of a context
 </span></dt></dl></div><p>
   </p><p>
   Motivation:
   GEN8 brings an expansion of the HW contexts: <span class="quote">“<span class="quote">Logical Ring Contexts</span>”</span>.
   These expanded contexts enable a number of new abilities, especially
   <span class="quote">“<span class="quote">Execlists</span>”</span> (also implemented in this file).
   </p><p>
   One of the main differences with the legacy HW contexts is that logical
   ring contexts incorporate many more things to the context's state, like
   PDPs or ringbuffer control registers:
   </p><p>
   The reason why PDPs are included in the context is straightforward: as
   PPGTTs (per-process GTTs) are actually per-context, having the PDPs
   contained there mean you don't need to do a ppgtt-&gt;switch_mm yourself,
   instead, the GPU will do it for you on the context switch.
   </p><p>
   But, what about the ringbuffer control registers (head, tail, etc..)?
   shouldn't we just need a set of those per engine command streamer? This is
   where the name <span class="quote">“<span class="quote">Logical Rings</span>”</span> starts to make sense: by virtualizing the
   rings, the engine cs shifts to a new <span class="quote">“<span class="quote">ring buffer</span>”</span> with every context
   switch. When you want to submit a workload to the GPU you: A) choose your
   context, B) find its appropriate virtualized ring, C) write commands to it
   and then, finally, D) tell the GPU to switch to that context.
   </p><p>
   Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
   to a contexts is via a context execution list, ergo <span class="quote">“<span class="quote">Execlists</span>”</span>.
   </p><p>
   LRC implementation:
   Regarding the creation of contexts, we have:
   </p><p>
   - One global default context.
   - One local default context for each opened fd.
   - One local extra context for each context create ioctl call.
   </p><p>
   Now that ringbuffers belong per-context (and not per-engine, like before)
   and that contexts are uniquely tied to a given engine (and not reusable,
   like before) we need:
   </p><p>
   - One ringbuffer per-engine inside each context.
   - One backing object per-engine inside each context.
   </p><p>
   The global default context starts its life with these new objects fully
   allocated and populated. The local default context for each opened fd is
   more complex, because we don't know at creation time which engine is going
   to use them. To handle this, we have implemented a deferred creation of LR
   contexts:
   </p><p>
   The local context starts its life as a hollow or blank holder, that only
   gets populated for a given engine once we receive an execbuffer. If later
   on we receive another execbuffer ioctl for the same context but a different
   engine, we allocate/populate a new ringbuffer and context backing object and
   so on.
   </p><p>
   Finally, regarding local contexts created using the ioctl call: as they are
   only allowed with the render ring, we can allocate &amp; populate them right
   away (no need to defer anything, at least for now).
   </p><p>
   Execlists implementation:
   Execlists are the new method by which, on gen8+ hardware, workloads are
   submitted for execution (as opposed to the legacy, ringbuffer-based, method).
   This method works as follows:
   </p><p>
   When a request is committed, its commands (the BB start and any leading or
   trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
   for the appropriate context. The tail pointer in the hardware context is not
   updated at this time, but instead, kept by the driver in the ringbuffer
   structure. A structure representing this request is added to a request queue
   for the appropriate engine: this structure contains a copy of the context's
   tail after the request was written to the ring buffer and a pointer to the
   context itself.
   </p><p>
   If the engine's request queue was empty before the request was added, the
   queue is processed immediately. Otherwise the queue will be processed during
   a context switch interrupt. In any case, elements on the queue will get sent
   (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
   globally unique 20-bits submission ID.
   </p><p>
   When execution of a request completes, the GPU updates the context status
   buffer with a context complete event and generates a context switch interrupt.
   During the interrupt handling, the driver examines the events in the buffer:
   for each context complete event, if the announced ID matches that on the head
   of the request queue, then that request is retired and removed from the queue.
   </p><p>
   After processing, if any requests were retired and the queue is not empty
   then a new execution list can be submitted. The two requests at the front of
   the queue are next to be submitted but since a context may not occur twice in
   an execution list, if subsequent requests have the same ID as the first then
   the two requests must be combined. This is done simply by discarding requests
   at the head of the queue until either only one requests is left (in which case
   we use a NULL second context) or the first two requests have unique IDs.
   </p><p>
   By always executing the first two requests in the queue the driver ensures
   that the GPU is kept as busy as possible. In the case where a single context
   completes but a second context is still executing, the request for this second
   context will be at the head of the queue when we remove the first one. This
   request will then be resubmitted along with a new request for a different context,
   which will cause the hardware to continue executing the second request and queue
   the new request (the GPU detects the condition of a context getting preempted
   with the same context and optimizes the context switch flow by not doing
   preemption, but just sampling the new tail pointer).
   </p><p>
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.6"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-pagetabs.html"><span class="phrase">gen8_ppgtt_alloc_pagetabs</span></a></span><span class="refpurpose"> —
  Allocate page tables for VA range.
 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-directories.html"><span class="phrase">gen8_ppgtt_alloc_page_directories</span></a></span><span class="refpurpose"> —
     Allocate page directories for VA range.
 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-dirpointers.html"><span class="phrase">gen8_ppgtt_alloc_page_dirpointers</span></a></span><span class="refpurpose"> —
     Allocate pdps for VA range.
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> —
     Sets up PTEs for an VMA in it's corresponding address space.
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-ggtt-view-size.html"><span class="phrase">i915_ggtt_view_size</span></a></span><span class="refpurpose"> —
     Get the size of a GGTT view.
 </span></dt></dl></div><p>
   </p><p>
   Background and previous state
   </p><p>
   Historically objects could exists (be bound) in global GTT space only as
   singular instances with a view representing all of the object's backing pages
   in a linear fashion. This view will be called a normal view.
   </p><p>
   To support multiple views of the same object, where the number of mapped
   pages is not equal to the backing store, or where the layout of the pages
   is not linear, concept of a GGTT view was added.
   </p><p>
   One example of an alternative view is a stereo display driven by a single
   image. In this case we would have a framebuffer looking like this
   (2x2 pages):
   </p><p>
   12
   34
   </p><p>
   Above would represent a normal GGTT view as normally mapped for GPU or CPU
   rendering. In contrast, fed to the display engine would be an alternative
   view which could look something like this:
   </p><p>
   1212
   3434
   </p><p>
   In this example both the size and layout of pages in the alternative view is
   different from the normal view.
   </p><p>
   Implementation and usage
   </p><p>
   GGTT views are implemented using VMAs and are distinguished via enum
   i915_ggtt_view_type and struct i915_ggtt_view.
   </p><p>
   A new flavour of core GEM functions which work with GGTT bound objects were
   added with the _ggtt_ infix, and sometimes with _view postfix to avoid
   renaming  in large amounts of code. They take the struct i915_ggtt_view
   parameter encapsulating all metadata required to implement a view.
   </p><p>
   As a helper for callers which are only interested in the normal view,
   globally const i915_ggtt_view_normal singleton instance exists. All old core
   GEM API functions, the ones not taking the view parameter, are operating on,
   or with the normal GGTT view.
   </p><p>
   Code wanting to add or use a new GGTT view needs to:
   </p><p>
   1. Add a new enum with a suitable name.
   2. Extend the metadata in the i915_ggtt_view structure if required.
   3. Add support to <code class="function">i915_get_vma_pages</code>.
   </p><p>
   New views are required to build a scatter-gather table from within the
   i915_get_vma_pages function. This table is stored in the vma.ggtt_view and
   exists for the lifetime of an VMA.
   </p><p>
   Core API is designed to have copy semantics which means that passed in
   struct i915_ggtt_view does not need to be persistent (left around after
   calling the core API functions).
   </p><p>
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.7"></a>GTT Fences and Swizzling</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-object-put-fence.html"><span class="phrase">i915_gem_object_put_fence</span></a></span><span class="refpurpose"> —
  force-remove fence for an object
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-get-fence.html"><span class="phrase">i915_gem_object_get_fence</span></a></span><span class="refpurpose"> —
     set up fencing for an object
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-pin-fence.html"><span class="phrase">i915_gem_object_pin_fence</span></a></span><span class="refpurpose"> —
     pin fencing state
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-unpin-fence.html"><span class="phrase">i915_gem_object_unpin_fence</span></a></span><span class="refpurpose"> —
     unpin fencing state
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-restore-fences.html"><span class="phrase">i915_gem_restore_fences</span></a></span><span class="refpurpose"> —
     restore fence state
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-detect-bit-6-swizzle.html"><span class="phrase">i915_gem_detect_bit_6_swizzle</span></a></span><span class="refpurpose"> —
     detect bit 6 swizzling pattern
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-do-bit-17-swizzle.html"><span class="phrase">i915_gem_object_do_bit_17_swizzle</span></a></span><span class="refpurpose"> —
     fixup bit 17 swizzling
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-save-bit-17-swizzle.html"><span class="phrase">i915_gem_object_save_bit_17_swizzle</span></a></span><span class="refpurpose"> —
     save bit 17 swizzling
 </span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.10">Global GTT Fence Handling</a></span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.11">Hardware Tiling and Swizzling Details</a></span></dt></dl></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.10"></a>Global GTT Fence Handling</h4></div></div></div><p>
   </p><p>
   Important to avoid confusions: <span class="quote">“<span class="quote">fences</span>”</span> in the i915 driver are not execution
   fences used to track command completion but hardware detiler objects which
   wrap a given range of the global GTT. Each platform has only a fairly limited
   set of these objects.
   </p><p>
   Fences are used to detile GTT memory mappings. They're also connected to the
   hardware frontbuffer render tracking and hence interract with frontbuffer
   conmpression. Furthermore on older platforms fences are required for tiled
   objects used by the display engine. They can also be used by the render
   engine - they're required for blitter commands and are optional for render
   commands. But on gen4+ both display (with the exception of fbc) and rendering
   have their own tiling state bits and don't need fences.
   </p><p>
   Also note that fences only support X and Y tiling and hence can't be used for
   the fancier new tiling formats like W, Ys and Yf.
   </p><p>
   Finally note that because fences are such a restricted resource they're
   dynamically associated with objects. Furthermore fence state is committed to
   the hardware lazily to avoid unecessary stalls on gen2/3. Therefore code must
   explictly call <code class="function"><a class="link" href="API-i915-gem-object-get-fence.html" title="i915_gem_object_get_fence">i915_gem_object_get_fence</a></code> to synchronize fencing status
   for cpu access. Also note that some code wants an unfenced view, for those
   cases the fence can be removed forcefully with <code class="function"><a class="link" href="API-i915-gem-object-put-fence.html" title="i915_gem_object_put_fence">i915_gem_object_put_fence</a></code>.
   </p><p>
   Internally these functions will synchronize with userspace access by removing
   CPU ptes into GTT mmaps (not the GTT ptes themselves) as needed.
</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.11"></a>Hardware Tiling and Swizzling Details</h4></div></div></div><p>
   </p><p>
   The idea behind tiling is to increase cache hit rates by rearranging
   pixel data so that a group of pixel accesses are in the same cacheline.
   Performance improvement from doing this on the back/depth buffer are on
   the order of 30%.
   </p><p>
   Intel architectures make this somewhat more complicated, though, by
   adjustments made to addressing of data when the memory is in interleaved
   mode (matched pairs of DIMMS) to improve memory bandwidth.
   For interleaved memory, the CPU sends every sequential 64 bytes
   to an alternate memory channel so it can get the bandwidth from both.
   </p><p>
   The GPU also rearranges its accesses for increased bandwidth to interleaved
   memory, and it matches what the CPU does for non-tiled.  However, when tiled
   it does it a little differently, since one walks addresses not just in the
   X direction but also Y.  So, along with alternating channels when bit
   6 of the address flips, it also alternates when other bits flip --  Bits 9
   (every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines)
   are common to both the 915 and 965-class hardware.
   </p><p>
   The CPU also sometimes XORs in higher bits as well, to improve
   bandwidth doing strided access like we do so frequently in graphics.  This
   is called <span class="quote">“<span class="quote">Channel XOR Randomization</span>”</span> in the MCH documentation.  The result
   is that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its address
   decode.
   </p><p>
   All of this bit 6 XORing has an effect on our memory management,
   as we need to make sure that the 3d driver can correctly address object
   contents.
   </p><p>
   If we don't have interleaved memory, all tiling is safe and no swizzling is
   required.
   </p><p>
   When bit 17 is XORed in, we simply refuse to tile at all.  Bit
   17 is not just a page offset, so as we page an objet out and back in,
   individual pages in it will have different bit 17 addresses, resulting in
   each 64 bytes being swapped with its neighbor!
   </p><p>
   Otherwise, if interleaved, we have to tell the 3d driver what the address
   swizzling it needs to do is, since it's writing with the CPU to the pages
   (bit 6 and potentially bit 11 XORed in), and the GPU is reading from the
   pages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzling
   required by the CPU of XORing in bit 6, 9, 10, and potentially 11, in order
   to match what the GPU expects.
</p></div></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.8"></a>Object Tiling IOCTLs</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-set-tiling.html"><span class="phrase">i915_gem_set_tiling</span></a></span><span class="refpurpose"> —
  IOCTL handler to set tiling mode
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-get-tiling.html"><span class="phrase">i915_gem_get_tiling</span></a></span><span class="refpurpose"> —
     IOCTL handler to get tiling mode
 </span></dt></dl></div><p>
   </p><p>
   <code class="function"><a class="link" href="API-i915-gem-set-tiling.html" title="i915_gem_set_tiling">i915_gem_set_tiling</a></code> and <code class="function"><a class="link" href="API-i915-gem-get-tiling.html" title="i915_gem_get_tiling">i915_gem_get_tiling</a></code> is the userspace interface to
   declare fence register requirements.
   </p><p>
   In principle GEM doesn't care at all about the internal data layout of an
   object, and hence it also doesn't care about tiling or swizzling. There's two
   exceptions:
   </p><p>
   - For X and Y tiling the hardware provides detilers for CPU access, so called
   fences. Since there's only a limited amount of them the kernel must manage
   these, and therefore userspace must tell the kernel the object tiling if it
   wants to use fences for detiling.
   - On gen3 and gen4 platforms have a swizzling pattern for tiled objects which
   depends upon the physical page frame number. When swapping such objects the
   page frame number might change and the kernel must be able to fix this up
   and hence now the tiling. Note that on a subset of platforms with
   asymmetric memory channel population the swizzling pattern changes in an
   unknown way, and for those the kernel simply forbids swapping completely.
   </p><p>
   Since neither of this applies for new tiling layouts on modern platforms like
   W, Ys and Yf tiling GEM only allows object tiling to be set to X or Y tiled.
   Anything else can be handled in userspace entirely without the kernel's
   invovlement.
</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.9"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> —
  Evict vmas to make room for binding a new one
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> —
     Evict all idle vmas from a vm
 </span></dt></dl></div><p>
	  This section documents the interface functions for evicting buffer
	  objects to make space available in the virtual gpu address spaces.
	  Note that this is mostly orthogonal to shrinking buffer objects
	  caches, which has the goal to make main memory (shared with the gpu
	  through the unified memory architecture) available.
	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.10"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> —
  Shrink buffer object caches
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> —
     Shrink buffer object caches completely
 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> —
     Initialize i915 shrinker
 </span></dt></dl></div><p>
	  This section documents the interface function for shrinking memory
	  usage of buffer object caches. Shrinking is used to make main memory
	  available.  Note that this is mostly orthogonal to evicting buffer
	  objects, which has the goal to make space in gpu virtual address
	  spaces.
	</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_csr_ucode_fini</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> <span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html>