1<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Memory Management and Command Submission</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="Linux GPU Driver Developer's Guide"><link rel="up" href="drmI915.html" title="Chapter 4. drm/i915 Intel GFX Driver"><link rel="prev" href="API-intel-csr-ucode-fini.html" title="intel_csr_ucode_fini"><link rel="next" href="API-i915-cmd-parser-init-ring.html" title="i915_cmd_parser_init_ring"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Memory Management and Command Submission</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><th width="60%" align="center">Chapter 4. drm/i915 Intel GFX Driver</th><td width="20%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr></table><hr></div><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id-1.4.3.5"></a>Memory Management and Command Submission</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.3">Batchbuffer Parsing</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.4">Batchbuffer Pools</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.5">Logical Rings, Logical Ring Contexts and Execlists</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.6">Global GTT views</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.7">GTT Fences and Swizzling</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.8">Object Tiling IOCTLs</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.9">Buffer Object Eviction</a></span></dt><dt><span class="sect2"><a href="ch04s03.html#id-1.4.3.5.10">Buffer Object Memory Shrinking</a></span></dt></dl></div><p>
2	This sections covers all things related to the GEM implementation in the
3	i915 driver.
4      </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.3"></a>Batchbuffer Parsing</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-init-ring.html"><span class="phrase">i915_cmd_parser_init_ring</span></a></span><span class="refpurpose"> — 
5  set cmd parser related fields for a ringbuffer
6 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-fini-ring.html"><span class="phrase">i915_cmd_parser_fini_ring</span></a></span><span class="refpurpose"> — 
7     clean up cmd parser related fields
8 </span></dt><dt><span class="refentrytitle"><a href="API-i915-needs-cmd-parser.html"><span class="phrase">i915_needs_cmd_parser</span></a></span><span class="refpurpose"> — 
9     should a given ring use software command parsing?
10 </span></dt><dt><span class="refentrytitle"><a href="API-i915-parse-cmds.html"><span class="phrase">i915_parse_cmds</span></a></span><span class="refpurpose"> — 
11     parse a submitted batch buffer for privilege violations
12 </span></dt><dt><span class="refentrytitle"><a href="API-i915-cmd-parser-get-version.html"><span class="phrase">i915_cmd_parser_get_version</span></a></span><span class="refpurpose"> — 
13     get the cmd parser version number
14 </span></dt></dl></div><p>
15   </p><p>
16   Motivation:
17   Certain OpenGL features (e.g. transform feedback, performance monitoring)
18   require userspace code to submit batches containing commands such as
19   MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some
20   generations of the hardware will noop these commands in <span class="quote">“<span class="quote">unsecure</span>”</span> batches
21   (which includes all userspace batches submitted via i915) even though the
22   commands may be safe and represent the intended programming model of the
23   device.
24   </p><p>
25   The software command parser is similar in operation to the command parsing
26   done in hardware for unsecure batches. However, the software parser allows
27   some operations that would be noop'd by hardware, if the parser determines
28   the operation is safe, and submits the batch as <span class="quote">“<span class="quote">secure</span>”</span> to prevent hardware
29   parsing.
30   </p><p>
31   Threats:
32   At a high level, the hardware (and software) checks attempt to prevent
33   granting userspace undue privileges. There are three categories of privilege.
34   </p><p>
35   First, commands which are explicitly defined as privileged or which should
36   only be used by the kernel driver. The parser generally rejects such
37   commands, though it may allow some from the drm master process.
38   </p><p>
39   Second, commands which access registers. To support correct/enhanced
40   userspace functionality, particularly certain OpenGL extensions, the parser
41   provides a whitelist of registers which userspace may safely access (for both
42   normal and drm master processes).
43   </p><p>
44   Third, commands which access privileged memory (i.e. GGTT, HWS page, etc).
45   The parser always rejects such commands.
46   </p><p>
47   The majority of the problematic commands fall in the MI_* range, with only a
48   few specific commands on each ring (e.g. PIPE_CONTROL and MI_FLUSH_DW).
49   </p><p>
50   Implementation:
51   Each ring maintains tables of commands and registers which the parser uses in
52   scanning batch buffers submitted to that ring.
53   </p><p>
54   Since the set of commands that the parser must check for is significantly
55   smaller than the number of commands supported, the parser tables contain only
56   those commands required by the parser. This generally works because command
57   opcode ranges have standard command length encodings. So for commands that
58   the parser does not need to check, it can easily skip them. This is
59   implemented via a per-ring length decoding vfunc.
60   </p><p>
61   Unfortunately, there are a number of commands that do not follow the standard
62   length encoding for their opcode range, primarily amongst the MI_* commands.
63   To handle this, the parser provides a way to define explicit <span class="quote">“<span class="quote">skip</span>”</span> entries
64   in the per-ring command tables.
65   </p><p>
66   Other command table entries map fairly directly to high level categories
67   mentioned above: rejected, master-only, register whitelist. The parser
68   implements a number of checks, including the privileged memory checks, via a
69   general bitmasking mechanism.
70</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.4"></a>Batchbuffer Pools</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-init.html"><span class="phrase">i915_gem_batch_pool_init</span></a></span><span class="refpurpose"> — 
71  initialize a batch buffer pool
72 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-fini.html"><span class="phrase">i915_gem_batch_pool_fini</span></a></span><span class="refpurpose"> — 
73     clean up a batch buffer pool
74 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-batch-pool-get.html"><span class="phrase">i915_gem_batch_pool_get</span></a></span><span class="refpurpose"> — 
75     allocate a buffer from the pool
76 </span></dt></dl></div><p>
77   </p><p>
78   In order to submit batch buffers as 'secure', the software command parser
79   must ensure that a batch buffer cannot be modified after parsing. It does
80   this by copying the user provided batch buffer contents to a kernel owned
81   buffer from which the hardware will actually execute, and by carefully
82   managing the address space bindings for such buffers.
83   </p><p>
84   The batch pool framework provides a mechanism for the driver to manage a
85   set of scratch buffers to use for this purpose. The framework can be
86   extended to support other uses cases should they arise.
87</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.5"></a>Logical Rings, Logical Ring Contexts and Execlists</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-intel-sanitize-enable-execlists.html"><span class="phrase">intel_sanitize_enable_execlists</span></a></span><span class="refpurpose"> — 
88  sanitize i915.enable_execlists
89 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-ctx-id.html"><span class="phrase">intel_execlists_ctx_id</span></a></span><span class="refpurpose"> — 
90     get the Execlists Context ID
91 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lrc-irq-handler.html"><span class="phrase">intel_lrc_irq_handler</span></a></span><span class="refpurpose"> — 
92     handle Context Switch interrupts
93 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-begin.html"><span class="phrase">intel_logical_ring_begin</span></a></span><span class="refpurpose"> — 
94     prepare the logical ringbuffer to accept some commands
95 </span></dt><dt><span class="refentrytitle"><a href="API-intel-execlists-submission.html"><span class="phrase">intel_execlists_submission</span></a></span><span class="refpurpose"> — 
96     submit a batchbuffer for execution, Execlists style
97 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-indirectctx-bb.html"><span class="phrase">gen8_init_indirectctx_bb</span></a></span><span class="refpurpose"> — 
98     initialize indirect ctx batch with WA
99 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-init-perctx-bb.html"><span class="phrase">gen8_init_perctx_bb</span></a></span><span class="refpurpose"> — 
100     initialize per ctx batch with WA
101 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-ring-cleanup.html"><span class="phrase">intel_logical_ring_cleanup</span></a></span><span class="refpurpose"> — 
102     deallocate the Engine Command Streamer
103 </span></dt><dt><span class="refentrytitle"><a href="API-intel-logical-rings-init.html"><span class="phrase">intel_logical_rings_init</span></a></span><span class="refpurpose"> — 
104     allocate, populate and init the Engine Command Streamers
105 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-free.html"><span class="phrase">intel_lr_context_free</span></a></span><span class="refpurpose"> — 
106     free the LRC specific bits of a context
107 </span></dt><dt><span class="refentrytitle"><a href="API-intel-lr-context-deferred-alloc.html"><span class="phrase">intel_lr_context_deferred_alloc</span></a></span><span class="refpurpose"> — 
108     create the LRC specific bits of a context
109 </span></dt></dl></div><p>
110   </p><p>
111   Motivation:
112   GEN8 brings an expansion of the HW contexts: <span class="quote">“<span class="quote">Logical Ring Contexts</span>”</span>.
113   These expanded contexts enable a number of new abilities, especially
114   <span class="quote">“<span class="quote">Execlists</span>”</span> (also implemented in this file).
115   </p><p>
116   One of the main differences with the legacy HW contexts is that logical
117   ring contexts incorporate many more things to the context's state, like
118   PDPs or ringbuffer control registers:
119   </p><p>
120   The reason why PDPs are included in the context is straightforward: as
121   PPGTTs (per-process GTTs) are actually per-context, having the PDPs
122   contained there mean you don't need to do a ppgtt-&gt;switch_mm yourself,
123   instead, the GPU will do it for you on the context switch.
124   </p><p>
125   But, what about the ringbuffer control registers (head, tail, etc..)?
126   shouldn't we just need a set of those per engine command streamer? This is
127   where the name <span class="quote">“<span class="quote">Logical Rings</span>”</span> starts to make sense: by virtualizing the
128   rings, the engine cs shifts to a new <span class="quote">“<span class="quote">ring buffer</span>”</span> with every context
129   switch. When you want to submit a workload to the GPU you: A) choose your
130   context, B) find its appropriate virtualized ring, C) write commands to it
131   and then, finally, D) tell the GPU to switch to that context.
132   </p><p>
133   Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switch
134   to a contexts is via a context execution list, ergo <span class="quote">“<span class="quote">Execlists</span>”</span>.
135   </p><p>
136   LRC implementation:
137   Regarding the creation of contexts, we have:
138   </p><p>
139   - One global default context.
140   - One local default context for each opened fd.
141   - One local extra context for each context create ioctl call.
142   </p><p>
143   Now that ringbuffers belong per-context (and not per-engine, like before)
144   and that contexts are uniquely tied to a given engine (and not reusable,
145   like before) we need:
146   </p><p>
147   - One ringbuffer per-engine inside each context.
148   - One backing object per-engine inside each context.
149   </p><p>
150   The global default context starts its life with these new objects fully
151   allocated and populated. The local default context for each opened fd is
152   more complex, because we don't know at creation time which engine is going
153   to use them. To handle this, we have implemented a deferred creation of LR
154   contexts:
155   </p><p>
156   The local context starts its life as a hollow or blank holder, that only
157   gets populated for a given engine once we receive an execbuffer. If later
158   on we receive another execbuffer ioctl for the same context but a different
159   engine, we allocate/populate a new ringbuffer and context backing object and
160   so on.
161   </p><p>
162   Finally, regarding local contexts created using the ioctl call: as they are
163   only allowed with the render ring, we can allocate &amp; populate them right
164   away (no need to defer anything, at least for now).
165   </p><p>
166   Execlists implementation:
167   Execlists are the new method by which, on gen8+ hardware, workloads are
168   submitted for execution (as opposed to the legacy, ringbuffer-based, method).
169   This method works as follows:
170   </p><p>
171   When a request is committed, its commands (the BB start and any leading or
172   trailing commands, like the seqno breadcrumbs) are placed in the ringbuffer
173   for the appropriate context. The tail pointer in the hardware context is not
174   updated at this time, but instead, kept by the driver in the ringbuffer
175   structure. A structure representing this request is added to a request queue
176   for the appropriate engine: this structure contains a copy of the context's
177   tail after the request was written to the ring buffer and a pointer to the
178   context itself.
179   </p><p>
180   If the engine's request queue was empty before the request was added, the
181   queue is processed immediately. Otherwise the queue will be processed during
182   a context switch interrupt. In any case, elements on the queue will get sent
183   (in pairs) to the GPU's ExecLists Submit Port (ELSP, for short) with a
184   globally unique 20-bits submission ID.
185   </p><p>
186   When execution of a request completes, the GPU updates the context status
187   buffer with a context complete event and generates a context switch interrupt.
188   During the interrupt handling, the driver examines the events in the buffer:
189   for each context complete event, if the announced ID matches that on the head
190   of the request queue, then that request is retired and removed from the queue.
191   </p><p>
192   After processing, if any requests were retired and the queue is not empty
193   then a new execution list can be submitted. The two requests at the front of
194   the queue are next to be submitted but since a context may not occur twice in
195   an execution list, if subsequent requests have the same ID as the first then
196   the two requests must be combined. This is done simply by discarding requests
197   at the head of the queue until either only one requests is left (in which case
198   we use a NULL second context) or the first two requests have unique IDs.
199   </p><p>
200   By always executing the first two requests in the queue the driver ensures
201   that the GPU is kept as busy as possible. In the case where a single context
202   completes but a second context is still executing, the request for this second
203   context will be at the head of the queue when we remove the first one. This
204   request will then be resubmitted along with a new request for a different context,
205   which will cause the hardware to continue executing the second request and queue
206   the new request (the GPU detects the condition of a context getting preempted
207   with the same context and optimizes the context switch flow by not doing
208   preemption, but just sampling the new tail pointer).
209   </p><p>
210</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.6"></a>Global GTT views</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-pagetabs.html"><span class="phrase">gen8_ppgtt_alloc_pagetabs</span></a></span><span class="refpurpose"> — 
211  Allocate page tables for VA range.
212 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-directories.html"><span class="phrase">gen8_ppgtt_alloc_page_directories</span></a></span><span class="refpurpose"> — 
213     Allocate page directories for VA range.
214 </span></dt><dt><span class="refentrytitle"><a href="API-gen8-ppgtt-alloc-page-dirpointers.html"><span class="phrase">gen8_ppgtt_alloc_page_dirpointers</span></a></span><span class="refpurpose"> — 
215     Allocate pdps for VA range.
216 </span></dt><dt><span class="refentrytitle"><a href="API-i915-vma-bind.html"><span class="phrase">i915_vma_bind</span></a></span><span class="refpurpose"> — 
217     Sets up PTEs for an VMA in it's corresponding address space.
218 </span></dt><dt><span class="refentrytitle"><a href="API-i915-ggtt-view-size.html"><span class="phrase">i915_ggtt_view_size</span></a></span><span class="refpurpose"> — 
219     Get the size of a GGTT view.
220 </span></dt></dl></div><p>
221   </p><p>
222   Background and previous state
223   </p><p>
224   Historically objects could exists (be bound) in global GTT space only as
225   singular instances with a view representing all of the object's backing pages
226   in a linear fashion. This view will be called a normal view.
227   </p><p>
228   To support multiple views of the same object, where the number of mapped
229   pages is not equal to the backing store, or where the layout of the pages
230   is not linear, concept of a GGTT view was added.
231   </p><p>
232   One example of an alternative view is a stereo display driven by a single
233   image. In this case we would have a framebuffer looking like this
234   (2x2 pages):
235   </p><p>
236   12
237   34
238   </p><p>
239   Above would represent a normal GGTT view as normally mapped for GPU or CPU
240   rendering. In contrast, fed to the display engine would be an alternative
241   view which could look something like this:
242   </p><p>
243   1212
244   3434
245   </p><p>
246   In this example both the size and layout of pages in the alternative view is
247   different from the normal view.
248   </p><p>
249   Implementation and usage
250   </p><p>
251   GGTT views are implemented using VMAs and are distinguished via enum
252   i915_ggtt_view_type and struct i915_ggtt_view.
253   </p><p>
254   A new flavour of core GEM functions which work with GGTT bound objects were
255   added with the _ggtt_ infix, and sometimes with _view postfix to avoid
256   renaming  in large amounts of code. They take the struct i915_ggtt_view
257   parameter encapsulating all metadata required to implement a view.
258   </p><p>
259   As a helper for callers which are only interested in the normal view,
260   globally const i915_ggtt_view_normal singleton instance exists. All old core
261   GEM API functions, the ones not taking the view parameter, are operating on,
262   or with the normal GGTT view.
263   </p><p>
264   Code wanting to add or use a new GGTT view needs to:
265   </p><p>
266   1. Add a new enum with a suitable name.
267   2. Extend the metadata in the i915_ggtt_view structure if required.
268   3. Add support to <code class="function">i915_get_vma_pages</code>.
269   </p><p>
270   New views are required to build a scatter-gather table from within the
271   i915_get_vma_pages function. This table is stored in the vma.ggtt_view and
272   exists for the lifetime of an VMA.
273   </p><p>
274   Core API is designed to have copy semantics which means that passed in
275   struct i915_ggtt_view does not need to be persistent (left around after
276   calling the core API functions).
277   </p><p>
278</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.7"></a>GTT Fences and Swizzling</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-object-put-fence.html"><span class="phrase">i915_gem_object_put_fence</span></a></span><span class="refpurpose"> — 
279  force-remove fence for an object
280 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-get-fence.html"><span class="phrase">i915_gem_object_get_fence</span></a></span><span class="refpurpose"> — 
281     set up fencing for an object
282 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-pin-fence.html"><span class="phrase">i915_gem_object_pin_fence</span></a></span><span class="refpurpose"> — 
283     pin fencing state
284 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-unpin-fence.html"><span class="phrase">i915_gem_object_unpin_fence</span></a></span><span class="refpurpose"> — 
285     unpin fencing state
286 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-restore-fences.html"><span class="phrase">i915_gem_restore_fences</span></a></span><span class="refpurpose"> — 
287     restore fence state
288 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-detect-bit-6-swizzle.html"><span class="phrase">i915_gem_detect_bit_6_swizzle</span></a></span><span class="refpurpose"> — 
289     detect bit 6 swizzling pattern
290 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-do-bit-17-swizzle.html"><span class="phrase">i915_gem_object_do_bit_17_swizzle</span></a></span><span class="refpurpose"> — 
291     fixup bit 17 swizzling
292 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-object-save-bit-17-swizzle.html"><span class="phrase">i915_gem_object_save_bit_17_swizzle</span></a></span><span class="refpurpose"> — 
293     save bit 17 swizzling
294 </span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.10">Global GTT Fence Handling</a></span></dt><dt><span class="sect3"><a href="ch04s03.html#id-1.4.3.5.7.11">Hardware Tiling and Swizzling Details</a></span></dt></dl></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.10"></a>Global GTT Fence Handling</h4></div></div></div><p>
295   </p><p>
296   Important to avoid confusions: <span class="quote">“<span class="quote">fences</span>”</span> in the i915 driver are not execution
297   fences used to track command completion but hardware detiler objects which
298   wrap a given range of the global GTT. Each platform has only a fairly limited
299   set of these objects.
300   </p><p>
301   Fences are used to detile GTT memory mappings. They're also connected to the
302   hardware frontbuffer render tracking and hence interract with frontbuffer
303   conmpression. Furthermore on older platforms fences are required for tiled
304   objects used by the display engine. They can also be used by the render
305   engine - they're required for blitter commands and are optional for render
306   commands. But on gen4+ both display (with the exception of fbc) and rendering
307   have their own tiling state bits and don't need fences.
308   </p><p>
309   Also note that fences only support X and Y tiling and hence can't be used for
310   the fancier new tiling formats like W, Ys and Yf.
311   </p><p>
312   Finally note that because fences are such a restricted resource they're
313   dynamically associated with objects. Furthermore fence state is committed to
314   the hardware lazily to avoid unecessary stalls on gen2/3. Therefore code must
315   explictly call <code class="function"><a class="link" href="API-i915-gem-object-get-fence.html" title="i915_gem_object_get_fence">i915_gem_object_get_fence</a></code> to synchronize fencing status
316   for cpu access. Also note that some code wants an unfenced view, for those
317   cases the fence can be removed forcefully with <code class="function"><a class="link" href="API-i915-gem-object-put-fence.html" title="i915_gem_object_put_fence">i915_gem_object_put_fence</a></code>.
318   </p><p>
319   Internally these functions will synchronize with userspace access by removing
320   CPU ptes into GTT mmaps (not the GTT ptes themselves) as needed.
321</p></div><div class="sect3"><div class="titlepage"><div><div><h4 class="title"><a name="id-1.4.3.5.7.11"></a>Hardware Tiling and Swizzling Details</h4></div></div></div><p>
322   </p><p>
323   The idea behind tiling is to increase cache hit rates by rearranging
324   pixel data so that a group of pixel accesses are in the same cacheline.
325   Performance improvement from doing this on the back/depth buffer are on
326   the order of 30%.
327   </p><p>
328   Intel architectures make this somewhat more complicated, though, by
329   adjustments made to addressing of data when the memory is in interleaved
330   mode (matched pairs of DIMMS) to improve memory bandwidth.
331   For interleaved memory, the CPU sends every sequential 64 bytes
332   to an alternate memory channel so it can get the bandwidth from both.
333   </p><p>
334   The GPU also rearranges its accesses for increased bandwidth to interleaved
335   memory, and it matches what the CPU does for non-tiled.  However, when tiled
336   it does it a little differently, since one walks addresses not just in the
337   X direction but also Y.  So, along with alternating channels when bit
338   6 of the address flips, it also alternates when other bits flip --  Bits 9
339   (every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines)
340   are common to both the 915 and 965-class hardware.
341   </p><p>
342   The CPU also sometimes XORs in higher bits as well, to improve
343   bandwidth doing strided access like we do so frequently in graphics.  This
344   is called <span class="quote">“<span class="quote">Channel XOR Randomization</span>”</span> in the MCH documentation.  The result
345   is that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its address
346   decode.
347   </p><p>
348   All of this bit 6 XORing has an effect on our memory management,
349   as we need to make sure that the 3d driver can correctly address object
350   contents.
351   </p><p>
352   If we don't have interleaved memory, all tiling is safe and no swizzling is
353   required.
354   </p><p>
355   When bit 17 is XORed in, we simply refuse to tile at all.  Bit
356   17 is not just a page offset, so as we page an objet out and back in,
357   individual pages in it will have different bit 17 addresses, resulting in
358   each 64 bytes being swapped with its neighbor!
359   </p><p>
360   Otherwise, if interleaved, we have to tell the 3d driver what the address
361   swizzling it needs to do is, since it's writing with the CPU to the pages
362   (bit 6 and potentially bit 11 XORed in), and the GPU is reading from the
363   pages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzling
364   required by the CPU of XORing in bit 6, 9, 10, and potentially 11, in order
365   to match what the GPU expects.
366</p></div></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.8"></a>Object Tiling IOCTLs</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-set-tiling.html"><span class="phrase">i915_gem_set_tiling</span></a></span><span class="refpurpose"> — 
367  IOCTL handler to set tiling mode
368 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-get-tiling.html"><span class="phrase">i915_gem_get_tiling</span></a></span><span class="refpurpose"> — 
369     IOCTL handler to get tiling mode
370 </span></dt></dl></div><p>
371   </p><p>
372   <code class="function"><a class="link" href="API-i915-gem-set-tiling.html" title="i915_gem_set_tiling">i915_gem_set_tiling</a></code> and <code class="function"><a class="link" href="API-i915-gem-get-tiling.html" title="i915_gem_get_tiling">i915_gem_get_tiling</a></code> is the userspace interface to
373   declare fence register requirements.
374   </p><p>
375   In principle GEM doesn't care at all about the internal data layout of an
376   object, and hence it also doesn't care about tiling or swizzling. There's two
377   exceptions:
378   </p><p>
379   - For X and Y tiling the hardware provides detilers for CPU access, so called
380   fences. Since there's only a limited amount of them the kernel must manage
381   these, and therefore userspace must tell the kernel the object tiling if it
382   wants to use fences for detiling.
383   - On gen3 and gen4 platforms have a swizzling pattern for tiled objects which
384   depends upon the physical page frame number. When swapping such objects the
385   page frame number might change and the kernel must be able to fix this up
386   and hence now the tiling. Note that on a subset of platforms with
387   asymmetric memory channel population the swizzling pattern changes in an
388   unknown way, and for those the kernel simply forbids swapping completely.
389   </p><p>
390   Since neither of this applies for new tiling layouts on modern platforms like
391   W, Ys and Yf tiling GEM only allows object tiling to be set to X or Y tiled.
392   Anything else can be handled in userspace entirely without the kernel's
393   invovlement.
394</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.9"></a>Buffer Object Eviction</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-evict-something.html"><span class="phrase">i915_gem_evict_something</span></a></span><span class="refpurpose"> — 
395  Evict vmas to make room for binding a new one
396 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-evict-vm.html"><span class="phrase">i915_gem_evict_vm</span></a></span><span class="refpurpose"> — 
397     Evict all idle vmas from a vm
398 </span></dt></dl></div><p>
399	  This section documents the interface functions for evicting buffer
400	  objects to make space available in the virtual gpu address spaces.
401	  Note that this is mostly orthogonal to shrinking buffer objects
402	  caches, which has the goal to make main memory (shared with the gpu
403	  through the unified memory architecture) available.
404	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="id-1.4.3.5.10"></a>Buffer Object Memory Shrinking</h3></div></div></div><div class="toc"><dl class="toc"><dt><span class="refentrytitle"><a href="API-i915-gem-shrink.html"><span class="phrase">i915_gem_shrink</span></a></span><span class="refpurpose"> — 
405  Shrink buffer object caches
406 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrink-all.html"><span class="phrase">i915_gem_shrink_all</span></a></span><span class="refpurpose"> — 
407     Shrink buffer object caches completely
408 </span></dt><dt><span class="refentrytitle"><a href="API-i915-gem-shrinker-init.html"><span class="phrase">i915_gem_shrinker_init</span></a></span><span class="refpurpose"> — 
409     Initialize i915 shrinker
410 </span></dt></dl></div><p>
411	  This section documents the interface function for shrinking memory
412	  usage of buffer object caches. Shrinking is used to make main memory
413	  available.  Note that this is mostly orthogonal to evicting buffer
414	  objects, which has the goal to make space in gpu virtual address
415	  spaces.
416	</p></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-intel-csr-ucode-fini.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="drmI915.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="API-i915-cmd-parser-init-ring.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">intel_csr_ucode_fini</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> <span class="phrase">i915_cmd_parser_init_ring</span></td></tr></table></div></body></html>
417