Lines Matching refs:the

6 This document describes the requirement from hardware for PCI MMIO resource
8 requirement. The first two sections describe the concepts of Partitionable
9 Endpoints and the implementation on P8 (IODA2). The next two sections talks
14 A Partitionable Endpoint (PE) is a way to group the various resources
17 to freeze a device that is causing errors in order to limit the possibility
26 captures things like the details of the error that caused the freeze etc., but
29 The interesting part is how the various PCIe transactions (MMIO, DMA, ...)
34 is a completely separate HW entity that replicates the entire logic, so has
44 memory but accessed in HW by the chip) that provides a direct
46 We call this the RTT.
49 contain two "windows", depending on the value of PCI address bit 59.
54 - For MSIs, we have two windows in the address space (one at the top of
55 the 32-bit space and one much higher) which, via a combination of the
56 address and MSI value, will result in one of the 2048 interrupts per
57 bridge being triggered. There's a PE# in the interrupt controller
58 descriptor table as well which is compared with the PE# obtained from
59 the RTT to "authorize" the device to emit that specific interrupt.
61 - Error messages just use the RTT.
63 * Outbound. That's where the tricky part is.
65 Like other PCI host bridges, the Power8 IODA2 PHB supports "windows"
66 from the CPU address space to the PCI address space. There is one M32
69 the CPU address space to the PCIe bus and must be naturally aligned
76 * Drops the top bits of the address (above the size) and replaces
80 portion of address space from the CPU to PCIe
83 need to ensure Linux doesn't assign anything there, the M32 logic
86 * It is divided into 256 segments of equal size. A table in the chip
87 maps each segment to a PE#. That allows portions of the MMIO space
89 the segment granularity is 2GB/256 = 8MB.
91 Now, this is the "main" window we use in Linux today (excluding
92 SR-IOV). We basically use the trick of forcing the bridge MMIO windows
93 onto a segment alignment/granularity so that the space behind a bridge
105 * Do not translate addresses (the address on PCIe is the same as the
106 address on the PowerBus). There is a way to also set the top 14
110 specify the PE# for the entire window. When segmented, a window
112 to a PE#. The segment number *is* the PE#.
117 We have code (fairly new compared to the M32 stuff) that exploits that
120 We configure an M64 window to cover the entire region of address space
121 that has been assigned by FW for the PHB (about 64GB, ignore the space
122 for the M32, it comes out of a different "reserve"). We configure it
125 Then we do the same thing as with M32, using the bridge alignment
130 - We do the PE# allocation *after* the 64-bit space has been assigned
131 because the addresses we use directly determine the PE#. We then
132 update the M32 PE# for the devices that use both 32-bit and 64-bit
133 spaces or assign the remaining PE# to 32-bit only devices.
137 mechanism to make the freeze state cascade to "companion" PEs but
141 the best we found. So when any of the PEs freezes, we freeze the
142 other ones for that "domain". We thus introduce the concept of
143 "master PE" which is the one used for DMA, MSIs, etc., and "secondary
144 PEs" that are used for the remaining M64 segments.
156 support several Virtual Functions (VFs). Registers in the PF's SR-IOV
157 Capability control the number of VFs and whether they are enabled.
160 PCI devices, but the BARs in VF config space headers are unusual. For
161 a non-VF device, software uses BARs in the config space header to
162 discover the BAR sizes and assign addresses for them. For VF devices,
163 software uses VF BAR registers in the *PF* SR-IOV Capability to
164 discover sizes and assign addresses. The BARs in the VF's config space
167 When a VF BAR in the PF SR-IOV Capability is programmed, it sets the
168 base address for all the corresponding VF(n) BARs. For example, if the
170 1MB VF BAR0, the address in that VF BAR sets the base of an 8MB region.
172 is a BAR0 for one of the VFs. Note that even though the VF BAR
173 describes an 8MB region, the alignment requirement is for a single VF,
182 individually mapped to a PE via the lookup table, so this is quite
183 flexible, but it works best when all the VF BARs are the same size. If
184 they are different sizes, the entire window has to be small enough that
185 the segment size matches the smallest VF BAR, which means larger VF
192 like the M32 window, but the segments can't be individually mapped to
193 PEs (the segment number is the PE#), so there isn't as much
198 equally-sized segments, and the segment number is the PE#. But if we
204 Finally, the plan to use M64 windows for SR-IOV, which will be described
205 more in the next two sections. For a given VF BAR, we need to
206 effectively reserve the entire 256 segments (256 * VF BAR size) and
207 position the VF BAR to start at the beginning of a free range of
217 SR-IOV VF BARs are all the same size.
220 than the number of M64 window segments, so if we map one VF BAR directly
221 to one M64 window, some part of the M64 window will map to another
225 total_VFs is less than 256, we have the situation in Figure 1.0, where
226 segments [total_VFs, 255] of the M64 window may map to some MMIO range on
245 Our current solution is to allocate 256 segments even if the VF(n) BAR
264 Allocating the extra space ensures that the entire M64 window will be
265 assigned to this one SR-IOV device and none of the space will be
266 available for other devices. Note that this only expands the space
271 4. Implications for the Generic PCI Code
273 The PCIe SR-IOV spec requires that the base of the VF(n) BAR space be
274 aligned to the size of an individual VF BAR.
276 In IODA2, the MMIO address determines the PE#. If the address is in an M32
277 window, we can set the PE# by updating the table that translates segments
278 to PE#s. Similarly, if the address is in an unsegmented M64 window, we can
279 set the PE# for the window. But if it's in a segmented M64 window, the
280 segment number is the PE#.
282 Therefore, the only way to control the PE# for a VF is to change the base
283 of the VF(n) BAR space in the VF BAR. If the PCI core allocates the exact
284 amount of space required for the VF(n) BAR space, the VF BAR value is fixed
287 On the other hand, if the PCI core allocates additional space, the VF BAR
288 value can be changed as long as the entire VF(n) BAR space remains inside
289 the space allocated by the core.
291 Ideally the segment size will be the same as an individual VF BAR size.
292 Then each VF will be in its own PE. The VF BARs (and therefore the PE#s)
294 allocate 256 segments, there are (256 - numVFs) choices for the PE# of VF0.
296 If the segment size is smaller than the VF BAR size, it will take several
298 possible, but the isolation isn't as good, and it reduces the number of PE#
299 choices because instead of consuming only numVFs segments, the VF(n) BAR
301 available segments for adjusting base of the VF(n) BAR space.