1Coherent Accelerator Interface (CXL) 2==================================== 3 4Introduction 5============ 6 7 The coherent accelerator interface is designed to allow the 8 coherent connection of accelerators (FPGAs and other devices) to a 9 POWER system. These devices need to adhere to the Coherent 10 Accelerator Interface Architecture (CAIA). 11 12 IBM refers to this as the Coherent Accelerator Processor Interface 13 or CAPI. In the kernel it's referred to by the name CXL to avoid 14 confusion with the ISDN CAPI subsystem. 15 16 Coherent in this context means that the accelerator and CPUs can 17 both access system memory directly and with the same effective 18 addresses. 19 20 21Hardware overview 22================= 23 24 POWER8 FPGA 25 +----------+ +---------+ 26 | | | | 27 | CPU | | AFU | 28 | | | | 29 | | | | 30 | | | | 31 +----------+ +---------+ 32 | PHB | | | 33 | +------+ | PSL | 34 | | CAPP |<------>| | 35 +---+------+ PCIE +---------+ 36 37 The POWER8 chip has a Coherently Attached Processor Proxy (CAPP) 38 unit which is part of the PCIe Host Bridge (PHB). This is managed 39 by Linux by calls into OPAL. Linux doesn't directly program the 40 CAPP. 41 42 The FPGA (or coherently attached device) consists of two parts. 43 The POWER Service Layer (PSL) and the Accelerator Function Unit 44 (AFU). The AFU is used to implement specific functionality behind 45 the PSL. The PSL, among other things, provides memory address 46 translation services to allow each AFU direct access to userspace 47 memory. 48 49 The AFU is the core part of the accelerator (eg. the compression, 50 crypto etc function). The kernel has no knowledge of the function 51 of the AFU. Only userspace interacts directly with the AFU. 52 53 The PSL provides the translation and interrupt services that the 54 AFU needs. This is what the kernel interacts with. For example, if 55 the AFU needs to read a particular effective address, it sends 56 that address to the PSL, the PSL then translates it, fetches the 57 data from memory and returns it to the AFU. If the PSL has a 58 translation miss, it interrupts the kernel and the kernel services 59 the fault. The context to which this fault is serviced is based on 60 who owns that acceleration function. 61 62 63AFU Modes 64========= 65 66 There are two programming modes supported by the AFU. Dedicated 67 and AFU directed. AFU may support one or both modes. 68 69 When using dedicated mode only one MMU context is supported. In 70 this mode, only one userspace process can use the accelerator at 71 time. 72 73 When using AFU directed mode, up to 16K simultaneous contexts can 74 be supported. This means up to 16K simultaneous userspace 75 applications may use the accelerator (although specific AFUs may 76 support fewer). In this mode, the AFU sends a 16 bit context ID 77 with each of its requests. This tells the PSL which context is 78 associated with each operation. If the PSL can't translate an 79 operation, the ID can also be accessed by the kernel so it can 80 determine the userspace context associated with an operation. 81 82 83MMIO space 84========== 85 86 A portion of the accelerator MMIO space can be directly mapped 87 from the AFU to userspace. Either the whole space can be mapped or 88 just a per context portion. The hardware is self describing, hence 89 the kernel can determine the offset and size of the per context 90 portion. 91 92 93Interrupts 94========== 95 96 AFUs may generate interrupts that are destined for userspace. These 97 are received by the kernel as hardware interrupts and passed onto 98 userspace by a read syscall documented below. 99 100 Data storage faults and error interrupts are handled by the kernel 101 driver. 102 103 104Work Element Descriptor (WED) 105============================= 106 107 The WED is a 64-bit parameter passed to the AFU when a context is 108 started. Its format is up to the AFU hence the kernel has no 109 knowledge of what it represents. Typically it will be the 110 effective address of a work queue or status block where the AFU 111 and userspace can share control and status information. 112 113 114 115 116User API 117======== 118 119 For AFUs operating in AFU directed mode, two character device 120 files will be created. /dev/cxl/afu0.0m will correspond to a 121 master context and /dev/cxl/afu0.0s will correspond to a slave 122 context. Master contexts have access to the full MMIO space an 123 AFU provides. Slave contexts have access to only the per process 124 MMIO space an AFU provides. 125 126 For AFUs operating in dedicated process mode, the driver will 127 only create a single character device per AFU called 128 /dev/cxl/afu0.0d. This will have access to the entire MMIO space 129 that the AFU provides (like master contexts in AFU directed). 130 131 The types described below are defined in include/uapi/misc/cxl.h 132 133 The following file operations are supported on both slave and 134 master devices. 135 136 A userspace library libcxl is available here: 137 https://github.com/ibm-capi/libcxl 138 This provides a C interface to this kernel API. 139 140open 141---- 142 143 Opens the device and allocates a file descriptor to be used with 144 the rest of the API. 145 146 A dedicated mode AFU only has one context and only allows the 147 device to be opened once. 148 149 An AFU directed mode AFU can have many contexts, the device can be 150 opened once for each context that is available. 151 152 When all available contexts are allocated the open call will fail 153 and return -ENOSPC. 154 155 Note: IRQs need to be allocated for each context, which may limit 156 the number of contexts that can be created, and therefore 157 how many times the device can be opened. The POWER8 CAPP 158 supports 2040 IRQs and 3 are used by the kernel, so 2037 are 159 left. If 1 IRQ is needed per context, then only 2037 160 contexts can be allocated. If 4 IRQs are needed per context, 161 then only 2037/4 = 509 contexts can be allocated. 162 163 164ioctl 165----- 166 167 CXL_IOCTL_START_WORK: 168 Starts the AFU context and associates it with the current 169 process. Once this ioctl is successfully executed, all memory 170 mapped into this process is accessible to this AFU context 171 using the same effective addresses. No additional calls are 172 required to map/unmap memory. The AFU memory context will be 173 updated as userspace allocates and frees memory. This ioctl 174 returns once the AFU context is started. 175 176 Takes a pointer to a struct cxl_ioctl_start_work: 177 178 struct cxl_ioctl_start_work { 179 __u64 flags; 180 __u64 work_element_descriptor; 181 __u64 amr; 182 __s16 num_interrupts; 183 __s16 reserved1; 184 __s32 reserved2; 185 __u64 reserved3; 186 __u64 reserved4; 187 __u64 reserved5; 188 __u64 reserved6; 189 }; 190 191 flags: 192 Indicates which optional fields in the structure are 193 valid. 194 195 work_element_descriptor: 196 The Work Element Descriptor (WED) is a 64-bit argument 197 defined by the AFU. Typically this is an effective 198 address pointing to an AFU specific structure 199 describing what work to perform. 200 201 amr: 202 Authority Mask Register (AMR), same as the powerpc 203 AMR. This field is only used by the kernel when the 204 corresponding CXL_START_WORK_AMR value is specified in 205 flags. If not specified the kernel will use a default 206 value of 0. 207 208 num_interrupts: 209 Number of userspace interrupts to request. This field 210 is only used by the kernel when the corresponding 211 CXL_START_WORK_NUM_IRQS value is specified in flags. 212 If not specified the minimum number required by the 213 AFU will be allocated. The min and max number can be 214 obtained from sysfs. 215 216 reserved fields: 217 For ABI padding and future extensions 218 219 CXL_IOCTL_GET_PROCESS_ELEMENT: 220 Get the current context id, also known as the process element. 221 The value is returned from the kernel as a __u32. 222 223 224mmap 225---- 226 227 An AFU may have an MMIO space to facilitate communication with the 228 AFU. If it does, the MMIO space can be accessed via mmap. The size 229 and contents of this area are specific to the particular AFU. The 230 size can be discovered via sysfs. 231 232 In AFU directed mode, master contexts are allowed to map all of 233 the MMIO space and slave contexts are allowed to only map the per 234 process MMIO space associated with the context. In dedicated 235 process mode the entire MMIO space can always be mapped. 236 237 This mmap call must be done after the START_WORK ioctl. 238 239 Care should be taken when accessing MMIO space. Only 32 and 64-bit 240 accesses are supported by POWER8. Also, the AFU will be designed 241 with a specific endianness, so all MMIO accesses should consider 242 endianness (recommend endian(3) variants like: le64toh(), 243 be64toh() etc). These endian issues equally apply to shared memory 244 queues the WED may describe. 245 246 247read 248---- 249 250 Reads events from the AFU. Blocks if no events are pending 251 (unless O_NONBLOCK is supplied). Returns -EIO in the case of an 252 unrecoverable error or if the card is removed. 253 254 read() will always return an integral number of events. 255 256 The buffer passed to read() must be at least 4K bytes. 257 258 The result of the read will be a buffer of one or more events, 259 each event is of type struct cxl_event, of varying size. 260 261 struct cxl_event { 262 struct cxl_event_header header; 263 union { 264 struct cxl_event_afu_interrupt irq; 265 struct cxl_event_data_storage fault; 266 struct cxl_event_afu_error afu_error; 267 }; 268 }; 269 270 The struct cxl_event_header is defined as: 271 272 struct cxl_event_header { 273 __u16 type; 274 __u16 size; 275 __u16 process_element; 276 __u16 reserved1; 277 }; 278 279 type: 280 This defines the type of event. The type determines how 281 the rest of the event is structured. These types are 282 described below and defined by enum cxl_event_type. 283 284 size: 285 This is the size of the event in bytes including the 286 struct cxl_event_header. The start of the next event can 287 be found at this offset from the start of the current 288 event. 289 290 process_element: 291 Context ID of the event. 292 293 reserved field: 294 For future extensions and padding. 295 296 If the event type is CXL_EVENT_AFU_INTERRUPT then the event 297 structure is defined as: 298 299 struct cxl_event_afu_interrupt { 300 __u16 flags; 301 __u16 irq; /* Raised AFU interrupt number */ 302 __u32 reserved1; 303 }; 304 305 flags: 306 These flags indicate which optional fields are present 307 in this struct. Currently all fields are mandatory. 308 309 irq: 310 The IRQ number sent by the AFU. 311 312 reserved field: 313 For future extensions and padding. 314 315 If the event type is CXL_EVENT_DATA_STORAGE then the event 316 structure is defined as: 317 318 struct cxl_event_data_storage { 319 __u16 flags; 320 __u16 reserved1; 321 __u32 reserved2; 322 __u64 addr; 323 __u64 dsisr; 324 __u64 reserved3; 325 }; 326 327 flags: 328 These flags indicate which optional fields are present in 329 this struct. Currently all fields are mandatory. 330 331 address: 332 The address that the AFU unsuccessfully attempted to 333 access. Valid accesses will be handled transparently by the 334 kernel but invalid accesses will generate this event. 335 336 dsisr: 337 This field gives information on the type of fault. It is a 338 copy of the DSISR from the PSL hardware when the address 339 fault occurred. The form of the DSISR is as defined in the 340 CAIA. 341 342 reserved fields: 343 For future extensions 344 345 If the event type is CXL_EVENT_AFU_ERROR then the event structure 346 is defined as: 347 348 struct cxl_event_afu_error { 349 __u16 flags; 350 __u16 reserved1; 351 __u32 reserved2; 352 __u64 error; 353 }; 354 355 flags: 356 These flags indicate which optional fields are present in 357 this struct. Currently all fields are Mandatory. 358 359 error: 360 Error status from the AFU. Defined by the AFU. 361 362 reserved fields: 363 For future extensions and padding 364 365Sysfs Class 366=========== 367 368 A cxl sysfs class is added under /sys/class/cxl to facilitate 369 enumeration and tuning of the accelerators. Its layout is 370 described in Documentation/ABI/testing/sysfs-class-cxl 371 372 373Udev rules 374========== 375 376 The following udev rules could be used to create a symlink to the 377 most logical chardev to use in any programming mode (afuX.Yd for 378 dedicated, afuX.Ys for afu directed), since the API is virtually 379 identical for each: 380 381 SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b" 382 SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \ 383 KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b" 384