1Introduction 2============ 3 4 The IBM Power architecture provides support for CAPI (Coherent 5 Accelerator Power Interface), which is available to certain PCIe slots 6 on Power 8 systems. CAPI can be thought of as a special tunneling 7 protocol through PCIe that allow PCIe adapters to look like special 8 purpose co-processors which can read or write an application's 9 memory and generate page faults. As a result, the host interface to 10 an adapter running in CAPI mode does not require the data buffers to 11 be mapped to the device's memory (IOMMU bypass) nor does it require 12 memory to be pinned. 13 14 On Linux, Coherent Accelerator (CXL) kernel services present CAPI 15 devices as a PCI device by implementing a virtual PCI host bridge. 16 This abstraction simplifies the infrastructure and programming 17 model, allowing for drivers to look similar to other native PCI 18 device drivers. 19 20 CXL provides a mechanism by which user space applications can 21 directly talk to a device (network or storage) bypassing the typical 22 kernel/device driver stack. The CXL Flash Adapter Driver enables a 23 user space application direct access to Flash storage. 24 25 The CXL Flash Adapter Driver is a kernel module that sits in the 26 SCSI stack as a low level device driver (below the SCSI disk and 27 protocol drivers) for the IBM CXL Flash Adapter. This driver is 28 responsible for the initialization of the adapter, setting up the 29 special path for user space access, and performing error recovery. It 30 communicates directly the Flash Accelerator Functional Unit (AFU) 31 as described in Documentation/powerpc/cxl.txt. 32 33 The cxlflash driver supports two, mutually exclusive, modes of 34 operation at the device (LUN) level: 35 36 - Any flash device (LUN) can be configured to be accessed as a 37 regular disk device (i.e.: /dev/sdc). This is the default mode. 38 39 - Any flash device (LUN) can be configured to be accessed from 40 user space with a special block library. This mode further 41 specifies the means of accessing the device and provides for 42 either raw access to the entire LUN (referred to as direct 43 or physical LUN access) or access to a kernel/AFU-mediated 44 partition of the LUN (referred to as virtual LUN access). The 45 segmentation of a disk device into virtual LUNs is assisted 46 by special translation services provided by the Flash AFU. 47 48Overview 49======== 50 51 The Coherent Accelerator Interface Architecture (CAIA) introduces a 52 concept of a master context. A master typically has special privileges 53 granted to it by the kernel or hypervisor allowing it to perform AFU 54 wide management and control. The master may or may not be involved 55 directly in each user I/O, but at the minimum is involved in the 56 initial setup before the user application is allowed to send requests 57 directly to the AFU. 58 59 The CXL Flash Adapter Driver establishes a master context with the 60 AFU. It uses memory mapped I/O (MMIO) for this control and setup. The 61 Adapter Problem Space Memory Map looks like this: 62 63 +-------------------------------+ 64 | 512 * 64 KB User MMIO | 65 | (per context) | 66 | User Accessible | 67 +-------------------------------+ 68 | 512 * 128 B per context | 69 | Provisioning and Control | 70 | Trusted Process accessible | 71 +-------------------------------+ 72 | 64 KB Global | 73 | Trusted Process accessible | 74 +-------------------------------+ 75 76 This driver configures itself into the SCSI software stack as an 77 adapter driver. The driver is the only entity that is considered a 78 Trusted Process to program the Provisioning and Control and Global 79 areas in the MMIO Space shown above. The master context driver 80 discovers all LUNs attached to the CXL Flash adapter and instantiates 81 scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN 82 seen from each path. 83 84 Once these scsi block devices are instantiated, an application 85 written to a specification provided by the block library may get 86 access to the Flash from user space (without requiring a system call). 87 88 This master context driver also provides a series of ioctls for this 89 block library to enable this user space access. The driver supports 90 two modes for accessing the block device. 91 92 The first mode is called a virtual mode. In this mode a single scsi 93 block device (/dev/sdb) may be carved up into any number of distinct 94 virtual LUNs. The virtual LUNs may be resized as long as the sum of 95 the sizes of all the virtual LUNs, along with the meta-data associated 96 with it does not exceed the physical capacity. 97 98 The second mode is called the physical mode. In this mode a single 99 block device (/dev/sdb) may be opened directly by the block library 100 and the entire space for the LUN is available to the application. 101 102 Only the physical mode provides persistence of the data. i.e. The 103 data written to the block device will survive application exit and 104 restart and also reboot. The virtual LUNs do not persist (i.e. do 105 not survive after the application terminates or the system reboots). 106 107 108Block library API 109================= 110 111 Applications intending to get access to the CXL Flash from user 112 space should use the block library, as it abstracts the details of 113 interfacing directly with the cxlflash driver that are necessary for 114 performing administrative actions (i.e.: setup, tear down, resize). 115 The block library can be thought of as a 'user' of services, 116 implemented as IOCTLs, that are provided by the cxlflash driver 117 specifically for devices (LUNs) operating in user space access 118 mode. While it is not a requirement that applications understand 119 the interface between the block library and the cxlflash driver, 120 a high-level overview of each supported service (IOCTL) is provided 121 below. 122 123 The block library can be found on GitHub: 124 http://www.github.com/mikehollinger/ibmcapikv 125 126 127CXL Flash Driver IOCTLs 128======================= 129 130 Users, such as the block library, that wish to interface with a flash 131 device (LUN) via user space access need to use the services provided 132 by the cxlflash driver. As these services are implemented as ioctls, 133 a file descriptor handle must first be obtained in order to establish 134 the communication channel between a user and the kernel. This file 135 descriptor is obtained by opening the device special file associated 136 with the scsi disk device (/dev/sdb) that was created during LUN 137 discovery. As per the location of the cxlflash driver within the 138 SCSI protocol stack, this open is actually not seen by the cxlflash 139 driver. Upon successful open, the user receives a file descriptor 140 (herein referred to as fd1) that should be used for issuing the 141 subsequent ioctls listed below. 142 143 The structure definitions for these IOCTLs are available in: 144 uapi/scsi/cxlflash_ioctl.h 145 146DK_CXLFLASH_ATTACH 147------------------ 148 149 This ioctl obtains, initializes, and starts a context using the CXL 150 kernel services. These services specify a context id (u16) by which 151 to uniquely identify the context and its allocated resources. The 152 services additionally provide a second file descriptor (herein 153 referred to as fd2) that is used by the block library to initiate 154 memory mapped I/O (via mmap()) to the CXL flash device and poll for 155 completion events. This file descriptor is intentionally installed by 156 this driver and not the CXL kernel services to allow for intermediary 157 notification and access in the event of a non-user-initiated close(), 158 such as a killed process. This design point is described in further 159 detail in the description for the DK_CXLFLASH_DETACH ioctl. 160 161 There are a few important aspects regarding the "tokens" (context id 162 and fd2) that are provided back to the user: 163 164 - These tokens are only valid for the process under which they 165 were created. The child of a forked process cannot continue 166 to use the context id or file descriptor created by its parent 167 (see DK_CXLFLASH_VLUN_CLONE for further details). 168 169 - These tokens are only valid for the lifetime of the context and 170 the process under which they were created. Once either is 171 destroyed, the tokens are to be considered stale and subsequent 172 usage will result in errors. 173 174 - When a context is no longer needed, the user shall detach from 175 the context via the DK_CXLFLASH_DETACH ioctl. 176 177 - A close on fd2 will invalidate the tokens. This operation is not 178 required by the user. 179 180DK_CXLFLASH_USER_DIRECT 181----------------------- 182 This ioctl is responsible for transitioning the LUN to direct 183 (physical) mode access and configuring the AFU for direct access from 184 user space on a per-context basis. Additionally, the block size and 185 last logical block address (LBA) are returned to the user. 186 187 As mentioned previously, when operating in user space access mode, 188 LUNs may be accessed in whole or in part. Only one mode is allowed 189 at a time and if one mode is active (outstanding references exist), 190 requests to use the LUN in a different mode are denied. 191 192 The AFU is configured for direct access from user space by adding an 193 entry to the AFU's resource handle table. The index of the entry is 194 treated as a resource handle that is returned to the user. The user 195 is then able to use the handle to reference the LUN during I/O. 196 197DK_CXLFLASH_USER_VIRTUAL 198------------------------ 199 This ioctl is responsible for transitioning the LUN to virtual mode 200 of access and configuring the AFU for virtual access from user space 201 on a per-context basis. Additionally, the block size and last logical 202 block address (LBA) are returned to the user. 203 204 As mentioned previously, when operating in user space access mode, 205 LUNs may be accessed in whole or in part. Only one mode is allowed 206 at a time and if one mode is active (outstanding references exist), 207 requests to use the LUN in a different mode are denied. 208 209 The AFU is configured for virtual access from user space by adding 210 an entry to the AFU's resource handle table. The index of the entry 211 is treated as a resource handle that is returned to the user. The 212 user is then able to use the handle to reference the LUN during I/O. 213 214 By default, the virtual LUN is created with a size of 0. The user 215 would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow 216 the virtual LUN to a desired size. To avoid having to perform this 217 resize for the initial creation of the virtual LUN, the user has the 218 option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL 219 ioctl, such that when success is returned to the user, the 220 resource handle that is provided is already referencing provisioned 221 storage. This is reflected by the last LBA being a non-zero value. 222 223DK_CXLFLASH_VLUN_RESIZE 224----------------------- 225 This ioctl is responsible for resizing a previously created virtual 226 LUN and will fail if invoked upon a LUN that is not in virtual 227 mode. Upon success, an updated last LBA is returned to the user 228 indicating the new size of the virtual LUN associated with the 229 resource handle. 230 231 The partitioning of virtual LUNs is jointly mediated by the cxlflash 232 driver and the AFU. An allocation table is kept for each LUN that is 233 operating in the virtual mode and used to program a LUN translation 234 table that the AFU references when provided with a resource handle. 235 236DK_CXLFLASH_RELEASE 237------------------- 238 This ioctl is responsible for releasing a previously obtained 239 reference to either a physical or virtual LUN. This can be 240 thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or 241 DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle 242 is no longer valid and the entry in the resource handle table is 243 made available to be used again. 244 245 As part of the release process for virtual LUNs, the virtual LUN 246 is first resized to 0 to clear out and free the translation tables 247 associated with the virtual LUN reference. 248 249DK_CXLFLASH_DETACH 250------------------ 251 This ioctl is responsible for unregistering a context with the 252 cxlflash driver and release outstanding resources that were 253 not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon 254 success, all "tokens" which had been provided to the user from the 255 DK_CXLFLASH_ATTACH onward are no longer valid. 256 257DK_CXLFLASH_VLUN_CLONE 258---------------------- 259 This ioctl is responsible for cloning a previously created 260 context to a more recently created context. It exists solely to 261 support maintaining user space access to storage after a process 262 forks. Upon success, the child process (which invoked the ioctl) 263 will have access to the same LUNs via the same resource handle(s) 264 and fd2 as the parent, but under a different context. 265 266 Context sharing across processes is not supported with CXL and 267 therefore each fork must be met with establishing a new context 268 for the child process. This ioctl simplifies the state management 269 and playback required by a user in such a scenario. When a process 270 forks, child process can clone the parents context by first creating 271 a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to 272 perform the clone from the parent to the child. 273 274 The clone itself is fairly simple. The resource handle and lun 275 translation tables are copied from the parent context to the child's 276 and then synced with the AFU. 277 278DK_CXLFLASH_VERIFY 279------------------ 280 This ioctl is used to detect various changes such as the capacity of 281 the disk changing, the number of LUNs visible changing, etc. In cases 282 where the changes affect the application (such as a LUN resize), the 283 cxlflash driver will report the changed state to the application. 284 285 The user calls in when they want to validate that a LUN hasn't been 286 changed in response to a check condition. As the user is operating out 287 of band from the kernel, they will see these types of events without 288 the kernel's knowledge. When encountered, the user's architected 289 behavior is to call in to this ioctl, indicating what they want to 290 verify and passing along any appropriate information. For now, only 291 verifying a LUN change (ie: size different) with sense data is 292 supported. 293 294DK_CXLFLASH_RECOVER_AFU 295----------------------- 296 This ioctl is used to drive recovery (if such an action is warranted) 297 of a specified user context. Any state associated with the user context 298 is re-established upon successful recovery. 299 300 User contexts are put into an error condition when the device needs to 301 be reset or is terminating. Users are notified of this error condition 302 by seeing all 0xF's on an MMIO read. Upon encountering this, the 303 architected behavior for a user is to call into this ioctl to recover 304 their context. A user may also call into this ioctl at any time to 305 check if the device is operating normally. If a failure is returned 306 from this ioctl, the user is expected to gracefully clean up their 307 context via release/detach ioctls. Until they do, the context they 308 hold is not relinquished. The user may also optionally exit the process 309 at which time the context/resources they held will be freed as part of 310 the release fop. 311 312DK_CXLFLASH_MANAGE_LUN 313---------------------- 314 This ioctl is used to switch a LUN from a mode where it is available 315 for file-system access (legacy), to a mode where it is set aside for 316 exclusive user space access (superpipe). In case a LUN is visible 317 across multiple ports and adapters, this ioctl is used to uniquely 318 identify each LUN by its World Wide Node Name (WWNN). 319