1(How to avoid) Botching up ioctls 2================================= 3 4From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html 5 6By: Daniel Vetter, Copyright © 2013 Intel Corporation 7 8One clear insight kernel graphics hackers gained in the past few years is that 9trying to come up with a unified interface to manage the execution units and 10memory on completely different GPUs is a futile effort. So nowadays every 11driver has its own set of ioctls to allocate memory and submit work to the GPU. 12Which is nice, since there's no more insanity in the form of fake-generic, but 13actually only used once interfaces. But the clear downside is that there's much 14more potential to screw things up. 15 16To avoid repeating all the same mistakes again I've written up some of the 17lessons learned while botching the job for the drm/i915 driver. Most of these 18only cover technicalities and not the big-picture issues like what the command 19submission ioctl exactly should look like. Learning these lessons is probably 20something every GPU driver has to do on its own. 21 22 23Prerequisites 24------------- 25 26First the prerequisites. Without these you have already failed, because you 27will need to add a a 32-bit compat layer: 28 29 * Only use fixed sized integers. To avoid conflicts with typedefs in userspace 30 the kernel has special types like __u32, __s64. Use them. 31 32 * Align everything to the natural size and use explicit padding. 32-bit 33 platforms don't necessarily align 64-bit values to 64-bit boundaries, but 34 64-bit platforms do. So we always need padding to the natural size to get 35 this right. 36 37 * Pad the entire struct to a multiple of 64-bits - the structure size will 38 otherwise differ on 32-bit versus 64-bit. Having a different structure size 39 hurts when passing arrays of structures to the kernel, or if the kernel 40 checks the structure size, which e.g. the drm core does. 41 42 * Pointers are __u64, cast from/to a uintprt_t on the userspace side and 43 from/to a void __user * in the kernel. Try really hard not to delay this 44 conversion or worse, fiddle the raw __u64 through your code since that 45 diminishes the checking tools like sparse can provide. 46 47 48Basics 49------ 50 51With the joys of writing a compat layer avoided we can take a look at the basic 52fumbles. Neglecting these will make backward and forward compatibility a real 53pain. And since getting things wrong on the first attempt is guaranteed you 54will have a second iteration or at least an extension for any given interface. 55 56 * Have a clear way for userspace to figure out whether your new ioctl or ioctl 57 extension is supported on a given kernel. If you can't rely on old kernels 58 rejecting the new flags/modes or ioctls (since doing that was botched in the 59 past) then you need a driver feature flag or revision number somewhere. 60 61 * Have a plan for extending ioctls with new flags or new fields at the end of 62 the structure. The drm core checks the passed-in size for each ioctl call 63 and zero-extends any mismatches between kernel and userspace. That helps, 64 but isn't a complete solution since newer userspace on older kernels won't 65 notice that the newly added fields at the end get ignored. So this still 66 needs a new driver feature flags. 67 68 * Check all unused fields and flags and all the padding for whether it's 0, 69 and reject the ioctl if that's not the case. Otherwise your nice plan for 70 future extensions is going right down the gutters since someone will submit 71 an ioctl struct with random stack garbage in the yet unused parts. Which 72 then bakes in the ABI that those fields can never be used for anything else 73 but garbage. 74 75 * Have simple testcases for all of the above. 76 77 78Fun with Error Paths 79-------------------- 80 81Nowadays we don't have any excuse left any more for drm drivers being neat 82little root exploits. This means we both need full input validation and solid 83error handling paths - GPUs will die eventually in the oddmost corner cases 84anyway: 85 86 * The ioctl must check for array overflows. Also it needs to check for 87 over/underflows and clamping issues of integer values in general. The usual 88 example is sprite positioning values fed directly into the hardware with the 89 hardware just having 12 bits or so. Works nicely until some odd display 90 server doesn't bother with clamping itself and the cursor wraps around the 91 screen. 92 93 * Have simple testcases for every input validation failure case in your ioctl. 94 Check that the error code matches your expectations. And finally make sure 95 that you only test for one single error path in each subtest by submitting 96 otherwise perfectly valid data. Without this an earlier check might reject 97 the ioctl already and shadow the codepath you actually want to test, hiding 98 bugs and regressions. 99 100 * Make all your ioctls restartable. First X really loves signals and second 101 this will allow you to test 90% of all error handling paths by just 102 interrupting your main test suite constantly with signals. Thanks to X's 103 love for signal you'll get an excellent base coverage of all your error 104 paths pretty much for free for graphics drivers. Also, be consistent with 105 how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its 106 userspace library. The i915 driver botched this with the set_tiling ioctl, 107 now we're stuck forever with some arcane semantics in both the kernel and 108 userspace. 109 110 * If you can't make a given codepath restartable make a stuck task at least 111 killable. GPUs just die and your users won't like you more if you hang their 112 entire box (by means of an unkillable X process). If the state recovery is 113 still too tricky have a timeout or hangcheck safety net as a last-ditch 114 effort in case the hardware has gone bananas. 115 116 * Have testcases for the really tricky corner cases in your error recovery code 117 - it's way too easy to create a deadlock between your hangcheck code and 118 waiters. 119 120 121Time, Waiting and Missing it 122---------------------------- 123 124GPUs do most everything asynchronously, so we have a need to time operations and 125wait for oustanding ones. This is really tricky business; at the moment none of 126the ioctls supported by the drm/i915 get this fully right, which means there's 127still tons more lessons to learn here. 128 129 * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and 130 v4l use by default nowadays. But let userspace know which timestamps are 131 derived from different clock domains like your main system clock (provided 132 by the kernel) or some independent hardware counter somewhere else. Clocks 133 will mismatch if you look close enough, but if performance measuring tools 134 have this information they can at least compensate. If your userspace can 135 get at the raw values of some clocks (e.g. through in-command-stream 136 performance counter sampling instructions) consider exposing those also. 137 138 * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most 139 convenient time specification, but it's mostly the standard. 140 141 * Check that input time values are normalized and reject them if not. Note 142 that the kernel native struct ktime has a signed integer for both seconds 143 and nanoseconds, so beware here. 144 145 * For timeouts, use absolute times. If you're a good fellow and made your 146 ioctl restartable relative timeouts tend to be too coarse and can 147 indefinitely extend your wait time due to rounding on each restart. 148 Especially if your reference clock is something really slow like the display 149 frame counter. With a spec laywer hat on this isn't a bug since timeouts can 150 always be extended - but users will surely hate you if their neat animations 151 starts to stutter due to this. 152 153 * Consider ditching any synchronous wait ioctls with timeouts and just deliver 154 an asynchronous event on a pollable file descriptor. It fits much better 155 into event driven applications' main loop. 156 157 * Have testcases for corner-cases, especially whether the return values for 158 already-completed events, successful waits and timed-out waits are all sane 159 and suiting to your needs. 160 161 162Leaking Resources, Not 163---------------------- 164 165A full-blown drm driver essentially implements a little OS, but specialized to 166the given GPU platforms. This means a driver needs to expose tons of handles 167for different objects and other resources to userspace. Doing that right 168entails its own little set of pitfalls: 169 170 * Always attach the lifetime of your dynamically created resources to the 171 lifetime of a file descriptor. Consider using a 1:1 mapping if your resource 172 needs to be shared across processes - fd-passing over unix domain sockets 173 also simplifies lifetime management for userspace. 174 175 * Always have O_CLOEXEC support. 176 177 * Ensure that you have sufficient insulation between different clients. By 178 default pick a private per-fd namespace which forces any sharing to be done 179 explictly. Only go with a more global per-device namespace if the objects 180 are truly device-unique. One counterexample in the drm modeset interfaces is 181 that the per-device modeset objects like connectors share a namespace with 182 framebuffer objects, which mostly are not shared at all. A separate 183 namespace, private by default, for framebuffers would have been more 184 suitable. 185 186 * Think about uniqueness requirements for userspace handles. E.g. for most drm 187 drivers it's a userspace bug to submit the same object twice in the same 188 command submission ioctl. But then if objects are shareable userspace needs 189 to know whether it has seen an imported object from a different process 190 already or not. I haven't tried this myself yet due to lack of a new class 191 of objects, but consider using inode numbers on your shared file descriptors 192 as unique identifiers - it's how real files are told apart, too. 193 Unfortunately this requires a full-blown virtual filesystem in the kernel. 194 195 196Last, but not Least 197------------------- 198 199Not every problem needs a new ioctl: 200 201 * Think hard whether you really want a driver-private interface. Of course 202 it's much quicker to push a driver-private interface than engaging in 203 lengthy discussions for a more generic solution. And occasionally doing a 204 private interface to spearhead a new concept is what's required. But in the 205 end, once the generic interface comes around you'll end up maintainer two 206 interfaces. Indefinitely. 207 208 * Consider other interfaces than ioctls. A sysfs attribute is much better for 209 per-device settings, or for child objects with fairly static lifetimes (like 210 output connectors in drm with all the detection override attributes). Or 211 maybe only your testsuite needs this interface, and then debugfs with its 212 disclaimer of not having a stable ABI would be better. 213 214Finally, the name of the game is to get it right on the first attempt, since if 215your driver proves popular and your hardware platforms long-lived then you'll 216be stuck with a given ioctl essentially forever. You can try to deprecate 217horrible ioctls on newer iterations of your hardware, but generally it takes 218years to accomplish this. And then again years until the last user able to 219complain about regressions disappears, too. 220