1Kernel mode NEON 2================ 3 4TL;DR summary 5------------- 6* Use only NEON instructions, or VFP instructions that don't rely on support 7 code 8* Isolate your NEON code in a separate compilation unit, and compile it with 9 '-mfpu=neon -mfloat-abi=softfp' 10* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your 11 NEON code 12* Don't sleep in your NEON code, and be aware that it will be executed with 13 preemption disabled 14 15 16Introduction 17------------ 18It is possible to use NEON instructions (and in some cases, VFP instructions) in 19code that runs in kernel mode. However, for performance reasons, the NEON/VFP 20register file is not preserved and restored at every context switch or taken 21exception like the normal register file is, so some manual intervention is 22required. Furthermore, special care is required for code that may sleep [i.e., 23may call schedule()], as NEON or VFP instructions will be executed in a 24non-preemptible section for reasons outlined below. 25 26 27Lazy preserve and restore 28------------------------- 29The NEON/VFP register file is managed using lazy preserve (on UP systems) and 30lazy restore (on both SMP and UP systems). This means that the register file is 31kept 'live', and is only preserved and restored when multiple tasks are 32contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to 33another core). Lazy restore is implemented by disabling the NEON/VFP unit after 34every context switch, resulting in a trap when subsequently a NEON/VFP 35instruction is issued, allowing the kernel to step in and perform the restore if 36necessary. 37 38Any use of the NEON/VFP unit in kernel mode should not interfere with this, so 39it is required to do an 'eager' preserve of the NEON/VFP register file, and 40enable the NEON/VFP unit explicitly so no exceptions are generated on first 41subsequent use. This is handled by the function kernel_neon_begin(), which 42should be called before any kernel mode NEON or VFP instructions are issued. 43Likewise, the NEON/VFP unit should be disabled again after use to make sure user 44mode will hit the lazy restore trap upon next use. This is handled by the 45function kernel_neon_end(). 46 47 48Interruptions in kernel mode 49---------------------------- 50For reasons of performance and simplicity, it was decided that there shall be no 51preserve/restore mechanism for the kernel mode NEON/VFP register contents. This 52implies that interruptions of a kernel mode NEON section can only be allowed if 53they are guaranteed not to touch the NEON/VFP registers. For this reason, the 54following rules and restrictions apply in the kernel: 55* NEON/VFP code is not allowed in interrupt context; 56* NEON/VFP code is not allowed to sleep; 57* NEON/VFP code is executed with preemption disabled. 58 59If latency is a concern, it is possible to put back to back calls to 60kernel_neon_end() and kernel_neon_begin() in places in your code where none of 61the NEON registers are live. (Additional calls to kernel_neon_begin() should be 62reasonably cheap if no context switch occurred in the meantime) 63 64 65VFP and support code 66-------------------- 67Earlier versions of VFP (prior to version 3) rely on software support for things 68like IEEE-754 compliant underflow handling etc. When the VFP unit needs such 69software assistance, it signals the kernel by raising an undefined instruction 70exception. The kernel responds by inspecting the VFP control registers and the 71current instruction and arguments, and emulates the instruction in software. 72 73Such software assistance is currently not implemented for VFP instructions 74executed in kernel mode. If such a condition is encountered, the kernel will 75fail and generate an OOPS. 76 77 78Separating NEON code from ordinary code 79--------------------------------------- 80The compiler is not aware of the special significance of kernel_neon_begin() and 81kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions 82between calls to these respective functions. Furthermore, GCC may generate NEON 83instructions of its own at -O3 level if -mfpu=neon is selected, and even if the 84kernel is currently compiled at -O2, future changes may result in NEON/VFP 85instructions appearing in unexpected places if no special care is taken. 86 87Therefore, the recommended and only supported way of using NEON/VFP in the 88kernel is by adhering to the following rules: 89* isolate the NEON code in a separate compilation unit and compile it with 90 '-mfpu=neon -mfloat-abi=softfp'; 91* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls 92 into the unit containing the NEON code from a compilation unit which is *not* 93 built with the GCC flag '-mfpu=neon' set. 94 95As the kernel is compiled with '-msoft-float', the above will guarantee that 96both NEON and VFP instructions will only ever appear in designated compilation 97units at any optimization level. 98 99 100NEON assembler 101-------------- 102NEON assembler is supported with no additional caveats as long as the rules 103above are followed. 104 105 106NEON code generated by GCC 107-------------------------- 108The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit 109parallelism, and generates NEON code from ordinary C source code. This is fully 110supported as long as the rules above are followed. 111 112 113NEON intrinsics 114--------------- 115NEON intrinsics are also supported. However, as code using NEON intrinsics 116relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should 117observe the following in addition to the rules above: 118* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC 119 uses its builtin version of <stdint.h> (this is a C99 header which the kernel 120 does not supply); 121* Include <arm_neon.h> last, or at least after <linux/types.h> 122