1KVM/ARM VGIC Forwarded Physical Interrupts 2========================================== 3 4The KVM/ARM code implements software support for the ARM Generic 5Interrupt Controller's (GIC's) hardware support for virtualization by 6allowing software to inject virtual interrupts to a VM, which the guest 7OS sees as regular interrupts. The code is famously known as the VGIC. 8 9Some of these virtual interrupts, however, correspond to physical 10interrupts from real physical devices. One example could be the 11architected timer, which itself supports virtualization, and therefore 12lets a guest OS program the hardware device directly to raise an 13interrupt at some point in time. When such an interrupt is raised, the 14host OS initially handles the interrupt and must somehow signal this 15event as a virtual interrupt to the guest. Another example could be a 16passthrough device, where the physical interrupts are initially handled 17by the host, but the device driver for the device lives in the guest OS 18and KVM must therefore somehow inject a virtual interrupt on behalf of 19the physical one to the guest OS. 20 21These virtual interrupts corresponding to a physical interrupt on the 22host are called forwarded physical interrupts, but are also sometimes 23referred to as 'virtualized physical interrupts' and 'mapped interrupts'. 24 25Forwarded physical interrupts are handled slightly differently compared 26to virtual interrupts generated purely by a software emulated device. 27 28 29The HW bit 30---------- 31Virtual interrupts are signalled to the guest by programming the List 32Registers (LRs) on the GIC before running a VCPU. The LR is programmed 33with the virtual IRQ number and the state of the interrupt (Pending, 34Active, or Pending+Active). When the guest ACKs and EOIs a virtual 35interrupt, the LR state moves from Pending to Active, and finally to 36inactive. 37 38The LRs include an extra bit, called the HW bit. When this bit is set, 39KVM must also program an additional field in the LR, the physical IRQ 40number, to link the virtual with the physical IRQ. 41 42When the HW bit is set, KVM must EITHER set the Pending OR the Active 43bit, never both at the same time. 44 45Setting the HW bit causes the hardware to deactivate the physical 46interrupt on the physical distributor when the guest deactivates the 47corresponding virtual interrupt. 48 49 50Forwarded Physical Interrupts Life Cycle 51---------------------------------------- 52 53The state of forwarded physical interrupts is managed in the following way: 54 55 - The physical interrupt is acked by the host, and becomes active on 56 the physical distributor (*). 57 - KVM sets the LR.Pending bit, because this is the only way the GICV 58 interface is going to present it to the guest. 59 - LR.Pending will stay set as long as the guest has not acked the interrupt. 60 - LR.Pending transitions to LR.Active on the guest read of the IAR, as 61 expected. 62 - On guest EOI, the *physical distributor* active bit gets cleared, 63 but the LR.Active is left untouched (set). 64 - KVM clears the LR on VM exits when the physical distributor 65 active state has been cleared. 66 67(*): The host handling is slightly more complicated. For some forwarded 68interrupts (shared), KVM directly sets the active state on the physical 69distributor before entering the guest, because the interrupt is never actually 70handled on the host (see details on the timer as an example below). For other 71forwarded interrupts (non-shared) the host does not deactivate the interrupt 72when the host ISR completes, but leaves the interrupt active until the guest 73deactivates it. Leaving the interrupt active is allowed, because Linux 74configures the physical GIC with EOIMode=1, which causes EOI operations to 75perform a priority drop allowing the GIC to receive other interrupts of the 76default priority. 77 78 79Forwarded Edge and Level Triggered PPIs and SPIs 80------------------------------------------------ 81Forwarded physical interrupts injected should always be active on the 82physical distributor when injected to a guest. 83 84Level-triggered interrupts will keep the interrupt line to the GIC 85asserted, typically until the guest programs the device to deassert the 86line. This means that the interrupt will remain pending on the physical 87distributor until the guest has reprogrammed the device. Since we 88always run the VM with interrupts enabled on the CPU, a pending 89interrupt will exit the guest as soon as we switch into the guest, 90preventing the guest from ever making progress as the process repeats 91over and over. Therefore, the active state on the physical distributor 92must be set when entering the guest, preventing the GIC from forwarding 93the pending interrupt to the CPU. As soon as the guest deactivates the 94interrupt, the physical line is sampled by the hardware again and the host 95takes a new interrupt if and only if the physical line is still asserted. 96 97Edge-triggered interrupts do not exhibit the same problem with 98preventing guest execution that level-triggered interrupts do. One 99option is to not use HW bit at all, and inject edge-triggered interrupts 100from a physical device as pure virtual interrupts. But that would 101potentially slow down handling of the interrupt in the guest, because a 102physical interrupt occurring in the middle of the guest ISR would 103preempt the guest for the host to handle the interrupt. Additionally, 104if you configure the system to handle interrupts on a separate physical 105core from that running your VCPU, you still have to interrupt the VCPU 106to queue the pending state onto the LR, even though the guest won't use 107this information until the guest ISR completes. Therefore, the HW 108bit should always be set for forwarded edge-triggered interrupts. With 109the HW bit set, the virtual interrupt is injected and additional 110physical interrupts occurring before the guest deactivates the interrupt 111simply mark the state on the physical distributor as Pending+Active. As 112soon as the guest deactivates the interrupt, the host takes another 113interrupt if and only if there was a physical interrupt between injecting 114the forwarded interrupt to the guest and the guest deactivating the 115interrupt. 116 117Consequently, whenever we schedule a VCPU with one or more LRs with the 118HW bit set, the interrupt must also be active on the physical 119distributor. 120 121 122Forwarded LPIs 123-------------- 124LPIs, introduced in GICv3, are always edge-triggered and do not have an 125active state. They become pending when a device signal them, and as 126soon as they are acked by the CPU, they are inactive again. 127 128It therefore doesn't make sense, and is not supported, to set the HW bit 129for physical LPIs that are forwarded to a VM as virtual interrupts, 130typically virtual SPIs. 131 132For LPIs, there is no other choice than to preempt the VCPU thread if 133necessary, and queue the pending state onto the LR. 134 135 136Putting It Together: The Architected Timer 137------------------------------------------ 138The architected timer is a device that signals interrupts with level 139triggered semantics. The timer hardware is directly accessed by VCPUs 140which program the timer to fire at some point in time. Each VCPU on a 141system programs the timer to fire at different times, and therefore the 142hardware is multiplexed between multiple VCPUs. This is implemented by 143context-switching the timer state along with each VCPU thread. 144 145However, this means that a scenario like the following is entirely 146possible, and in fact, typical: 147 1481. KVM runs the VCPU 1492. The guest programs the time to fire in T+100 1503. The guest is idle and calls WFI (wait-for-interrupts) 1514. The hardware traps to the host 1525. KVM stores the timer state to memory and disables the hardware timer 1536. KVM schedules a soft timer to fire in T+(100 - time since step 2) 1547. KVM puts the VCPU thread to sleep (on a waitqueue) 1558. The soft timer fires, waking up the VCPU thread 1569. KVM reprograms the timer hardware with the VCPU's values 15710. KVM marks the timer interrupt as active on the physical distributor 15811. KVM injects a forwarded physical interrupt to the guest 15912. KVM runs the VCPU 160 161Notice that KVM injects a forwarded physical interrupt in step 11 without 162the corresponding interrupt having actually fired on the host. That is 163exactly why we mark the timer interrupt as active in step 10, because 164the active state on the physical distributor is part of the state 165belonging to the timer hardware, which is context-switched along with 166the VCPU thread. 167 168If the guest does not idle because it is busy, the flow looks like this 169instead: 170 1711. KVM runs the VCPU 1722. The guest programs the time to fire in T+100 1734. At T+100 the timer fires and a physical IRQ causes the VM to exit 174 (note that this initially only traps to EL2 and does not run the host ISR 175 until KVM has returned to the host). 1765. With interrupts still disabled on the CPU coming back from the guest, KVM 177 stores the virtual timer state to memory and disables the virtual hw timer. 1786. KVM looks at the timer state (in memory) and injects a forwarded physical 179 interrupt because it concludes the timer has expired. 1807. KVM marks the timer interrupt as active on the physical distributor 1817. KVM enables the timer, enables interrupts, and runs the VCPU 182 183Notice that again the forwarded physical interrupt is injected to the 184guest without having actually been handled on the host. In this case it 185is because the physical interrupt is never actually seen by the host because the 186timer is disabled upon guest return, and the virtual forwarded interrupt is 187injected on the KVM guest entry path. 188