1The Definitive KVM (Kernel-based Virtual Machine) API Documentation
2===================================================================
3
41. General description
5----------------------
6
7The kvm API is a set of ioctls that are issued to control various aspects
8of a virtual machine.  The ioctls belong to three classes
9
10 - System ioctls: These query and set global attributes which affect the
11   whole kvm subsystem.  In addition a system ioctl is used to create
12   virtual machines
13
14 - VM ioctls: These query and set attributes that affect an entire virtual
15   machine, for example memory layout.  In addition a VM ioctl is used to
16   create virtual cpus (vcpus).
17
18   Only run VM ioctls from the same process (address space) that was used
19   to create the VM.
20
21 - vcpu ioctls: These query and set attributes that control the operation
22   of a single virtual cpu.
23
24   Only run vcpu ioctls from the same thread that was used to create the
25   vcpu.
26
27
282. File descriptors
29-------------------
30
31The kvm API is centered around file descriptors.  An initial
32open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
33can be used to issue system ioctls.  A KVM_CREATE_VM ioctl on this
34handle will create a VM file descriptor which can be used to issue VM
35ioctls.  A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
36and return a file descriptor pointing to it.  Finally, ioctls on a vcpu
37fd can be used to control the vcpu, including the important task of
38actually running guest code.
39
40In general file descriptors can be migrated among processes by means
41of fork() and the SCM_RIGHTS facility of unix domain socket.  These
42kinds of tricks are explicitly not supported by kvm.  While they will
43not cause harm to the host, their actual behavior is not guaranteed by
44the API.  The only supported use is one virtual machine per process,
45and one vcpu per thread.
46
47
483. Extensions
49-------------
50
51As of Linux 2.6.22, the KVM ABI has been stabilized: no backward
52incompatible change are allowed.  However, there is an extension
53facility that allows backward-compatible extensions to the API to be
54queried and used.
55
56The extension mechanism is not based on the Linux version number.
57Instead, kvm defines extension identifiers and a facility to query
58whether a particular extension identifier is available.  If it is, a
59set of ioctls is available for application use.
60
61
624. API description
63------------------
64
65This section describes ioctls that can be used to control kvm guests.
66For each ioctl, the following information is provided along with a
67description:
68
69  Capability: which KVM extension provides this ioctl.  Can be 'basic',
70      which means that is will be provided by any kernel that supports
71      API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
72      means availability needs to be checked with KVM_CHECK_EXTENSION
73      (see section 4.4), or 'none' which means that while not all kernels
74      support this ioctl, there's no capability bit to check its
75      availability: for kernels that don't support the ioctl,
76      the ioctl returns -ENOTTY.
77
78  Architectures: which instruction set architectures provide this ioctl.
79      x86 includes both i386 and x86_64.
80
81  Type: system, vm, or vcpu.
82
83  Parameters: what parameters are accepted by the ioctl.
84
85  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
86      are not detailed, but errors with specific meanings are.
87
88
894.1 KVM_GET_API_VERSION
90
91Capability: basic
92Architectures: all
93Type: system ioctl
94Parameters: none
95Returns: the constant KVM_API_VERSION (=12)
96
97This identifies the API version as the stable kvm API. It is not
98expected that this number will change.  However, Linux 2.6.20 and
992.6.21 report earlier versions; these are not documented and not
100supported.  Applications should refuse to run if KVM_GET_API_VERSION
101returns a value other than 12.  If this check passes, all ioctls
102described as 'basic' will be available.
103
104
1054.2 KVM_CREATE_VM
106
107Capability: basic
108Architectures: all
109Type: system ioctl
110Parameters: machine type identifier (KVM_VM_*)
111Returns: a VM fd that can be used to control the new virtual machine.
112
113The new VM has no virtual cpus and no memory.  An mmap() of a VM fd
114will access the virtual machine's physical address space; offset zero
115corresponds to guest physical address zero.  Use of mmap() on a VM fd
116is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is
117available.
118You most certainly want to use 0 as machine type.
119
120In order to create user controlled virtual machines on S390, check
121KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
122privileged user (CAP_SYS_ADMIN).
123
124
1254.3 KVM_GET_MSR_INDEX_LIST
126
127Capability: basic
128Architectures: x86
129Type: system
130Parameters: struct kvm_msr_list (in/out)
131Returns: 0 on success; -1 on error
132Errors:
133  E2BIG:     the msr index list is to be to fit in the array specified by
134             the user.
135
136struct kvm_msr_list {
137	__u32 nmsrs; /* number of msrs in entries */
138	__u32 indices[0];
139};
140
141This ioctl returns the guest msrs that are supported.  The list varies
142by kvm version and host processor, but does not change otherwise.  The
143user fills in the size of the indices array in nmsrs, and in return
144kvm adjusts nmsrs to reflect the actual number of msrs and fills in
145the indices array with their numbers.
146
147Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are
148not returned in the MSR list, as different vcpus can have a different number
149of banks, as set via the KVM_X86_SETUP_MCE ioctl.
150
151
1524.4 KVM_CHECK_EXTENSION
153
154Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl
155Architectures: all
156Type: system ioctl, vm ioctl
157Parameters: extension identifier (KVM_CAP_*)
158Returns: 0 if unsupported; 1 (or some other positive integer) if supported
159
160The API allows the application to query about extensions to the core
161kvm API.  Userspace passes an extension identifier (an integer) and
162receives an integer that describes the extension availability.
163Generally 0 means no and 1 means yes, but some extensions may report
164additional information in the integer return value.
165
166Based on their initialization different VMs may have different capabilities.
167It is thus encouraged to use the vm ioctl to query for capabilities (available
168with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
169
1704.5 KVM_GET_VCPU_MMAP_SIZE
171
172Capability: basic
173Architectures: all
174Type: system ioctl
175Parameters: none
176Returns: size of vcpu mmap area, in bytes
177
178The KVM_RUN ioctl (cf.) communicates with userspace via a shared
179memory region.  This ioctl returns the size of that region.  See the
180KVM_RUN documentation for details.
181
182
1834.6 KVM_SET_MEMORY_REGION
184
185Capability: basic
186Architectures: all
187Type: vm ioctl
188Parameters: struct kvm_memory_region (in)
189Returns: 0 on success, -1 on error
190
191This ioctl is obsolete and has been removed.
192
193
1944.7 KVM_CREATE_VCPU
195
196Capability: basic
197Architectures: all
198Type: vm ioctl
199Parameters: vcpu id (apic id on x86)
200Returns: vcpu fd on success, -1 on error
201
202This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
203in the range [0, max_vcpus).
204
205The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
206the KVM_CHECK_EXTENSION ioctl() at run-time.
207The maximum possible value for max_vcpus can be retrieved using the
208KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
209
210If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4
211cpus max.
212If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is
213same as the value returned from KVM_CAP_NR_VCPUS.
214
215On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
216threads in one or more virtual CPU cores.  (This is because the
217hardware requires all the hardware threads in a CPU core to be in the
218same partition.)  The KVM_CAP_PPC_SMT capability indicates the number
219of vcpus per virtual core (vcore).  The vcore id is obtained by
220dividing the vcpu id by the number of vcpus per vcore.  The vcpus in a
221given vcore will always be in the same physical core as each other
222(though that might be a different physical core from time to time).
223Userspace can control the threading (SMT) mode of the guest by its
224allocation of vcpu ids.  For example, if userspace wants
225single-threaded guest vcpus, it should make all vcpu ids be a multiple
226of the number of vcpus per vcore.
227
228For virtual cpus that have been created with S390 user controlled virtual
229machines, the resulting vcpu fd can be memory mapped at page offset
230KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
231cpu's hardware control block.
232
233
2344.8 KVM_GET_DIRTY_LOG (vm ioctl)
235
236Capability: basic
237Architectures: x86
238Type: vm ioctl
239Parameters: struct kvm_dirty_log (in/out)
240Returns: 0 on success, -1 on error
241
242/* for KVM_GET_DIRTY_LOG */
243struct kvm_dirty_log {
244	__u32 slot;
245	__u32 padding;
246	union {
247		void __user *dirty_bitmap; /* one bit per page */
248		__u64 padding;
249	};
250};
251
252Given a memory slot, return a bitmap containing any pages dirtied
253since the last call to this ioctl.  Bit 0 is the first page in the
254memory slot.  Ensure the entire structure is cleared to avoid padding
255issues.
256
257
2584.9 KVM_SET_MEMORY_ALIAS
259
260Capability: basic
261Architectures: x86
262Type: vm ioctl
263Parameters: struct kvm_memory_alias (in)
264Returns: 0 (success), -1 (error)
265
266This ioctl is obsolete and has been removed.
267
268
2694.10 KVM_RUN
270
271Capability: basic
272Architectures: all
273Type: vcpu ioctl
274Parameters: none
275Returns: 0 on success, -1 on error
276Errors:
277  EINTR:     an unmasked signal is pending
278
279This ioctl is used to run a guest virtual cpu.  While there are no
280explicit parameters, there is an implicit parameter block that can be
281obtained by mmap()ing the vcpu fd at offset 0, with the size given by
282KVM_GET_VCPU_MMAP_SIZE.  The parameter block is formatted as a 'struct
283kvm_run' (see below).
284
285
2864.11 KVM_GET_REGS
287
288Capability: basic
289Architectures: all except ARM, arm64
290Type: vcpu ioctl
291Parameters: struct kvm_regs (out)
292Returns: 0 on success, -1 on error
293
294Reads the general purpose registers from the vcpu.
295
296/* x86 */
297struct kvm_regs {
298	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
299	__u64 rax, rbx, rcx, rdx;
300	__u64 rsi, rdi, rsp, rbp;
301	__u64 r8,  r9,  r10, r11;
302	__u64 r12, r13, r14, r15;
303	__u64 rip, rflags;
304};
305
306/* mips */
307struct kvm_regs {
308	/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
309	__u64 gpr[32];
310	__u64 hi;
311	__u64 lo;
312	__u64 pc;
313};
314
315
3164.12 KVM_SET_REGS
317
318Capability: basic
319Architectures: all except ARM, arm64
320Type: vcpu ioctl
321Parameters: struct kvm_regs (in)
322Returns: 0 on success, -1 on error
323
324Writes the general purpose registers into the vcpu.
325
326See KVM_GET_REGS for the data structure.
327
328
3294.13 KVM_GET_SREGS
330
331Capability: basic
332Architectures: x86, ppc
333Type: vcpu ioctl
334Parameters: struct kvm_sregs (out)
335Returns: 0 on success, -1 on error
336
337Reads special registers from the vcpu.
338
339/* x86 */
340struct kvm_sregs {
341	struct kvm_segment cs, ds, es, fs, gs, ss;
342	struct kvm_segment tr, ldt;
343	struct kvm_dtable gdt, idt;
344	__u64 cr0, cr2, cr3, cr4, cr8;
345	__u64 efer;
346	__u64 apic_base;
347	__u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];
348};
349
350/* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
351
352interrupt_bitmap is a bitmap of pending external interrupts.  At most
353one bit may be set.  This interrupt has been acknowledged by the APIC
354but not yet injected into the cpu core.
355
356
3574.14 KVM_SET_SREGS
358
359Capability: basic
360Architectures: x86, ppc
361Type: vcpu ioctl
362Parameters: struct kvm_sregs (in)
363Returns: 0 on success, -1 on error
364
365Writes special registers into the vcpu.  See KVM_GET_SREGS for the
366data structures.
367
368
3694.15 KVM_TRANSLATE
370
371Capability: basic
372Architectures: x86
373Type: vcpu ioctl
374Parameters: struct kvm_translation (in/out)
375Returns: 0 on success, -1 on error
376
377Translates a virtual address according to the vcpu's current address
378translation mode.
379
380struct kvm_translation {
381	/* in */
382	__u64 linear_address;
383
384	/* out */
385	__u64 physical_address;
386	__u8  valid;
387	__u8  writeable;
388	__u8  usermode;
389	__u8  pad[5];
390};
391
392
3934.16 KVM_INTERRUPT
394
395Capability: basic
396Architectures: x86, ppc, mips
397Type: vcpu ioctl
398Parameters: struct kvm_interrupt (in)
399Returns: 0 on success, -1 on error
400
401Queues a hardware interrupt vector to be injected.  This is only
402useful if in-kernel local APIC or equivalent is not used.
403
404/* for KVM_INTERRUPT */
405struct kvm_interrupt {
406	/* in */
407	__u32 irq;
408};
409
410X86:
411
412Note 'irq' is an interrupt vector, not an interrupt pin or line.
413
414PPC:
415
416Queues an external interrupt to be injected. This ioctl is overleaded
417with 3 different irq values:
418
419a) KVM_INTERRUPT_SET
420
421  This injects an edge type external interrupt into the guest once it's ready
422  to receive interrupts. When injected, the interrupt is done.
423
424b) KVM_INTERRUPT_UNSET
425
426  This unsets any pending interrupt.
427
428  Only available with KVM_CAP_PPC_UNSET_IRQ.
429
430c) KVM_INTERRUPT_SET_LEVEL
431
432  This injects a level type external interrupt into the guest context. The
433  interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET
434  is triggered.
435
436  Only available with KVM_CAP_PPC_IRQ_LEVEL.
437
438Note that any value for 'irq' other than the ones stated above is invalid
439and incurs unexpected behavior.
440
441MIPS:
442
443Queues an external interrupt to be injected into the virtual CPU. A negative
444interrupt number dequeues the interrupt.
445
446
4474.17 KVM_DEBUG_GUEST
448
449Capability: basic
450Architectures: none
451Type: vcpu ioctl
452Parameters: none)
453Returns: -1 on error
454
455Support for this has been removed.  Use KVM_SET_GUEST_DEBUG instead.
456
457
4584.18 KVM_GET_MSRS
459
460Capability: basic
461Architectures: x86
462Type: vcpu ioctl
463Parameters: struct kvm_msrs (in/out)
464Returns: 0 on success, -1 on error
465
466Reads model-specific registers from the vcpu.  Supported msr indices can
467be obtained using KVM_GET_MSR_INDEX_LIST.
468
469struct kvm_msrs {
470	__u32 nmsrs; /* number of msrs in entries */
471	__u32 pad;
472
473	struct kvm_msr_entry entries[0];
474};
475
476struct kvm_msr_entry {
477	__u32 index;
478	__u32 reserved;
479	__u64 data;
480};
481
482Application code should set the 'nmsrs' member (which indicates the
483size of the entries array) and the 'index' member of each array entry.
484kvm will fill in the 'data' member.
485
486
4874.19 KVM_SET_MSRS
488
489Capability: basic
490Architectures: x86
491Type: vcpu ioctl
492Parameters: struct kvm_msrs (in)
493Returns: 0 on success, -1 on error
494
495Writes model-specific registers to the vcpu.  See KVM_GET_MSRS for the
496data structures.
497
498Application code should set the 'nmsrs' member (which indicates the
499size of the entries array), and the 'index' and 'data' members of each
500array entry.
501
502
5034.20 KVM_SET_CPUID
504
505Capability: basic
506Architectures: x86
507Type: vcpu ioctl
508Parameters: struct kvm_cpuid (in)
509Returns: 0 on success, -1 on error
510
511Defines the vcpu responses to the cpuid instruction.  Applications
512should use the KVM_SET_CPUID2 ioctl if available.
513
514
515struct kvm_cpuid_entry {
516	__u32 function;
517	__u32 eax;
518	__u32 ebx;
519	__u32 ecx;
520	__u32 edx;
521	__u32 padding;
522};
523
524/* for KVM_SET_CPUID */
525struct kvm_cpuid {
526	__u32 nent;
527	__u32 padding;
528	struct kvm_cpuid_entry entries[0];
529};
530
531
5324.21 KVM_SET_SIGNAL_MASK
533
534Capability: basic
535Architectures: all
536Type: vcpu ioctl
537Parameters: struct kvm_signal_mask (in)
538Returns: 0 on success, -1 on error
539
540Defines which signals are blocked during execution of KVM_RUN.  This
541signal mask temporarily overrides the threads signal mask.  Any
542unblocked signal received (except SIGKILL and SIGSTOP, which retain
543their traditional behaviour) will cause KVM_RUN to return with -EINTR.
544
545Note the signal will only be delivered if not blocked by the original
546signal mask.
547
548/* for KVM_SET_SIGNAL_MASK */
549struct kvm_signal_mask {
550	__u32 len;
551	__u8  sigset[0];
552};
553
554
5554.22 KVM_GET_FPU
556
557Capability: basic
558Architectures: x86
559Type: vcpu ioctl
560Parameters: struct kvm_fpu (out)
561Returns: 0 on success, -1 on error
562
563Reads the floating point state from the vcpu.
564
565/* for KVM_GET_FPU and KVM_SET_FPU */
566struct kvm_fpu {
567	__u8  fpr[8][16];
568	__u16 fcw;
569	__u16 fsw;
570	__u8  ftwx;  /* in fxsave format */
571	__u8  pad1;
572	__u16 last_opcode;
573	__u64 last_ip;
574	__u64 last_dp;
575	__u8  xmm[16][16];
576	__u32 mxcsr;
577	__u32 pad2;
578};
579
580
5814.23 KVM_SET_FPU
582
583Capability: basic
584Architectures: x86
585Type: vcpu ioctl
586Parameters: struct kvm_fpu (in)
587Returns: 0 on success, -1 on error
588
589Writes the floating point state to the vcpu.
590
591/* for KVM_GET_FPU and KVM_SET_FPU */
592struct kvm_fpu {
593	__u8  fpr[8][16];
594	__u16 fcw;
595	__u16 fsw;
596	__u8  ftwx;  /* in fxsave format */
597	__u8  pad1;
598	__u16 last_opcode;
599	__u64 last_ip;
600	__u64 last_dp;
601	__u8  xmm[16][16];
602	__u32 mxcsr;
603	__u32 pad2;
604};
605
606
6074.24 KVM_CREATE_IRQCHIP
608
609Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
610Architectures: x86, ARM, arm64, s390
611Type: vm ioctl
612Parameters: none
613Returns: 0 on success, -1 on error
614
615Creates an interrupt controller model in the kernel.
616On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
617future vcpus to have a local APIC.  IRQ routing for GSIs 0-15 is set to both
618PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
619On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of
620KVM_CREATE_DEVICE, which also supports creating a GICv2.  Using
621KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2.
622On s390, a dummy irq routing table is created.
623
624Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
625before KVM_CREATE_IRQCHIP can be used.
626
627
6284.25 KVM_IRQ_LINE
629
630Capability: KVM_CAP_IRQCHIP
631Architectures: x86, arm, arm64
632Type: vm ioctl
633Parameters: struct kvm_irq_level
634Returns: 0 on success, -1 on error
635
636Sets the level of a GSI input to the interrupt controller model in the kernel.
637On some architectures it is required that an interrupt controller model has
638been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
639interrupts require the level to be set to 1 and then back to 0.
640
641On real hardware, interrupt pins can be active-low or active-high.  This
642does not matter for the level field of struct kvm_irq_level: 1 always
643means active (asserted), 0 means inactive (deasserted).
644
645x86 allows the operating system to program the interrupt polarity
646(active-low/active-high) for level-triggered interrupts, and KVM used
647to consider the polarity.  However, due to bitrot in the handling of
648active-low interrupts, the above convention is now valid on x86 too.
649This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED.  Userspace
650should not present interrupts to the guest as active-low unless this
651capability is present (or unless it is not using the in-kernel irqchip,
652of course).
653
654
655ARM/arm64 can signal an interrupt either at the CPU level, or at the
656in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
657use PPIs designated for specific cpus.  The irq field is interpreted
658like this:
659
660  bits:  | 31 ... 24 | 23  ... 16 | 15    ...    0 |
661  field: | irq_type  | vcpu_index |     irq_id     |
662
663The irq_type field has the following values:
664- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
665- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
666               (the vcpu_index field is ignored)
667- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
668
669(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
670
671In both cases, level is used to assert/deassert the line.
672
673struct kvm_irq_level {
674	union {
675		__u32 irq;     /* GSI */
676		__s32 status;  /* not used for KVM_IRQ_LEVEL */
677	};
678	__u32 level;           /* 0 or 1 */
679};
680
681
6824.26 KVM_GET_IRQCHIP
683
684Capability: KVM_CAP_IRQCHIP
685Architectures: x86
686Type: vm ioctl
687Parameters: struct kvm_irqchip (in/out)
688Returns: 0 on success, -1 on error
689
690Reads the state of a kernel interrupt controller created with
691KVM_CREATE_IRQCHIP into a buffer provided by the caller.
692
693struct kvm_irqchip {
694	__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
695	__u32 pad;
696        union {
697		char dummy[512];  /* reserving space */
698		struct kvm_pic_state pic;
699		struct kvm_ioapic_state ioapic;
700	} chip;
701};
702
703
7044.27 KVM_SET_IRQCHIP
705
706Capability: KVM_CAP_IRQCHIP
707Architectures: x86
708Type: vm ioctl
709Parameters: struct kvm_irqchip (in)
710Returns: 0 on success, -1 on error
711
712Sets the state of a kernel interrupt controller created with
713KVM_CREATE_IRQCHIP from a buffer provided by the caller.
714
715struct kvm_irqchip {
716	__u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */
717	__u32 pad;
718        union {
719		char dummy[512];  /* reserving space */
720		struct kvm_pic_state pic;
721		struct kvm_ioapic_state ioapic;
722	} chip;
723};
724
725
7264.28 KVM_XEN_HVM_CONFIG
727
728Capability: KVM_CAP_XEN_HVM
729Architectures: x86
730Type: vm ioctl
731Parameters: struct kvm_xen_hvm_config (in)
732Returns: 0 on success, -1 on error
733
734Sets the MSR that the Xen HVM guest uses to initialize its hypercall
735page, and provides the starting address and size of the hypercall
736blobs in userspace.  When the guest writes the MSR, kvm copies one
737page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
738memory.
739
740struct kvm_xen_hvm_config {
741	__u32 flags;
742	__u32 msr;
743	__u64 blob_addr_32;
744	__u64 blob_addr_64;
745	__u8 blob_size_32;
746	__u8 blob_size_64;
747	__u8 pad2[30];
748};
749
750
7514.29 KVM_GET_CLOCK
752
753Capability: KVM_CAP_ADJUST_CLOCK
754Architectures: x86
755Type: vm ioctl
756Parameters: struct kvm_clock_data (out)
757Returns: 0 on success, -1 on error
758
759Gets the current timestamp of kvmclock as seen by the current guest. In
760conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
761such as migration.
762
763struct kvm_clock_data {
764	__u64 clock;  /* kvmclock current value */
765	__u32 flags;
766	__u32 pad[9];
767};
768
769
7704.30 KVM_SET_CLOCK
771
772Capability: KVM_CAP_ADJUST_CLOCK
773Architectures: x86
774Type: vm ioctl
775Parameters: struct kvm_clock_data (in)
776Returns: 0 on success, -1 on error
777
778Sets the current timestamp of kvmclock to the value specified in its parameter.
779In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
780such as migration.
781
782struct kvm_clock_data {
783	__u64 clock;  /* kvmclock current value */
784	__u32 flags;
785	__u32 pad[9];
786};
787
788
7894.31 KVM_GET_VCPU_EVENTS
790
791Capability: KVM_CAP_VCPU_EVENTS
792Extended by: KVM_CAP_INTR_SHADOW
793Architectures: x86
794Type: vm ioctl
795Parameters: struct kvm_vcpu_event (out)
796Returns: 0 on success, -1 on error
797
798Gets currently pending exceptions, interrupts, and NMIs as well as related
799states of the vcpu.
800
801struct kvm_vcpu_events {
802	struct {
803		__u8 injected;
804		__u8 nr;
805		__u8 has_error_code;
806		__u8 pad;
807		__u32 error_code;
808	} exception;
809	struct {
810		__u8 injected;
811		__u8 nr;
812		__u8 soft;
813		__u8 shadow;
814	} interrupt;
815	struct {
816		__u8 injected;
817		__u8 pending;
818		__u8 masked;
819		__u8 pad;
820	} nmi;
821	__u32 sipi_vector;
822	__u32 flags;
823};
824
825KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
826interrupt.shadow contains a valid state. Otherwise, this field is undefined.
827
828
8294.32 KVM_SET_VCPU_EVENTS
830
831Capability: KVM_CAP_VCPU_EVENTS
832Extended by: KVM_CAP_INTR_SHADOW
833Architectures: x86
834Type: vm ioctl
835Parameters: struct kvm_vcpu_event (in)
836Returns: 0 on success, -1 on error
837
838Set pending exceptions, interrupts, and NMIs as well as related states of the
839vcpu.
840
841See KVM_GET_VCPU_EVENTS for the data structure.
842
843Fields that may be modified asynchronously by running VCPUs can be excluded
844from the update. These fields are nmi.pending and sipi_vector. Keep the
845corresponding bits in the flags field cleared to suppress overwriting the
846current in-kernel state. The bits are:
847
848KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
849KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
850
851If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
852the flags field to signal that interrupt.shadow contains a valid state and
853shall be written into the VCPU.
854
855
8564.33 KVM_GET_DEBUGREGS
857
858Capability: KVM_CAP_DEBUGREGS
859Architectures: x86
860Type: vm ioctl
861Parameters: struct kvm_debugregs (out)
862Returns: 0 on success, -1 on error
863
864Reads debug registers from the vcpu.
865
866struct kvm_debugregs {
867	__u64 db[4];
868	__u64 dr6;
869	__u64 dr7;
870	__u64 flags;
871	__u64 reserved[9];
872};
873
874
8754.34 KVM_SET_DEBUGREGS
876
877Capability: KVM_CAP_DEBUGREGS
878Architectures: x86
879Type: vm ioctl
880Parameters: struct kvm_debugregs (in)
881Returns: 0 on success, -1 on error
882
883Writes debug registers into the vcpu.
884
885See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
886yet and must be cleared on entry.
887
888
8894.35 KVM_SET_USER_MEMORY_REGION
890
891Capability: KVM_CAP_USER_MEM
892Architectures: all
893Type: vm ioctl
894Parameters: struct kvm_userspace_memory_region (in)
895Returns: 0 on success, -1 on error
896
897struct kvm_userspace_memory_region {
898	__u32 slot;
899	__u32 flags;
900	__u64 guest_phys_addr;
901	__u64 memory_size; /* bytes */
902	__u64 userspace_addr; /* start of the userspace allocated memory */
903};
904
905/* for kvm_memory_region::flags */
906#define KVM_MEM_LOG_DIRTY_PAGES	(1UL << 0)
907#define KVM_MEM_READONLY	(1UL << 1)
908
909This ioctl allows the user to create or modify a guest physical memory
910slot.  When changing an existing slot, it may be moved in the guest
911physical memory space, or its flags may be modified.  It may not be
912resized.  Slots may not overlap in guest physical address space.
913
914Memory for the region is taken starting at the address denoted by the
915field userspace_addr, which must point at user addressable memory for
916the entire memory slot size.  Any object may back this memory, including
917anonymous memory, ordinary files, and hugetlbfs.
918
919It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
920be identical.  This allows large pages in the guest to be backed by large
921pages in the host.
922
923The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
924KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
925writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
926use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
927to make a new slot read-only.  In this case, writes to this memory will be
928posted to userspace as KVM_EXIT_MMIO exits.
929
930When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
931the memory region are automatically reflected into the guest.  For example, an
932mmap() that affects the region will be made visible immediately.  Another
933example is madvise(MADV_DROP).
934
935It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
936The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
937allocation and is deprecated.
938
939
9404.36 KVM_SET_TSS_ADDR
941
942Capability: KVM_CAP_SET_TSS_ADDR
943Architectures: x86
944Type: vm ioctl
945Parameters: unsigned long tss_address (in)
946Returns: 0 on success, -1 on error
947
948This ioctl defines the physical address of a three-page region in the guest
949physical address space.  The region must be within the first 4GB of the
950guest physical address space and must not conflict with any memory slot
951or any mmio address.  The guest may malfunction if it accesses this memory
952region.
953
954This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
955because of a quirk in the virtualization implementation (see the internals
956documentation when it pops into existence).
957
958
9594.37 KVM_ENABLE_CAP
960
961Capability: KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM
962Architectures: ppc, s390
963Type: vcpu ioctl, vm ioctl (with KVM_CAP_ENABLE_CAP_VM)
964Parameters: struct kvm_enable_cap (in)
965Returns: 0 on success; -1 on error
966
967+Not all extensions are enabled by default. Using this ioctl the application
968can enable an extension, making it available to the guest.
969
970On systems that do not support this ioctl, it always fails. On systems that
971do support it, it only works for extensions that are supported for enablement.
972
973To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should
974be used.
975
976struct kvm_enable_cap {
977       /* in */
978       __u32 cap;
979
980The capability that is supposed to get enabled.
981
982       __u32 flags;
983
984A bitfield indicating future enhancements. Has to be 0 for now.
985
986       __u64 args[4];
987
988Arguments for enabling a feature. If a feature needs initial values to
989function properly, this is the place to put them.
990
991       __u8  pad[64];
992};
993
994The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
995for vm-wide capabilities.
996
9974.38 KVM_GET_MP_STATE
998
999Capability: KVM_CAP_MP_STATE
1000Architectures: x86, s390, arm, arm64
1001Type: vcpu ioctl
1002Parameters: struct kvm_mp_state (out)
1003Returns: 0 on success; -1 on error
1004
1005struct kvm_mp_state {
1006	__u32 mp_state;
1007};
1008
1009Returns the vcpu's current "multiprocessing state" (though also valid on
1010uniprocessor guests).
1011
1012Possible values are:
1013
1014 - KVM_MP_STATE_RUNNABLE:        the vcpu is currently running [x86,arm/arm64]
1015 - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)
1016                                 which has not yet received an INIT signal [x86]
1017 - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is
1018                                 now ready for a SIPI [x86]
1019 - KVM_MP_STATE_HALTED:          the vcpu has executed a HLT instruction and
1020                                 is waiting for an interrupt [x86]
1021 - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector
1022                                 accessible via KVM_GET_VCPU_EVENTS) [x86]
1023 - KVM_MP_STATE_STOPPED:         the vcpu is stopped [s390,arm/arm64]
1024 - KVM_MP_STATE_CHECK_STOP:      the vcpu is in a special error state [s390]
1025 - KVM_MP_STATE_OPERATING:       the vcpu is operating (running or halted)
1026                                 [s390]
1027 - KVM_MP_STATE_LOAD:            the vcpu is in a special load/startup state
1028                                 [s390]
1029
1030On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
1031in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1032these architectures.
1033
1034For arm/arm64:
1035
1036The only states that are valid are KVM_MP_STATE_STOPPED and
1037KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
1038
10394.39 KVM_SET_MP_STATE
1040
1041Capability: KVM_CAP_MP_STATE
1042Architectures: x86, s390, arm, arm64
1043Type: vcpu ioctl
1044Parameters: struct kvm_mp_state (in)
1045Returns: 0 on success; -1 on error
1046
1047Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for
1048arguments.
1049
1050On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
1051in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1052these architectures.
1053
1054For arm/arm64:
1055
1056The only states that are valid are KVM_MP_STATE_STOPPED and
1057KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
1058
10594.40 KVM_SET_IDENTITY_MAP_ADDR
1060
1061Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
1062Architectures: x86
1063Type: vm ioctl
1064Parameters: unsigned long identity (in)
1065Returns: 0 on success, -1 on error
1066
1067This ioctl defines the physical address of a one-page region in the guest
1068physical address space.  The region must be within the first 4GB of the
1069guest physical address space and must not conflict with any memory slot
1070or any mmio address.  The guest may malfunction if it accesses this memory
1071region.
1072
1073This ioctl is required on Intel-based hosts.  This is needed on Intel hardware
1074because of a quirk in the virtualization implementation (see the internals
1075documentation when it pops into existence).
1076
1077
10784.41 KVM_SET_BOOT_CPU_ID
1079
1080Capability: KVM_CAP_SET_BOOT_CPU_ID
1081Architectures: x86
1082Type: vm ioctl
1083Parameters: unsigned long vcpu_id
1084Returns: 0 on success, -1 on error
1085
1086Define which vcpu is the Bootstrap Processor (BSP).  Values are the same
1087as the vcpu id in KVM_CREATE_VCPU.  If this ioctl is not called, the default
1088is vcpu 0.
1089
1090
10914.42 KVM_GET_XSAVE
1092
1093Capability: KVM_CAP_XSAVE
1094Architectures: x86
1095Type: vcpu ioctl
1096Parameters: struct kvm_xsave (out)
1097Returns: 0 on success, -1 on error
1098
1099struct kvm_xsave {
1100	__u32 region[1024];
1101};
1102
1103This ioctl would copy current vcpu's xsave struct to the userspace.
1104
1105
11064.43 KVM_SET_XSAVE
1107
1108Capability: KVM_CAP_XSAVE
1109Architectures: x86
1110Type: vcpu ioctl
1111Parameters: struct kvm_xsave (in)
1112Returns: 0 on success, -1 on error
1113
1114struct kvm_xsave {
1115	__u32 region[1024];
1116};
1117
1118This ioctl would copy userspace's xsave struct to the kernel.
1119
1120
11214.44 KVM_GET_XCRS
1122
1123Capability: KVM_CAP_XCRS
1124Architectures: x86
1125Type: vcpu ioctl
1126Parameters: struct kvm_xcrs (out)
1127Returns: 0 on success, -1 on error
1128
1129struct kvm_xcr {
1130	__u32 xcr;
1131	__u32 reserved;
1132	__u64 value;
1133};
1134
1135struct kvm_xcrs {
1136	__u32 nr_xcrs;
1137	__u32 flags;
1138	struct kvm_xcr xcrs[KVM_MAX_XCRS];
1139	__u64 padding[16];
1140};
1141
1142This ioctl would copy current vcpu's xcrs to the userspace.
1143
1144
11454.45 KVM_SET_XCRS
1146
1147Capability: KVM_CAP_XCRS
1148Architectures: x86
1149Type: vcpu ioctl
1150Parameters: struct kvm_xcrs (in)
1151Returns: 0 on success, -1 on error
1152
1153struct kvm_xcr {
1154	__u32 xcr;
1155	__u32 reserved;
1156	__u64 value;
1157};
1158
1159struct kvm_xcrs {
1160	__u32 nr_xcrs;
1161	__u32 flags;
1162	struct kvm_xcr xcrs[KVM_MAX_XCRS];
1163	__u64 padding[16];
1164};
1165
1166This ioctl would set vcpu's xcr to the value userspace specified.
1167
1168
11694.46 KVM_GET_SUPPORTED_CPUID
1170
1171Capability: KVM_CAP_EXT_CPUID
1172Architectures: x86
1173Type: system ioctl
1174Parameters: struct kvm_cpuid2 (in/out)
1175Returns: 0 on success, -1 on error
1176
1177struct kvm_cpuid2 {
1178	__u32 nent;
1179	__u32 padding;
1180	struct kvm_cpuid_entry2 entries[0];
1181};
1182
1183#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
1184#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
1185#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
1186
1187struct kvm_cpuid_entry2 {
1188	__u32 function;
1189	__u32 index;
1190	__u32 flags;
1191	__u32 eax;
1192	__u32 ebx;
1193	__u32 ecx;
1194	__u32 edx;
1195	__u32 padding[3];
1196};
1197
1198This ioctl returns x86 cpuid features which are supported by both the hardware
1199and kvm.  Userspace can use the information returned by this ioctl to
1200construct cpuid information (for KVM_SET_CPUID2) that is consistent with
1201hardware, kernel, and userspace capabilities, and with user requirements (for
1202example, the user may wish to constrain cpuid to emulate older hardware,
1203or for feature consistency across a cluster).
1204
1205Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
1206with the 'nent' field indicating the number of entries in the variable-size
1207array 'entries'.  If the number of entries is too low to describe the cpu
1208capabilities, an error (E2BIG) is returned.  If the number is too high,
1209the 'nent' field is adjusted and an error (ENOMEM) is returned.  If the
1210number is just right, the 'nent' field is adjusted to the number of valid
1211entries in the 'entries' array, which is then filled.
1212
1213The entries returned are the host cpuid as returned by the cpuid instruction,
1214with unknown or unsupported features masked out.  Some features (for example,
1215x2apic), may not be present in the host cpu, but are exposed by kvm if it can
1216emulate them efficiently. The fields in each entry are defined as follows:
1217
1218  function: the eax value used to obtain the entry
1219  index: the ecx value used to obtain the entry (for entries that are
1220         affected by ecx)
1221  flags: an OR of zero or more of the following:
1222        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
1223           if the index field is valid
1224        KVM_CPUID_FLAG_STATEFUL_FUNC:
1225           if cpuid for this function returns different values for successive
1226           invocations; there will be several entries with the same function,
1227           all with this flag set
1228        KVM_CPUID_FLAG_STATE_READ_NEXT:
1229           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
1230           the first entry to be read by a cpu
1231   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
1232         this function/index combination
1233
1234The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
1235as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
1236support.  Instead it is reported via
1237
1238  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
1239
1240if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
1241feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
1242
1243
12444.47 KVM_PPC_GET_PVINFO
1245
1246Capability: KVM_CAP_PPC_GET_PVINFO
1247Architectures: ppc
1248Type: vm ioctl
1249Parameters: struct kvm_ppc_pvinfo (out)
1250Returns: 0 on success, !0 on error
1251
1252struct kvm_ppc_pvinfo {
1253	__u32 flags;
1254	__u32 hcall[4];
1255	__u8  pad[108];
1256};
1257
1258This ioctl fetches PV specific information that need to be passed to the guest
1259using the device tree or other means from vm context.
1260
1261The hcall array defines 4 instructions that make up a hypercall.
1262
1263If any additional field gets added to this structure later on, a bit for that
1264additional piece of information will be set in the flags bitmap.
1265
1266The flags bitmap is defined as:
1267
1268   /* the host supports the ePAPR idle hcall
1269   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)
1270
12714.48 KVM_ASSIGN_PCI_DEVICE
1272
1273Capability: none
1274Architectures: x86
1275Type: vm ioctl
1276Parameters: struct kvm_assigned_pci_dev (in)
1277Returns: 0 on success, -1 on error
1278
1279Assigns a host PCI device to the VM.
1280
1281struct kvm_assigned_pci_dev {
1282	__u32 assigned_dev_id;
1283	__u32 busnr;
1284	__u32 devfn;
1285	__u32 flags;
1286	__u32 segnr;
1287	union {
1288		__u32 reserved[11];
1289	};
1290};
1291
1292The PCI device is specified by the triple segnr, busnr, and devfn.
1293Identification in succeeding service requests is done via assigned_dev_id. The
1294following flags are specified:
1295
1296/* Depends on KVM_CAP_IOMMU */
1297#define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
1298/* The following two depend on KVM_CAP_PCI_2_3 */
1299#define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
1300#define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
1301
1302If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
1303via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other
1304assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the
1305guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.
1306
1307The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
1308isolation of the device.  Usages not specifying this flag are deprecated.
1309
1310Only PCI header type 0 devices with PCI BAR resources are supported by
1311device assignment.  The user requesting this ioctl must have read/write
1312access to the PCI sysfs resource files associated with the device.
1313
1314Errors:
1315  ENOTTY: kernel does not support this ioctl
1316
1317  Other error conditions may be defined by individual device types or
1318  have their standard meanings.
1319
1320
13214.49 KVM_DEASSIGN_PCI_DEVICE
1322
1323Capability: none
1324Architectures: x86
1325Type: vm ioctl
1326Parameters: struct kvm_assigned_pci_dev (in)
1327Returns: 0 on success, -1 on error
1328
1329Ends PCI device assignment, releasing all associated resources.
1330
1331See KVM_ASSIGN_PCI_DEVICE for the data structure. Only assigned_dev_id is
1332used in kvm_assigned_pci_dev to identify the device.
1333
1334Errors:
1335  ENOTTY: kernel does not support this ioctl
1336
1337  Other error conditions may be defined by individual device types or
1338  have their standard meanings.
1339
13404.50 KVM_ASSIGN_DEV_IRQ
1341
1342Capability: KVM_CAP_ASSIGN_DEV_IRQ
1343Architectures: x86
1344Type: vm ioctl
1345Parameters: struct kvm_assigned_irq (in)
1346Returns: 0 on success, -1 on error
1347
1348Assigns an IRQ to a passed-through device.
1349
1350struct kvm_assigned_irq {
1351	__u32 assigned_dev_id;
1352	__u32 host_irq; /* ignored (legacy field) */
1353	__u32 guest_irq;
1354	__u32 flags;
1355	union {
1356		__u32 reserved[12];
1357	};
1358};
1359
1360The following flags are defined:
1361
1362#define KVM_DEV_IRQ_HOST_INTX    (1 << 0)
1363#define KVM_DEV_IRQ_HOST_MSI     (1 << 1)
1364#define KVM_DEV_IRQ_HOST_MSIX    (1 << 2)
1365
1366#define KVM_DEV_IRQ_GUEST_INTX   (1 << 8)
1367#define KVM_DEV_IRQ_GUEST_MSI    (1 << 9)
1368#define KVM_DEV_IRQ_GUEST_MSIX   (1 << 10)
1369
1370It is not valid to specify multiple types per host or guest IRQ. However, the
1371IRQ type of host and guest can differ or can even be null.
1372
1373Errors:
1374  ENOTTY: kernel does not support this ioctl
1375
1376  Other error conditions may be defined by individual device types or
1377  have their standard meanings.
1378
1379
13804.51 KVM_DEASSIGN_DEV_IRQ
1381
1382Capability: KVM_CAP_ASSIGN_DEV_IRQ
1383Architectures: x86
1384Type: vm ioctl
1385Parameters: struct kvm_assigned_irq (in)
1386Returns: 0 on success, -1 on error
1387
1388Ends an IRQ assignment to a passed-through device.
1389
1390See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
1391by assigned_dev_id, flags must correspond to the IRQ type specified on
1392KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
1393
1394
13954.52 KVM_SET_GSI_ROUTING
1396
1397Capability: KVM_CAP_IRQ_ROUTING
1398Architectures: x86 s390
1399Type: vm ioctl
1400Parameters: struct kvm_irq_routing (in)
1401Returns: 0 on success, -1 on error
1402
1403Sets the GSI routing table entries, overwriting any previously set entries.
1404
1405struct kvm_irq_routing {
1406	__u32 nr;
1407	__u32 flags;
1408	struct kvm_irq_routing_entry entries[0];
1409};
1410
1411No flags are specified so far, the corresponding field must be set to zero.
1412
1413struct kvm_irq_routing_entry {
1414	__u32 gsi;
1415	__u32 type;
1416	__u32 flags;
1417	__u32 pad;
1418	union {
1419		struct kvm_irq_routing_irqchip irqchip;
1420		struct kvm_irq_routing_msi msi;
1421		struct kvm_irq_routing_s390_adapter adapter;
1422		__u32 pad[8];
1423	} u;
1424};
1425
1426/* gsi routing entry types */
1427#define KVM_IRQ_ROUTING_IRQCHIP 1
1428#define KVM_IRQ_ROUTING_MSI 2
1429#define KVM_IRQ_ROUTING_S390_ADAPTER 3
1430
1431No flags are specified so far, the corresponding field must be set to zero.
1432
1433struct kvm_irq_routing_irqchip {
1434	__u32 irqchip;
1435	__u32 pin;
1436};
1437
1438struct kvm_irq_routing_msi {
1439	__u32 address_lo;
1440	__u32 address_hi;
1441	__u32 data;
1442	__u32 pad;
1443};
1444
1445struct kvm_irq_routing_s390_adapter {
1446	__u64 ind_addr;
1447	__u64 summary_addr;
1448	__u64 ind_offset;
1449	__u32 summary_offset;
1450	__u32 adapter_id;
1451};
1452
1453
14544.53 KVM_ASSIGN_SET_MSIX_NR
1455
1456Capability: none
1457Architectures: x86
1458Type: vm ioctl
1459Parameters: struct kvm_assigned_msix_nr (in)
1460Returns: 0 on success, -1 on error
1461
1462Set the number of MSI-X interrupts for an assigned device. The number is
1463reset again by terminating the MSI-X assignment of the device via
1464KVM_DEASSIGN_DEV_IRQ. Calling this service more than once at any earlier
1465point will fail.
1466
1467struct kvm_assigned_msix_nr {
1468	__u32 assigned_dev_id;
1469	__u16 entry_nr;
1470	__u16 padding;
1471};
1472
1473#define KVM_MAX_MSIX_PER_DEV		256
1474
1475
14764.54 KVM_ASSIGN_SET_MSIX_ENTRY
1477
1478Capability: none
1479Architectures: x86
1480Type: vm ioctl
1481Parameters: struct kvm_assigned_msix_entry (in)
1482Returns: 0 on success, -1 on error
1483
1484Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting
1485the GSI vector to zero means disabling the interrupt.
1486
1487struct kvm_assigned_msix_entry {
1488	__u32 assigned_dev_id;
1489	__u32 gsi;
1490	__u16 entry; /* The index of entry in the MSI-X table */
1491	__u16 padding[3];
1492};
1493
1494Errors:
1495  ENOTTY: kernel does not support this ioctl
1496
1497  Other error conditions may be defined by individual device types or
1498  have their standard meanings.
1499
1500
15014.55 KVM_SET_TSC_KHZ
1502
1503Capability: KVM_CAP_TSC_CONTROL
1504Architectures: x86
1505Type: vcpu ioctl
1506Parameters: virtual tsc_khz
1507Returns: 0 on success, -1 on error
1508
1509Specifies the tsc frequency for the virtual machine. The unit of the
1510frequency is KHz.
1511
1512
15134.56 KVM_GET_TSC_KHZ
1514
1515Capability: KVM_CAP_GET_TSC_KHZ
1516Architectures: x86
1517Type: vcpu ioctl
1518Parameters: none
1519Returns: virtual tsc-khz on success, negative value on error
1520
1521Returns the tsc frequency of the guest. The unit of the return value is
1522KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
1523error.
1524
1525
15264.57 KVM_GET_LAPIC
1527
1528Capability: KVM_CAP_IRQCHIP
1529Architectures: x86
1530Type: vcpu ioctl
1531Parameters: struct kvm_lapic_state (out)
1532Returns: 0 on success, -1 on error
1533
1534#define KVM_APIC_REG_SIZE 0x400
1535struct kvm_lapic_state {
1536	char regs[KVM_APIC_REG_SIZE];
1537};
1538
1539Reads the Local APIC registers and copies them into the input argument.  The
1540data format and layout are the same as documented in the architecture manual.
1541
1542
15434.58 KVM_SET_LAPIC
1544
1545Capability: KVM_CAP_IRQCHIP
1546Architectures: x86
1547Type: vcpu ioctl
1548Parameters: struct kvm_lapic_state (in)
1549Returns: 0 on success, -1 on error
1550
1551#define KVM_APIC_REG_SIZE 0x400
1552struct kvm_lapic_state {
1553	char regs[KVM_APIC_REG_SIZE];
1554};
1555
1556Copies the input argument into the Local APIC registers.  The data format
1557and layout are the same as documented in the architecture manual.
1558
1559
15604.59 KVM_IOEVENTFD
1561
1562Capability: KVM_CAP_IOEVENTFD
1563Architectures: all
1564Type: vm ioctl
1565Parameters: struct kvm_ioeventfd (in)
1566Returns: 0 on success, !0 on error
1567
1568This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address
1569within the guest.  A guest write in the registered address will signal the
1570provided event instead of triggering an exit.
1571
1572struct kvm_ioeventfd {
1573	__u64 datamatch;
1574	__u64 addr;        /* legal pio/mmio address */
1575	__u32 len;         /* 1, 2, 4, or 8 bytes    */
1576	__s32 fd;
1577	__u32 flags;
1578	__u8  pad[36];
1579};
1580
1581For the special case of virtio-ccw devices on s390, the ioevent is matched
1582to a subchannel/virtqueue tuple instead.
1583
1584The following flags are defined:
1585
1586#define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)
1587#define KVM_IOEVENTFD_FLAG_PIO       (1 << kvm_ioeventfd_flag_nr_pio)
1588#define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
1589#define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
1590	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
1591
1592If datamatch flag is set, the event will be signaled only if the written value
1593to the registered address is equal to datamatch in struct kvm_ioeventfd.
1594
1595For virtio-ccw devices, addr contains the subchannel id and datamatch the
1596virtqueue index.
1597
1598
15994.60 KVM_DIRTY_TLB
1600
1601Capability: KVM_CAP_SW_TLB
1602Architectures: ppc
1603Type: vcpu ioctl
1604Parameters: struct kvm_dirty_tlb (in)
1605Returns: 0 on success, -1 on error
1606
1607struct kvm_dirty_tlb {
1608	__u64 bitmap;
1609	__u32 num_dirty;
1610};
1611
1612This must be called whenever userspace has changed an entry in the shared
1613TLB, prior to calling KVM_RUN on the associated vcpu.
1614
1615The "bitmap" field is the userspace address of an array.  This array
1616consists of a number of bits, equal to the total number of TLB entries as
1617determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
1618nearest multiple of 64.
1619
1620Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
1621array.
1622
1623The array is little-endian: the bit 0 is the least significant bit of the
1624first byte, bit 8 is the least significant bit of the second byte, etc.
1625This avoids any complications with differing word sizes.
1626
1627The "num_dirty" field is a performance hint for KVM to determine whether it
1628should skip processing the bitmap and just invalidate everything.  It must
1629be set to the number of set bits in the bitmap.
1630
1631
16324.61 KVM_ASSIGN_SET_INTX_MASK
1633
1634Capability: KVM_CAP_PCI_2_3
1635Architectures: x86
1636Type: vm ioctl
1637Parameters: struct kvm_assigned_pci_dev (in)
1638Returns: 0 on success, -1 on error
1639
1640Allows userspace to mask PCI INTx interrupts from the assigned device.  The
1641kernel will not deliver INTx interrupts to the guest between setting and
1642clearing of KVM_ASSIGN_SET_INTX_MASK via this interface.  This enables use of
1643and emulation of PCI 2.3 INTx disable command register behavior.
1644
1645This may be used for both PCI 2.3 devices supporting INTx disable natively and
1646older devices lacking this support. Userspace is responsible for emulating the
1647read value of the INTx disable bit in the guest visible PCI command register.
1648When modifying the INTx disable state, userspace should precede updating the
1649physical device command register by calling this ioctl to inform the kernel of
1650the new intended INTx mask state.
1651
1652Note that the kernel uses the device INTx disable bit to internally manage the
1653device interrupt state for PCI 2.3 devices.  Reads of this register may
1654therefore not match the expected value.  Writes should always use the guest
1655intended INTx disable value rather than attempting to read-copy-update the
1656current physical device state.  Races between user and kernel updates to the
1657INTx disable bit are handled lazily in the kernel.  It's possible the device
1658may generate unintended interrupts, but they will not be injected into the
1659guest.
1660
1661See KVM_ASSIGN_DEV_IRQ for the data structure.  The target device is specified
1662by assigned_dev_id.  In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
1663evaluated.
1664
1665
16664.62 KVM_CREATE_SPAPR_TCE
1667
1668Capability: KVM_CAP_SPAPR_TCE
1669Architectures: powerpc
1670Type: vm ioctl
1671Parameters: struct kvm_create_spapr_tce (in)
1672Returns: file descriptor for manipulating the created TCE table
1673
1674This creates a virtual TCE (translation control entry) table, which
1675is an IOMMU for PAPR-style virtual I/O.  It is used to translate
1676logical addresses used in virtual I/O into guest physical addresses,
1677and provides a scatter/gather capability for PAPR virtual I/O.
1678
1679/* for KVM_CAP_SPAPR_TCE */
1680struct kvm_create_spapr_tce {
1681	__u64 liobn;
1682	__u32 window_size;
1683};
1684
1685The liobn field gives the logical IO bus number for which to create a
1686TCE table.  The window_size field specifies the size of the DMA window
1687which this TCE table will translate - the table will contain one 64
1688bit TCE entry for every 4kiB of the DMA window.
1689
1690When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
1691table has been created using this ioctl(), the kernel will handle it
1692in real mode, updating the TCE table.  H_PUT_TCE calls for other
1693liobns will cause a vm exit and must be handled by userspace.
1694
1695The return value is a file descriptor which can be passed to mmap(2)
1696to map the created TCE table into userspace.  This lets userspace read
1697the entries written by kernel-handled H_PUT_TCE calls, and also lets
1698userspace update the TCE table directly which is useful in some
1699circumstances.
1700
1701
17024.63 KVM_ALLOCATE_RMA
1703
1704Capability: KVM_CAP_PPC_RMA
1705Architectures: powerpc
1706Type: vm ioctl
1707Parameters: struct kvm_allocate_rma (out)
1708Returns: file descriptor for mapping the allocated RMA
1709
1710This allocates a Real Mode Area (RMA) from the pool allocated at boot
1711time by the kernel.  An RMA is a physically-contiguous, aligned region
1712of memory used on older POWER processors to provide the memory which
1713will be accessed by real-mode (MMU off) accesses in a KVM guest.
1714POWER processors support a set of sizes for the RMA that usually
1715includes 64MB, 128MB, 256MB and some larger powers of two.
1716
1717/* for KVM_ALLOCATE_RMA */
1718struct kvm_allocate_rma {
1719	__u64 rma_size;
1720};
1721
1722The return value is a file descriptor which can be passed to mmap(2)
1723to map the allocated RMA into userspace.  The mapped area can then be
1724passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
1725RMA for a virtual machine.  The size of the RMA in bytes (which is
1726fixed at host kernel boot time) is returned in the rma_size field of
1727the argument structure.
1728
1729The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
1730is supported; 2 if the processor requires all virtual machines to have
1731an RMA, or 1 if the processor can use an RMA but doesn't require it,
1732because it supports the Virtual RMA (VRMA) facility.
1733
1734
17354.64 KVM_NMI
1736
1737Capability: KVM_CAP_USER_NMI
1738Architectures: x86
1739Type: vcpu ioctl
1740Parameters: none
1741Returns: 0 on success, -1 on error
1742
1743Queues an NMI on the thread's vcpu.  Note this is well defined only
1744when KVM_CREATE_IRQCHIP has not been called, since this is an interface
1745between the virtual cpu core and virtual local APIC.  After KVM_CREATE_IRQCHIP
1746has been called, this interface is completely emulated within the kernel.
1747
1748To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the
1749following algorithm:
1750
1751  - pause the vpcu
1752  - read the local APIC's state (KVM_GET_LAPIC)
1753  - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
1754  - if so, issue KVM_NMI
1755  - resume the vcpu
1756
1757Some guests configure the LINT1 NMI input to cause a panic, aiding in
1758debugging.
1759
1760
17614.65 KVM_S390_UCAS_MAP
1762
1763Capability: KVM_CAP_S390_UCONTROL
1764Architectures: s390
1765Type: vcpu ioctl
1766Parameters: struct kvm_s390_ucas_mapping (in)
1767Returns: 0 in case of success
1768
1769The parameter is defined like this:
1770	struct kvm_s390_ucas_mapping {
1771		__u64 user_addr;
1772		__u64 vcpu_addr;
1773		__u64 length;
1774	};
1775
1776This ioctl maps the memory at "user_addr" with the length "length" to
1777the vcpu's address space starting at "vcpu_addr". All parameters need to
1778be aligned by 1 megabyte.
1779
1780
17814.66 KVM_S390_UCAS_UNMAP
1782
1783Capability: KVM_CAP_S390_UCONTROL
1784Architectures: s390
1785Type: vcpu ioctl
1786Parameters: struct kvm_s390_ucas_mapping (in)
1787Returns: 0 in case of success
1788
1789The parameter is defined like this:
1790	struct kvm_s390_ucas_mapping {
1791		__u64 user_addr;
1792		__u64 vcpu_addr;
1793		__u64 length;
1794	};
1795
1796This ioctl unmaps the memory in the vcpu's address space starting at
1797"vcpu_addr" with the length "length". The field "user_addr" is ignored.
1798All parameters need to be aligned by 1 megabyte.
1799
1800
18014.67 KVM_S390_VCPU_FAULT
1802
1803Capability: KVM_CAP_S390_UCONTROL
1804Architectures: s390
1805Type: vcpu ioctl
1806Parameters: vcpu absolute address (in)
1807Returns: 0 in case of success
1808
1809This call creates a page table entry on the virtual cpu's address space
1810(for user controlled virtual machines) or the virtual machine's address
1811space (for regular virtual machines). This only works for minor faults,
1812thus it's recommended to access subject memory page via the user page
1813table upfront. This is useful to handle validity intercepts for user
1814controlled virtual machines to fault in the virtual cpu's lowcore pages
1815prior to calling the KVM_RUN ioctl.
1816
1817
18184.68 KVM_SET_ONE_REG
1819
1820Capability: KVM_CAP_ONE_REG
1821Architectures: all
1822Type: vcpu ioctl
1823Parameters: struct kvm_one_reg (in)
1824Returns: 0 on success, negative value on failure
1825
1826struct kvm_one_reg {
1827       __u64 id;
1828       __u64 addr;
1829};
1830
1831Using this ioctl, a single vcpu register can be set to a specific value
1832defined by user space with the passed in struct kvm_one_reg, where id
1833refers to the register identifier as described below and addr is a pointer
1834to a variable with the respective size. There can be architecture agnostic
1835and architecture specific registers. Each have their own range of operation
1836and their own constants and width. To keep track of the implemented
1837registers, find a list below:
1838
1839  Arch  |           Register            | Width (bits)
1840        |                               |
1841  PPC   | KVM_REG_PPC_HIOR              | 64
1842  PPC   | KVM_REG_PPC_IAC1              | 64
1843  PPC   | KVM_REG_PPC_IAC2              | 64
1844  PPC   | KVM_REG_PPC_IAC3              | 64
1845  PPC   | KVM_REG_PPC_IAC4              | 64
1846  PPC   | KVM_REG_PPC_DAC1              | 64
1847  PPC   | KVM_REG_PPC_DAC2              | 64
1848  PPC   | KVM_REG_PPC_DABR              | 64
1849  PPC   | KVM_REG_PPC_DSCR              | 64
1850  PPC   | KVM_REG_PPC_PURR              | 64
1851  PPC   | KVM_REG_PPC_SPURR             | 64
1852  PPC   | KVM_REG_PPC_DAR               | 64
1853  PPC   | KVM_REG_PPC_DSISR             | 32
1854  PPC   | KVM_REG_PPC_AMR               | 64
1855  PPC   | KVM_REG_PPC_UAMOR             | 64
1856  PPC   | KVM_REG_PPC_MMCR0             | 64
1857  PPC   | KVM_REG_PPC_MMCR1             | 64
1858  PPC   | KVM_REG_PPC_MMCRA             | 64
1859  PPC   | KVM_REG_PPC_MMCR2             | 64
1860  PPC   | KVM_REG_PPC_MMCRS             | 64
1861  PPC   | KVM_REG_PPC_SIAR              | 64
1862  PPC   | KVM_REG_PPC_SDAR              | 64
1863  PPC   | KVM_REG_PPC_SIER              | 64
1864  PPC   | KVM_REG_PPC_PMC1              | 32
1865  PPC   | KVM_REG_PPC_PMC2              | 32
1866  PPC   | KVM_REG_PPC_PMC3              | 32
1867  PPC   | KVM_REG_PPC_PMC4              | 32
1868  PPC   | KVM_REG_PPC_PMC5              | 32
1869  PPC   | KVM_REG_PPC_PMC6              | 32
1870  PPC   | KVM_REG_PPC_PMC7              | 32
1871  PPC   | KVM_REG_PPC_PMC8              | 32
1872  PPC   | KVM_REG_PPC_FPR0              | 64
1873          ...
1874  PPC   | KVM_REG_PPC_FPR31             | 64
1875  PPC   | KVM_REG_PPC_VR0               | 128
1876          ...
1877  PPC   | KVM_REG_PPC_VR31              | 128
1878  PPC   | KVM_REG_PPC_VSR0              | 128
1879          ...
1880  PPC   | KVM_REG_PPC_VSR31             | 128
1881  PPC   | KVM_REG_PPC_FPSCR             | 64
1882  PPC   | KVM_REG_PPC_VSCR              | 32
1883  PPC   | KVM_REG_PPC_VPA_ADDR          | 64
1884  PPC   | KVM_REG_PPC_VPA_SLB           | 128
1885  PPC   | KVM_REG_PPC_VPA_DTL           | 128
1886  PPC   | KVM_REG_PPC_EPCR              | 32
1887  PPC   | KVM_REG_PPC_EPR               | 32
1888  PPC   | KVM_REG_PPC_TCR               | 32
1889  PPC   | KVM_REG_PPC_TSR               | 32
1890  PPC   | KVM_REG_PPC_OR_TSR            | 32
1891  PPC   | KVM_REG_PPC_CLEAR_TSR         | 32
1892  PPC   | KVM_REG_PPC_MAS0              | 32
1893  PPC   | KVM_REG_PPC_MAS1              | 32
1894  PPC   | KVM_REG_PPC_MAS2              | 64
1895  PPC   | KVM_REG_PPC_MAS7_3            | 64
1896  PPC   | KVM_REG_PPC_MAS4              | 32
1897  PPC   | KVM_REG_PPC_MAS6              | 32
1898  PPC   | KVM_REG_PPC_MMUCFG            | 32
1899  PPC   | KVM_REG_PPC_TLB0CFG           | 32
1900  PPC   | KVM_REG_PPC_TLB1CFG           | 32
1901  PPC   | KVM_REG_PPC_TLB2CFG           | 32
1902  PPC   | KVM_REG_PPC_TLB3CFG           | 32
1903  PPC   | KVM_REG_PPC_TLB0PS            | 32
1904  PPC   | KVM_REG_PPC_TLB1PS            | 32
1905  PPC   | KVM_REG_PPC_TLB2PS            | 32
1906  PPC   | KVM_REG_PPC_TLB3PS            | 32
1907  PPC   | KVM_REG_PPC_EPTCFG            | 32
1908  PPC   | KVM_REG_PPC_ICP_STATE         | 64
1909  PPC   | KVM_REG_PPC_TB_OFFSET         | 64
1910  PPC   | KVM_REG_PPC_SPMC1             | 32
1911  PPC   | KVM_REG_PPC_SPMC2             | 32
1912  PPC   | KVM_REG_PPC_IAMR              | 64
1913  PPC   | KVM_REG_PPC_TFHAR             | 64
1914  PPC   | KVM_REG_PPC_TFIAR             | 64
1915  PPC   | KVM_REG_PPC_TEXASR            | 64
1916  PPC   | KVM_REG_PPC_FSCR              | 64
1917  PPC   | KVM_REG_PPC_PSPB              | 32
1918  PPC   | KVM_REG_PPC_EBBHR             | 64
1919  PPC   | KVM_REG_PPC_EBBRR             | 64
1920  PPC   | KVM_REG_PPC_BESCR             | 64
1921  PPC   | KVM_REG_PPC_TAR               | 64
1922  PPC   | KVM_REG_PPC_DPDES             | 64
1923  PPC   | KVM_REG_PPC_DAWR              | 64
1924  PPC   | KVM_REG_PPC_DAWRX             | 64
1925  PPC   | KVM_REG_PPC_CIABR             | 64
1926  PPC   | KVM_REG_PPC_IC                | 64
1927  PPC   | KVM_REG_PPC_VTB               | 64
1928  PPC   | KVM_REG_PPC_CSIGR             | 64
1929  PPC   | KVM_REG_PPC_TACR              | 64
1930  PPC   | KVM_REG_PPC_TCSCR             | 64
1931  PPC   | KVM_REG_PPC_PID               | 64
1932  PPC   | KVM_REG_PPC_ACOP              | 64
1933  PPC   | KVM_REG_PPC_VRSAVE            | 32
1934  PPC   | KVM_REG_PPC_LPCR              | 32
1935  PPC   | KVM_REG_PPC_LPCR_64           | 64
1936  PPC   | KVM_REG_PPC_PPR               | 64
1937  PPC   | KVM_REG_PPC_ARCH_COMPAT       | 32
1938  PPC   | KVM_REG_PPC_DABRX             | 32
1939  PPC   | KVM_REG_PPC_WORT              | 64
1940  PPC	| KVM_REG_PPC_SPRG9             | 64
1941  PPC	| KVM_REG_PPC_DBSR              | 32
1942  PPC   | KVM_REG_PPC_TM_GPR0           | 64
1943          ...
1944  PPC   | KVM_REG_PPC_TM_GPR31          | 64
1945  PPC   | KVM_REG_PPC_TM_VSR0           | 128
1946          ...
1947  PPC   | KVM_REG_PPC_TM_VSR63          | 128
1948  PPC   | KVM_REG_PPC_TM_CR             | 64
1949  PPC   | KVM_REG_PPC_TM_LR             | 64
1950  PPC   | KVM_REG_PPC_TM_CTR            | 64
1951  PPC   | KVM_REG_PPC_TM_FPSCR          | 64
1952  PPC   | KVM_REG_PPC_TM_AMR            | 64
1953  PPC   | KVM_REG_PPC_TM_PPR            | 64
1954  PPC   | KVM_REG_PPC_TM_VRSAVE         | 64
1955  PPC   | KVM_REG_PPC_TM_VSCR           | 32
1956  PPC   | KVM_REG_PPC_TM_DSCR           | 64
1957  PPC   | KVM_REG_PPC_TM_TAR            | 64
1958        |                               |
1959  MIPS  | KVM_REG_MIPS_R0               | 64
1960          ...
1961  MIPS  | KVM_REG_MIPS_R31              | 64
1962  MIPS  | KVM_REG_MIPS_HI               | 64
1963  MIPS  | KVM_REG_MIPS_LO               | 64
1964  MIPS  | KVM_REG_MIPS_PC               | 64
1965  MIPS  | KVM_REG_MIPS_CP0_INDEX        | 32
1966  MIPS  | KVM_REG_MIPS_CP0_CONTEXT      | 64
1967  MIPS  | KVM_REG_MIPS_CP0_USERLOCAL    | 64
1968  MIPS  | KVM_REG_MIPS_CP0_PAGEMASK     | 32
1969  MIPS  | KVM_REG_MIPS_CP0_WIRED        | 32
1970  MIPS  | KVM_REG_MIPS_CP0_HWRENA       | 32
1971  MIPS  | KVM_REG_MIPS_CP0_BADVADDR     | 64
1972  MIPS  | KVM_REG_MIPS_CP0_COUNT        | 32
1973  MIPS  | KVM_REG_MIPS_CP0_ENTRYHI      | 64
1974  MIPS  | KVM_REG_MIPS_CP0_COMPARE      | 32
1975  MIPS  | KVM_REG_MIPS_CP0_STATUS       | 32
1976  MIPS  | KVM_REG_MIPS_CP0_CAUSE        | 32
1977  MIPS  | KVM_REG_MIPS_CP0_EPC          | 64
1978  MIPS  | KVM_REG_MIPS_CP0_PRID         | 32
1979  MIPS  | KVM_REG_MIPS_CP0_CONFIG       | 32
1980  MIPS  | KVM_REG_MIPS_CP0_CONFIG1      | 32
1981  MIPS  | KVM_REG_MIPS_CP0_CONFIG2      | 32
1982  MIPS  | KVM_REG_MIPS_CP0_CONFIG3      | 32
1983  MIPS  | KVM_REG_MIPS_CP0_CONFIG4      | 32
1984  MIPS  | KVM_REG_MIPS_CP0_CONFIG5      | 32
1985  MIPS  | KVM_REG_MIPS_CP0_CONFIG7      | 32
1986  MIPS  | KVM_REG_MIPS_CP0_ERROREPC     | 64
1987  MIPS  | KVM_REG_MIPS_COUNT_CTL        | 64
1988  MIPS  | KVM_REG_MIPS_COUNT_RESUME     | 64
1989  MIPS  | KVM_REG_MIPS_COUNT_HZ         | 64
1990  MIPS  | KVM_REG_MIPS_FPR_32(0..31)    | 32
1991  MIPS  | KVM_REG_MIPS_FPR_64(0..31)    | 64
1992  MIPS  | KVM_REG_MIPS_VEC_128(0..31)   | 128
1993  MIPS  | KVM_REG_MIPS_FCR_IR           | 32
1994  MIPS  | KVM_REG_MIPS_FCR_CSR          | 32
1995  MIPS  | KVM_REG_MIPS_MSA_IR           | 32
1996  MIPS  | KVM_REG_MIPS_MSA_CSR          | 32
1997
1998ARM registers are mapped using the lower 32 bits.  The upper 16 of that
1999is the register group type, or coprocessor number:
2000
2001ARM core registers have the following id bit patterns:
2002  0x4020 0000 0010 <index into the kvm_regs struct:16>
2003
2004ARM 32-bit CP15 registers have the following id bit patterns:
2005  0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3>
2006
2007ARM 64-bit CP15 registers have the following id bit patterns:
2008  0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3>
2009
2010ARM CCSIDR registers are demultiplexed by CSSELR value:
2011  0x4020 0000 0011 00 <csselr:8>
2012
2013ARM 32-bit VFP control registers have the following id bit patterns:
2014  0x4020 0000 0012 1 <regno:12>
2015
2016ARM 64-bit FP registers have the following id bit patterns:
2017  0x4030 0000 0012 0 <regno:12>
2018
2019
2020arm64 registers are mapped using the lower 32 bits. The upper 16 of
2021that is the register group type, or coprocessor number:
2022
2023arm64 core/FP-SIMD registers have the following id bit patterns. Note
2024that the size of the access is variable, as the kvm_regs structure
2025contains elements ranging from 32 to 128 bits. The index is a 32bit
2026value in the kvm_regs structure seen as a 32bit array.
2027  0x60x0 0000 0010 <index into the kvm_regs struct:16>
2028
2029arm64 CCSIDR registers are demultiplexed by CSSELR value:
2030  0x6020 0000 0011 00 <csselr:8>
2031
2032arm64 system registers have the following id bit patterns:
2033  0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3>
2034
2035
2036MIPS registers are mapped using the lower 32 bits.  The upper 16 of that is
2037the register group type:
2038
2039MIPS core registers (see above) have the following id bit patterns:
2040  0x7030 0000 0000 <reg:16>
2041
2042MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit
2043patterns depending on whether they're 32-bit or 64-bit registers:
2044  0x7020 0000 0001 00 <reg:5> <sel:3>   (32-bit)
2045  0x7030 0000 0001 00 <reg:5> <sel:3>   (64-bit)
2046
2047MIPS KVM control registers (see above) have the following id bit patterns:
2048  0x7030 0000 0002 <reg:16>
2049
2050MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following
2051id bit patterns depending on the size of the register being accessed. They are
2052always accessed according to the current guest FPU mode (Status.FR and
2053Config5.FRE), i.e. as the guest would see them, and they become unpredictable
2054if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector
2055registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they
2056overlap the FPU registers:
2057  0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
2058  0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
2059  0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
2060
2061MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the
2062following id bit patterns:
2063  0x7020 0000 0003 01 <0:3> <reg:5>
2064
2065MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the
2066following id bit patterns:
2067  0x7020 0000 0003 02 <0:3> <reg:5>
2068
2069
20704.69 KVM_GET_ONE_REG
2071
2072Capability: KVM_CAP_ONE_REG
2073Architectures: all
2074Type: vcpu ioctl
2075Parameters: struct kvm_one_reg (in and out)
2076Returns: 0 on success, negative value on failure
2077
2078This ioctl allows to receive the value of a single register implemented
2079in a vcpu. The register to read is indicated by the "id" field of the
2080kvm_one_reg struct passed in. On success, the register value can be found
2081at the memory location pointed to by "addr".
2082
2083The list of registers accessible using this interface is identical to the
2084list in 4.68.
2085
2086
20874.70 KVM_KVMCLOCK_CTRL
2088
2089Capability: KVM_CAP_KVMCLOCK_CTRL
2090Architectures: Any that implement pvclocks (currently x86 only)
2091Type: vcpu ioctl
2092Parameters: None
2093Returns: 0 on success, -1 on error
2094
2095This signals to the host kernel that the specified guest is being paused by
2096userspace.  The host will set a flag in the pvclock structure that is checked
2097from the soft lockup watchdog.  The flag is part of the pvclock structure that
2098is shared between guest and host, specifically the second bit of the flags
2099field of the pvclock_vcpu_time_info structure.  It will be set exclusively by
2100the host and read/cleared exclusively by the guest.  The guest operation of
2101checking and clearing the flag must an atomic operation so
2102load-link/store-conditional, or equivalent must be used.  There are two cases
2103where the guest will clear the flag: when the soft lockup watchdog timer resets
2104itself or when a soft lockup is detected.  This ioctl can be called any time
2105after pausing the vcpu, but before it is resumed.
2106
2107
21084.71 KVM_SIGNAL_MSI
2109
2110Capability: KVM_CAP_SIGNAL_MSI
2111Architectures: x86
2112Type: vm ioctl
2113Parameters: struct kvm_msi (in)
2114Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
2115
2116Directly inject a MSI message. Only valid with in-kernel irqchip that handles
2117MSI messages.
2118
2119struct kvm_msi {
2120	__u32 address_lo;
2121	__u32 address_hi;
2122	__u32 data;
2123	__u32 flags;
2124	__u8  pad[16];
2125};
2126
2127No flags are defined so far. The corresponding field must be 0.
2128
2129
21304.71 KVM_CREATE_PIT2
2131
2132Capability: KVM_CAP_PIT2
2133Architectures: x86
2134Type: vm ioctl
2135Parameters: struct kvm_pit_config (in)
2136Returns: 0 on success, -1 on error
2137
2138Creates an in-kernel device model for the i8254 PIT. This call is only valid
2139after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
2140parameters have to be passed:
2141
2142struct kvm_pit_config {
2143	__u32 flags;
2144	__u32 pad[15];
2145};
2146
2147Valid flags are:
2148
2149#define KVM_PIT_SPEAKER_DUMMY     1 /* emulate speaker port stub */
2150
2151PIT timer interrupts may use a per-VM kernel thread for injection. If it
2152exists, this thread will have a name of the following pattern:
2153
2154kvm-pit/<owner-process-pid>
2155
2156When running a guest with elevated priorities, the scheduling parameters of
2157this thread may have to be adjusted accordingly.
2158
2159This IOCTL replaces the obsolete KVM_CREATE_PIT.
2160
2161
21624.72 KVM_GET_PIT2
2163
2164Capability: KVM_CAP_PIT_STATE2
2165Architectures: x86
2166Type: vm ioctl
2167Parameters: struct kvm_pit_state2 (out)
2168Returns: 0 on success, -1 on error
2169
2170Retrieves the state of the in-kernel PIT model. Only valid after
2171KVM_CREATE_PIT2. The state is returned in the following structure:
2172
2173struct kvm_pit_state2 {
2174	struct kvm_pit_channel_state channels[3];
2175	__u32 flags;
2176	__u32 reserved[9];
2177};
2178
2179Valid flags are:
2180
2181/* disable PIT in HPET legacy mode */
2182#define KVM_PIT_FLAGS_HPET_LEGACY  0x00000001
2183
2184This IOCTL replaces the obsolete KVM_GET_PIT.
2185
2186
21874.73 KVM_SET_PIT2
2188
2189Capability: KVM_CAP_PIT_STATE2
2190Architectures: x86
2191Type: vm ioctl
2192Parameters: struct kvm_pit_state2 (in)
2193Returns: 0 on success, -1 on error
2194
2195Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
2196See KVM_GET_PIT2 for details on struct kvm_pit_state2.
2197
2198This IOCTL replaces the obsolete KVM_SET_PIT.
2199
2200
22014.74 KVM_PPC_GET_SMMU_INFO
2202
2203Capability: KVM_CAP_PPC_GET_SMMU_INFO
2204Architectures: powerpc
2205Type: vm ioctl
2206Parameters: None
2207Returns: 0 on success, -1 on error
2208
2209This populates and returns a structure describing the features of
2210the "Server" class MMU emulation supported by KVM.
2211This can in turn be used by userspace to generate the appropriate
2212device-tree properties for the guest operating system.
2213
2214The structure contains some global information, followed by an
2215array of supported segment page sizes:
2216
2217      struct kvm_ppc_smmu_info {
2218	     __u64 flags;
2219	     __u32 slb_size;
2220	     __u32 pad;
2221	     struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
2222      };
2223
2224The supported flags are:
2225
2226    - KVM_PPC_PAGE_SIZES_REAL:
2227        When that flag is set, guest page sizes must "fit" the backing
2228        store page sizes. When not set, any page size in the list can
2229        be used regardless of how they are backed by userspace.
2230
2231    - KVM_PPC_1T_SEGMENTS
2232        The emulated MMU supports 1T segments in addition to the
2233        standard 256M ones.
2234
2235The "slb_size" field indicates how many SLB entries are supported
2236
2237The "sps" array contains 8 entries indicating the supported base
2238page sizes for a segment in increasing order. Each entry is defined
2239as follow:
2240
2241   struct kvm_ppc_one_seg_page_size {
2242	__u32 page_shift;	/* Base page shift of segment (or 0) */
2243	__u32 slb_enc;		/* SLB encoding for BookS */
2244	struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ];
2245   };
2246
2247An entry with a "page_shift" of 0 is unused. Because the array is
2248organized in increasing order, a lookup can stop when encoutering
2249such an entry.
2250
2251The "slb_enc" field provides the encoding to use in the SLB for the
2252page size. The bits are in positions such as the value can directly
2253be OR'ed into the "vsid" argument of the slbmte instruction.
2254
2255The "enc" array is a list which for each of those segment base page
2256size provides the list of supported actual page sizes (which can be
2257only larger or equal to the base page size), along with the
2258corresponding encoding in the hash PTE. Similarly, the array is
22598 entries sorted by increasing sizes and an entry with a "0" shift
2260is an empty entry and a terminator:
2261
2262   struct kvm_ppc_one_page_size {
2263	__u32 page_shift;	/* Page shift (or 0) */
2264	__u32 pte_enc;		/* Encoding in the HPTE (>>12) */
2265   };
2266
2267The "pte_enc" field provides a value that can OR'ed into the hash
2268PTE's RPN field (ie, it needs to be shifted left by 12 to OR it
2269into the hash PTE second double word).
2270
22714.75 KVM_IRQFD
2272
2273Capability: KVM_CAP_IRQFD
2274Architectures: x86 s390 arm arm64
2275Type: vm ioctl
2276Parameters: struct kvm_irqfd (in)
2277Returns: 0 on success, -1 on error
2278
2279Allows setting an eventfd to directly trigger a guest interrupt.
2280kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
2281kvm_irqfd.gsi specifies the irqchip pin toggled by this event.  When
2282an event is triggered on the eventfd, an interrupt is injected into
2283the guest using the specified gsi pin.  The irqfd is removed using
2284the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
2285and kvm_irqfd.gsi.
2286
2287With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
2288mechanism allowing emulation of level-triggered, irqfd-based
2289interrupts.  When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an
2290additional eventfd in the kvm_irqfd.resamplefd field.  When operating
2291in resample mode, posting of an interrupt through kvm_irq.fd asserts
2292the specified gsi in the irqchip.  When the irqchip is resampled, such
2293as from an EOI, the gsi is de-asserted and the user is notified via
2294kvm_irqfd.resamplefd.  It is the user's responsibility to re-queue
2295the interrupt if the device making use of it still requires service.
2296Note that closing the resamplefd is not sufficient to disable the
2297irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
2298and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
2299
2300On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared
2301Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is
2302given by gsi + 32.
2303
23044.76 KVM_PPC_ALLOCATE_HTAB
2305
2306Capability: KVM_CAP_PPC_ALLOC_HTAB
2307Architectures: powerpc
2308Type: vm ioctl
2309Parameters: Pointer to u32 containing hash table order (in/out)
2310Returns: 0 on success, -1 on error
2311
2312This requests the host kernel to allocate an MMU hash table for a
2313guest using the PAPR paravirtualization interface.  This only does
2314anything if the kernel is configured to use the Book 3S HV style of
2315virtualization.  Otherwise the capability doesn't exist and the ioctl
2316returns an ENOTTY error.  The rest of this description assumes Book 3S
2317HV.
2318
2319There must be no vcpus running when this ioctl is called; if there
2320are, it will do nothing and return an EBUSY error.
2321
2322The parameter is a pointer to a 32-bit unsigned integer variable
2323containing the order (log base 2) of the desired size of the hash
2324table, which must be between 18 and 46.  On successful return from the
2325ioctl, it will have been updated with the order of the hash table that
2326was allocated.
2327
2328If no hash table has been allocated when any vcpu is asked to run
2329(with the KVM_RUN ioctl), the host kernel will allocate a
2330default-sized hash table (16 MB).
2331
2332If this ioctl is called when a hash table has already been allocated,
2333the kernel will clear out the existing hash table (zero all HPTEs) and
2334return the hash table order in the parameter.  (If the guest is using
2335the virtualized real-mode area (VRMA) facility, the kernel will
2336re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
2337
23384.77 KVM_S390_INTERRUPT
2339
2340Capability: basic
2341Architectures: s390
2342Type: vm ioctl, vcpu ioctl
2343Parameters: struct kvm_s390_interrupt (in)
2344Returns: 0 on success, -1 on error
2345
2346Allows to inject an interrupt to the guest. Interrupts can be floating
2347(vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
2348
2349Interrupt parameters are passed via kvm_s390_interrupt:
2350
2351struct kvm_s390_interrupt {
2352	__u32 type;
2353	__u32 parm;
2354	__u64 parm64;
2355};
2356
2357type can be one of the following:
2358
2359KVM_S390_SIGP_STOP (vcpu) - sigp stop; optional flags in parm
2360KVM_S390_PROGRAM_INT (vcpu) - program check; code in parm
2361KVM_S390_SIGP_SET_PREFIX (vcpu) - sigp set prefix; prefix address in parm
2362KVM_S390_RESTART (vcpu) - restart
2363KVM_S390_INT_CLOCK_COMP (vcpu) - clock comparator interrupt
2364KVM_S390_INT_CPU_TIMER (vcpu) - CPU timer interrupt
2365KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt
2366			   parameters in parm and parm64
2367KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm
2368KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm
2369KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm
2370KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an
2371    I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
2372    I/O interruption parameters in parm (subchannel) and parm64 (intparm,
2373    interruption subclass)
2374KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm,
2375                           machine check interrupt code in parm64 (note that
2376                           machine checks needing further payload are not
2377                           supported by this ioctl)
2378
2379Note that the vcpu ioctl is asynchronous to vcpu execution.
2380
23814.78 KVM_PPC_GET_HTAB_FD
2382
2383Capability: KVM_CAP_PPC_HTAB_FD
2384Architectures: powerpc
2385Type: vm ioctl
2386Parameters: Pointer to struct kvm_get_htab_fd (in)
2387Returns: file descriptor number (>= 0) on success, -1 on error
2388
2389This returns a file descriptor that can be used either to read out the
2390entries in the guest's hashed page table (HPT), or to write entries to
2391initialize the HPT.  The returned fd can only be written to if the
2392KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and
2393can only be read if that bit is clear.  The argument struct looks like
2394this:
2395
2396/* For KVM_PPC_GET_HTAB_FD */
2397struct kvm_get_htab_fd {
2398	__u64	flags;
2399	__u64	start_index;
2400	__u64	reserved[2];
2401};
2402
2403/* Values for kvm_get_htab_fd.flags */
2404#define KVM_GET_HTAB_BOLTED_ONLY	((__u64)0x1)
2405#define KVM_GET_HTAB_WRITE		((__u64)0x2)
2406
2407The `start_index' field gives the index in the HPT of the entry at
2408which to start reading.  It is ignored when writing.
2409
2410Reads on the fd will initially supply information about all
2411"interesting" HPT entries.  Interesting entries are those with the
2412bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise
2413all entries.  When the end of the HPT is reached, the read() will
2414return.  If read() is called again on the fd, it will start again from
2415the beginning of the HPT, but will only return HPT entries that have
2416changed since they were last read.
2417
2418Data read or written is structured as a header (8 bytes) followed by a
2419series of valid HPT entries (16 bytes) each.  The header indicates how
2420many valid HPT entries there are and how many invalid entries follow
2421the valid entries.  The invalid entries are not represented explicitly
2422in the stream.  The header format is:
2423
2424struct kvm_get_htab_header {
2425	__u32	index;
2426	__u16	n_valid;
2427	__u16	n_invalid;
2428};
2429
2430Writes to the fd create HPT entries starting at the index given in the
2431header; first `n_valid' valid entries with contents from the data
2432written, then `n_invalid' invalid entries, invalidating any previously
2433valid entries found.
2434
24354.79 KVM_CREATE_DEVICE
2436
2437Capability: KVM_CAP_DEVICE_CTRL
2438Type: vm ioctl
2439Parameters: struct kvm_create_device (in/out)
2440Returns: 0 on success, -1 on error
2441Errors:
2442  ENODEV: The device type is unknown or unsupported
2443  EEXIST: Device already created, and this type of device may not
2444          be instantiated multiple times
2445
2446  Other error conditions may be defined by individual device types or
2447  have their standard meanings.
2448
2449Creates an emulated device in the kernel.  The file descriptor returned
2450in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR.
2451
2452If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the
2453device type is supported (not necessarily whether it can be created
2454in the current vm).
2455
2456Individual devices should not define flags.  Attributes should be used
2457for specifying any behavior that is not implied by the device type
2458number.
2459
2460struct kvm_create_device {
2461	__u32	type;	/* in: KVM_DEV_TYPE_xxx */
2462	__u32	fd;	/* out: device handle */
2463	__u32	flags;	/* in: KVM_CREATE_DEVICE_xxx */
2464};
2465
24664.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR
2467
2468Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
2469Type: device ioctl, vm ioctl
2470Parameters: struct kvm_device_attr
2471Returns: 0 on success, -1 on error
2472Errors:
2473  ENXIO:  The group or attribute is unknown/unsupported for this device
2474  EPERM:  The attribute cannot (currently) be accessed this way
2475          (e.g. read-only attribute, or attribute that only makes
2476          sense when the device is in a different state)
2477
2478  Other error conditions may be defined by individual device types.
2479
2480Gets/sets a specified piece of device configuration and/or state.  The
2481semantics are device-specific.  See individual device documentation in
2482the "devices" directory.  As with ONE_REG, the size of the data
2483transferred is defined by the particular attribute.
2484
2485struct kvm_device_attr {
2486	__u32	flags;		/* no flags currently defined */
2487	__u32	group;		/* device-defined */
2488	__u64	attr;		/* group-defined */
2489	__u64	addr;		/* userspace address of attr data */
2490};
2491
24924.81 KVM_HAS_DEVICE_ATTR
2493
2494Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device
2495Type: device ioctl, vm ioctl
2496Parameters: struct kvm_device_attr
2497Returns: 0 on success, -1 on error
2498Errors:
2499  ENXIO:  The group or attribute is unknown/unsupported for this device
2500
2501Tests whether a device supports a particular attribute.  A successful
2502return indicates the attribute is implemented.  It does not necessarily
2503indicate that the attribute can be read or written in the device's
2504current state.  "addr" is ignored.
2505
25064.82 KVM_ARM_VCPU_INIT
2507
2508Capability: basic
2509Architectures: arm, arm64
2510Type: vcpu ioctl
2511Parameters: struct kvm_vcpu_init (in)
2512Returns: 0 on success; -1 on error
2513Errors:
2514  EINVAL:    the target is unknown, or the combination of features is invalid.
2515  ENOENT:    a features bit specified is unknown.
2516
2517This tells KVM what type of CPU to present to the guest, and what
2518optional features it should have.  This will cause a reset of the cpu
2519registers to their initial values.  If this is not called, KVM_RUN will
2520return ENOEXEC for that vcpu.
2521
2522Note that because some registers reflect machine topology, all vcpus
2523should be created before this ioctl is invoked.
2524
2525Userspace can call this function multiple times for a given vcpu, including
2526after the vcpu has been run. This will reset the vcpu to its initial
2527state. All calls to this function after the initial call must use the same
2528target and same set of feature flags, otherwise EINVAL will be returned.
2529
2530Possible features:
2531	- KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
2532	  Depends on KVM_CAP_ARM_PSCI.  If not set, the CPU will be powered on
2533	  and execute guest code when KVM_RUN is called.
2534	- KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
2535	  Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
2536	- KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU.
2537	  Depends on KVM_CAP_ARM_PSCI_0_2.
2538
2539
25404.83 KVM_ARM_PREFERRED_TARGET
2541
2542Capability: basic
2543Architectures: arm, arm64
2544Type: vm ioctl
2545Parameters: struct struct kvm_vcpu_init (out)
2546Returns: 0 on success; -1 on error
2547Errors:
2548  ENODEV:    no preferred target available for the host
2549
2550This queries KVM for preferred CPU target type which can be emulated
2551by KVM on underlying host.
2552
2553The ioctl returns struct kvm_vcpu_init instance containing information
2554about preferred CPU target type and recommended features for it.  The
2555kvm_vcpu_init->features bitmap returned will have feature bits set if
2556the preferred target recommends setting these features, but this is
2557not mandatory.
2558
2559The information returned by this ioctl can be used to prepare an instance
2560of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in
2561in VCPU matching underlying host.
2562
2563
25644.84 KVM_GET_REG_LIST
2565
2566Capability: basic
2567Architectures: arm, arm64, mips
2568Type: vcpu ioctl
2569Parameters: struct kvm_reg_list (in/out)
2570Returns: 0 on success; -1 on error
2571Errors:
2572  E2BIG:     the reg index list is too big to fit in the array specified by
2573             the user (the number required will be written into n).
2574
2575struct kvm_reg_list {
2576	__u64 n; /* number of registers in reg[] */
2577	__u64 reg[0];
2578};
2579
2580This ioctl returns the guest registers that are supported for the
2581KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
2582
2583
25844.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
2585
2586Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
2587Architectures: arm, arm64
2588Type: vm ioctl
2589Parameters: struct kvm_arm_device_address (in)
2590Returns: 0 on success, -1 on error
2591Errors:
2592  ENODEV: The device id is unknown
2593  ENXIO:  Device not supported on current system
2594  EEXIST: Address already set
2595  E2BIG:  Address outside guest physical address space
2596  EBUSY:  Address overlaps with other device range
2597
2598struct kvm_arm_device_addr {
2599	__u64 id;
2600	__u64 addr;
2601};
2602
2603Specify a device address in the guest's physical address space where guests
2604can access emulated or directly exposed devices, which the host kernel needs
2605to know about. The id field is an architecture specific identifier for a
2606specific device.
2607
2608ARM/arm64 divides the id field into two parts, a device id and an
2609address type id specific to the individual device.
2610
2611  bits:  | 63        ...       32 | 31    ...    16 | 15    ...    0 |
2612  field: |        0x00000000      |     device id   |  addr type id  |
2613
2614ARM/arm64 currently only require this when using the in-kernel GIC
2615support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2
2616as the device id.  When setting the base address for the guest's
2617mapping of the VGIC virtual CPU and distributor interface, the ioctl
2618must be called after calling KVM_CREATE_IRQCHIP, but before calling
2619KVM_RUN on any of the VCPUs.  Calling this ioctl twice for any of the
2620base addresses will return -EEXIST.
2621
2622Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API
2623should be used instead.
2624
2625
26264.86 KVM_PPC_RTAS_DEFINE_TOKEN
2627
2628Capability: KVM_CAP_PPC_RTAS
2629Architectures: ppc
2630Type: vm ioctl
2631Parameters: struct kvm_rtas_token_args
2632Returns: 0 on success, -1 on error
2633
2634Defines a token value for a RTAS (Run Time Abstraction Services)
2635service in order to allow it to be handled in the kernel.  The
2636argument struct gives the name of the service, which must be the name
2637of a service that has a kernel-side implementation.  If the token
2638value is non-zero, it will be associated with that service, and
2639subsequent RTAS calls by the guest specifying that token will be
2640handled by the kernel.  If the token value is 0, then any token
2641associated with the service will be forgotten, and subsequent RTAS
2642calls by the guest for that service will be passed to userspace to be
2643handled.
2644
26454.87 KVM_SET_GUEST_DEBUG
2646
2647Capability: KVM_CAP_SET_GUEST_DEBUG
2648Architectures: x86, s390, ppc
2649Type: vcpu ioctl
2650Parameters: struct kvm_guest_debug (in)
2651Returns: 0 on success; -1 on error
2652
2653struct kvm_guest_debug {
2654       __u32 control;
2655       __u32 pad;
2656       struct kvm_guest_debug_arch arch;
2657};
2658
2659Set up the processor specific debug registers and configure vcpu for
2660handling guest debug events. There are two parts to the structure, the
2661first a control bitfield indicates the type of debug events to handle
2662when running. Common control bits are:
2663
2664  - KVM_GUESTDBG_ENABLE:        guest debugging is enabled
2665  - KVM_GUESTDBG_SINGLESTEP:    the next run should single-step
2666
2667The top 16 bits of the control field are architecture specific control
2668flags which can include the following:
2669
2670  - KVM_GUESTDBG_USE_SW_BP:     using software breakpoints [x86]
2671  - KVM_GUESTDBG_USE_HW_BP:     using hardware breakpoints [x86, s390]
2672  - KVM_GUESTDBG_INJECT_DB:     inject DB type exception [x86]
2673  - KVM_GUESTDBG_INJECT_BP:     inject BP type exception [x86]
2674  - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
2675
2676For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints
2677are enabled in memory so we need to ensure breakpoint exceptions are
2678correctly trapped and the KVM run loop exits at the breakpoint and not
2679running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP
2680we need to ensure the guest vCPUs architecture specific registers are
2681updated to the correct (supplied) values.
2682
2683The second part of the structure is architecture specific and
2684typically contains a set of debug registers.
2685
2686When debug events exit the main run loop with the reason
2687KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
2688structure containing architecture specific debug information.
2689
26904.88 KVM_GET_EMULATED_CPUID
2691
2692Capability: KVM_CAP_EXT_EMUL_CPUID
2693Architectures: x86
2694Type: system ioctl
2695Parameters: struct kvm_cpuid2 (in/out)
2696Returns: 0 on success, -1 on error
2697
2698struct kvm_cpuid2 {
2699	__u32 nent;
2700	__u32 flags;
2701	struct kvm_cpuid_entry2 entries[0];
2702};
2703
2704The member 'flags' is used for passing flags from userspace.
2705
2706#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX		BIT(0)
2707#define KVM_CPUID_FLAG_STATEFUL_FUNC		BIT(1)
2708#define KVM_CPUID_FLAG_STATE_READ_NEXT		BIT(2)
2709
2710struct kvm_cpuid_entry2 {
2711	__u32 function;
2712	__u32 index;
2713	__u32 flags;
2714	__u32 eax;
2715	__u32 ebx;
2716	__u32 ecx;
2717	__u32 edx;
2718	__u32 padding[3];
2719};
2720
2721This ioctl returns x86 cpuid features which are emulated by
2722kvm.Userspace can use the information returned by this ioctl to query
2723which features are emulated by kvm instead of being present natively.
2724
2725Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
2726structure with the 'nent' field indicating the number of entries in
2727the variable-size array 'entries'. If the number of entries is too low
2728to describe the cpu capabilities, an error (E2BIG) is returned. If the
2729number is too high, the 'nent' field is adjusted and an error (ENOMEM)
2730is returned. If the number is just right, the 'nent' field is adjusted
2731to the number of valid entries in the 'entries' array, which is then
2732filled.
2733
2734The entries returned are the set CPUID bits of the respective features
2735which kvm emulates, as returned by the CPUID instruction, with unknown
2736or unsupported feature bits cleared.
2737
2738Features like x2apic, for example, may not be present in the host cpu
2739but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
2740emulated efficiently and thus not included here.
2741
2742The fields in each entry are defined as follows:
2743
2744  function: the eax value used to obtain the entry
2745  index: the ecx value used to obtain the entry (for entries that are
2746         affected by ecx)
2747  flags: an OR of zero or more of the following:
2748        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
2749           if the index field is valid
2750        KVM_CPUID_FLAG_STATEFUL_FUNC:
2751           if cpuid for this function returns different values for successive
2752           invocations; there will be several entries with the same function,
2753           all with this flag set
2754        KVM_CPUID_FLAG_STATE_READ_NEXT:
2755           for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
2756           the first entry to be read by a cpu
2757   eax, ebx, ecx, edx: the values returned by the cpuid instruction for
2758         this function/index combination
2759
27604.89 KVM_S390_MEM_OP
2761
2762Capability: KVM_CAP_S390_MEM_OP
2763Architectures: s390
2764Type: vcpu ioctl
2765Parameters: struct kvm_s390_mem_op (in)
2766Returns: = 0 on success,
2767         < 0 on generic error (e.g. -EFAULT or -ENOMEM),
2768         > 0 if an exception occurred while walking the page tables
2769
2770Read or write data from/to the logical (virtual) memory of a VPCU.
2771
2772Parameters are specified via the following structure:
2773
2774struct kvm_s390_mem_op {
2775	__u64 gaddr;		/* the guest address */
2776	__u64 flags;		/* flags */
2777	__u32 size;		/* amount of bytes */
2778	__u32 op;		/* type of operation */
2779	__u64 buf;		/* buffer in userspace */
2780	__u8 ar;		/* the access register number */
2781	__u8 reserved[31];	/* should be set to 0 */
2782};
2783
2784The type of operation is specified in the "op" field. It is either
2785KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
2786KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
2787KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the "flags" field to check
2788whether the corresponding memory access would create an access exception
2789(without touching the data in the memory at the destination). In case an
2790access exception occurred while walking the MMU tables of the guest, the
2791ioctl returns a positive error number to indicate the type of exception.
2792This exception is also raised directly at the corresponding VCPU if the
2793flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.
2794
2795The start address of the memory region has to be specified in the "gaddr"
2796field, and the length of the region in the "size" field. "buf" is the buffer
2797supplied by the userspace application where the read data should be written
2798to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
2799is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL
2800when KVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access
2801register number to be used.
2802
2803The "reserved" field is meant for future extensions. It is not used by
2804KVM with the currently defined set of flags.
2805
28064.90 KVM_S390_GET_SKEYS
2807
2808Capability: KVM_CAP_S390_SKEYS
2809Architectures: s390
2810Type: vm ioctl
2811Parameters: struct kvm_s390_skeys
2812Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
2813         keys, negative value on error
2814
2815This ioctl is used to get guest storage key values on the s390
2816architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
2817
2818struct kvm_s390_skeys {
2819	__u64 start_gfn;
2820	__u64 count;
2821	__u64 skeydata_addr;
2822	__u32 flags;
2823	__u32 reserved[9];
2824};
2825
2826The start_gfn field is the number of the first guest frame whose storage keys
2827you want to get.
2828
2829The count field is the number of consecutive frames (starting from start_gfn)
2830whose storage keys to get. The count field must be at least 1 and the maximum
2831allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
2832will cause the ioctl to return -EINVAL.
2833
2834The skeydata_addr field is the address to a buffer large enough to hold count
2835bytes. This buffer will be filled with storage key data by the ioctl.
2836
28374.91 KVM_S390_SET_SKEYS
2838
2839Capability: KVM_CAP_S390_SKEYS
2840Architectures: s390
2841Type: vm ioctl
2842Parameters: struct kvm_s390_skeys
2843Returns: 0 on success, negative value on error
2844
2845This ioctl is used to set guest storage key values on the s390
2846architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
2847See section on KVM_S390_GET_SKEYS for struct definition.
2848
2849The start_gfn field is the number of the first guest frame whose storage keys
2850you want to set.
2851
2852The count field is the number of consecutive frames (starting from start_gfn)
2853whose storage keys to get. The count field must be at least 1 and the maximum
2854allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
2855will cause the ioctl to return -EINVAL.
2856
2857The skeydata_addr field is the address to a buffer containing count bytes of
2858storage keys. Each byte in the buffer will be set as the storage key for a
2859single frame starting at start_gfn for count frames.
2860
2861Note: If any architecturally invalid key value is found in the given data then
2862the ioctl will return -EINVAL.
2863
28644.92 KVM_S390_IRQ
2865
2866Capability: KVM_CAP_S390_INJECT_IRQ
2867Architectures: s390
2868Type: vcpu ioctl
2869Parameters: struct kvm_s390_irq (in)
2870Returns: 0 on success, -1 on error
2871Errors:
2872  EINVAL: interrupt type is invalid
2873          type is KVM_S390_SIGP_STOP and flag parameter is invalid value
2874          type is KVM_S390_INT_EXTERNAL_CALL and code is bigger
2875            than the maximum of VCPUs
2876  EBUSY:  type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped
2877          type is KVM_S390_SIGP_STOP and a stop irq is already pending
2878          type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt
2879            is already pending
2880
2881Allows to inject an interrupt to the guest.
2882
2883Using struct kvm_s390_irq as a parameter allows
2884to inject additional payload which is not
2885possible via KVM_S390_INTERRUPT.
2886
2887Interrupt parameters are passed via kvm_s390_irq:
2888
2889struct kvm_s390_irq {
2890	__u64 type;
2891	union {
2892		struct kvm_s390_io_info io;
2893		struct kvm_s390_ext_info ext;
2894		struct kvm_s390_pgm_info pgm;
2895		struct kvm_s390_emerg_info emerg;
2896		struct kvm_s390_extcall_info extcall;
2897		struct kvm_s390_prefix_info prefix;
2898		struct kvm_s390_stop_info stop;
2899		struct kvm_s390_mchk_info mchk;
2900		char reserved[64];
2901	} u;
2902};
2903
2904type can be one of the following:
2905
2906KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
2907KVM_S390_PROGRAM_INT - program check; parameters in .pgm
2908KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
2909KVM_S390_RESTART - restart; no parameters
2910KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
2911KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
2912KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
2913KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
2914KVM_S390_MCHK - machine check interrupt; parameters in .mchk
2915
2916
2917Note that the vcpu ioctl is asynchronous to vcpu execution.
2918
29194.94 KVM_S390_GET_IRQ_STATE
2920
2921Capability: KVM_CAP_S390_IRQ_STATE
2922Architectures: s390
2923Type: vcpu ioctl
2924Parameters: struct kvm_s390_irq_state (out)
2925Returns: >= number of bytes copied into buffer,
2926         -EINVAL if buffer size is 0,
2927         -ENOBUFS if buffer size is too small to fit all pending interrupts,
2928         -EFAULT if the buffer address was invalid
2929
2930This ioctl allows userspace to retrieve the complete state of all currently
2931pending interrupts in a single buffer. Use cases include migration
2932and introspection. The parameter structure contains the address of a
2933userspace buffer and its length:
2934
2935struct kvm_s390_irq_state {
2936	__u64 buf;
2937	__u32 flags;
2938	__u32 len;
2939	__u32 reserved[4];
2940};
2941
2942Userspace passes in the above struct and for each pending interrupt a
2943struct kvm_s390_irq is copied to the provided buffer.
2944
2945If -ENOBUFS is returned the buffer provided was too small and userspace
2946may retry with a bigger buffer.
2947
29484.95 KVM_S390_SET_IRQ_STATE
2949
2950Capability: KVM_CAP_S390_IRQ_STATE
2951Architectures: s390
2952Type: vcpu ioctl
2953Parameters: struct kvm_s390_irq_state (in)
2954Returns: 0 on success,
2955         -EFAULT if the buffer address was invalid,
2956         -EINVAL for an invalid buffer length (see below),
2957         -EBUSY if there were already interrupts pending,
2958         errors occurring when actually injecting the
2959          interrupt. See KVM_S390_IRQ.
2960
2961This ioctl allows userspace to set the complete state of all cpu-local
2962interrupts currently pending for the vcpu. It is intended for restoring
2963interrupt state after a migration. The input parameter is a userspace buffer
2964containing a struct kvm_s390_irq_state:
2965
2966struct kvm_s390_irq_state {
2967	__u64 buf;
2968	__u32 len;
2969	__u32 pad;
2970};
2971
2972The userspace memory referenced by buf contains a struct kvm_s390_irq
2973for each interrupt to be injected into the guest.
2974If one of the interrupts could not be injected for some reason the
2975ioctl aborts.
2976
2977len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
2978and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
2979which is the maximum number of possibly pending cpu-local interrupts.
2980
29815. The kvm_run structure
2982------------------------
2983
2984Application code obtains a pointer to the kvm_run structure by
2985mmap()ing a vcpu fd.  From that point, application code can control
2986execution by changing fields in kvm_run prior to calling the KVM_RUN
2987ioctl, and obtain information about the reason KVM_RUN returned by
2988looking up structure members.
2989
2990struct kvm_run {
2991	/* in */
2992	__u8 request_interrupt_window;
2993
2994Request that KVM_RUN return when it becomes possible to inject external
2995interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
2996
2997	__u8 padding1[7];
2998
2999	/* out */
3000	__u32 exit_reason;
3001
3002When KVM_RUN has returned successfully (return value 0), this informs
3003application code why KVM_RUN has returned.  Allowable values for this
3004field are detailed below.
3005
3006	__u8 ready_for_interrupt_injection;
3007
3008If request_interrupt_window has been specified, this field indicates
3009an interrupt can be injected now with KVM_INTERRUPT.
3010
3011	__u8 if_flag;
3012
3013The value of the current interrupt flag.  Only valid if in-kernel
3014local APIC is not used.
3015
3016	__u8 padding2[2];
3017
3018	/* in (pre_kvm_run), out (post_kvm_run) */
3019	__u64 cr8;
3020
3021The value of the cr8 register.  Only valid if in-kernel local APIC is
3022not used.  Both input and output.
3023
3024	__u64 apic_base;
3025
3026The value of the APIC BASE msr.  Only valid if in-kernel local
3027APIC is not used.  Both input and output.
3028
3029	union {
3030		/* KVM_EXIT_UNKNOWN */
3031		struct {
3032			__u64 hardware_exit_reason;
3033		} hw;
3034
3035If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown
3036reasons.  Further architecture-specific information is available in
3037hardware_exit_reason.
3038
3039		/* KVM_EXIT_FAIL_ENTRY */
3040		struct {
3041			__u64 hardware_entry_failure_reason;
3042		} fail_entry;
3043
3044If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due
3045to unknown reasons.  Further architecture-specific information is
3046available in hardware_entry_failure_reason.
3047
3048		/* KVM_EXIT_EXCEPTION */
3049		struct {
3050			__u32 exception;
3051			__u32 error_code;
3052		} ex;
3053
3054Unused.
3055
3056		/* KVM_EXIT_IO */
3057		struct {
3058#define KVM_EXIT_IO_IN  0
3059#define KVM_EXIT_IO_OUT 1
3060			__u8 direction;
3061			__u8 size; /* bytes */
3062			__u16 port;
3063			__u32 count;
3064			__u64 data_offset; /* relative to kvm_run start */
3065		} io;
3066
3067If exit_reason is KVM_EXIT_IO, then the vcpu has
3068executed a port I/O instruction which could not be satisfied by kvm.
3069data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
3070where kvm expects application code to place the data for the next
3071KVM_RUN invocation (KVM_EXIT_IO_IN).  Data format is a packed array.
3072
3073		struct {
3074			struct kvm_debug_exit_arch arch;
3075		} debug;
3076
3077Unused.
3078
3079		/* KVM_EXIT_MMIO */
3080		struct {
3081			__u64 phys_addr;
3082			__u8  data[8];
3083			__u32 len;
3084			__u8  is_write;
3085		} mmio;
3086
3087If exit_reason is KVM_EXIT_MMIO, then the vcpu has
3088executed a memory-mapped I/O instruction which could not be satisfied
3089by kvm.  The 'data' member contains the written data if 'is_write' is
3090true, and should be filled by application code otherwise.
3091
3092The 'data' member contains, in its first 'len' bytes, the value as it would
3093appear if the VCPU performed a load or store of the appropriate width directly
3094to the byte array.
3095
3096NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
3097      KVM_EXIT_EPR the corresponding
3098operations are complete (and guest state is consistent) only after userspace
3099has re-entered the kernel with KVM_RUN.  The kernel side will first finish
3100incomplete operations and then check for pending signals.  Userspace
3101can re-enter the guest with an unmasked signal pending to complete
3102pending operations.
3103
3104		/* KVM_EXIT_HYPERCALL */
3105		struct {
3106			__u64 nr;
3107			__u64 args[6];
3108			__u64 ret;
3109			__u32 longmode;
3110			__u32 pad;
3111		} hypercall;
3112
3113Unused.  This was once used for 'hypercall to userspace'.  To implement
3114such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).
3115Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO.
3116
3117		/* KVM_EXIT_TPR_ACCESS */
3118		struct {
3119			__u64 rip;
3120			__u32 is_write;
3121			__u32 pad;
3122		} tpr_access;
3123
3124To be documented (KVM_TPR_ACCESS_REPORTING).
3125
3126		/* KVM_EXIT_S390_SIEIC */
3127		struct {
3128			__u8 icptcode;
3129			__u64 mask; /* psw upper half */
3130			__u64 addr; /* psw lower half */
3131			__u16 ipa;
3132			__u32 ipb;
3133		} s390_sieic;
3134
3135s390 specific.
3136
3137		/* KVM_EXIT_S390_RESET */
3138#define KVM_S390_RESET_POR       1
3139#define KVM_S390_RESET_CLEAR     2
3140#define KVM_S390_RESET_SUBSYSTEM 4
3141#define KVM_S390_RESET_CPU_INIT  8
3142#define KVM_S390_RESET_IPL       16
3143		__u64 s390_reset_flags;
3144
3145s390 specific.
3146
3147		/* KVM_EXIT_S390_UCONTROL */
3148		struct {
3149			__u64 trans_exc_code;
3150			__u32 pgm_code;
3151		} s390_ucontrol;
3152
3153s390 specific. A page fault has occurred for a user controlled virtual
3154machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be
3155resolved by the kernel.
3156The program code and the translation exception code that were placed
3157in the cpu's lowcore are presented here as defined by the z Architecture
3158Principles of Operation Book in the Chapter for Dynamic Address Translation
3159(DAT)
3160
3161		/* KVM_EXIT_DCR */
3162		struct {
3163			__u32 dcrn;
3164			__u32 data;
3165			__u8  is_write;
3166		} dcr;
3167
3168Deprecated - was used for 440 KVM.
3169
3170		/* KVM_EXIT_OSI */
3171		struct {
3172			__u64 gprs[32];
3173		} osi;
3174
3175MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch
3176hypercalls and exit with this exit struct that contains all the guest gprs.
3177
3178If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall.
3179Userspace can now handle the hypercall and when it's done modify the gprs as
3180necessary. Upon guest entry all guest GPRs will then be replaced by the values
3181in this struct.
3182
3183		/* KVM_EXIT_PAPR_HCALL */
3184		struct {
3185			__u64 nr;
3186			__u64 ret;
3187			__u64 args[9];
3188		} papr_hcall;
3189
3190This is used on 64-bit PowerPC when emulating a pSeries partition,
3191e.g. with the 'pseries' machine type in qemu.  It occurs when the
3192guest does a hypercall using the 'sc 1' instruction.  The 'nr' field
3193contains the hypercall number (from the guest R3), and 'args' contains
3194the arguments (from the guest R4 - R12).  Userspace should put the
3195return code in 'ret' and any extra returned values in args[].
3196The possible hypercalls are defined in the Power Architecture Platform
3197Requirements (PAPR) document available from www.power.org (free
3198developer registration required to access it).
3199
3200		/* KVM_EXIT_S390_TSCH */
3201		struct {
3202			__u16 subchannel_id;
3203			__u16 subchannel_nr;
3204			__u32 io_int_parm;
3205			__u32 io_int_word;
3206			__u32 ipb;
3207			__u8 dequeued;
3208		} s390_tsch;
3209
3210s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled
3211and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O
3212interrupt for the target subchannel has been dequeued and subchannel_id,
3213subchannel_nr, io_int_parm and io_int_word contain the parameters for that
3214interrupt. ipb is needed for instruction parameter decoding.
3215
3216		/* KVM_EXIT_EPR */
3217		struct {
3218			__u32 epr;
3219		} epr;
3220
3221On FSL BookE PowerPC chips, the interrupt controller has a fast patch
3222interrupt acknowledge path to the core. When the core successfully
3223delivers an interrupt, it automatically populates the EPR register with
3224the interrupt vector number and acknowledges the interrupt inside
3225the interrupt controller.
3226
3227In case the interrupt controller lives in user space, we need to do
3228the interrupt acknowledge cycle through it to fetch the next to be
3229delivered interrupt vector using this exit.
3230
3231It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an
3232external interrupt has just been delivered into the guest. User space
3233should put the acknowledged interrupt vector into the 'epr' field.
3234
3235		/* KVM_EXIT_SYSTEM_EVENT */
3236		struct {
3237#define KVM_SYSTEM_EVENT_SHUTDOWN       1
3238#define KVM_SYSTEM_EVENT_RESET          2
3239			__u32 type;
3240			__u64 flags;
3241		} system_event;
3242
3243If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
3244a system-level event using some architecture specific mechanism (hypercall
3245or some special instruction). In case of ARM/ARM64, this is triggered using
3246HVC instruction based PSCI call from the vcpu. The 'type' field describes
3247the system-level event type. The 'flags' field describes architecture
3248specific flags for the system-level event.
3249
3250Valid values for 'type' are:
3251  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
3252   VM. Userspace is not obliged to honour this, and if it does honour
3253   this does not need to destroy the VM synchronously (ie it may call
3254   KVM_RUN again before shutdown finally occurs).
3255  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
3256   As with SHUTDOWN, userspace can choose to ignore the request, or
3257   to schedule the reset to occur in the future and may call KVM_RUN again.
3258
3259		/* Fix the size of the union. */
3260		char padding[256];
3261	};
3262
3263	/*
3264	 * shared registers between kvm and userspace.
3265	 * kvm_valid_regs specifies the register classes set by the host
3266	 * kvm_dirty_regs specified the register classes dirtied by userspace
3267	 * struct kvm_sync_regs is architecture specific, as well as the
3268	 * bits for kvm_valid_regs and kvm_dirty_regs
3269	 */
3270	__u64 kvm_valid_regs;
3271	__u64 kvm_dirty_regs;
3272	union {
3273		struct kvm_sync_regs regs;
3274		char padding[1024];
3275	} s;
3276
3277If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access
3278certain guest registers without having to call SET/GET_*REGS. Thus we can
3279avoid some system call overhead if userspace has to handle the exit.
3280Userspace can query the validity of the structure by checking
3281kvm_valid_regs for specific bits. These bits are architecture specific
3282and usually define the validity of a groups of registers. (e.g. one bit
3283 for general purpose registers)
3284
3285Please note that the kernel is allowed to use the kvm_run structure as the
3286primary storage for certain register types. Therefore, the kernel may use the
3287values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
3288
3289};
3290
3291
3292
32936. Capabilities that can be enabled on vCPUs
3294--------------------------------------------
3295
3296There are certain capabilities that change the behavior of the virtual CPU or
3297the virtual machine when enabled. To enable them, please see section 4.37.
3298Below you can find a list of capabilities and what their effect on the vCPU or
3299the virtual machine is when enabling them.
3300
3301The following information is provided along with the description:
3302
3303  Architectures: which instruction set architectures provide this ioctl.
3304      x86 includes both i386 and x86_64.
3305
3306  Target: whether this is a per-vcpu or per-vm capability.
3307
3308  Parameters: what parameters are accepted by the capability.
3309
3310  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
3311      are not detailed, but errors with specific meanings are.
3312
3313
33146.1 KVM_CAP_PPC_OSI
3315
3316Architectures: ppc
3317Target: vcpu
3318Parameters: none
3319Returns: 0 on success; -1 on error
3320
3321This capability enables interception of OSI hypercalls that otherwise would
3322be treated as normal system calls to be injected into the guest. OSI hypercalls
3323were invented by Mac-on-Linux to have a standardized communication mechanism
3324between the guest and the host.
3325
3326When this capability is enabled, KVM_EXIT_OSI can occur.
3327
3328
33296.2 KVM_CAP_PPC_PAPR
3330
3331Architectures: ppc
3332Target: vcpu
3333Parameters: none
3334Returns: 0 on success; -1 on error
3335
3336This capability enables interception of PAPR hypercalls. PAPR hypercalls are
3337done using the hypercall instruction "sc 1".
3338
3339It also sets the guest privilege level to "supervisor" mode. Usually the guest
3340runs in "hypervisor" privilege mode with a few missing features.
3341
3342In addition to the above, it changes the semantics of SDR1. In this mode, the
3343HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the
3344HTAB invisible to the guest.
3345
3346When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur.
3347
3348
33496.3 KVM_CAP_SW_TLB
3350
3351Architectures: ppc
3352Target: vcpu
3353Parameters: args[0] is the address of a struct kvm_config_tlb
3354Returns: 0 on success; -1 on error
3355
3356struct kvm_config_tlb {
3357	__u64 params;
3358	__u64 array;
3359	__u32 mmu_type;
3360	__u32 array_len;
3361};
3362
3363Configures the virtual CPU's TLB array, establishing a shared memory area
3364between userspace and KVM.  The "params" and "array" fields are userspace
3365addresses of mmu-type-specific data structures.  The "array_len" field is an
3366safety mechanism, and should be set to the size in bytes of the memory that
3367userspace has reserved for the array.  It must be at least the size dictated
3368by "mmu_type" and "params".
3369
3370While KVM_RUN is active, the shared region is under control of KVM.  Its
3371contents are undefined, and any modification by userspace results in
3372boundedly undefined behavior.
3373
3374On return from KVM_RUN, the shared region will reflect the current state of
3375the guest's TLB.  If userspace makes any changes, it must call KVM_DIRTY_TLB
3376to tell KVM which entries have been changed, prior to calling KVM_RUN again
3377on this vcpu.
3378
3379For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
3380 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
3381 - The "array" field points to an array of type "struct
3382   kvm_book3e_206_tlb_entry".
3383 - The array consists of all entries in the first TLB, followed by all
3384   entries in the second TLB.
3385 - Within a TLB, entries are ordered first by increasing set number.  Within a
3386   set, entries are ordered by way (increasing ESEL).
3387 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
3388   where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
3389 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
3390   hardware ignores this value for TLB0.
3391
33926.4 KVM_CAP_S390_CSS_SUPPORT
3393
3394Architectures: s390
3395Target: vcpu
3396Parameters: none
3397Returns: 0 on success; -1 on error
3398
3399This capability enables support for handling of channel I/O instructions.
3400
3401TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are
3402handled in-kernel, while the other I/O instructions are passed to userspace.
3403
3404When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
3405SUBCHANNEL intercepts.
3406
3407Note that even though this capability is enabled per-vcpu, the complete
3408virtual machine is affected.
3409
34106.5 KVM_CAP_PPC_EPR
3411
3412Architectures: ppc
3413Target: vcpu
3414Parameters: args[0] defines whether the proxy facility is active
3415Returns: 0 on success; -1 on error
3416
3417This capability enables or disables the delivery of interrupts through the
3418external proxy facility.
3419
3420When enabled (args[0] != 0), every time the guest gets an external interrupt
3421delivered, it automatically exits into user space with a KVM_EXIT_EPR exit
3422to receive the topmost interrupt vector.
3423
3424When disabled (args[0] == 0), behavior is as if this facility is unsupported.
3425
3426When this capability is enabled, KVM_EXIT_EPR can occur.
3427
34286.6 KVM_CAP_IRQ_MPIC
3429
3430Architectures: ppc
3431Parameters: args[0] is the MPIC device fd
3432            args[1] is the MPIC CPU number for this vcpu
3433
3434This capability connects the vcpu to an in-kernel MPIC device.
3435
34366.7 KVM_CAP_IRQ_XICS
3437
3438Architectures: ppc
3439Target: vcpu
3440Parameters: args[0] is the XICS device fd
3441            args[1] is the XICS CPU number (server ID) for this vcpu
3442
3443This capability connects the vcpu to an in-kernel XICS device.
3444
34456.8 KVM_CAP_S390_IRQCHIP
3446
3447Architectures: s390
3448Target: vm
3449Parameters: none
3450
3451This capability enables the in-kernel irqchip for s390. Please refer to
3452"4.24 KVM_CREATE_IRQCHIP" for details.
3453
34546.9 KVM_CAP_MIPS_FPU
3455
3456Architectures: mips
3457Target: vcpu
3458Parameters: args[0] is reserved for future use (should be 0).
3459
3460This capability allows the use of the host Floating Point Unit by the guest. It
3461allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is
3462done the KVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed
3463(depending on the current guest FPU register mode), and the Status.FR,
3464Config5.FRE bits are accessible via the KVM API and also from the guest,
3465depending on them being supported by the FPU.
3466
34676.10 KVM_CAP_MIPS_MSA
3468
3469Architectures: mips
3470Target: vcpu
3471Parameters: args[0] is reserved for future use (should be 0).
3472
3473This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest.
3474It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest.
3475Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
3476accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
3477the guest.
3478
34797. Capabilities that can be enabled on VMs
3480------------------------------------------
3481
3482There are certain capabilities that change the behavior of the virtual
3483machine when enabled. To enable them, please see section 4.37. Below
3484you can find a list of capabilities and what their effect on the VM
3485is when enabling them.
3486
3487The following information is provided along with the description:
3488
3489  Architectures: which instruction set architectures provide this ioctl.
3490      x86 includes both i386 and x86_64.
3491
3492  Parameters: what parameters are accepted by the capability.
3493
3494  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)
3495      are not detailed, but errors with specific meanings are.
3496
3497
34987.1 KVM_CAP_PPC_ENABLE_HCALL
3499
3500Architectures: ppc
3501Parameters: args[0] is the sPAPR hcall number
3502	    args[1] is 0 to disable, 1 to enable in-kernel handling
3503
3504This capability controls whether individual sPAPR hypercalls (hcalls)
3505get handled by the kernel or not.  Enabling or disabling in-kernel
3506handling of an hcall is effective across the VM.  On creation, an
3507initial set of hcalls are enabled for in-kernel handling, which
3508consists of those hcalls for which in-kernel handlers were implemented
3509before this capability was implemented.  If disabled, the kernel will
3510not to attempt to handle the hcall, but will always exit to userspace
3511to handle it.  Note that it may not make sense to enable some and
3512disable others of a group of related hcalls, but KVM does not prevent
3513userspace from doing that.
3514
3515If the hcall number specified is not one that has an in-kernel
3516implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL
3517error.
3518
35197.2 KVM_CAP_S390_USER_SIGP
3520
3521Architectures: s390
3522Parameters: none
3523
3524This capability controls which SIGP orders will be handled completely in user
3525space. With this capability enabled, all fast orders will be handled completely
3526in the kernel:
3527- SENSE
3528- SENSE RUNNING
3529- EXTERNAL CALL
3530- EMERGENCY SIGNAL
3531- CONDITIONAL EMERGENCY SIGNAL
3532
3533All other orders will be handled completely in user space.
3534
3535Only privileged operation exceptions will be checked for in the kernel (or even
3536in the hardware prior to interception). If this capability is not enabled, the
3537old way of handling SIGP orders is used (partially in kernel and user space).
3538
35397.3 KVM_CAP_S390_VECTOR_REGISTERS
3540
3541Architectures: s390
3542Parameters: none
3543Returns: 0 on success, negative value on error
3544
3545Allows use of the vector registers introduced with z13 processor, and
3546provides for the synchronization between host and user space.  Will
3547return -EINVAL if the machine does not support vectors.
3548
35497.4 KVM_CAP_S390_USER_STSI
3550
3551Architectures: s390
3552Parameters: none
3553
3554This capability allows post-handlers for the STSI instruction. After
3555initial handling in the kernel, KVM exits to user space with
3556KVM_EXIT_S390_STSI to allow user space to insert further data.
3557
3558Before exiting to userspace, kvm handlers should fill in s390_stsi field of
3559vcpu->run:
3560struct {
3561	__u64 addr;
3562	__u8 ar;
3563	__u8 reserved;
3564	__u8 fc;
3565	__u8 sel1;
3566	__u16 sel2;
3567} s390_stsi;
3568
3569@addr - guest address of STSI SYSIB
3570@fc   - function code
3571@sel1 - selector 1
3572@sel2 - selector 2
3573@ar   - access register number
3574
3575KVM handlers should exit to userspace with rc = -EREMOTE.
3576
3577
35788. Other capabilities.
3579----------------------
3580
3581This section lists capabilities that give information about other
3582features of the KVM implementation.
3583
35848.1 KVM_CAP_PPC_HWRNG
3585
3586Architectures: ppc
3587
3588This capability, if KVM_CHECK_EXTENSION indicates that it is
3589available, means that that the kernel has an implementation of the
3590H_RANDOM hypercall backed by a hardware random-number generator.
3591If present, the kernel H_RANDOM handler can be enabled for guest use
3592with the KVM_CAP_PPC_ENABLE_HCALL capability.
3593