Lines Matching refs:a

1 Adding a New System Call
4 This document describes what's involved in adding a new system call to the
12 The first thing to consider when adding a new system call is whether one of
18 - If the operations involved can be made to look like a filesystem-like
19 object, it may make more sense to create a new filesystem or device. This
20 also makes it easier to encapsulate the new functionality in a kernel module
23 userspace that something has happened, then returning a new file
27 have to be implemented as ioctl(2) requests, which can lead to a
29 - If you're just exposing runtime system information, a new node in sysfs
33 in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
34 debugfs, as this is not considered a 'production' interface to userspace.
35 - If the operation is specific to a particular file or file descriptor, then
37 fcntl(2) is a multiplexing system call that hides a lot of complexity, so
40 (for example, getting/setting a simple flag related to a file descriptor).
41 - If the operation is specific to a particular task or process, then an
43 fcntl(2), this system call is a complicated multiplexor so is best reserved
44 for near-analogs of existing prctl() commands or getting/setting a simple
45 flag related to a process.
52 indefinitely. As such, it's a very good idea to explicitly discuss the
61 For simpler system calls that only take a couple of arguments, the preferred
62 way to allow for future extensibility is to include a flags argument to the
72 For more sophisticated system calls that involve a larger number of arguments,
73 it's preferred to encapsulate the majority of the arguments into a structure
74 that is passed in by pointer. Such a structure can cope with future extension
75 by including a size argument in the structure:
84 As long as any subsequently added field, say param_4, is designed so that a
88 - To cope with a later userspace program calling an older kernel, the kernel
91 - To cope with an older userspace program calling a newer kernel, the kernel
92 code can zero-extend a smaller instance of the structure (effectively
102 If your new system call allows userspace to refer to a kernel object, it
103 should use a file descriptor as the handle for that object -- don't invent a
107 If your new xyzzy(2) system call does return a new file descriptor, then the
108 flags argument should include a value that is equivalent to setting O_CLOEXEC
111 unexpected fork() and execve() in another thread could leak a descriptor to
113 of the O_CLOEXEC constant, as it is architecture-specific and is part of a
116 If your system call returns a new file descriptor, you should also consider
118 descriptor. Making a file descriptor ready for reading or writing is the
122 If your new xyzzy(2) system call involves a filename argument:
141 If your new xyzzy(2) system call involves a parameter describing an offset
142 within a file, make its type loff_t so that 64-bit offsets can be supported
146 to be governed by the appropriate Linux capability bit (checked with a call to
154 If your new xyzzy(2) system call manipulates a process other than the calling
155 process, it should be restricted (using a call to ptrace_may_access()) so that
156 only a calling process with the same permissions as the target process, or
162 registers. (This concern does not apply if the arguments are part of a
177 - A demonstration of the use of the new system call in userspace via a
180 cover letter, or as a patch to the (separate) man-pages repository.
197 The new entry point also needs a corresponding function prototype, in
204 tables, but several other architectures share a generic syscall table. Add your
215 The file kernel/sys_ni.c provides a fallback stub implementation of each system
221 normally be optional, so add a CONFIG option (typically to init/Kconfig) for
224 - Include a description of the new functionality and system call controlled
232 To summarize, you need a commit that includes:
246 way (see below), this involves a "common" entry (for x86_64 and x32) in
266 However, there are a couple of situations where a compatibility layer is
271 64-bit values. In particular, this is needed whenever a system call argument
274 - a pointer to a pointer
275 - a pointer to a struct containing a pointer (e.g. struct iovec __user *)
276 - a pointer to a varying sized integral type (time_t, off_t, long, ...)
277 - a pointer to a struct containing a varying sized integral type.
279 The second situation that requires a compatibility layer is if one of the
280 system call's arguments has a type that is explicitly 64-bit even on a 32-bit
281 architecture, for example loff_t or __u64. In this case, a value that arrives
282 at a 64-bit kernel from a 32-bit application will be split into two 32-bit
285 (Note that a system call argument that's a pointer to an explicit 64-bit type
286 does *not* need a compatibility layer; for example, splice(2)'s arguments of
287 type loff_t __user * do not trigger the need for a compat_ system call.)
291 SYSCALL_DEFINEn. This version of the implementation runs as part of a 64-bit
295 them call a common inner implementation function.)
297 The compat entry point also needs a corresponding function prototype, in
303 If the system call involves a structure that is laid out differently on 32-bit
305 header file should also include a compat version of the structure (struct
309 arguments from a 32-bit invocation.
338 - a COMPAT_SYSCALL_DEFINEn(xyzzy, ...) for the compat entry point
347 To wire up the x86 architecture of a system call with a compatibility version,
351 column to indicate that a 32-bit userspace program running on a 64-bit kernel
357 the new system call. There's a choice here: the layout of the arguments
360 If there's a pointer-to-a-pointer involved, the decision is easy: x32 is
386 However, a few system calls do things differently. They might return to a
398 For x86_64, this is implemented as a stub_xyzzy entry point in
404 The equivalent for 32-bit programs running on a 64-bit kernel is normally
411 If the system call needs a compatibility layer (as in the previous section)
415 table will also need to invoke a stub that calls on to the compat_sys_
418 For completeness, it's also nice to set up a mapping so that user-mode Linux
421 simulates registers etc). Fixing this is as simple as adding a #define to
430 Most of the kernel treats system calls in a generic way, but there is the
440 new system call, it's worth doing a kernel-wide grep for the existing system
448 reviewers with a demonstration of how user space programs will use the system
449 call. A good way to combine these aims is to include a simple self-test
450 program in a new directory under tools/testing/selftests/.
452 For a new system call, there will obviously be no libc wrapper function and so
454 involves a new userspace-visible structure, the corresponding header will need
471 All new system calls should come with a complete man page, ideally using groff
472 markup, but plain text will do. If groff is used, it's helpful to include a
484 - LWN article from Michael Kerrisk on how to handle unknown flags in a system
501 - Recommendation from Andrew Morton that all related information for a new
504 - Recommendation from Michael Kerrisk that a new system call should come with
505 a man page: https://lkml.org/lkml/2014/6/13/309
506 - Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
509 come with a man-page & selftest: https://lkml.org/lkml/2014/3/19/710
513 arguments should encapsulate those arguments in a struct, which includes a