1 ==================== 2 CREDENTIALS IN LINUX 3 ==================== 4 5By: David Howells <dhowells@redhat.com> 6 7Contents: 8 9 (*) Overview. 10 11 (*) Types of credentials. 12 13 (*) File markings. 14 15 (*) Task credentials. 16 17 - Immutable credentials. 18 - Accessing task credentials. 19 - Accessing another task's credentials. 20 - Altering credentials. 21 - Managing credentials. 22 23 (*) Open file credentials. 24 25 (*) Overriding the VFS's use of credentials. 26 27 28======== 29OVERVIEW 30======== 31 32There are several parts to the security check performed by Linux when one 33object acts upon another: 34 35 (1) Objects. 36 37 Objects are things in the system that may be acted upon directly by 38 userspace programs. Linux has a variety of actionable objects, including: 39 40 - Tasks 41 - Files/inodes 42 - Sockets 43 - Message queues 44 - Shared memory segments 45 - Semaphores 46 - Keys 47 48 As a part of the description of all these objects there is a set of 49 credentials. What's in the set depends on the type of object. 50 51 (2) Object ownership. 52 53 Amongst the credentials of most objects, there will be a subset that 54 indicates the ownership of that object. This is used for resource 55 accounting and limitation (disk quotas and task rlimits for example). 56 57 In a standard UNIX filesystem, for instance, this will be defined by the 58 UID marked on the inode. 59 60 (3) The objective context. 61 62 Also amongst the credentials of those objects, there will be a subset that 63 indicates the 'objective context' of that object. This may or may not be 64 the same set as in (2) - in standard UNIX files, for instance, this is the 65 defined by the UID and the GID marked on the inode. 66 67 The objective context is used as part of the security calculation that is 68 carried out when an object is acted upon. 69 70 (4) Subjects. 71 72 A subject is an object that is acting upon another object. 73 74 Most of the objects in the system are inactive: they don't act on other 75 objects within the system. Processes/tasks are the obvious exception: 76 they do stuff; they access and manipulate things. 77 78 Objects other than tasks may under some circumstances also be subjects. 79 For instance an open file may send SIGIO to a task using the UID and EUID 80 given to it by a task that called fcntl(F_SETOWN) upon it. In this case, 81 the file struct will have a subjective context too. 82 83 (5) The subjective context. 84 85 A subject has an additional interpretation of its credentials. A subset 86 of its credentials forms the 'subjective context'. The subjective context 87 is used as part of the security calculation that is carried out when a 88 subject acts. 89 90 A Linux task, for example, has the FSUID, FSGID and the supplementary 91 group list for when it is acting upon a file - which are quite separate 92 from the real UID and GID that normally form the objective context of the 93 task. 94 95 (6) Actions. 96 97 Linux has a number of actions available that a subject may perform upon an 98 object. The set of actions available depends on the nature of the subject 99 and the object. 100 101 Actions include reading, writing, creating and deleting files; forking or 102 signalling and tracing tasks. 103 104 (7) Rules, access control lists and security calculations. 105 106 When a subject acts upon an object, a security calculation is made. This 107 involves taking the subjective context, the objective context and the 108 action, and searching one or more sets of rules to see whether the subject 109 is granted or denied permission to act in the desired manner on the 110 object, given those contexts. 111 112 There are two main sources of rules: 113 114 (a) Discretionary access control (DAC): 115 116 Sometimes the object will include sets of rules as part of its 117 description. This is an 'Access Control List' or 'ACL'. A Linux 118 file may supply more than one ACL. 119 120 A traditional UNIX file, for example, includes a permissions mask that 121 is an abbreviated ACL with three fixed classes of subject ('user', 122 'group' and 'other'), each of which may be granted certain privileges 123 ('read', 'write' and 'execute' - whatever those map to for the object 124 in question). UNIX file permissions do not allow the arbitrary 125 specification of subjects, however, and so are of limited use. 126 127 A Linux file might also sport a POSIX ACL. This is a list of rules 128 that grants various permissions to arbitrary subjects. 129 130 (b) Mandatory access control (MAC): 131 132 The system as a whole may have one or more sets of rules that get 133 applied to all subjects and objects, regardless of their source. 134 SELinux and Smack are examples of this. 135 136 In the case of SELinux and Smack, each object is given a label as part 137 of its credentials. When an action is requested, they take the 138 subject label, the object label and the action and look for a rule 139 that says that this action is either granted or denied. 140 141 142==================== 143TYPES OF CREDENTIALS 144==================== 145 146The Linux kernel supports the following types of credentials: 147 148 (1) Traditional UNIX credentials. 149 150 Real User ID 151 Real Group ID 152 153 The UID and GID are carried by most, if not all, Linux objects, even if in 154 some cases it has to be invented (FAT or CIFS files for example, which are 155 derived from Windows). These (mostly) define the objective context of 156 that object, with tasks being slightly different in some cases. 157 158 Effective, Saved and FS User ID 159 Effective, Saved and FS Group ID 160 Supplementary groups 161 162 These are additional credentials used by tasks only. Usually, an 163 EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID 164 will be used as the objective. For tasks, it should be noted that this is 165 not always true. 166 167 (2) Capabilities. 168 169 Set of permitted capabilities 170 Set of inheritable capabilities 171 Set of effective capabilities 172 Capability bounding set 173 174 These are only carried by tasks. They indicate superior capabilities 175 granted piecemeal to a task that an ordinary task wouldn't otherwise have. 176 These are manipulated implicitly by changes to the traditional UNIX 177 credentials, but can also be manipulated directly by the capset() system 178 call. 179 180 The permitted capabilities are those caps that the process might grant 181 itself to its effective or permitted sets through capset(). This 182 inheritable set might also be so constrained. 183 184 The effective capabilities are the ones that a task is actually allowed to 185 make use of itself. 186 187 The inheritable capabilities are the ones that may get passed across 188 execve(). 189 190 The bounding set limits the capabilities that may be inherited across 191 execve(), especially when a binary is executed that will execute as UID 0. 192 193 (3) Secure management flags (securebits). 194 195 These are only carried by tasks. These govern the way the above 196 credentials are manipulated and inherited over certain operations such as 197 execve(). They aren't used directly as objective or subjective 198 credentials. 199 200 (4) Keys and keyrings. 201 202 These are only carried by tasks. They carry and cache security tokens 203 that don't fit into the other standard UNIX credentials. They are for 204 making such things as network filesystem keys available to the file 205 accesses performed by processes, without the necessity of ordinary 206 programs having to know about security details involved. 207 208 Keyrings are a special type of key. They carry sets of other keys and can 209 be searched for the desired key. Each process may subscribe to a number 210 of keyrings: 211 212 Per-thread keying 213 Per-process keyring 214 Per-session keyring 215 216 When a process accesses a key, if not already present, it will normally be 217 cached on one of these keyrings for future accesses to find. 218 219 For more information on using keys, see Documentation/security/keys.txt. 220 221 (5) LSM 222 223 The Linux Security Module allows extra controls to be placed over the 224 operations that a task may do. Currently Linux supports several LSM 225 options. 226 227 Some work by labelling the objects in a system and then applying sets of 228 rules (policies) that say what operations a task with one label may do to 229 an object with another label. 230 231 (6) AF_KEY 232 233 This is a socket-based approach to credential management for networking 234 stacks [RFC 2367]. It isn't discussed by this document as it doesn't 235 interact directly with task and file credentials; rather it keeps system 236 level credentials. 237 238 239When a file is opened, part of the opening task's subjective context is 240recorded in the file struct created. This allows operations using that file 241struct to use those credentials instead of the subjective context of the task 242that issued the operation. An example of this would be a file opened on a 243network filesystem where the credentials of the opened file should be presented 244to the server, regardless of who is actually doing a read or a write upon it. 245 246 247============= 248FILE MARKINGS 249============= 250 251Files on disk or obtained over the network may have annotations that form the 252objective security context of that file. Depending on the type of filesystem, 253this may include one or more of the following: 254 255 (*) UNIX UID, GID, mode; 256 257 (*) Windows user ID; 258 259 (*) Access control list; 260 261 (*) LSM security label; 262 263 (*) UNIX exec privilege escalation bits (SUID/SGID); 264 265 (*) File capabilities exec privilege escalation bits. 266 267These are compared to the task's subjective security context, and certain 268operations allowed or disallowed as a result. In the case of execve(), the 269privilege escalation bits come into play, and may allow the resulting process 270extra privileges, based on the annotations on the executable file. 271 272 273================ 274TASK CREDENTIALS 275================ 276 277In Linux, all of a task's credentials are held in (uid, gid) or through 278(groups, keys, LSM security) a refcounted structure of type 'struct cred'. 279Each task points to its credentials by a pointer called 'cred' in its 280task_struct. 281 282Once a set of credentials has been prepared and committed, it may not be 283changed, barring the following exceptions: 284 285 (1) its reference count may be changed; 286 287 (2) the reference count on the group_info struct it points to may be changed; 288 289 (3) the reference count on the security data it points to may be changed; 290 291 (4) the reference count on any keyrings it points to may be changed; 292 293 (5) any keyrings it points to may be revoked, expired or have their security 294 attributes changed; and 295 296 (6) the contents of any keyrings to which it points may be changed (the whole 297 point of keyrings being a shared set of credentials, modifiable by anyone 298 with appropriate access). 299 300To alter anything in the cred struct, the copy-and-replace principle must be 301adhered to. First take a copy, then alter the copy and then use RCU to change 302the task pointer to make it point to the new copy. There are wrappers to aid 303with this (see below). 304 305A task may only alter its _own_ credentials; it is no longer permitted for a 306task to alter another's credentials. This means the capset() system call is no 307longer permitted to take any PID other than the one of the current process. 308Also keyctl_instantiate() and keyctl_negate() functions no longer permit 309attachment to process-specific keyrings in the requesting process as the 310instantiating process may need to create them. 311 312 313IMMUTABLE CREDENTIALS 314--------------------- 315 316Once a set of credentials has been made public (by calling commit_creds() for 317example), it must be considered immutable, barring two exceptions: 318 319 (1) The reference count may be altered. 320 321 (2) Whilst the keyring subscriptions of a set of credentials may not be 322 changed, the keyrings subscribed to may have their contents altered. 323 324To catch accidental credential alteration at compile time, struct task_struct 325has _const_ pointers to its credential sets, as does struct file. Furthermore, 326certain functions such as get_cred() and put_cred() operate on const pointers, 327thus rendering casts unnecessary, but require to temporarily ditch the const 328qualification to be able to alter the reference count. 329 330 331ACCESSING TASK CREDENTIALS 332-------------------------- 333 334A task being able to alter only its own credentials permits the current process 335to read or replace its own credentials without the need for any form of locking 336- which simplifies things greatly. It can just call: 337 338 const struct cred *current_cred() 339 340to get a pointer to its credentials structure, and it doesn't have to release 341it afterwards. 342 343There are convenience wrappers for retrieving specific aspects of a task's 344credentials (the value is simply returned in each case): 345 346 uid_t current_uid(void) Current's real UID 347 gid_t current_gid(void) Current's real GID 348 uid_t current_euid(void) Current's effective UID 349 gid_t current_egid(void) Current's effective GID 350 uid_t current_fsuid(void) Current's file access UID 351 gid_t current_fsgid(void) Current's file access GID 352 kernel_cap_t current_cap(void) Current's effective capabilities 353 void *current_security(void) Current's LSM security pointer 354 struct user_struct *current_user(void) Current's user account 355 356There are also convenience wrappers for retrieving specific associated pairs of 357a task's credentials: 358 359 void current_uid_gid(uid_t *, gid_t *); 360 void current_euid_egid(uid_t *, gid_t *); 361 void current_fsuid_fsgid(uid_t *, gid_t *); 362 363which return these pairs of values through their arguments after retrieving 364them from the current task's credentials. 365 366 367In addition, there is a function for obtaining a reference on the current 368process's current set of credentials: 369 370 const struct cred *get_current_cred(void); 371 372and functions for getting references to one of the credentials that don't 373actually live in struct cred: 374 375 struct user_struct *get_current_user(void); 376 struct group_info *get_current_groups(void); 377 378which get references to the current process's user accounting structure and 379supplementary groups list respectively. 380 381Once a reference has been obtained, it must be released with put_cred(), 382free_uid() or put_group_info() as appropriate. 383 384 385ACCESSING ANOTHER TASK'S CREDENTIALS 386------------------------------------ 387 388Whilst a task may access its own credentials without the need for locking, the 389same is not true of a task wanting to access another task's credentials. It 390must use the RCU read lock and rcu_dereference(). 391 392The rcu_dereference() is wrapped by: 393 394 const struct cred *__task_cred(struct task_struct *task); 395 396This should be used inside the RCU read lock, as in the following example: 397 398 void foo(struct task_struct *t, struct foo_data *f) 399 { 400 const struct cred *tcred; 401 ... 402 rcu_read_lock(); 403 tcred = __task_cred(t); 404 f->uid = tcred->uid; 405 f->gid = tcred->gid; 406 f->groups = get_group_info(tcred->groups); 407 rcu_read_unlock(); 408 ... 409 } 410 411Should it be necessary to hold another task's credentials for a long period of 412time, and possibly to sleep whilst doing so, then the caller should get a 413reference on them using: 414 415 const struct cred *get_task_cred(struct task_struct *task); 416 417This does all the RCU magic inside of it. The caller must call put_cred() on 418the credentials so obtained when they're finished with. 419 420 [*] Note: The result of __task_cred() should not be passed directly to 421 get_cred() as this may race with commit_cred(). 422 423There are a couple of convenience functions to access bits of another task's 424credentials, hiding the RCU magic from the caller: 425 426 uid_t task_uid(task) Task's real UID 427 uid_t task_euid(task) Task's effective UID 428 429If the caller is holding the RCU read lock at the time anyway, then: 430 431 __task_cred(task)->uid 432 __task_cred(task)->euid 433 434should be used instead. Similarly, if multiple aspects of a task's credentials 435need to be accessed, RCU read lock should be used, __task_cred() called, the 436result stored in a temporary pointer and then the credential aspects called 437from that before dropping the lock. This prevents the potentially expensive 438RCU magic from being invoked multiple times. 439 440Should some other single aspect of another task's credentials need to be 441accessed, then this can be used: 442 443 task_cred_xxx(task, member) 444 445where 'member' is a non-pointer member of the cred struct. For instance: 446 447 uid_t task_cred_xxx(task, suid); 448 449will retrieve 'struct cred::suid' from the task, doing the appropriate RCU 450magic. This may not be used for pointer members as what they point to may 451disappear the moment the RCU read lock is dropped. 452 453 454ALTERING CREDENTIALS 455-------------------- 456 457As previously mentioned, a task may only alter its own credentials, and may not 458alter those of another task. This means that it doesn't need to use any 459locking to alter its own credentials. 460 461To alter the current process's credentials, a function should first prepare a 462new set of credentials by calling: 463 464 struct cred *prepare_creds(void); 465 466this locks current->cred_replace_mutex and then allocates and constructs a 467duplicate of the current process's credentials, returning with the mutex still 468held if successful. It returns NULL if not successful (out of memory). 469 470The mutex prevents ptrace() from altering the ptrace state of a process whilst 471security checks on credentials construction and changing is taking place as 472the ptrace state may alter the outcome, particularly in the case of execve(). 473 474The new credentials set should be altered appropriately, and any security 475checks and hooks done. Both the current and the proposed sets of credentials 476are available for this purpose as current_cred() will return the current set 477still at this point. 478 479 480When the credential set is ready, it should be committed to the current process 481by calling: 482 483 int commit_creds(struct cred *new); 484 485This will alter various aspects of the credentials and the process, giving the 486LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually 487commit the new credentials to current->cred, it will release 488current->cred_replace_mutex to allow ptrace() to take place, and it will notify 489the scheduler and others of the changes. 490 491This function is guaranteed to return 0, so that it can be tail-called at the 492end of such functions as sys_setresuid(). 493 494Note that this function consumes the caller's reference to the new credentials. 495The caller should _not_ call put_cred() on the new credentials afterwards. 496 497Furthermore, once this function has been called on a new set of credentials, 498those credentials may _not_ be changed further. 499 500 501Should the security checks fail or some other error occur after prepare_creds() 502has been called, then the following function should be invoked: 503 504 void abort_creds(struct cred *new); 505 506This releases the lock on current->cred_replace_mutex that prepare_creds() got 507and then releases the new credentials. 508 509 510A typical credentials alteration function would look something like this: 511 512 int alter_suid(uid_t suid) 513 { 514 struct cred *new; 515 int ret; 516 517 new = prepare_creds(); 518 if (!new) 519 return -ENOMEM; 520 521 new->suid = suid; 522 ret = security_alter_suid(new); 523 if (ret < 0) { 524 abort_creds(new); 525 return ret; 526 } 527 528 return commit_creds(new); 529 } 530 531 532MANAGING CREDENTIALS 533-------------------- 534 535There are some functions to help manage credentials: 536 537 (*) void put_cred(const struct cred *cred); 538 539 This releases a reference to the given set of credentials. If the 540 reference count reaches zero, the credentials will be scheduled for 541 destruction by the RCU system. 542 543 (*) const struct cred *get_cred(const struct cred *cred); 544 545 This gets a reference on a live set of credentials, returning a pointer to 546 that set of credentials. 547 548 (*) struct cred *get_new_cred(struct cred *cred); 549 550 This gets a reference on a set of credentials that is under construction 551 and is thus still mutable, returning a pointer to that set of credentials. 552 553 554===================== 555OPEN FILE CREDENTIALS 556===================== 557 558When a new file is opened, a reference is obtained on the opening task's 559credentials and this is attached to the file struct as 'f_cred' in place of 560'f_uid' and 'f_gid'. Code that used to access file->f_uid and file->f_gid 561should now access file->f_cred->fsuid and file->f_cred->fsgid. 562 563It is safe to access f_cred without the use of RCU or locking because the 564pointer will not change over the lifetime of the file struct, and nor will the 565contents of the cred struct pointed to, barring the exceptions listed above 566(see the Task Credentials section). 567 568 569======================================= 570OVERRIDING THE VFS'S USE OF CREDENTIALS 571======================================= 572 573Under some circumstances it is desirable to override the credentials used by 574the VFS, and that can be done by calling into such as vfs_mkdir() with a 575different set of credentials. This is done in the following places: 576 577 (*) sys_faccessat(). 578 579 (*) do_coredump(). 580 581 (*) nfs4recover.c. 582