1This document explains the thinking about the revamped and streamlined 2nice-levels implementation in the new Linux scheduler. 3 4Nice levels were always pretty weak under Linux and people continuously 5pestered us to make nice +19 tasks use up much less CPU time. 6 7Unfortunately that was not that easy to implement under the old 8scheduler, (otherwise we'd have done it long ago) because nice level 9support was historically coupled to timeslice length, and timeslice 10units were driven by the HZ tick, so the smallest timeslice was 1/HZ. 11 12In the O(1) scheduler (in 2003) we changed negative nice levels to be 13much stronger than they were before in 2.4 (and people were happy about 14that change), and we also intentionally calibrated the linear timeslice 15rule so that nice +19 level would be _exactly_ 1 jiffy. To better 16understand it, the timeslice graph went like this (cheesy ASCII art 17alert!): 18 19 20 A 21 \ | [timeslice length] 22 \ | 23 \ | 24 \ | 25 \ | 26 \|___100msecs 27 |^ . _ 28 | ^ . _ 29 | ^ . _ 30 -*----------------------------------*-----> [nice level] 31 -20 | +19 32 | 33 | 34 35So that if someone wanted to really renice tasks, +19 would give a much 36bigger hit than the normal linear rule would do. (The solution of 37changing the ABI to extend priorities was discarded early on.) 38 39This approach worked to some degree for some time, but later on with 40HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which 41we felt to be a bit excessive. Excessive _not_ because it's too small of 42a CPU utilization, but because it causes too frequent (once per 43millisec) rescheduling. (and would thus trash the cache, etc. Remember, 44this was long ago when hardware was weaker and caches were smaller, and 45people were running number crunching apps at nice +19.) 46 47So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the 48right minimal granularity - and this translates to 5% CPU utilization. 49But the fundamental HZ-sensitive property for nice+19 still remained, 50and we never got a single complaint about nice +19 being too _weak_ in 51terms of CPU utilization, we only got complaints about it (still) being 52too _strong_ :-) 53 54To sum it up: we always wanted to make nice levels more consistent, but 55within the constraints of HZ and jiffies and their nasty design level 56coupling to timeslices and granularity it was not really viable. 57 58The second (less frequent but still periodically occurring) complaint 59about Linux's nice level support was its assymetry around the origo 60(which you can see demonstrated in the picture above), or more 61accurately: the fact that nice level behavior depended on the _absolute_ 62nice level as well, while the nice API itself is fundamentally 63"relative": 64 65 int nice(int inc); 66 67 asmlinkage long sys_nice(int increment) 68 69(the first one is the glibc API, the second one is the syscall API.) 70Note that the 'inc' is relative to the current nice level. Tools like 71bash's "nice" command mirror this relative API. 72 73With the old scheduler, if you for example started a niced task with +1 74and another task with +2, the CPU split between the two tasks would 75depend on the nice level of the parent shell - if it was at nice -10 the 76CPU split was different than if it was at +5 or +10. 77 78A third complaint against Linux's nice level support was that negative 79nice levels were not 'punchy enough', so lots of people had to resort to 80run audio (and other multimedia) apps under RT priorities such as 81SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation 82proof, and a buggy SCHED_FIFO app can also lock up the system for good. 83 84The new scheduler in v2.6.23 addresses all three types of complaints: 85 86To address the first complaint (of nice levels being not "punchy" 87enough), the scheduler was decoupled from 'time slice' and HZ concepts 88(and granularity was made a separate concept from nice levels) and thus 89it was possible to implement better and more consistent nice +19 90support: with the new scheduler nice +19 tasks get a HZ-independent 911.5%, instead of the variable 3%-5%-9% range they got in the old 92scheduler. 93 94To address the second complaint (of nice levels not being consistent), 95the new scheduler makes nice(1) have the same CPU utilization effect on 96tasks, regardless of their absolute nice levels. So on the new 97scheduler, running a nice +10 and a nice 11 task has the same CPU 98utilization "split" between them as running a nice -5 and a nice -4 99task. (one will get 55% of the CPU, the other 45%.) That is why nice 100levels were changed to be "multiplicative" (or exponential) - that way 101it does not matter which nice level you start out from, the 'relative 102result' will always be the same. 103 104The third complaint (of negative nice levels not being "punchy" enough 105and forcing audio apps to run under the more dangerous SCHED_FIFO 106scheduling policy) is addressed by the new scheduler almost 107automatically: stronger negative nice levels are an automatic 108side-effect of the recalibrated dynamic range of nice levels. 109