scaling.txt - OpenGrok cross reference for /linux-4.4.14/Documentation/networking/scaling.txt

Lines Matching refs:CPU
54 for each CPU if the device supports enough queues, or otherwise at least
69 this to notify a CPU when new packets arrive on the given queue. The
71 that can route each interrupt to a particular CPU. The active mapping
73 an IRQ may be handled on any CPU. Because a non-negligible part of packet
88 receive queue overflows due to a saturated CPU, because in default
94 a separate CPU. For interrupt handling, HT has shown no benefit in
95 initial tests, so limit the number of queues to the number of CPU cores
104 Whereas RSS selects the queue and hence CPU that will run the hardware
105 interrupt handler, RPS selects the CPU to perform protocol processing
107 on the desired CPU’s backlog queue and waking up the CPU for processing.
118 The first step in determining the target CPU for RPS is to calculate a
131 of the list. The indexed CPU is the target for processing the packet,
132 and the packet is queued to the tail of that CPU’s backlog queue. At
135 processing on the remote CPU, and any queued packets are then processed
149 CPU. Documentation/IRQ-affinity.txt explains how CPUs are assigned to
156 CPU. If NUMA locality is not an issue, this could also be all CPUs in
158 interrupting CPU from the map since that already performs much work.
161 receive queue is mapped to each CPU, then RPS is probably redundant
164 share the same memory domain as the interrupting CPU for that queue.
170 to the same CPU is CPU load imbalance if flows vary in packet rate.
177 during CPU contention by dropping packets from large flows slightly
179 destination CPU approaches saturation.  Once a CPU's input packet
193 turned on. It is implemented for each CPU independently (to avoid lock
194 and cache contention) and toggled per CPU by setting the relevant bit
195 in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU
202 the same that selects a CPU in RPS, but as the number of buckets can
215 where a single connection taking up 50% of a CPU indicates a problem.
232 kernel processing of packets to the CPU where the application thread
234 to enqueue packets onto the backlog of another CPU and to wake up that
235 CPU.
241 The CPU recorded in each entry is the one which last processed the flow.
242 If an entry does not hold a valid CPU, then packets mapped to that entry
244 same CPU. Indeed, with many flows and few CPUs, it is very likely that
247 rps_sock_flow_table is a global flow table that contains the *desired* CPU
248 for flows: the CPU that is currently processing the flow in userspace.
249 Each table value is a CPU index that is updated during calls to recvmsg
253 When the scheduler moves a thread to a new CPU while it has outstanding
254 receive packets on the old CPU, packets may arrive out of order. To
257 receive queue of each device. Each table value stores a CPU index and a
258 counter. The CPU index represents the *current* CPU onto which packets
260 and userspace processing occur on the same CPU, and hence the CPU index
263 enqueued for kernel processing on the old CPU.
266 CPU's backlog when a packet in this flow was last enqueued. Each backlog
270 been enqueued onto the currently designated CPU for flow i (of course,
275 CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
277 are compared. If the desired CPU for the flow (found in the
278 rps_sock_flow table) matches the current CPU (found in the rps_dev_flow
279 table), the packet is enqueued onto that CPU’s backlog. If they differ,
280 the current CPU is updated to match the desired CPU if one of the
283 - The current CPU's queue head counter >= the recorded tail counter
285 - The current CPU is unset (>= nr_cpu_ids)
286 - The current CPU is offline
289 CPU. These rules aim to ensure that a flow only moves to a new CPU when
290 there are no packets outstanding on the old CPU, as the outstanding
292 CPU.
331 directly to a CPU local to the thread consuming the data. The target CPU
332 will either be the same CPU where the application runs, or at least a CPU
333 which is local to the application thread’s CPU in the cache hierarchy.
342 The hardware queue for a flow is derived from the CPU recorded in
343 rps_dev_flow_table. The stack consults a CPU to hardware queue map which
346 functions in the cpu_rmap (“CPU affinity reverse map”) kernel library
347 to populate the map. For each CPU, the corresponding queue in the map is
348 set to be one whose processing CPU is closest in cache locality.
355 of CPU to queues is automatically deduced from the IRQ affinities
369 device. To accomplish this, a mapping from CPU to hardware queue(s) is
372 these queues are processed on a CPU within this set. This choice
375 (contention can be eliminated completely if each CPU has its own
385 of the running CPU as a key into the CPU-to-queue lookup table. If the
416 system, XPS is preferably configured so that each CPU maps onto one queue.
418 queue can also map onto one CPU, resulting in exclusive pairings that
421 with the CPU that processes transmit completions for that queue