Lines Matching refs:in
1 Notes on the Generic Block Layer Rewrite in Linux 2.5
6 Suparna Bhattacharya <suparna@in.ibm.com>
14 These are some notes describing some aspects of the 2.5 block layer in the
18 Please mail corrections & suggestions to suparna@in.ibm.com.
38 while it was still work-in-progress:
58 2.2 The bio struct in detail (multi-page io unit)
59 2.3 Changes in the request structure
63 3.2.1 Traversing segments and completion units in a request
85 Let us discuss the changes in the context of how some overall goals for the
96 important especially in the light of ever improving hardware capabilities
102 Sophisticated devices with large built-in caches, intelligent i/o scheduling
117 a per-queue level (e.g maximum request size, maximum number of segments in
122 move into the block device structure in the future. Some characteristics
124 in themselves. There are blk_queue_xxx functions to set the parameters,
136 - The request queue's max_sectors, which is a soft size in
142 in units of 512 byte sectors.
148 Maximum physical segments you can handle in a request. 128
152 Maximum dma segments the hardware can handle in a request. 128
171 The generic bounce buffer logic, present in 2.4, where the block layer would
174 changed in 2.5. The bounce logic is now applied only for memory ranges
196 the type of the operation. For example, in case of a read operation, the
200 operation. Since an original buffer may be in a high memory area that's not
201 mapped in kernel virtual addr, a kmap operation may be required for
202 performing the copy, and special care may be needed in the completion path
203 as it may not be in irq context. Special care is also required (by way of
208 area that's not mapped in kernel virtual addr, but within the range that the
210 copy operations. [Note: This does not hold in the current implementation,
215 may need to abort DMA operations and revert to PIO for the transfer, in
217 done in some scenarios where the low level driver cannot be trusted to
226 As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular
256 The flags and rw fields in the bio structure can be used for some tuning
267 requests in the queue. For example it allows reads for bringing in an
270 could even be exposed to applications in some manner, providing higher level
272 requests. Some bits in the bi_rw flags field in the bio structure are
292 can instead be used to directly insert such requests in the queue or preferably
308 addresses passed in this way and ignores bio entries for the request type
316 [TBD: end_that_request_last should be usable even in this case;
331 _last works OK in this case, and is not a problem, as I mentioned earlier
337 to the device. The cmd block in the request structure has room for filling
338 in the command bytes. (i.e rq->cmd is now 16 bytes in size, and meant for
343 in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC:
346 It can help to pre-build device commands for requests in advance.
356 request on the queue, rather than construct the command on the fly in the
357 driver while servicing the request queue when it may affect latencies in
358 interrupt context or responsiveness in general. One way to add early
360 Now REQ_NOMERGE is set in the request flags to skip this one in the future,
375 when the underlying device was capable of handling the i/o in one shot.
380 The following were some of the goals and expectations considered in the
381 redesign of the block i/o data structure in 2.5.
384 avoid cache related fields which are irrelevant in the direct/page i/o path,
388 address mapping in kernel address space).
390 greater than PAGE_SIZE chunks in one shot)
401 themselves in the process
409 bh structure for buffered i/o, and in the case of raw/direct i/o kiobufs are
440 unsigned int bi_size; /* total size in bytes */
453 - Large i/os can be sent down in one go using a bio_vec list consisting
455 are represented in the zero-copy network code)
456 - Splitting of an i/o request across multiple devices (as in the case of
464 - Drivers which can't process a large bio in one shot can use the bi_iter
466 (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
476 The scatter gather list is in the form of an array of <page, offset, len>
477 entries with their corresponding dma address mappings filled in at the
485 which in turn means that only raw I/O uses it (direct i/o may not work
488 bios, but that is currently not included in the stock development kernels.
489 The same is true of Andrew Morton's work-in-progress multipage bio writeout
492 2.3 Changes in the Request Structure
499 invoke underlying driver entry points passing in a specially constructed
503 to in some of the discussion here) are listed below, not necessarily in
504 the order in which they occur in the structure (see include/linux/blkdev.h)
541 unsigned int current_nr_sectors; /* no. of sectors left in the
561 to the numbers of sectors in the current segment being processed which could
562 be one of the many segments in the current bio (i.e i/o completion unit).
563 The nr_sectors value refers to the total number of sectors in the whole
576 of the i/o buffer in cases where the buffer resides in low-memory. For high
593 subsystem makes use of the block layer to writeout dirty pages in order to be
595 allocation logic draws from the preallocated emergency reserve in situations
598 replenish the pool (without deadlocking) and wait for availability in the pool.
599 If it is in IRQ context, and hence not in a position to do this, allocation
605 the current availability in the pool. The mempool interface lets the
613 in the bounce bio allocation that happens in the current code, since
618 amount of time (in the case of bio, that would be after the i/o is completed).
619 This ensures that if part of the pool has been used up, some work (in this
620 case i/o) must already be in progress and memory would be available when it
621 is over. If allocating from multiple pools in the same code path, the order
632 i/o is issued (since the bio may otherwise get freed in case i/o completion
633 happens in the meantime).
638 in lvm or md.
642 3.2.1 Traversing segments and completion units in a request
645 in the request list (drivers should avoid directly trying to do it
647 with block changes in the future.
654 that traverse bio chains on completion need to keep that in mind. Drivers
681 hw data segments in a request (i.e. the maximum number of address/length
685 of physical data segments in a request (i.e. the largest sized scatter list
704 size of remaining data in the current segment (that is the maximum it can
705 transfer in one go unless it interprets segments), and rely on the block layer
771 tag number to the associated request. These are, in no particular order:
796 Internally, block manages tags in the blk_queue_tag structure:
808 but in the event of any barrier requests in the tag queue we need to ensure
809 that requests are restarted in the order they were queue. This may happen
826 The embedded bh array in the kiobuf structure has been removed and no
828 blocks array as well, but it's currently in there to kludge around direct i/o.]
834 of data, so brw_kiovec() invokes ll_rw_kio for each kiobuf in a kiovec.
849 heads. This work is still in progress.
852 bh). This isn't included in bio as yet. Christoph was also working on a
860 as discussed earlier in section 1.3.
873 cloning, in this case rather than PRE_BUILT bio_vecs, we set the bi_io_vec
874 array pointer to point to the veclet array in kvecs.
876 TBD: In order for this to work, some changes are needed in the way multi-page
877 bios are handled today. The values of the tuples in such a vector passed in
878 from higher level code should not be modified by the block layer in the course
881 all such transient state should either be maintained in the request structure,
882 and passed on in some way to the endio completion routine.
886 I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch
890 Block layer implements generic dispatch queue in block/*.c.
902 calls elevator_xxx_fn in the elevator switch (block/elevator.c). Oh, xxx
917 elevator_merged_fn called when a request in the scheduler has been
918 involved in a merge. It is used in the deadline
926 results in some sort of conflict internally,
944 one specified in disk sort order. Used by the
996 almost always dispatched in disk sort order, so a cache is kept of the next
997 request in sort order to prevent binary tree lookups.
1009 Front merges are handled by the binary trees in AS and deadline schedulers.
1011 iii. Plugging the queue to batch requests in anticipation of opportunities for
1015 that it collects up enough requests in the queue to be able to take
1016 advantage of the sorting/merging logic in the elevator. If the
1017 queue is empty when a request comes in, then it plugs the request queue
1025 can do it explicity through blk_unplug(bdev). So in the read case,
1033 and allowing a big queue to build up in software, while letting the device be
1036 multi-page bios being queued in one shot, we may not need to wait to merge
1041 be used in I/O schedulers, and in the block layer (could be used for IO statis,
1042 priorities for example). See *io_context in block/ll_rw_blk.c, and as-iosched.c
1043 for an example of usage in an i/o scheduler.
1056 corresponding adapter lock, which results in a per host locking
1071 The sector number used in the bio structure has been changed to sector_t,
1072 which could be defined as 64 bit in preparation for 64 bit sector support.
1081 having to take partition number into account in order to arrive at the true
1099 The following are some points to keep in mind when converting old drivers
1107 It used to handle always just the first buffer_head in a request, now
1117 As described in Sec 1.1, drivers can set max sector size, max segment size
1138 PIO drivers (or drivers that need to revert to PIO transfer once in a
1148 - orig kiobuf & raw i/o patches (now in 2.4 tree)
1164 8.11. Block device in page cache patch (Andrea Archangeli) - now in 2.4.10+
1178 brought up in this discussion thread)