Lines Matching refs:to

15 context of the bio rewrite. The idea is to bring out some of the key
18 Please mail corrections & suggestions to suparna@in.ibm.com.
37 The following people helped with fixes/contributions to the bio patches
45 1. Scope for tuning of logic to various needs
52 1.3 Direct access/bypass to lower layers for diagnostics and special
73 5.2 Prepare for transition to 64 bit sector_t
88 1. Scope for tuning the generic logic to satisfy various requirements
90 The block layer design supports adaptable abstractions to handle common
91 processing with the ability to tune the logic to an appropriate extent
93 One of the objectives of the rewrite was to increase the degree of tunability
94 and to enable higher level code to utilize underlying device/driver
95 capabilities to the maximum extent for better i/o performance. This is
97 and application/middleware software designed to take advantage of these
107 used at the generic block layer to take the right decisions on
114 i. Per-queue limits/values exported to the generic layer by the driver
124 in themselves. There are blk_queue_xxx functions to set the parameters,
130 Enable I/O to highmem pages, dma_address being the
172 by default copyin/out i/o requests on high-memory buffers to low-memory buffers
173 assuming that the driver wouldn't be able to handle it directly, has been
180 In order to enable high-memory i/o where the device is capable of supporting
182 modified to accomplish a direct page -> bus translation, without requiring
188 Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion
192 Special handling is required only for cases where i/o needs to happen on
197 data read has to be copied to the original buffer on i/o completion, so a
198 callback routine is set up to do this, while for write, the data is copied
199 from the original buffer to the bounce buffer prior to issuing the
204 GFP flags) when allocating bounce buffers, to avoid certain highmem
209 device can use directly; so the bounce page may need to be kmapped during
213 There are some situations when pages from high memory may need to
215 may need to abort DMA operations and revert to PIO for the transfer, in
217 done in some scenarios where the low level driver cannot be trusted to
218 handle a single sg entry correctly. The driver is expected to perform the
221 routine on its own to bounce highmem i/o to low memory for specific requests
226 As in 2.4, it is possible to plugin a brand new i/o scheduler for a particular
230 add request, extract request, which makes it possible to abstract specific
232 It also makes it possible to completely hide the implementation details of
235 I/O scheduler wrappers are to be used instead of accessing the queue directly.
243 requirements where an application prefers to make its own i/o scheduling
260 other upper level mechanism to communicate such settings to block.
268 executable page on demand to be given a higher priority over pending write
270 could even be exposed to applications in some manner, providing higher level
273 intended to be used for this priority information.
276 1.3 Direct Access to Low level Device/Driver Capabilities (Bypass mode)
279 There are situations where high-level code needs to have direct access to
280 the low level device capabilities or requires the ability to issue commands
281 to the device bypassing some of the intermediate i/o layers.
285 multiple levels without having to pass through upper layers makes
286 it possible to perform bottom up validation of the i/o path, layer by
292 can instead be used to directly insert such requests in the queue or preferably
293 the blk_do_rq routine can be used to place the request on the queue and
306 completion. Alternatively one could directly use the request->buffer field to
317 Perhaps an end_that_direct_request_first routine could be implemented to make
320 corresponding to a data buffer]
327 completion of partial transfers. The driver has to modify these fields
336 A request can be created with a pre-built custom command to be sent directly
337 to the device. The cmd block in the request structure has room for filling
342 The request structure flags can be set up to indicate the type of request
343 in such cases (REQ_PC: direct packet command passed to driver, REQ_BLOCK_PC:
346 It can help to pre-build device commands for requests in advance.
348 block layer would invoke to pre-build device commands for a given request,
358 interrupt context or responsiveness in general. One way to add early
359 pre-building would be to do it whenever we fail to merge on a request.
360 Now REQ_NOMERGE is set in the request flags to skip this one in the future,
361 which means that it will not change before we feed it to the device. So
369 Prior to 2.5, buffer heads were used as the unit of i/o at the generic block
371 buffer heads for a contiguous i/o request. This led to certain inefficiencies
372 when it came to large i/o requests and readv/writev style operations, as it
373 forced such requests to be broken up into small chunks before being passed
374 on to the generic block layer, only to be merged by the i/o scheduler
377 from the buffer cache unnecessarily added to the weight of the descriptors
387 ii. Ability to represent high-memory buffers (which do not have a virtual
389 iii.Ability to represent large i/os w/o unnecessarily breaking them up (i.e
391 iv. At the same time, ability to retain independent identity of i/os from
394 v. Ability to represent an i/o involving multiple physical memory segments
402 vii.Ability to handle the possibility of splits/merges as the structure passes
405 The solution was to define a new structure (bio) for the block layer,
410 mapped to bio structures.
414 The bio structure uses a vector representation pointing to an array of tuples
415 of <page, offset, len> to describe the i/o buffer, and has various other
416 fields describing i/o parameters and state that needs to be maintained for
454 of an array of <page, offset, len> fragments (similar to the way fragments
457 lvm or raid) is achieved by cloning the bio (where the clone points to
460 avoids reallocs and makes independent completions easier to handle.
465 field to keep track of the next bio_vec entry to process.
466 (e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
467 [TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
479 covered by a single entry where <page> refers to the first page and <len>
480 covers the range of pages (up to 16 contiguous pages could be covered this
481 way). There is a helper routine (blk_rq_map_sg) which drivers can use to build
486 right now). The intent however is to enable clustering of pages etc to
494 The request structure is the structure that gets passed down to low level
497 use of block layer helper routine elv_next_request to pull the next request
503 to in some of the discussion here) are listed below, not necessarily in
505 Refer to Documentation/block/request.txt for details about all the request
507 supposed to use or modify those fields.
510 struct list_head queuelist; /* Not meant to be directly accessed by
534 * will actually have to deal with after DMA mapping is done.
548 char *buffer; /* valid only for low memory buffers up to
561 to the numbers of sectors in the current segment being processed which could
563 The nr_sectors value refers to the total number of sectors in the whole
564 request that remain to be transferred (no change). The purpose of the
565 hard_xxx values is for block to remember these counts every time it hands
566 over the request to the driver. These values are updated by block on
568 transfer and invokes block end*request helpers to mark this. The
579 Code that sets up its own request structures and passes them down to
580 a driver needs to be careful about interoperation with the block layer helper
591 subsystems like bio to maintain their own reserve memory pools for guaranteed
593 subsystem makes use of the block layer to writeout dirty pages in order to be
594 able to free up memory space, a case which needs careful handling. The
599 If it is in IRQ context, and hence not in a position to do this, allocation
600 could fail if the pool is empty. In general mempool always first tries to
601 perform allocation without having to wait, even if it means digging into the
604 On a free, memory is released to the pool or directly freed depending on
606 subsystem specify the routines to be used for normal alloc and free. In the
609 The caller of bio_alloc is expected to taken certain steps to avoid
610 deadlocks, e.g. avoid trying to allocate more memory from the pool while
622 or hierarchy of allocation needs to be consistent, just the way one deals
625 The bio_alloc routine also needs to allocate the bio_vec_list (bvec_alloc())
630 The bio_get() routine may be used to hold an extra reference on a bio prior
631 to i/o submission, if the bio fields are likely to be accessed after the
635 The bio_clone() routine may be used to duplicate a bio, where the clone
636 shares the bio_vec_list with the original bio (i.e. both point to the
645 in the request list (drivers should avoid directly trying to do it
646 themselves). Using these helpers should also make it easier to cope
654 that traverse bio chains on completion need to keep that in mind. Drivers
656 need to be reorganized to support multi-segment bios.
666 to modify the internals of request to scatterlist conversion down the line
669 is set) and correct segment accounting to avoid exceeding the limits which
678 Routines which the low level driver can use to set up the segment limits:
682 pairs the host adapter can actually hand to the device at once)
703 field. As before the driver should use current_nr_sectors to determine the
706 end_request, or end_that_request_first/last to take care of all accounting
716 Block now offers some simple generic functionality to help support command
737 0 and 'depth' is assigned to the request (rq->tag holds this number),
738 and 'rq' is added to the internal tag management. If the maximum depth
751 the driver must remember to call blk_queue_end_tag() before signalling
752 completion of the request to the block layer. This means ending tag
756 Certain hardware conditions may dictate a need to invalidate the block tag
757 queue. For instance, on IDE any tagged request error needs to clear both
758 the hardware and software block queue and enable the driver to sanely restart
759 all the outstanding requests. There's a third helper to do that:
764 to the request queue. The driver will receive them again on the
770 Some block functions exist to query current tag status or to go from a
771 tag number to the associated request. These are, in no particular order:
779 Returns a pointer to the request associated with tag 'tag'.
799 struct request **tag_index; /* array or pointers to rq */
808 but in the event of any barrier requests in the tag queue we need to ensure
810 if the driver needs to use blk_queue_invalidate_tags().
814 The routine submit_bio() is used to submit a single io. Higher level i/o
818 The routine submit_bh() invokes submit_bio() on a bio corresponding to the
823 maps the array to one or more multi-page bios, issuing submit_bio() to
827 preallocation of bios is done for kiobufs. [The intent is to remove the
828 blocks array as well, but it's currently in there to kludge around direct i/o.]
829 Thus kiobuf allocation has switched back to using kmalloc rather than vmalloc.
833 A single kiobuf structure is assumed to correspond to a contiguous range
836 This is to be resolved. The eventual direction is to replace kiobuf
839 Badari Pulavarty has a patch to implement direct i/o correctly using
846 Andrew Morton's multi-page bio patches attempt to issue multi-page
854 some of the address space ops interfaces to utilize this abstraction rather
856 abstraction, but intended to be as lightweight as possible).
870 to brw_kvec_async().
872 Now it should be possible to directly map these kvecs to a bio. Just as while
874 array pointer to point to the veclet array in kvecs.
876 TBD: In order for this to work, some changes are needed in the way multi-page
880 to continue to use the vector descriptor (kvec) after i/o completes. Instead,
882 and passed on in some way to the endio completion routine.
888 to refer to both parts and I/O scheduler to specific I/O schedulers.
895 requests. They can also choose to delay certain requests to improve
899 change to another one dynamically.
901 A block layer call to the i/o scheduler follows the convention elv_xxx(). This
910 elevator_merge_fn called to query requests for merge with a bio
919 scheduler for example, to reposition the request
925 want to stop a merge at this point if it
927 this hook allows it to do that. Note however
929 time. Currently the io scheduler has no way to
934 I/O schedulers are free to postpone requests by
937 are not allowed to manipulate the requests -
938 they belong to generic dispatch queue.
940 elevator_add_req_fn* called to add a new request into the scheduler
945 block layer to find merge possibilities.
949 elevator_may_queue_fn returns true if the scheduler wants to allow the
950 current context to queue a new request even if
955 elevator_put_req_fn Must be used to allocate and free any elevator
959 I/O schedulers can use this callback to
962 elevator_deactivate_req_fn Called when device driver decides to delay
983 The generic i/o scheduler algorithm attempts to sort/merge/batch requests for
997 request in sort order to prevent binary tree lookups.
1004 enables merging code to quickly look up "back merge" candidates, even when
1008 are far less common than "back merges" due to the nature of most I/O patterns.
1011 iii. Plugging the queue to batch requests in anticipation of opportunities for
1014 Plugging is an approach that the current i/o scheduling algorithm resorts to so
1015 that it collects up enough requests in the queue to be able to take
1018 (sort of like plugging the bath tub of a vessel to get fluid to build up)
1019 till it fills up with a few more requests, before starting to service
1020 the requests. This provides an opportunity to merge/sort the requests before
1021 passing them down to the device. There are various conditions when the queue is
1022 unplugged (to open up the flow again), either through a scheduled task or
1032 always the right thing to do. Devices typically have their own queues,
1033 and allowing a big queue to build up in software, while letting the device be
1034 idle for a while may not always make sense. The trick is to handle the fine
1035 balance between when to plug and when to open up. Also now that we have
1036 multi-page bios being queued in one shot, we may not need to wait to merge
1050 The global io_request_lock has been removed as of 2.5, to avoid
1052 granular locking. The request queue structure has a pointer to the
1053 lock to be used for that queue. As a result, locking can now be
1055 necessary (e.g the scsi layer sets the queue lock pointers to the
1060 should still be SMP safe. Drivers are free to drop the queue
1062 io_request_lock for serialization need to be modified accordingly.
1067 and passing the address to that lock to blk_init_queue().
1071 The sector number used in the bio structure has been changed to sector_t,
1080 provides drivers with a sector number relative to whole device, rather than
1081 having to take partition number into account in order to arrive at the true
1084 so the i/o scheduler also gets to operate on whole disk sector numbers. This
1085 should typically not require changes to block drivers, it just never gets
1086 to invoke its own partition sector offset calculations since all bios
1099 The following are some points to keep in mind when converting old drivers
1100 to bio.
1102 Drivers should use elv_next_request to pick up requests and are no longer
1103 supposed to handle looping directly over the request list.
1107 It used to handle always just the first buffer_head in a request, now
1112 right thing to use is bio_endio(bio, uptodate) instead.
1115 then it just needs to replace that with q->queue_lock instead.
1118 etc per queue now. Drivers that used to define their own merge functions i
1119 to handle things like this can now just use the blk_queue_* functions at
1122 Drivers no longer have to map a {partition, sector offset} into the
1137 use dma_map_sg for scatter gather) to be able to ship it to the driver. For
1138 PIO drivers (or drivers that need to revert to PIO transfer once in a
1141 (Sec 1.1, (ii) ) it needs to use __bio_kmap_atomic and bio_kmap_irq to
1149 - direct kiobuf based i/o to devices (no intermediate bh's)
1177 et al - Feb-March 2001 (many of the initial thoughts that led to bio were