BUFFERIO(9) | Kernel Developer's Manual | BUFFERIO(9) |
BUFFERIO
, biodone
,
biowait
, getiobuf
,
putiobuf
, nestiobuf_setup
,
nestiobuf_done
—
#include <sys/buf.h>
void
biodone
(struct
buf *bp);
int
biowait
(struct
buf *bp);
struct buf *
getiobuf
(struct
vnode *vp, bool
waitok);
void
putiobuf
(struct
buf *bp);
void
nestiobuf_setup
(struct
buf *mbp, struct buf
*bp, int offset,
size_t size);
void
nestiobuf_done
(struct
buf *mbp, int
donebytes, int
error);
BUFFERIO
subsystem manages block I/O buffer
transfers, described by the struct buf structure, which
serves multiple purposes between users in BUFFERIO
,
users in buffercache(9),
and users in block device drivers to execute transfers to physical disks.
BUFFERIO
wishing to submit a buffer for block
I/O transfer must obtain a struct buf, e.g. via
getiobuf
(), fill its parameters, and submit it to a
block device with
bdev_strategy(9), usually
via VOP_STRATEGY(9).
The parameters to an I/O transfer described by bp are specified by the following struct buf fields:
->b_flags
B_READ
B_ASYNC
->b_iodone
and must
not call
biowait
(bp).B_WRITE
, which is zero.->b_data
->b_bcount
->b_blkno
->b_iodone
B_ASYNC
must not be set
in bp->b_flags
.Additionally, if the I/O transfer is a write associated with a
vnode(9)
vp, then before the user submits it to a block device,
the user must increment
vp->v_numoutput
. The user
must not acquire vp's vnode lock between incrementing
vp->v_numoutput
and
submitting bp to a block device — doing so will
likely cause deadlock with the syncer.
Block I/O transfer completion may be notified by the
bp->b_iodone
callback, by
signalling biowait
() waiters, or not at all in the
B_ASYNC
case.
->b_iodone
callback to
a non-NULL
function pointer, it will be called in
soft interrupt context when the I/O transfer is complete. The user
may not call
biowait
(bp) in this
case.B_ASYNC
is set, then the I/O transfer is
asynchronous and the user will not be notified when it is completed. The
user may not call
biowait
(bp) in this
case.->b_iodone
is
NULL
and B_ASYNC
is not
specified, the user may wait for the I/O transfer to complete with
biowait
(bp).Once an I/O transfer has completed, its struct
buf may be reused, but the user must first clear the
BO_DONE
flag of
bp->b_oflags
before reusing
it.
After initializing the b_flags
,
b_data
, and b_bcount
parameters of an I/O transfer for the buffer, called the
master buffer, the user can issue smaller transfers for
segments of the buffer using nestiobuf_setup
(). When
nested I/O transfers complete, in any order, they debit from the amount of
work left to be done in the master buffer. If any segments of the buffer
were skipped, the user can report this with
nestiobuf_done
() to debit the skipped part of the
work.
The master buffer's I/O transfer is completed when all nested
buffers' I/O transfers are completed, and if
nestiobuf_done
() is called in the case of skipped
segments.
For writes associated with a vnode vp,
nestiobuf_setup
() accounts for
vp->v_numoutput
, so the
caller is not allowed to acquire vp's vnode lock
before submitting the nested I/O transfer to a block device. However, the
caller is responsible for accounting the master buffer in
vp->v_numoutput
. This must
be done very carefully because after incrementing
vp->v_numoutput
, the caller
is not allowed to acquire vp's vnode lock before
either calling nestiobuf_done
() or submitting the
last nested I/O transfer to a block device.
For example:
struct buf *mbp, *bp; size_t skipped = 0; unsigned i; int error = 0; mbp = getiobuf(vp, true); mbp->b_data = data; mbp->b_resid = mbp->b_bcount = datalen; mbp->b_flags = B_WRITE; KASSERT(0 < nsegs); KASSERT(datalen == nsegs*segsz); for (i = 0; i < nsegs; i++) { struct vnode *devvp; daddr_t blkno; vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); error = VOP_BMAP(vp, i*segsz, &devvp, &blkno, NULL); VOP_UNLOCK(vp); if (error == 0 && blkno == -1) error = EIO; if (error) { /* Give up early, don't try to handle holes. */ skipped += datalen - i*segsz; break; } bp = getiobuf(vp, true); nestiobuf_setup(bp, mbp, i*segsz, segsz); bp->b_blkno = blkno; if (i == nsegs - 1) /* Last segment. */ break; VOP_STRATEGY(devvp, bp); } /* * Account v_numoutput for master write. * (Must not vn_lock before last VOP_STRATEGY!) */ mutex_enter(&vp->v_interlock); vp->v_numoutput++; mutex_exit(&vp->v_interlock); if (skipped) nestiobuf_done(mbp, skipped, error); else VOP_STRATEGY(devvp, bp);
d_strategy
member of struct
bdevsw
(driver(9)), to
queue a buffer for disk I/O. The inputs to the strategy method are:
->b_flags
B_READ
->b_data
->b_bcount
->b_blkno
If the strategy method uses bufq(9), it must additionally initialize the following fields before queueing bp with bufq_put(9):
->b_rawblkno
When the I/O transfer is complete, whether it succeeded or failed, the strategy method must:
->b_error
to zero
on success, or to an errno(2)
error code on failure.->b_resid
to the
number of bytes remaining to transfer, whether on success or on failure.
If no bytes were transferred, this must be set to
bp->b_bcount
.biodone
(bp).biodone
(bp)To be called by a block device driver. Caller must first set
bp->b_error
to an error
code and bp->b_resid
to
the number of bytes remaining to transfer.
biowait
(bp)->b_error
.
To be called by a user requesting the I/O transfer.
May not be called if bp has a callback
or is asynchronous — that is, if
bp->b_iodone
is set, or
if B_ASYNC
is set in
bp->b_flags
.
getiobuf
(vp,
waitok)NULL
, the transfer
is associated with it. If waitok is false, returns
NULL
if none can be allocated immediately.
The resulting struct buf pointer must
eventually be passed to putiobuf
() to release
it. Do not use
brelse(9).
The buffer may not be used for an asynchronous I/O transfer,
because there is no way to know when it is completed and may be safely
passed to putiobuf
(). Asynchronous I/O transfers
are allowed only for buffers in the
buffercache(9).
May sleep if waitok is true.
putiobuf
(bp)getiobuf
(). Either bp must
never have been submitted to a block device, or the I/O transfer must have
completed.BUFFERIO
subsystem is implemented in
sys/kern/vfs_bio.c.
BUFFERIO
abstraction provides no way to cancel an
I/O transfer once it has been submitted to a block device.
The BUFFERIO
abstraction provides no way
to do I/O transfers with non-kernel pages, e.g. directly to buffers in
userland without copying into the kernel first.
The struct buf type is all mixed up with the buffercache(9).
The BUFFERIO
abstraction is a totally
idiotic API design.
The v_numoutput
accounting required of
BUFFERIO
callers is asinine.
March 29, 2015 | NetBSD 9.2 |