1dm-log-writes 2============= 3 4This target takes 2 devices, one to pass all IO to normally, and one to log all 5of the write operations to. This is intended for file system developers wishing 6to verify the integrity of metadata or data as the file system is written to. 7There is a log_write_entry written for every WRITE request and the target is 8able to take arbitrary data from userspace to insert into the log. The data 9that is in the WRITE requests is copied into the log to make the replay happen 10exactly as it happened originally. 11 12Log Ordering 13============ 14 15We log things in order of completion once we are sure the write is no longer in 16cache. This means that normal WRITE requests are not actually logged until the 17next REQ_FLUSH request. This is to make it easier for userspace to replay the 18log in a way that correlates to what is on disk and not what is in cache, to 19make it easier to detect improper waiting/flushing. 20 21This works by attaching all WRITE requests to a list once the write completes. 22Once we see a REQ_FLUSH request we splice this list onto the request and once 23the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only 24completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to 25simulate the worst case scenario with regard to power failures. Consider the 26following example (W means write, C means complete): 27 28W1,W2,W3,C3,C2,Wflush,C1,Cflush 29 30The log would show the following 31 32W3,W2,flush,W1.... 33 34Again this is to simulate what is actually on disk, this allows us to detect 35cases where a power failure at a particular point in time would create an 36inconsistent file system. 37 38Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as 39they complete as those requests will obviously bypass the device cache. 40 41Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would 42have all the DISCARD requests, and then the WRITE requests and then the FLUSH 43request. Consider the following example: 44 45WRITE block 1, DISCARD block 1, FLUSH 46 47If we logged DISCARD when it completed, the replay would look like this 48 49DISCARD 1, WRITE 1, FLUSH 50 51which isn't quite what happened and wouldn't be caught during the log replay. 52 53Target interface 54================ 55 56i) Constructor 57 58 log-writes <dev_path> <log_dev_path> 59 60 dev_path : Device that all of the IO will go to normally. 61 log_dev_path : Device where the log entries are written to. 62 63ii) Status 64 65 <#logged entries> <highest allocated sector> 66 67 #logged entries : Number of logged entries 68 highest allocated sector : Highest allocated sector 69 70iii) Messages 71 72 mark <description> 73 74 You can use a dmsetup message to set an arbitrary mark in a log. 75 For example say you want to fsck a file system after every 76 write, but first you need to replay up to the mkfs to make sure 77 we're fsck'ing something reasonable, you would do something like 78 this: 79 80 mkfs.btrfs -f /dev/mapper/log 81 dmsetup message log 0 mark mkfs 82 <run test> 83 84 This would allow you to replay the log up to the mkfs mark and 85 then replay from that point on doing the fsck check in the 86 interval that you want. 87 88 Every log has a mark at the end labeled "dm-log-writes-end". 89 90Userspace component 91=================== 92 93There is a userspace tool that will replay the log for you in various ways. 94It can be found here: https://github.com/josefbacik/log-writes 95 96Example usage 97============= 98 99Say you want to test fsync on your file system. You would do something like 100this: 101 102TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 103dmsetup create log --table "$TABLE" 104mkfs.btrfs -f /dev/mapper/log 105dmsetup message log 0 mark mkfs 106 107mount /dev/mapper/log /mnt/btrfs-test 108<some test that does fsync at the end> 109dmsetup message log 0 mark fsync 110md5sum /mnt/btrfs-test/foo 111umount /mnt/btrfs-test 112 113dmsetup remove log 114replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync 115mount /dev/sdb /mnt/btrfs-test 116md5sum /mnt/btrfs-test/foo 117<verify md5sum's are correct> 118 119Another option is to do a complicated file system operation and verify the file 120system is consistent during the entire operation. You could do this with: 121 122TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" 123dmsetup create log --table "$TABLE" 124mkfs.btrfs -f /dev/mapper/log 125dmsetup message log 0 mark mkfs 126 127mount /dev/mapper/log /mnt/btrfs-test 128<fsstress to dirty the fs> 129btrfs filesystem balance /mnt/btrfs-test 130umount /mnt/btrfs-test 131dmsetup remove log 132 133replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs 134btrfsck /dev/sdb 135replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ 136 --fsck "btrfsck /dev/sdb" --check fua 137 138And that will replay the log until it sees a FUA request, run the fsck command 139and if the fsck passes it will replay to the next FUA, until it is completed or 140the fsck command exists abnormally. 141