From cecb7487fc8eea3508c3b67810ba5f0e2a265ba1 Mon Sep 17 00:00:00 2001 From: Richard Yao Date: Thu, 27 Feb 2014 14:03:39 -0500 Subject: [PATCH] Invalidate Linux buffer cache on vdevs upon each flush Userland tools such as blkid, grub2-probe and zdb will go through the buffer cache. However, ZFS uses on submit_bio() to bypass the buffer cache when performing IO operations on vdevs for efficiency purposes. This permits the on-disk state and buffer cache to fall out of synchronization. That causes seemingly random failures when tools reading stale metadata from the buffer cache try to access references to data that is no longer there. A particularly bad failure this causes involves grub2-probe, which is used by grub2-mkconfig. Ordinarily, a rootfs might be called rpool/ROOT/gentoo. However, when a failure occurs in grub2-probe, grub2-mkconfig will generate a configuration file containing /ROOT/gentoo, which omits the pool name and causes a boot failure. This is avoidable by calling invalidate_bdev() on each flush, which is a simple way to ensure that all non-dirty pages are wiped. Since userland tools rarely access vdevs directly, this should be a fancy noop >99.999% of the time and have little impact on IO. We could have tried a finer grained approach for the rare instances in which the vdevs are accessed frequently by userland. However, that would require consideration of corner cases and it is not worth the effort. Memory-wise, it would have been better to use a Linux kernel API hook to disable the buffer cache on such devices, but it provides us no way of doing that, so we opt for this approach instead. We should revisit that idea in the future when higher priority issues have been tackled. Signed-off-by: Richard Yao Signed-off-by: Brian Behlendorf Closes #2150 --- module/zfs/vdev_disk.c | 1 + 1 file changed, 1 insertion(+) diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c index 1d8bf3f8c..5bd38e983 100644 --- a/module/zfs/vdev_disk.c +++ b/module/zfs/vdev_disk.c @@ -643,6 +643,7 @@ vdev_disk_io_flush(struct block_device *bdev, zio_t *zio) bio->bi_bdev = bdev; zio->io_delay = jiffies_64; submit_bio(VDEV_WRITE_FLUSH_FUA, bio); + invalidate_bdev(bdev); return (0); }