mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2024-11-17 18:11:00 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	fdcd952b4d	Fix set block scheduler warnings There were two cases when attempting to set the vdev block device scheduler which would causes console warnings. The first case was when the vdev used a loop, ram, dm, or other such device which doesn't support a configurable scheduler. In these cases attempting to set a scheduler is pointless and can be safely skipped. The secord case is slightly more troubling. We were seeing transient cases where setting the elevator would return -EFAULT. On retry everything is fine so there appears to be a small window where this is possible. To handle that case we silently retry up to three times before reporting the warning. In all of the above cases the warning is harmless and at worse you may see slightly different performance characteristics from one or more of your vdevs.	2011-02-25 11:37:11 -08:00
Brian Behlendorf	6839eed23e	Use 'noop' IO Scheduler Initial testing has shown the the right IO scheduler to use under Linux is noop. This strikes the ideal balance by allowing the zfs elevator to do all request ordering and prioritization. While allowing the Linux elevator to do the maximum front/back merging allowed by the physical device. This yields the largest possible requests for the device with the lowest total overhead. While 'noop' should be right for your system you can choose a different IO scheduler with the 'zfs_vdev_scheduler' option. You may set this value to any of the standard Linux schedulers: noop, cfq, deadline, anticipatory. In addition, if you choose 'none' zfs will not attempt to change the IO scheduler for the block device.	2011-02-10 09:27:22 -08:00
Ned Bass	e06be58641	Fix for access beyond end of device error This commit fixes a sign extension bug affecting l2arc devices. Extremely large offsets may be passed down to the low level block device driver on reads, generating errors similar to attempt to access beyond end of device sdbi1: rw=14, want=36028797014862705, limit=125026959 The unwanted sign extension occurrs because the function arc_read_nolock() stores the offset as a daddr_t, a 32-bit signed int type in the Linux kernel. This offset is then passed to zio_read_phys() as a uint64_t argument, causing sign extension for values of 0x80000000 or greater. To avoid this, we store the offset in a uint64_t. This change also changes a few daddr_t struct members to uint64_t in the libspl headers to avoid similar bugs cropping up in the future. We also add an ASSERT to __vdev_disk_physio() to check for invalid offsets. Closes #66 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-10 21:29:07 -08:00
Brian Behlendorf	675de5aa37	Linux 2.6.36 compat, synchronous bio flag The name of the flag used to mark a bio as synchronous has changed again in the 2.6.36 kernel due to the unification of the BIO_RW_* and REQ_* flags. The new flag is called REQ_SYNC. To simplify checking this flag I have introduced the vdev_disk_dio_is_sync() helper function. Based on the results of several new autoconf tests it uses the correct mask to check for a synchronous bio. Preferred interface for flagging a synchronous bio: 2.6.12-2.6.29: BIO_RW_SYNC 2.6.30-2.6.35: BIO_RW_SYNCIO 2.6.36-2.6.xx: REQ_SYNC	2010-11-10 17:00:33 -08:00
Brian Behlendorf	a69052be7f	Initial zio delay timing While there is no right maximum timeout for a disk IO we can start laying the ground work to measure how long they do take in practice. This change simply measures the IO time and if it exceeds 30s an event is posted for 'zpool events'. This value was carefully selected because for sd devices it implies that at least one timeout (SD_TIMEOUT) has occured. Unfortunately, even with FAILFAST set we may retry and request and not get an error. This behavior is strongly dependant on the device driver and how it is hooked in to the scsi error handling stack. However by setting the limit at 30s we can log the event even if no error was returned. Slightly longer term we can start recording these delays perhaps as a simple power-of-two histrogram. This histogram can then be reported as part of the 'zpool status' command when given an command line option. None of this code changes the internal behavior of ZFS. Currently it is simply for reporting excessively long delays.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	d148e95156	Fix negative zio->io_error which must be positive. All the upper layers of zfs expect zio->io_error to be positive. I was careful but I missed one instance in vdev_disk_physio_completion() which could return a negative error. To ensure all cases are always caught I had additionally added an ASSERT() to check this before zio_interpret(). Finally, as a debugging aid when zfs is build with --enable-debug all errors from the backing block devices will be reported to the console with an error message like this: ZFS: zio error=5 type=1 offset=4217856 size=8192 flags=60440	2010-10-12 14:55:00 -07:00
Ned Bass	3a7381e531	Use stored whole_disk property when opening a vdev This commit fixes a bug in vdev_disk_open() in which the whole_disk property was getting set to 0 for disk devices, even when it was stored as a 1 when the zpool was created. The whole_disk property lets us detect when the partition suffix should be stripped from the device name in CLI output. It is also used to determine how writeback cache should be set for a device. When an existing zpool is imported its configuration is read from the vdev label by user space in zpool_read_label(). The whole_disk property is saved in the nvlist which gets passed into the kernel, where it in turn gets saved in the vdev struct in vdev_alloc(). Therefore, this value is available in vdev_disk_open() and should not be overridden by checking the provided device path, since that path will likely point to a partition and the check will return the wrong result. We also add an ASSERT that the whole_disk property is set. We are not aware of any cases where vdev_disk_open() should be called with a config that doesn't have this property set. The ASSERT is there so that when debugging is enabled we can identify any legitimate cases that we are missing. If we never hit the ASSERT, we can at some point remove it along with the conditional whole_disk check. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-04 13:53:18 -07:00
Brian Behlendorf	60101509ee	Add linux kernel disk support Native Linux vdev disk interfaces Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:57 -07:00

9 Commits