vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path
in a storage topology to a channel name. The channel name is combined
with a disk enclosure slot number to create an alias that reflects the
physical location of the drive. This is particularly helpful when it
comes to tasks like replacing failed drives. Slot numbers may also be
re-mapped in case the default numbering is unsatisfactory. The drive
aliases will be created as symbolic links in /dev/disk/by-vdev.
The only currently supported topologies are sas_direct and sas_switch:
o sas_direct - a channel is uniquely identified by a PCI slot and a
HBA port
o sas_switch - a channel is uniquely identified by a SAS switch port
A multipath mode is supported in which dm-mpath devices are handled by
examining the first running component disk, as reported by 'multipath
-l'. In multipath mode the configuration file should contain a
channel definition with the same name for each path to a given
enclosure.
vdev_id can replace the existing zpool_id script on systems where the
storage topology conforms to sas_direct or sas_switch. The script
could be extended to support other topologies as well. The advantage
of vdev_id is that it is driven by a single static input file that can
be shared across multiple nodes having a common storage toplogy.
zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per
node and a separate slot-mapping file. However, zpool_id provides the
flexibility of using any device names that show up in
/dev/disk/by-path, so it may still be needed on some systems.
vdev_id's functionality subsumes that of the sas_switch_id script, and
it is unlikely that anyone is using it, so sas_switch_id is removed.
Finally, /dev/disk/by-vdev is added to the list of directories that
'zpool import' will scan.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#713
When creating a new pool, make_root_vdev() calls check_in_use() to
ensure that none of the consituent disks are in use. If the disk
contains a valid vdev label it is read to retrieve the list of its
child vdevs and these are checked recursively. However, the
partitions stored in the vdev label my no longer exist, for example
if the partition table has since been altered. In any such case we
would want the pool creation to proceed, so this change removes the
check from check_slice() that returns an error if the device doesn't
exist. As an added assurance, the Solaris implementation also
returns sucess on ENOENT.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Some disks with internal sectors larger than 512 bytes (e.g., 4k) can
suffer from bad write performance when ashift is not configured
correctly. This is caused by the disk not reporting its actual sector
size, but a sector size of 512 bytes. The drive may behave this way
for compatibility reasons. For example, the WDC WD20EARS disks are
known to exhibit this behavior.
When creating a zpool, ZFS takes that wrong sector size and sets the
"ashift" property accordingly (to 9: 1<<9=512), whereas it should be
set to 12 for 4k sectors (1<<12=4096).
This patch allows an adminstrator to manual specify the known correct
ashift size at 'zpool create' time. This can significantly improve
performance in certain cases. However, it will have an impact on your
total pool capacity. See the updated ashift property description
in the zpool.8 man page for additional details.
Valid values for the ashift property range from 9 to 17 (512B-128KB).
Additionally, you may set the ashift to 0 if you wish to auto-detect
the sector size based on what the disk reports, this is the default
behavior. The most common ashift values are 9 and 12.
Example:
zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd
Closes#280
Original-patch-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Creating whole-disk vdevs can intermittently fail if a udev-managed symlink to
the disk partition is already in place. To avoid this, we now remove any such
symlink before partitioning the disk. This makes zpool_label_disk_wait() truly
wait for the new link to show up instead of returning if it finds an old link
still in place. Otherwise there is a window between when udev deletes and
recreates the link during which access attempts will fail with ENOENT.
Also, clean up a comment about waiting for udev to create symlinks. It no
longer needs to describe the special cases for the link names, since that is
now handled in a separate helper function.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Portability between Solaris and Linux isn't really an issue for us anymore, and
removing sections like this one helps simplify the code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This change adds two helper functions for working with vdev names and paths.
zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path
of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path,
/dev/disk/by-uuid, /dev/disk/zpool. This was previously done only in the
function is_shorthand_path(), but we need a general helper function to
implement shorthand names for additional zpool subcommands like remove.
is_shorthand_path() is accordingly updated to call the helper function.
There is a minor change in the way zfs_resolve_shortname() tests if a file
exists. is_shorthand_path() effectively used open() and stat64() to test for
file existence, since its scope includes testing if a device is a whole disk
and collecting file status information. zfs_resolve_shortname(), on the other
hand, only uses access() to test for existence and leaves it to the caller to
perform any additional file operations. This seemed like the most general and
lightweight approach, and still preserves the semantics of is_shorthand_path().
zfs_append_partition() appends a partition suffix to a device path. This
should be used to generate the name of a whole disk as it is stored in the vdev
label. The user-visible names of whole disks do not contain the partition
information, while the name in the vdev label does. The code was lifted from
the function make_disks(), which now just calls the helper function. Again,
having a helper function to do this supports general handling of shorthand
names in the user interface.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The string array 'char dirs[5][8]' was too small to accomodate the terminating
NUL character in "by-label". This change adds the needed additional byte.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This topic branch contains all the changes needed to integrate the user
side zfs tools with Linux style devices. Primarily this includes fixing
up the Solaris libefi library to be Linux friendly, and integrating with
the libblkid library which is provided by e2fsprogs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>