mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2024-11-18 02:20:59 +03:00

Author	SHA1	Message	Date
Don Brady	39fc0cb557	Add support for devid and phys_path keys in vdev disk labels This is foundational work for ZED. Updates a leaf vdev's persistent device strings on Linux platform * only applies for a dedicated leaf vdev (aka whole disk) * updated during pool create\|add\|attach\|import * used for matching device matching during auto-{online,expand,replace} * stored in a leaf disk config label (i.e. alongside 'path' NVP) * can opt-out using env var ZFS_VDEV_DEVID_OPT_OUT=YES Some examples: path: '/dev/sdb1' devid: 'scsi-350000394a8ca4fbc-part1' phys_path: 'pci-0000:04:00.0-sas-0x50000394a8ca4fbf-lun-0' path: '/dev/mapper/mpatha' devid: 'dm-uuid-mpath-35000c5006304de3f' Signed-off-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2856 Closes #3978 Closes #4416	2016-03-31 13:45:53 -07:00
Brian Behlendorf	a9977b37ca	Relax MBR partition scanning requirement When checking a whole disk to see if it can be safely added to the pool a variety of checks are done. One of those checks is to attempt to determine the partition information and scan all the partitions for existing filesystems. Since ZoL contains a EFI library this partition scanning is easy to do for GPT partitioned disks. However, for non-GPT partitioned disks (MBR/EBR) things are a bit harder. The lack of a convenient library means non-GPT partitioned disks will not have all their partitions checked. For this reason, the default behavior was to require the force option. For example: invalid vdev specification use '-f' to override the following errors: /dev/vdb does not contain an GPT label but it may contain partition information in the MBR. However in practice requiring the force option for this case is counter-intuitively less safe. The reason is because only the first error is returned. By passing the force option it will suppress this first warning and potentially others you were not aware of. Therefore this patch inverts the default behavior for non-GPT formated disks (unformatted, MBR/EBR, etc). If no GPT table is detected and there is no file system detected on the provided block device. Then it will be assumed that block device is safe to use. Longer term it would be nice to see MBR/EBR scanning added to the utilities. This should be fairly straight forward to do. However these days it's somewhat less critical because Linux defaults to GPT partition tables for devices 2TB or larger. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2660 Closes #2274	2016-03-10 14:04:58 -08:00
Brian Behlendorf	7d11e37e55	Require libblkid Historically libblkid support was detected as part of configure and optionally enabled. This was done because at the time support for detecting ZFS pool vdevs had just be added to libblkid and those updated packages were not yet part of many distributions. This is no longer the case and any reasonably current distribution will ship a version of libblkid which can detect ZFS pool vdevs. This patch makes libblkid mandatory at build time and libblkid the preferred method of scanning for ZFS pools. For distributions which include a modern version of libblkid there is no change in behavior. Explicitly scanning the default search paths is still supported and can be enabled with the '-s' command line option. Additionally making libblkid mandatory means that the 'zpool create' command can reliably detect if a specified device has an existing non-ZFS filesystem (ext4, xfs) and print a warning. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2448	2016-03-09 10:39:22 -08:00
Basil Crow	de0a9d7630	Illumos 5118 - When verifying or creating a storage pool, error messages only show one device 5118 When verifying or creating a storage pool, error messages only show one device Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Boris Protopopov <boris.protopopov@me.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://github.com/illumos/illumos-gate/commit/75fbdf9 https://www.illumos.org/issues/5118 Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3567	2015-07-10 12:07:13 -07:00
Isaac Huang	c5656c4cfc	Memory leak in make_root_vdev() The newroot nvlist should be freed before returning. Signed-off-by: Isaac Huang <he.huang@intel.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3264	2015-04-27 09:18:02 -07:00
Brian Behlendorf	7d90f569b3	Check all vdev labels in 'zpool import' When using 'zpool import' to scan for available pools prefer vdev names which reference vdevs with more valid labels. There should be two labels at the start of the device and two labels at the end of the device. If labels are missing then the device has been damaged or is in some other way incomplete. Preferring names with fully intact labels helps weed out bad paths and improves the likelihood of being able to import the pool. This behavior only applies when scanning /dev/ for valid pools. If a cache file exists the pools described by the cache file will be used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Closes #3145 Closes #2844 Closes #3107	2015-03-25 14:52:52 -07:00
Richard Yao	2fe5011008	Drive database update The Intel DC S3500 and Intel DC S3700 are optimized to handle 4KB sectors well despite of their 8KB page sizes, so we move them to a new category for enterprise drives where they will receive ashift=12. They are joined by the Intel 730 series, which uses the same disk controller, as well as a San Disk enterprise drive. The drive IDs for these two were obtained by myself with the drive_id utility. The drive ID for the 240GB Intel 730 model was extrapolated from the drive ID for the 480GB model. Lastly, we also add some Western Digital mobile drives. ryuo in \#zfsonlinux on freenode obtained "ATA WDC WD2500BEVT-0" from running drive_id on his own hardware. The additional drives in that family were extrapolated from that identifer. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2601	2014-08-18 10:09:03 -07:00
Matthew Ahrens	9bd274ddd8	Illumos #4374 4374 dn_free_ranges should use range_tree_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4374 https://github.com/illumos/illumos-gate/commit/bf16b11 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2531	2014-07-30 09:20:35 -07:00
John Albietz	5f3c101b8f	Added INTEL SSD 530 Series INTEL SSD 530 Series... SSDSC2BW24 Signed-off-by: John Albietz <inthecloud247@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2184	2014-05-19 16:57:14 -07:00
Richard Yao	c6e924fea8	Fix libblkid ZFS detection when making new pools zfsonlinux/zfs@1db7b9be75 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2288	2014-05-01 13:26:33 -07:00
Michael Kjorling	d1d7e2689d	cstyle: Resolve C style issues The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1821	2013-12-18 16:46:35 -08:00
Richard Yao	c8c8d1e7e5	Drive database update Added: Adata S396 (obtained from drive_id) Apple MacBookAir3,1 SSD (obtained from drive_id) Apple MacBookPro10,1 SSD (obtained from drive_id) Intel 510 (obtained from drive_id) Intel 710 (obtained from drive_id) Intel DC S3500 (obtained from drive_id) Netapp LUN (obtained from illumos user's sd.conf) OCZ Agility 3 (obtained from drive_id) OCZ Vertex (obtained from drive_id) Samsung PM800 (obtained from drive_id) Sandisk U100 (obtained from drive_id) Sun Comstar (obtained from illumos user's sd.conf) Notes: 1. The entries for the Intel DC S3500 were extrapolated from the 800GB model's entry, which is "ATA INTEL SSDSC2BB80". 2. The entires for the Intel 710 were extrapolated from the 120GG model's entry, which is "ATA INTEL SSDSA2BZ12". 3. The entires for the Intel 510 were extrapolated from the 250GB model's entry, which is "ATA INTEL SSDSC2MH25". 4. The entires for the Apple MacBookPro10,1 SSD were extrapolated from the 512GB model's entry, which is "ATA APPLE SSD SM512E". Google searches suggest that this is a rebadged Samsung 830. 5. The entires for the Apple MacBookAir3,1 SSD were extrapolated from the 128GB model's entry, which is "ATA APPLE SSD TS128C". Google searches suggest that this is a rebadged Kingston SSDNow V+ 100 (based on Toshiba). 6. Sun Comstar is an iSCSI Target, so we cannot tell what the correct sector size is through this method. We list it only for reference purposes, but it is commented out. Similarly, it is not clear what the right thing to do for Netapp is, so we comment it out. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1907	2013-12-02 14:07:26 -08:00
Brian Behlendorf	11cb9d773f	Increase default udev wait time When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1646	2013-10-22 10:25:51 -07:00
Richard Yao	3549721c9e	Update drive database Add Corsair Force GS drive (obtained from drive_id) Add Kingston HyperX 3K (obtained from drive_id) Add OCZ Vertex 4 drive (obtained from drive_id) Add Samsung SM843T enterprise drive (obtained from drive_id) Add entries for additional sizes of Intel 320/330/335/520 series Add Cruical C400 (obtained from Illumos user's sd.conf) Add Toshiba SSD (obtained from Illumos user's sd.conf) Add Samsung's first SLC SSD (obtained from drive_id) Add OCZ Core Series (obtained from drive_id) Add Intel DC S3700 (obtained from drive_id) Notes: 1. The drive identifer obtained for the Samsung SM843T was MZ7WD480. The rest were extrapolated. The additional entries were checked with Google to verify that such drives exist in the wild. 2. The additional entries for Intel drives were extrapolated from existing entries. The additional entries were checked with Google to verify that such drives exist in the wild. 3. The "ATA C400-MTFDDAC512M" and "ATA TOSHIBA THNSNH51" entries are from the sd.conf of gcbirzan on freenode. Additional entries were extrapolated from them and checked with Google. 4. I obtained the Samsung MCCOE64G entry from an actual drive. The Samsung MCCOE32G entry was extrapolated from it and checked with Google. 5. I obtained the SSDSC2BA10 from a 100GB Intel DC S3700 drive and extrapolated the entries for the additional models. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1752	2013-10-09 09:16:23 -07:00
Richard Yao	bff32e0972	Implement database to workaround misreported physical sector sizes This implements vdev_bdev_database_check(). It alters the detected sector size of any device listed in a database of drives known to lie about their physical sector sizes. This is based on "6931570 Add flash devices' VID/PID to disk table to advertising 4K physical sector size" from Open Solaris and on sg_simple4.c from sg3_utils. About two dozen lines are taken from sg_simple4.c, which is GPLv2 licensed. However, sg_simple4.c is analogous to a Hello World program and is safe for us to use. We requested that Douglas Gilbert, the author of sg_simple4.c, confirm that this is the case. A cutdown version of his response is as follows: ``` I would consider a SCSI INQUIRY example using the Linux sg driver interface (also written by me) as the equivalent of an "hello world" program in C. ``` The database was created with the help of the freenode and ZFSOnLinux communities. Some notes: 1. The following drives both were confirmed to lie via reports in IRC and they contain capacity information in their identifiers: INTEL SSDSA2M080 INTEL SSDSA2M160 M4-CT256M4SSD2 WDC WD15EARS-00S WDC WD15EARS-00Z WDC WD20EARS-00M The identifiers for different capacity models were extrapolated and added under the assumption that those models also lie. Google was used to verify that the extrapolated drive identifiers existed prior to their inclusion. 2. The OCZ-VERTEX2 3.5 identifer applies to two drives that differ solely in page size (and slightly in capacity). One uses 4096-byte pages and the other uses 8192-byte pages. Both are set to use 8192-byte pages. We could detect the page size by checking the capacity, but that would unnecessarily complicate the code. 3. It is possible for updated drive firmware to correctly report the sector size. There were reports of a few advanced format drives doing that. One report stated that the vendor changed the identification string while another was unclear on this. Both reports involved WDC models. 4. Google was used to determine the size of pages in the listed flash devices. Reports of 8192-byte pages took precedence over reports of 4096-byte pages. 5. Devices behind USB adapters can have their identification strings altered. Identification strings obtained across USB adapters are omitted and no attempt is made to correct for alterations made by USB adapters when doing comparisons against the database. Two entries in the Open Solaris database that appear to have been altered by a USB adapter were omitted. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1652	2013-08-22 11:24:06 -07:00
Brian Behlendorf	8128bd89fb	Fix hot spares The issue with hot spares in ZoL is because it opens all leaf vdevs exclusively (O_EXCL). On Linux, exclusive opens cause subsequent exclusive opens to fail with EBUSY. This could be resolved by not opening any of the devices exclusively, which is what Illumos does, but the additional protection offered by exclusive opens is desirable. It cleanly prevents you from accidentally adding an in-use non-ZFS device to your pool. To fix this we very slightly relaxed the usage of O_EXCL in the following ways. 1) Functions which open the device but only read had the O_EXCL flag removed and were updated to use O_RDONLY. 2) A common holder was added to the vdev disk code. This allow the ZFS code to internally open the device multiple times but non-ZFS callers may not. 3) An exception was added to make_disks() for hot spare when creating partition tables. For hot spare devices which are already opened exclusively we skip creating the partition table because this must already have been done when the disk was originally added as a hot spare. Additional minor changes include fixing check_in_use() to use a partition instead of a slice suffix. And is_spare() was moved above make_disks() to avoid adding a forward reference. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #250	2013-03-01 13:31:02 -08:00
Brian Behlendorf	eac4720465	Allow 'zpool replace' to use short device names The 'zpool replace' command would fail when given a short name because unlike on other platforms the short name cannot be deterministically expanded to a single path. Multiple path prefixes must be checked and in addition the partition suffix for whole disks is determined by the prefix. To handle this complexity a zfs_strcmp_pathname() function was added which takes either a short or fully qualified device name. Short names will be expanded using the prefixes in the default import search path, or the ZPOOL_IMPORT_PATH environment variable if it's defined. All posible expansions are then compared against the comparison path. Care is taken to strip redundant slashes to ensure legitimate matches are not missed. In the context of this work the existing zfs_resolve_shortname() function was extended to consider the ZPOOL_IMPORT_PATH when set. The zfs_append_partition() interface was also simplified to take only a single buffer. The vast majority of these changes rework existing Linux specific code which was originally written to accomidate udev. However, there is some minimal cleanup which removes Illumos specific code. This was done to improve readability but the basic flow and intent of the upstream code was maintained. These changes are the logical conclusion of the previos work to adjust the 'zpool import' search behavior, see commit 44867b6a. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #544 Closes #976	2012-10-22 08:45:58 -07:00
Ned A. Bass	821b683436	Add vdev_id for JBOD-friendly udev aliases vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The only currently supported topologies are sas_direct and sas_switch: o sas_direct - a channel is uniquely identified by a PCI slot and a HBA port o sas_switch - a channel is uniquely identified by a SAS switch port A multipath mode is supported in which dm-mpath devices are handled by examining the first running component disk, as reported by 'multipath -l'. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. vdev_id can replace the existing zpool_id script on systems where the storage topology conforms to sas_direct or sas_switch. The script could be extended to support other topologies as well. The advantage of vdev_id is that it is driven by a single static input file that can be shared across multiple nodes having a common storage toplogy. zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per node and a separate slot-mapping file. However, zpool_id provides the flexibility of using any device names that show up in /dev/disk/by-path, so it may still be needed on some systems. vdev_id's functionality subsumes that of the sas_switch_id script, and it is unlikely that anyone is using it, so sas_switch_id is removed. Finally, /dev/disk/by-vdev is added to the list of directories that 'zpool import' will scan. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #713	2012-06-01 08:55:14 -07:00
Ned Bass	3a4f6caf08	Return success from check_slice() if device doesn't exist When creating a new pool, make_root_vdev() calls check_in_use() to ensure that none of the consituent disks are in use. If the disk contains a valid vdev label it is read to retrieve the list of its child vdevs and these are checked recursively. However, the partitions stored in the vdev label my no longer exist, for example if the partition table has since been altered. In any such case we would want the pool creation to proceed, so this change removes the check from check_slice() that returns an error if the device doesn't exist. As an added assurance, the Solaris implementation also returns sucess on ENOENT. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 08:52:38 -08:00
Christian Kohlschütter	df30f56639	Add "ashift" property to zpool create Some disks with internal sectors larger than 512 bytes (e.g., 4k) can suffer from bad write performance when ashift is not configured correctly. This is caused by the disk not reporting its actual sector size, but a sector size of 512 bytes. The drive may behave this way for compatibility reasons. For example, the WDC WD20EARS disks are known to exhibit this behavior. When creating a zpool, ZFS takes that wrong sector size and sets the "ashift" property accordingly (to 9: 1<<9=512), whereas it should be set to 12 for 4k sectors (1<<12=4096). This patch allows an adminstrator to manual specify the known correct ashift size at 'zpool create' time. This can significantly improve performance in certain cases. However, it will have an impact on your total pool capacity. See the updated ashift property description in the zpool.8 man page for additional details. Valid values for the ashift property range from 9 to 17 (512B-128KB). Additionally, you may set the ashift to 0 if you wish to auto-detect the sector size based on what the disk reports, this is the default behavior. The most common ashift values are 9 and 12. Example: zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd Closes #280 Original-patch-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-06-17 16:35:49 -07:00
Ned Bass	d877ac6bfe	Fix intermittent 'zpool add' failures Creating whole-disk vdevs can intermittently fail if a udev-managed symlink to the disk partition is already in place. To avoid this, we now remove any such symlink before partitioning the disk. This makes zpool_label_disk_wait() truly wait for the new link to show up instead of returning if it finds an old link still in place. Otherwise there is a window between when udev deletes and recreates the link during which access attempts will fail with ENOENT. Also, clean up a comment about waiting for udev to create symlinks. It no longer needs to describe the special cases for the link names, since that is now handled in a separate helper function. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:38:58 -07:00
Ned Bass	4682b8c14e	Remove solaris-specific code from make_leaf_vdev() Portability between Solaris and Linux isn't really an issue for us anymore, and removing sections like this one helps simplify the code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:58 -07:00
Ned Bass	79e7242a91	Add helper functions for manipulating device names This change adds two helper functions for working with vdev names and paths. zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid, /dev/disk/zpool. This was previously done only in the function is_shorthand_path(), but we need a general helper function to implement shorthand names for additional zpool subcommands like remove. is_shorthand_path() is accordingly updated to call the helper function. There is a minor change in the way zfs_resolve_shortname() tests if a file exists. is_shorthand_path() effectively used open() and stat64() to test for file existence, since its scope includes testing if a device is a whole disk and collecting file status information. zfs_resolve_shortname(), on the other hand, only uses access() to test for existence and leaves it to the caller to perform any additional file operations. This seemed like the most general and lightweight approach, and still preserves the semantics of is_shorthand_path(). zfs_append_partition() appends a partition suffix to a device path. This should be used to generate the name of a whole disk as it is stored in the vdev label. The user-visible names of whole disks do not contain the partition information, while the name in the vdev label does. The code was lifted from the function make_disks(), which now just calls the helper function. Again, having a helper function to do this supports general handling of shorthand names in the user interface. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:30 -07:00
Ned Bass	5c1bad0013	Fix undersized buffer in is_shorthand_path() The string array 'char dirs[5][8]' was too small to accomodate the terminating NUL character in "by-label". This change adds the needed additional byte. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-12 14:47:39 -07:00
Brian Behlendorf	d603ed6c27	Add linux user disk support This topic branch contains all the changes needed to integrate the user side zfs tools with Linux style devices. Primarily this includes fixing up the Solaris libefi library to be Linux friendly, and integrating with the libblkid library which is provided by e2fsprogs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:00 -07:00
Brian Behlendorf	d4ed667343	Fix gcc uninitialized variable warnings Gcc -Wall warn: 'uninitialized variable' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:43 -07:00
Brian Behlendorf	428870ff73	Update core ZFS code from build 121 to build 141.	2010-05-28 13:45:14 -07:00
Brian Behlendorf	45d1cae3b8	Rebase master to b121	2009-08-18 11:43:27 -07:00
Brian Behlendorf	172bb4bd5e	Move the world out of /zfs/ and seperate out module build tree	2008-12-11 11:08:09 -08:00

29 Commits