mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Richard Yao	2fe5011008	Drive database update The Intel DC S3500 and Intel DC S3700 are optimized to handle 4KB sectors well despite of their 8KB page sizes, so we move them to a new category for enterprise drives where they will receive ashift=12. They are joined by the Intel 730 series, which uses the same disk controller, as well as a San Disk enterprise drive. The drive IDs for these two were obtained by myself with the drive_id utility. The drive ID for the 240GB Intel 730 model was extrapolated from the drive ID for the 480GB model. Lastly, we also add some Western Digital mobile drives. ryuo in \#zfsonlinux on freenode obtained "ATA WDC WD2500BEVT-0" from running drive_id on his own hardware. The additional drives in that family were extrapolated from that identifer. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2601	2014-08-18 10:09:03 -07:00
George Wilson	f3a7f6610f	Illumos 4976-4984 - metaslab improvements 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4983 need to collect metaslab information via mdb 4984 device selection should use fragmentation metric Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4976 https://www.illumos.org/issues/4978 https://www.illumos.org/issues/4979 https://www.illumos.org/issues/4980 https://www.illumos.org/issues/4981 https://www.illumos.org/issues/4982 https://www.illumos.org/issues/4983 https://www.illumos.org/issues/4984 https://github.com/illumos/illumos-gate/commit/2e4c998 Notes: The "zdb -M" option has been re-tasked to display the new metaslab fragmentation metric and the new "zdb -I" option is used to control the maximum number of in-flight I/Os. The new fragmentation metric is derived from the space map histogram which has been rolled up to the vdev and pool level and is presented to the user via "zpool list". Add a number of module parameters related to the new metaslab weighting logic. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2595	2014-08-18 08:40:49 -07:00
Turbo Fredriksson	f67d709080	Create an 'overlay' property Add a new 'overlay' property (default 'off') that controls whether the filesystem should be mounted even if the mountpoint is busy or if it should fail with a 'mountpoint not empty'. Doing overlay mounts is the default mount behavior on Linux, but not in ZFS. It have been decided that following the ZFS behavior should be the default, but this overlay allows for site administrator to override this decision on a per-dataset basis. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #2503	2014-08-15 13:39:19 -07:00
Matthew Ahrens	5dbd68a352	Illumos 4914 - zfs on-disk bookmark structure should be named _phys_t 4914 zfs on-disk bookmark structure should be named _phys_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/4914 https://github.com/illumos/illumos-gate/commit/7802d7b Porting notes: There were a number of zfsonlinux-specific uses of zbookmark_t which needed to be updated. This should reduce the likelihood of further problems like issue #2094 from occurring. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2558	2014-08-06 14:48:41 -07:00
Matthew Ahrens	9b67f60560	Illumos 4757, 4913 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4757 https://www.illumos.org/issues/4913 https://github.com/illumos/illumos-gate/commit/5d7b4d4 Porting notes: For compatibility with the fastpath code the zio_done() function needed to be updated. Because embedded-data block pointers do not require DVAs to be allocated the associated vdevs will not be marked and therefore should not be unmarked. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2544	2014-08-01 14:28:05 -07:00
Tim Chase	603cb25ca5	zed needs libzfs_core As of a recent group of Illumos/Delphix updates, zed needs libzfs_core in order to resolve lzc_get_bookmarks() and likely other functions going forward. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2534	2014-07-31 09:49:01 -07:00
Matthew Ahrens	9bd274ddd8	Illumos #4374 4374 dn_free_ranges should use range_tree_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4374 https://github.com/illumos/illumos-gate/commit/bf16b11 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2531	2014-07-30 09:20:35 -07:00
Matthew Ahrens	da536844d5	Illumos 4368, 4369. 4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4369 https://www.illumos.org/issues/4368 https://github.com/illumos/illumos-gate/commit/78f1710 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2530	2014-07-29 10:55:29 -07:00
Max Grossman	b0bc7a84d9	Illumos 4370, 4371 4370 avoid transmitting holes during zfs send 4371 DMU code clean up Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Approved by: Garrett D'Amore <garrett@damore.org>a References: https://www.illumos.org/issues/4370 https://www.illumos.org/issues/4371 https://github.com/illumos/illumos-gate/commit/43466aa Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2529	2014-07-28 14:29:58 -07:00
Matthew Ahrens	fa86b5dbb6	Illumos 4171, 4172 4171 clean up spa_feature_*() interfaces 4172 implement extensible_dataset feature for use by other zpool features Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org>a References: https://www.illumos.org/issues/4171 https://www.illumos.org/issues/4172 https://github.com/illumos/illumos-gate/commit/2acef22 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2528	2014-07-25 16:40:07 -07:00
Turbo Fredriksson	79eb71dc6c	Support '-H' (scripted mode) to 'zpool get' This functionality is already available in 'zfs get'. Providing it for 'zpool get' is useful and good for consistency. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #2522	2014-07-25 11:58:36 -07:00
George Wilson	080b310015	Illumos #4756 Fix metaslab_group_preload deadlock 4756 metaslab_group_preload() could deadlock Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> The metaslab_group_preload() function grabs the mg_lock and then later tries to grab the metaslab lock. This lock ordering may lead to a deadlock since other consumers of the mg_lock will grab the metaslab lock first. References: https://www.illumos.org/issues/4756 https://github.com/illumos/illumos-gate/commit/30beaff Ported-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2488	2014-07-22 09:41:32 -07:00
George Wilson	93cf20764a	Illumos #4101 , #4102 , #4103 , #4105 , #4106 4101 metaslab_debug should allow for fine-grained control 4102 space_maps should store more information about themselves 4103 space map object blocksize should be increased 4105 removing a mirrored log device results in a leaked object 4106 asynchronously load metaslab Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Sebastien Roy <seb@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> Prior to this patch, space_maps were preferred solely based on the amount of free space left in each. Unfortunately, this heuristic didn't contain any information about the make-up of that free space, which meant we could keep preferring and loading a highly fragmented space map that wouldn't actually have enough contiguous space to satisfy the allocation; then unloading that space_map and repeating the process. This change modifies the space_map's to store additional information about the contiguous space in the space_map, so that we can use this information to make a better decision about which space_map to load. This requires reallocating all space_map objects to increase their bonus buffer size sizes enough to fit the new metadata. The above feature can be enabled via a new feature flag introduced by this change: com.delphix:spacemap_histogram In addition to the above, this patch allows the space_map block size to be increase. Currently the block size is set to be 4K in size, which has certain implications including the following: * 4K sector devices will not see any compression benefit * large space_maps require more metadata on-disk * large space_maps require more time to load (typically random reads) Now the space_map block size can adjust as needed up to the maximum size set via the space_map_max_blksz variable. A bug was fixed which resulted in potentially leaking an object when removing a mirrored log device. The previous logic for vdev_remove() did not deal with removing top-level vdevs that are interior vdevs (i.e. mirror) correctly. The problem would occur when removing a mirrored log device, and result in the DTL space map object being leaked; because top-level vdevs don't have DTL space map objects associated with them. References: https://www.illumos.org/issues/4101 https://www.illumos.org/issues/4102 https://www.illumos.org/issues/4103 https://www.illumos.org/issues/4105 https://www.illumos.org/issues/4106 https://github.com/illumos/illumos-gate/commit/0713e23 Porting notes: A handful of kmem_alloc() calls were converted to kmem_zalloc(). Also, the KM_PUSHPAGE and TQ_PUSHPAGE flags were used as necessary. Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2488	2014-07-22 09:39:16 -07:00
Richard Yao	a5778ea242	zdb: Introduce -V for verbatim import When given a pool name via -e, zdb would attempt an import. If it failed, then it would attempt a verbatim import. This behavior is not always desirable so a -V switch is added to zdb to control the behavior. When specified, a verbatim import is done. Otherwise, the behavior is as it was previously, except no verbatim import is done on failure. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2372	2014-07-17 11:40:32 -07:00
George Wilson	2fbc542ebd	Illumos 4168, 4169, 4170: ztest, zdb and zhack fixes 4168 ztest assertion failure in dbuf_undirty 4169 verbatim import causes zdb to segfault 4170 zhack leaves pool in ACTIVE state Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/4168 https://www.illumos.org/issues/4169 https://www.illumos.org/issues/4170 https://github.com/illumos/illumos-gate/commit/7fdd916 Porting notes: Of particular interest when troubleshooting corrupted pools, the commonly-used "zdb -e" operation may perform verbatim imports and furthermore, it will soon have direct support for verbatim imports via a new "-V" option. The 4169 fix eliminates a common segfault case in which spa_history_log_version() tries to access an un-opened dsl_pool_t. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2451 Closes #2283 Closes #2467	2014-07-17 11:37:57 -07:00
Matthew Ahrens	d586964141	Illumos #3641 compressed block histograms with zdb This patch is a zdb extension of the '-b' option, producing a histogram of the physical compressed block sizes per DMU object type on disk. The '-bbbb' option to zdb will uncover this new feature; here's an example usage on a new pool and snippet of the output it generates: # zpool create tank /dev/vd{b,c,d} # dd bs=1k if=/dev/urandom of=/tank/1kfile count=1 # dd bs=3k if=/dev/urandom of=/tank/3kfile count=1 # dd bs=64k if=/dev/urandom of=/tank/64kfile count=1 # zdb -bbbb tank ... 3 68.0K 68.0K 68.0K 22.7K 1.00 34.26 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 1 * 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * ... The blocks are also broken down by their indirection level. Expanding on the above example: # zfs set recordsize=1k tank # dd bs=1k if=/dev/urandom of=/tank/2x1kfile count=2 # zdb -bbbb tank ... 1 16K 1K 2K 2K 16.00 1.02 L1 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 1 * 5 70.0K 70.0K 70.0K 14.0K 1.00 35.71 L0 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 3 *** 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * 6 86.0K 71.0K 72.0K 12.0K 1.21 36.73 ZFS plain file psize (in 512-byte sectors): number of blocks 2: 4 **** 3: 0 4: 0 5: 0 6: 1 * 7: 0 ... 127: 0 128: 1 * ... There's now a single 1K L1 block which is the indirect block needed for the '2x1kfile' file just created, as well as two more 1K L0 blocks from the same file. This can be used to get a distribution of the block sizes used within the pool, on a per object type basis. References: https://illumos.org/issues/3641 https://github.com/illumos/illumos-gate/commit/490d05b Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@me.com> Closes #2456	2014-07-16 11:52:46 -07:00
Turbo Fredriksson	628668a39f	Add information about the -o option to zpool replace Users need to be aware that when replacing devices in an existing pool they may need to override automatically detected ashift value. This will all depend on the exact hardware they are using. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2024	2014-06-27 08:31:07 -07:00
Turbo Fredriksson	480f62655d	Only automatically mount a clone when 'canmount == on'. According to the man page, "When the noauto option is set, a dataset can only be mounted and unmounted explicitly. The dataset is not mounted automatically when the dataset is created or imported ...." When cloning a dataset the canmount property was not being honored. This patch adds the required check to achieve the behavior described in the man page. Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2241	2014-06-06 12:30:35 -07:00
Richard Yao	2024041b6c	Remove superfluous statement Clang's static analyzer reported that the value assigned to pcksum is never used. That is because we initialize both zc and pcksum to {{ 0 }} and then do `pcksum = zc;`. That is fairly pointless. However, it has the effect of generating a false positive in Clang's static analyzer. Since noise from false positives can obscure real issues, we fix it anyway. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-05-30 17:02:37 -07:00
John Albietz	5f3c101b8f	Added INTEL SSD 530 Series INTEL SSD 530 Series... SSDSC2BW24 Signed-off-by: John Albietz <inthecloud247@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2184	2014-05-19 16:57:14 -07:00
Richard Yao	1cbae971c5	Handle ZPOOL_STATUS_HOSTID_MISMATCH in zpool status Verbatim imports can cause hostid mismatches, but things otherwise work. `zpool status` does not handle this and will fail when assertions are enabled: ``` zpool: ../../cmd/zpool/zpool_main.c:4418: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. Program received signal SIGABRT, Aborted. ``` Lets instead add a case to display an informative message such as this: ``` pool: rpool state: ONLINE status: Mismatch between pool hostid and system hostid on imported pool. This pool was previously imported into a system with a different hostid, and then was verbatim imported into this system. action: Export this pool on all systems on which it is imported. Then import it to correct the mismatch. see: http://zfsonlinux.org/msg/ZFS-8000-EY scan: scrub repaired 0 in 0h8m with 0 errors on Thu Apr 17 19:43:57 2014 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 sda ONLINE 0 0 0 errors: No known data errors ``` Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2342	2014-05-19 10:16:29 -07:00
Tim Chase	962d524212	Check the dataset type more rigorously when fetching properties. When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2265	2014-05-06 10:41:46 -07:00
Richard Yao	7809eb8b65	ztest: Switch to LWP rwlock interface ztest is intended to subject the ZFS code in userland to stress that it should be able to withstand. Any failures that occur when running it are failures that likely would occur inside the kernel. However, being in userland, it is much easier to debug them. In practice, this prevents a large number of problems from reaching production code. A design decision was made by the original authors of ztest to make a distinction between userland locking primitives and kernel locking primitives. The ztest code itself calls userland locking primitives while the kernel code being run in userland will call emulated kernel locking primitives that wrap the userland locking primitives. When ztest was first ported to Linux, a decision was made to use the emulated kernel interfaces everywhere. In effect, the userland rw_rdlock()/rw_wrlock() became the kernel rw_enter() and and the userland rw_unlock() became the kernel rw_exit(). This caused a regression because of an assertion in rw_enter() to catch recursive locking. That is permitted in userland, but not in the kernel. Consequently, the ztest code itself does recursive read locking. The use of the emulated kernel interfaces consequently caused the following failure: ztest: ../../lib/libzpool/kernel.c:384: Assertion `rwlp->rw_owner != zk_thread_current() (0x1c87150 != 0x1c87150)' failed. That occurs because ztest_dmu_objset_create_destroy() will take a read lock and call ztest_dmu_object_alloc_free(). That will call ztest_io(), which will take a readlock only when asked to do ZTEST_IO_REWRITE. This triggered the assertion. The pthreads rwlock interface was based on the LWP rwlock interface implemented in Illumos libc. Luckily enough, the subset used by ztest is almost identical, so we can solve this problem by switching to the LWP thread rwlock interface in ztest. This eliminates a point of divergence with Illumos and should make code sharing slightly easier. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1970	2014-05-01 15:53:58 -07:00
Richard Yao	c6e924fea8	Fix libblkid ZFS detection when making new pools zfsonlinux/zfs@1db7b9be75 should have fixed this, but this particular string was overlooked. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2288	2014-05-01 13:26:33 -07:00
Chris Dunlap	6ac770b196	Replace zed_file_create_dirs() with mkdirp() When processing directory components starting from the root dir, zed_file_create_dirs() contained a bug in checking the return value of mkdir(). A typo was made, and the test for (mkdir_errno != EEXIST) was erroneously written as (mkdir_errno == EEXIST). If some of the leading directory components already existed, this bug would cause the routine to exit before creating the remaining directory components. Instead of fixing the above mkdir_errno test, this commit replaces zed_file_create_dirs() with mkdirp(). This cleanup was already planned, and zed_file_create_dirs() only existed because I didn't realize mkdirp() was already in tree at the time. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2248	2014-04-09 13:32:54 -07:00
John M. Layman	cbca6076b3	Fix for re-reading /etc/mtab. This is a continuation of `fb5c53ea65`: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <jml@frijid.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2215 Issue #1611	2014-04-04 09:46:20 -07:00
Chris Dunlap	518eba1492	Replace check for _POSIX_MEMLOCK w/ HAVE_MLOCKALL zed supports a '-M' cmdline opt to lock all pages in memory via mlockall(). The _POSIX_MEMLOCK define is checked to determine whether this function is supported. The current test assumes mlockall() is supported if _POSIX_MEMLOCK is non-zero. However, this test is insufficient according to mlock(2) and sysconf(3). If _POSIX_MEMLOCK is -1, mlockall() is not supported; but if _POSIX_MEMLOCK is 0, availability must be checked at runtime. This commit adds an autoconf check for mlockall() to user.m4. The zed code block for mlockall() is now guarded with a test for HAVE_MLOCKALL. If defined, mlockall() will be called and its runtime availability checked via its return value. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-04-02 13:10:08 -07:00
Brian Behlendorf	904ea2763e	Add automatic hot spare functionality When a vdev starts getting I/O or checksum errors it is now possible to automatically rebuild to a hot spare device. To cleanly support this functionality in a shell script some additional information was added to all zevent ereports which include a vdev. This covers both io and checksum zevents but may be used but other scripts. In the Illumos FMA solution the same information is required but it is retrieved through the libzfs library interface. Specifically the following members were added: vdev_spare_paths - List of vdev paths for all hot spares. vdev_spare_guids - List of vdev guids for all hot spares. vdev_read_errors - Read errors for the problematic vdev vdev_write_errors - Write errors for the problematic vdev vdev_cksum_errors - Checksum errors for the problematic vdev. By default the required hot spare scripts are installed but this functionality is disabled. To enable hot sparing uncomment the ZED_SPARE_ON_IO_ERRORS and ZED_SPARE_ON_CHECKSUM_ERRORS in the /etc/zfs/zed.d/zed.rc configuration file. These scripts do no add support for the autoexpand property. At a minimum this requires adding a new udev rule to detect when a new device is added to the system. It also requires that the autoexpand policy be ported from Illumos, see: https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/syseventd/modules/zfs_mod/zfs_mod.c Support for detecting the correct name of a vdev when it's not a whole disk was added by Turbo Fredriksson. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Issue #2	2014-04-02 13:10:08 -07:00
Brian Behlendorf	d21705eab9	Add missing DATA_TYPE_STRING_ARRAY output This functionality has always been missing. But until now there were no zevents which included an array of strings so it wasn't missed. However, that's now changed so to ensure this information is output correctly by 'zpool events -v' the DATA_TYPE_STRING_ARRAY has been implemented. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-04-02 13:10:08 -07:00
Brian Behlendorf	1a5c611a22	Make command line guid parsing more tolerant Several of the zfs utilities allow you to pass a vdev's guid rather than the device name. However, the utilities are not consistent in how they parse that guid. For example, 'zinject' expects the guid to be passed as a hex value while 'zpool replace' wants it as a decimal. The user is forced to just know what format to use. This patch improve things by making the parsing more tolerant. When strtol(3) is called using 0 for the base, rather than say 10 or 16, it will then accept hex, decimal, or octal input based on the prefix. From the man page. If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal). NOTE: There may be additional conversions not caught be this patch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-04-02 13:10:08 -07:00
Chris Dunlap	9e246ac3d8	Initial implementation of zed (ZFS Event Daemon) zed monitors ZFS events. When a zevent is posted, zed will run any scripts that have been enabled for the corresponding zevent class. Multiple scripts may be invoked for a given zevent. The zevent nvpairs are passed to the scripts as environment variables. Events are processed synchronously by the single thread, and there is no maximum timeout for script execution. Consequently, a misbehaving script can delay (or forever block) the processing of subsequent zevents. Plans are to address this in future commits. Initial scripts have been developed to log events to syslog and send email in response to checksum/data/io errors and resilver.finish/scrub.finish events. By default, email will only be sent if the ZED_EMAIL variable is configured in zed.rc (which is serving as a config file of sorts until a proper configuration file is implemented). Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-04-02 13:10:03 -07:00
Chris Dunlap	8c7aa0cfc4	Replace zpool_events_next() "block" parm w/ "flags" zpool_events_next() can be called in blocking mode by specifying a non-zero value for the "block" parameter. However, the design of the ZFS Event Daemon (zed) requires additional functionality from zpool_events_next(). Instead of adding additional arguments to the function, it makes more sense to use flags that can be bitwise-or'd together. This commit replaces the zpool_events_next() int "block" parameter with an unsigned bitwise "flags" parameter. It also defines ZEVENT_NONE to specify the default behavior. Since non-blocking mode can be specified with the existing ZEVENT_NONBLOCK flag, the default behavior becomes blocking mode. This, in effect, inverts the previous use of the "block" parameter. Existing callers of zpool_events_next() have been modified to check for the ZEVENT_NONBLOCK flag. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-03-31 16:11:21 -07:00
Brian Behlendorf	9b101a7320	Clarify zpool_events_next() comment Due to the very poorly chosen argument name 'cleanup_fd' it was completely unclear that this file descriptor is used to track the current cursor location. When the file descriptor is created by opening ZFS_DEV a private cursor is created in the kernel for the returned file descriptor. Subsequent calls to zpool_events_next() and zpool_events_seek() then require the file descriptor as an argument to reposition the cursor. When the file descriptor is closed the kernel state tracking the cursor is destroyed. This patch contains no functional change, it just changes a few variable names and clarifies the documentation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-03-31 16:11:08 -07:00
Yuri Pankov	d3773fda14	Illumos #3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc 3120 zinject hangs in zfsdev_ioctl() due to uninitialized zc Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3120 illumos/illumos-gate@f4c46b1eda Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2152	2014-03-24 11:06:11 -07:00
Richard Yao	26b42f3f9d	Implement -t option to zpool import for temporary pool names Originally, users had to handle spa namespace collisions by either exporting the already imported pool or by specifying a new name for the pool with a conflicting name. In the case of root pools from virtual guests, neither approach to collision resolution is reasonable. This is addressed by extending the new name syntax with a -t option to specify that the new name is temporary. When specified, this sets an internal flag that is passed into the kernel to tell it that all label updates should refer to the name used in the original label. Consequently, the original pool name will be retained on export. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2189	2014-03-20 12:05:30 -07:00
Isaac Huang	312f82ce65	sighandler() should take 2 arguments Stopping arcstat.py with ^C always ends up with error: TypeError: sighandler() takes no arguments (2 given) Since no special signal handling was done in sighandler(), it's simpler to just set SIGINT handler to SIG_DFL, which terminates the script. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes #2179	2014-03-20 11:03:46 -07:00
Brian Behlendorf	d9119bd66d	Revert "sighandler() should take 2 arguments" This reverts commit `0bb89b6c59` in favor of a cleaner implementation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2182	2014-03-20 11:03:21 -07:00
Richard Yao	e2282ef57e	Remove unused option -r from 'zpool import' Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2178	2014-03-12 09:09:52 -07:00
Richard Yao	2026196230	Switch ztest mmap(2) ASSERTs to VERIFYs This is just a small bit of cleanup to ensure ztest fails early on systems where mmap(2) is not functioning. For the automated testing which is the primary consumer of ztest there is no functional change because debugging is always enabled. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2177	2014-03-12 09:05:00 -07:00
Richard Yao	85802aa42b	Free props in ztest_init() Valgrind complained about this and it's absolutely right. The props nvlist was not being freed in ztest_init. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #2174	2014-03-12 09:01:48 -07:00
Isaac Huang	0bb89b6c59	sighandler() should take 2 arguments Stopping arcstat.py with ^C always ends up with error: TypeError: sighandler() takes no arguments (2 given) This patch corrects the error by updating the signal handler to take the correct number of arguments. Signed-off-by: Isaac Huang <he.huang@intel.com> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2182	2014-03-12 09:00:18 -07:00
Isaac Huang	4e1c9f9c48	Make arcstat.py default to one header per screen Today arcstat.py prints one header every hdr_intr (20 by default) lines. It would be more consistent with out utilities like vmstat if hdr_intr defaulted to terminal window size, i.e. one header per screenful of outputs. Signed-off-by: Isaac Huang <he.huang@intel.com> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2183	2014-03-12 08:56:12 -07:00
cburroughs	e78b6da3d0	Include l2asize in arcstat For consistency with upstream pull in the l2asize update after reworking it from Perl to Python. References: https://github.com/mharsch/arcstat/pull/11 https://github.com/mharsch/arcstat/pull/12 Signed-off-by: cburroughs <chris.burroughs@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2122	2014-03-04 11:25:58 -08:00
Richard Yao	4f2dcb3eee	Add erratum for issue #2094 ZoL commit `1421c89` unintentionally changed the disk format in a forward- compatible, but not backward compatible way. This was accomplished by adding an entry to zbookmark_t, which is included in a couple of on-disk structures. That lead to the creation of pools with incorrect dsl_scan_phys_t objects that could only be imported by versions of ZoL containing that commit. Such pools cannot be imported by other versions of ZFS or past versions of ZoL. The additional field has been removed by the previous commit. However, affected pools must be imported and scrubbed using a version of ZoL with this commit applied. This will return the pools to a state in which they may be imported by other implementations. The 'zpool import' or 'zpool status' command can be used to determine if a pool is impacted. A message similar to one of the following means your pool must be scrubbed to restore compatibility. $ zpool import pool: zol-0.6.2-173 id: 1165955789558693437 state: ONLINE status: Errata #1 detected. action: The pool can be imported using its name or numeric identifier, however there is a compatibility issue which should be corrected by running 'zpool scrub' see: http://zfsonlinux.org/msg/ZFS-8000-ER config: ... $ zpool status pool: zol-0.6.2-173 state: ONLINE scan: pool compatibility issue detected. see: https://github.com/zfsonlinux/zfs/issues/2094 action: To correct the issue run 'zpool scrub'. config: ... If there was an async destroy in progress 'zpool import' will prevent the pool from being imported. Further advice on how to proceed will be provided by the error message as follows. $ zpool import pool: zol-0.6.2-173 id: 1165955789558693437 state: ONLINE status: Errata #2 detected. action: The pool can not be imported with this version of ZFS due to an active asynchronous destroy. Revert to an earlier version and allow the destroy to complete before updating. see: http://zfsonlinux.org/msg/ZFS-8000-ER config: ... Pools affected by the damaged dsl_scan_phys_t can be detected prior to an upgrade by running the following command as root: zdb -dddd poolname 1 \| grep -P '^\t\tscan = ' \| sed -e 's;scan = ;;' \| wc -w Note that `poolname` must be replaced with the name of the pool you wish to check. A value of 25 indicates the dsl_scan_phys_t has been damaged. A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0 indicates that there has never been a scrub run on the pool. The regression caused by the change to zbookmark_t never made it into a tagged release, Gentoo backports, Ubuntu, Debian, Fedora, or EPEL stable respositorys. Only those using the HEAD version directly from Github after the 0.6.2 but before the 0.6.3 tag are affected. This patch does have one limitation that should be mentioned. It will not detect errata #2 on a pool unless errata #1 is also present. It expected this will not be a significant problem because pools impacted by errata #2 have a high probably of being impacted by errata #1. End users can ensure they do no hit this unlikely case by waiting for all asynchronous destroy operations to complete before updating ZoL. The presence of any background destroys on any imported pools can be checked by running `zpool get freeing` as root. This will display a non-zero value for any pool with an active asynchronous destroy. Lastly, it is expected that no user data has been lost as a result of this erratum. Original-patch-by: Tim Chase <tim@chase2k.com> Reworked-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2094	2014-02-21 12:10:40 -08:00
Brian Behlendorf	ffe9d38275	Add generic errata infrastructure From time to time it may be necessary to inform the pool administrator about an errata which impacts their pool. These errata will by shown to the administrator through the 'zpool status' and 'zpool import' output as appropriate. The errata must clearly describe the issue detected, how the pool is impacted, and what action should be taken to resolve the situation. Additional information for each errata will be provided at http://zfsonlinux.org/msg/ZFS-8000-ER. To accomplish the above this patch adds the required infrastructure to allow the kernel modules to notify the utilities that an errata has been detected. This is done through the ZPOOL_CONFIG_ERRATA uint64_t which has been added to the pool configuration nvlist. To add a new errata the following changes must be made: * A new errata identifier must be assigned by adding a new enum value to the zpool_errata_t type. New enums must be added to the end to preserve the existing ordering. * Code must be added to detect the issue. This does not strictly need to be done at pool import time but doing so will make the errata visible in 'zpool import' as well as 'zpool status'. Once detected the spa->spa_errata member should be set to the new enum. * If possible code should be added to clear the spa->spa_errata member once the errata has been resolved. * The show_import() and status_callback() functions must be updated to include an informational message describing the errata. This should include an action message describing what an administrator should do to address the errata. * The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be updated to describe the errata. This space can be used to provide as much additional information as needed to fully describe the errata. A link to this documentation will be automatically generated in the output of 'zpool import' and 'zpool status'. Original-idea-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.or Issue #2094	2014-02-21 12:10:40 -08:00
Brian Behlendorf	731782ec31	Use expected zpool_status_t type Both the zpool_import_status() and zpool_get_status() functions return the zpool_status_t enum. This explicit type should be used rather than the more generic int type. This patch makes no functional change and should only be considered code cleanup. It happens to have been done in the context of #2094 because that's when I noticed this issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.or Issue #2094	2014-02-21 12:10:40 -08:00
Patrik Greco	2278381ce2	Fix error message in zpios The chunksize must always be strictly smaller than the regionsize. Signed-off-by: Andrew Uselton <andrew.c.uselton@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2072	2014-01-29 15:11:56 -08:00
Ned Bass	09d0b30fd1	vdev_id: support per-channel slot mappings The vdev_id udev helper currently applies slot renumbering rules to every channel (JBOD) in the system. This is too inflexible for systems with non-homogeneous storage topologies. The "slot" keyword now takes an optional third parameter which names a channel to which the mapping will apply. If the third parameter is omitted then the rule applies to all channels. The first-specified rule that can match a slot takes precedence. Therefore a channel-specific rule for a given slot should generally appear before a generic rule for the same slot number. In this way a custom slot mapping can be applied to a particular channel and a default mapping applied to the rest. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2056	2014-01-17 11:17:54 -08:00
Richard Yao	cbe8e6198c	Properly link zpool command to libblkid `31fc19399e` incorrectly removed $(LIBBLKID) from cmd/zpool/Makefile.am. This meant that the toolchain was not given -lblkid, which resulted in the following build failure on Ubuntu 13.10: /usr/bin/ld: zpool_vdev.o: undefined reference to symbol 'blkid_put_cache@@BLKID_1.0' /lib/x86_64-linux-gnu/libblkid.so.1: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status That commit reworked various Makefile.am to follow best practices, so we reintroduce $(LIBBLKID) in a manner consistent with that, rather than explicitly reverting the change. Reproduction of this issue was done on a Gentoo Linux system by executing the following commands: zfs create -o mountpoint=/mnt/ubuntu-13.10 rpool/ROOT/ubuntu-13.10 debootstrap --variant=buildd --arch amd64 saucy /mnt/ubuntu-13.10 http://archive.ubuntu.com/ubuntu/ mount -o bind /dev /mnt/ubuntu-13.10/dev/ mount -o bind /proc/ /mnt/ubuntu-13.10/proc/ mount -o bind /sys/ /mnt/ubuntu-13.10/sys/ cp /etc/resolv.conf /mnt/ubuntu-13.10/etc/ (cd /mnt/ubuntu-13.10/root/ && git clone git://github.com/zfsonlinux/zfs.git) chroot /mnt/ubuntu-13.10/ apt-get install git autoconf libtool zlib1g-dev uuid-dev libblkid-dev \#apt-get install alien fakeroot vim cd /root/zfs ./autogen.sh ./configure --with-config=user --prefix=/usr make That will create a Ubuntu 13.10 chroot, fetch the sources and build test. At this point, cmd/zpool/Makefile.am was modified and the following commands were run to verify that the build issue was resolved: git clean -xdf ./autogen.sh ./configure --with-config=user --prefix=/usr make Although it is not shown here, the absence of libblkid-dev enables ZFS to build successfully without the patch. This could explain how this escaped detection until recently. A test without libblkid-dev was done to verify that the patch did not cause a regression in the absence of libblkid: apt-get remove libblkid-dev git clean -xdf ./autogen.sh ./configure --with-config=user --prefix=/usr make Additionally, the commands themselves were tested against my live system from within the chroot to ensure basic functionality. My live system had corresponding kernel modules already installed and basic commands such as `zpool list` and `zfs list` worked without incident. Lastly, this patch was also build tested on Gentoo Linux, where it caused no problems. At time of writing, these steps can be used to reproduce these results on any modern Linux system that has debootstrap installed. On Gentoo, installing debootstrap can be done with `emerge dev-util/debootstrap`. The current ZFSOnLinux HEAD revision as of writing is `fd23720ae1`. Once this is fixed in HEAD, either that revision or another before this fix and after `31fc19399e` will be needed to reproduce this issue. Lastly, it remains to be seen why the toolchains on the systems performing regression tests did not catch this. This is not a ZFS-specific issue, but it is something that we will want to explore in the future. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2038	2014-01-14 10:27:17 -08:00
Brian Behlendorf	e07306687d	Enable /etc/mtab cache to improve performance Re-enable the /etc/mtab cache to prevent the zfs command from having to repeatedly open and read from the /etc/mtab file. Instead an AVL tree of the mounted filesystems is created and used to vastly speed up lookups. This means that if non-zfs filesystems are mounted concurrently the 'zfs mount' will not immediately detect them. In practice that will rarely happen and even if it does the absolute worst case would be a failed mount. This was originally disabled out of an abundance of paranoia. NOTE: There may still be some parts of the code which do not consult the mtab cache. They should be updated to check the mtab cache as they as discovered to be a problem. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Chris Dunlop <chris@onthe.net.au> Issue #845	2014-01-07 09:48:09 -08:00
Tim Chase	fb8e608d9d	Fix the creation of ZPOOL_HIST_CMD pool history entries. Move the libzfs_fini() after the zpool_log_history() call so the ZPOOL_HIST_CMD entry can get written. Fix the handling of saved_poolname in zfsdev_ioctl() which was broken as part of the stack-reduction work in `a168788053`. Since ZoL destroys the TSD data in which the previously successful ioctl()'s pool name is stored following every vop, the ZFS_IOC_LOG_HISTORY ioctl has a very important restriction: it can only successfully write a long entry following a successful ioctl() if no intervening vops have been performed. Some of zfs subcommands do perform intervening vops and to do the logging themselves. At the moment, the "create" and "clone" subcommands have been modified appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1998	2014-01-07 09:00:26 -08:00
Matthew Thode	11b9ec23b9	Add full SELinux support Four new dataset properties have been added to support SELinux. They are 'context', 'fscontext', 'defcontext' and 'rootcontext' which map directly to the context options described in mount(8). When one of these properties is set to something other than 'none'. That string will be passed verbatim as a mount option for the given context when the filesystem is mounted. For example, if you wanted the rootcontext for a filesystem to be set to 'system_u:object_r:fs_t' you would set the property as follows: $ zfs set rootcontext="system_u:object_r:fs_t" storage-pool/media This will ensure the filesystem is automatically mounted with that rootcontext. It is equivalent to manually specifying the rootcontext with the -o option like this: $ zfs mount -o rootcontext=system_u:object_r:fs_t storage-pool/media By default all four contexts are set to 'none'. Further information on SELinux contexts is detailed in mount(8) and selinux(8) man pages. Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #1504	2013-12-19 10:37:31 -08:00
Michael Kjorling	d1d7e2689d	cstyle: Resolve C style issues The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1821	2013-12-18 16:46:35 -08:00
John Wren Kennedy	2820bc49c5	Illumos #4208 4208 Typo in zfs_main.c: "posxiuser" Reviewed by: Sonu Pillai <sonu.pillai@delphix.com> Reviewed by: Will Guyette <will.guyette@delphix.com> Reviewed by: Eric Diven <eric.diven@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/4208 illumos/illumos-gate@f38cb554a5 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1986	2013-12-18 16:46:35 -08:00
renelson	a5f3665168	Handle acl flags from util-linux mount command Add acl, noacl and posixacl to option_map, avoiding ENOENT error case when mount from util-linux-2.24 execs mount.zfs with any of those flags Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: renelson <bnelson@nelsonbe.com> Issue #1968	2013-12-18 16:46:20 -08:00
renelson	758d35520b	Fix grammar in parse_options() error message A minor grammar error was corrected in in the parse_options() error handling for the ENOENT case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: renelson <bnelson@nelsonbe.com> Issue #1968	2013-12-17 10:48:28 -08:00
Simon Guest	383efa5743	Fix multipath bug in vdev_id caused by inconsistent field numbering The bug is caused by multipath output like this: 35000c50056bd77a7 dm-15 HP,MB3000FCWDH size=2.7T features='0' hwhandler='0' wp=rw \|-+- policy='round-robin 0' prio=0 status=active \| `- 2:0:16:0 sdq 65:0 active undef running `-+- policy='round-robin 0' prio=0 status=enabled `- 4:0:52:0 sdfp 130:176 active undef running Note that the pipe symbols mean that the field numbering is different between the sdq and sdfp lines. The fix edits out the pipe symbols. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1692	2013-12-10 09:58:35 -08:00
Richard Yao	c8c8d1e7e5	Drive database update Added: Adata S396 (obtained from drive_id) Apple MacBookAir3,1 SSD (obtained from drive_id) Apple MacBookPro10,1 SSD (obtained from drive_id) Intel 510 (obtained from drive_id) Intel 710 (obtained from drive_id) Intel DC S3500 (obtained from drive_id) Netapp LUN (obtained from illumos user's sd.conf) OCZ Agility 3 (obtained from drive_id) OCZ Vertex (obtained from drive_id) Samsung PM800 (obtained from drive_id) Sandisk U100 (obtained from drive_id) Sun Comstar (obtained from illumos user's sd.conf) Notes: 1. The entries for the Intel DC S3500 were extrapolated from the 800GB model's entry, which is "ATA INTEL SSDSC2BB80". 2. The entires for the Intel 710 were extrapolated from the 120GG model's entry, which is "ATA INTEL SSDSA2BZ12". 3. The entires for the Intel 510 were extrapolated from the 250GB model's entry, which is "ATA INTEL SSDSC2MH25". 4. The entires for the Apple MacBookPro10,1 SSD were extrapolated from the 512GB model's entry, which is "ATA APPLE SSD SM512E". Google searches suggest that this is a rebadged Samsung 830. 5. The entires for the Apple MacBookAir3,1 SSD were extrapolated from the 128GB model's entry, which is "ATA APPLE SSD TS128C". Google searches suggest that this is a rebadged Kingston SSDNow V+ 100 (based on Toshiba). 6. Sun Comstar is an iSCSI Target, so we cannot tell what the correct sector size is through this method. We list it only for reference purposes, but it is commented out. Similarly, it is not clear what the right thing to do for Netapp is, so we comment it out. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1907	2013-12-02 14:07:26 -08:00
Yuri Pankov	54d5378fae	Illumos #2583 2583 Add -p (parsable) option to zfs list References: https://www.illumos.org/issues/2583 illumos/illumos-gate@43d68d68c1 Ported-by: Gregor Kopka <gregor@kopka.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #937	2013-11-21 11:13:53 -08:00
Maximilian Mehnert	539defc873	Add missing libzfs_core to Makefiles On some platforms symbols provided by libzfs_core and used by libzfs were not available to the linker. To avoid this issue libzfs_core has been added to the list of required libraries when building utilities which depend on libzfs. This should have been handled properly by libtool and it's still not entirely clear why it wasn't on all platforms. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1841	2013-11-20 15:44:15 -08:00
Matthew Thode	09d672d331	Python 3 fixes Future proofing for compatibility with newer versions of Python. Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1838	2013-11-08 14:30:29 -08:00
Matthew Thode	23bc1f91fc	pep8 code readability changes Update the code to follow the pep8 style guide. References: http://www.python.org/dev/peps/pep-0008/ Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1838	2013-11-08 14:30:22 -08:00
Tim Chase	fd4f76160c	Handle concurrent snapshot automounts failing due to EBUSY. In the current snapshot automount implementation, it is possible for multiple mounts to attempted concurrently. Only one of the mounts will succeed and the other will fail. The failed mounts will cause an EREMOTE to be propagated back to the application. This commit works around the problem by adding a new exit status, MOUNT_BUSY to the mount.zfs program which is used when the underlying mount(2) call returns EBUSY. The zfs code detects this condition and treats it as if the mount had succeeded. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1819	2013-11-08 10:45:14 -08:00
George Wilson	5d1f7fb647	Illumos #3956 , #3957 , #3958 , #3959 , #3960 , #3961 , #3962 3956 ::vdev -r should work with pipelines 3957 ztest should update the cachefile before killing itself 3958 multiple scans can lead to partial resilvering 3959 ddt entries are not always resilvered 3960 dsl_scan can skip over dedup-ed blocks if physical birth != logical birth 3961 freed gang blocks are not resilvered and can cause pool to suspend 3962 ztest should print out zfs debug buffer before exiting Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3956 https://www.illumos.org/issues/3957 https://www.illumos.org/issues/3958 https://www.illumos.org/issues/3959 https://www.illumos.org/issues/3960 https://www.illumos.org/issues/3961 https://www.illumos.org/issues/3962 illumos/illumos-gate@b4952e17e8 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting notes: 1. zfs_dbgmsg_print() is only used in userland. Since we do not have mdb on Linux, it does not make sense to make it available in the kernel. This means that a build failure will occur if any future kernel patch depends on it. However, that is unlikely given that this functionality was added to support zdb. 2. zfs_dbgmsg_print() is only invoked for -VVV or greater log levels. This preserves the existing behavior of minimal noise when running with -V, and -VV. 3. In vdev_config_generate() the call to nvlist_alloc() was not changed to fnvlist_alloc() because we must pass KM_PUSHPAGE in the txg_sync context.	2013-11-05 12:23:05 -08:00
George Wilson	621dd7bb2c	Illumos #3949 , #3950 , #3952 , #3953 3949 ztest fault injection should avoid resilvering devices 3950 ztest: deadman fires when we're doing a scan 3951 ztest hang when running dedup test 3952 ztest: ztest_reguid test and ztest_fault_inject don't place nice together Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3949 https://www.illumos.org/issues/3950 https://www.illumos.org/issues/3951 https://www.illumos.org/issues/3952 illumos/illumos-gate@2c1e2b4414 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Porting notes: 1. The deadman thread was removed from ztest during the original port because it depended on Solaris thr_create() interface. This functionality should be reintroduced using the more portable pthreads.	2013-11-05 12:17:07 -08:00
Matthew Ahrens	383fc4a997	Illumos #3955 3955 ztest failure: assertion refcount_count(&tx->tx_space_written) + delta <= tx->tx_space_towrite Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3955 illumos/illumos-gate@be9000cc67 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:16:14 -08:00
Matthew Ahrens	498877baf5	Illumos #3112 , #3113 , #3114 3112 ztest does not honor ZFS_DEBUG 3113 ztest should use watchpoints to protect frozen arc bufs 3114 some leaked nvlists in zfsdev_ioctl Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Amdur <Matt.Amdur@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/3112 https://www.illumos.org/issues/3113 https://www.illumos.org/issues/3114 illumos/illumos-gate@cd1c8b85eb The /proc/self/cmd watchpoint interface is specific to Solaris. Therefore, the #3113 implementation was reworked to use the more portable mprotect(2) system call. When the pages are watched they are marked read-only for protection. Any write to the protected address range immediately trigger a SIGSEGV. The pages are marked writable again when they are unwatched. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1489	2013-11-05 12:14:48 -08:00
George Wilson	03c6040bee	Illumos #3236 3236 zio nop-write Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@80901aea8e https://www.illumos.org/issues/3236 Porting Notes 1. This patch is being merged dispite an increased instance of https://www.illumos.org/issues/3113 being triggered by ztest. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1489	2013-11-05 12:14:21 -08:00
Keith M Wesolowski	96c2e96193	Illumos #3894 3894 zfs should not allow snapshot of inconsistent dataset Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/3894 illumos/illumos-gate@ca48f36f20 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Steven Hartland	95fd54a1c5	Illumos #3740 3740 Poor ZFS send / receive performance due to snapshot hold / release processing Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3740 illumos/illumos-gate@a7a845e4bf Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Porting notes: 1. `13fe019870` introduced a merge conflict in dsl_dataset_user_release_tmp where some variables were moved outside of the preprocessor directive. 2. dea9dfefdd747534b3846845629d2200f0616dad made the previous merge conflict worse by switching KM_SLEEP to KM_PUSHPAGE. This is notable because this commit refactors the code, adding a new KM_SLEEP allocation. It is not clear to me whether this should be converted to KM_PUSHPAGE. 3. We had a merge conflict in libzfs_sendrecv.c because of copyright notices. 4. Several small C99 compatibility fixed were made.	2013-11-04 11:17:48 -08:00
Will Andrews	7bc7f25040	Illumos #3745 , #3811 3745 zpool create should treat -O mountpoint and -m the same 3811 zpool create -o altroot=/xyz -O mountpoint=/mnt ignores the mountpoint option Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3745 https://www.illumos.org/issues/3811 illumos/illumos-gate@8b71377531 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Matthew Ahrens	d1fada1e6d	Illumos #3603 , #3604 : bobj improvements 3603 panic from bpobj_enqueue_subobj() 3604 zdb should print bpobjs more verbosely 3871 GCC 4.5.3 does not like issue 3604 patch Reviewed by: Henrik Mattson <henrik.mattson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/3603 https://www.illumos.org/issues/3604 https://www.illumos.org/issues/3871 illumos/illumos-gate@d04756377d Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Note that the patch from Illumos issue 3871 is not accepted into Illumos at the time of this writing. It is something that I wrote when porting this. Documentation is in the Illumos issue.	2013-10-31 14:57:51 -07:00
Ralf Ertzinger	d65e738109	Add -p switch to "zpool get" This works the same as the -p switch to "zfs get", displaying full resolution values for appropriate attributes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1813	2013-10-28 15:40:12 -07:00
Steven Hartland	157c9b6981	Corrected "zfs list -t <type>" syntax in man page and in command help. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1805	2013-10-25 16:09:21 -07:00
Brian Behlendorf	d738d34da5	Add dbufstat.py command The dbufstat.py command was added to provide a conveniant way to easily determine what ZFS is caching. The script consumes the raw /proc/spl/kstat/zfs/dbufs kstat data can consolidates it in to a more human readable form. This was designed primarily as a tool to aid developers but it may also be useful for advanced users who want more visibility in to what the ARC is caching. When run without options dbufstat.py will default to showing a list of all objects with at least one buffer present in the cache. The total cache space consumed by that object will be printed on the right along with the object type. Similar to the arcstats.py command the -x option may used to display additional fields. Two other modes of operation are also supported by dbufstat.py and the expectation is additional display modes may be added as needed. The -t option will summerize the total number of bytes cached for each object type, and the -b option will show every dbuf currently cached. The script was designed to be consistent with arcstat.py and includes most of the same options and funcationality. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-10-25 14:52:45 -07:00
Brian Behlendorf	11cb9d773f	Increase default udev wait time When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1646	2013-10-22 10:25:51 -07:00
Tim Chase	2e2ddc30b4	Dedup-related documentation additions for zpool and zdb. Document the "-D" and "-T" options and the optional interval and count or "zpool status". Also for zpool's man page, use a consistent order for the various "-T" options to match the program's help output. Document the effect of additional "-D" options for zdb. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1786	2013-10-22 10:08:51 -07:00
Richard Yao	31fc19399e	Generate libraries with correct DT_NEEDED entries Libraries that depend on other libraries should list them in ELF's DT_NEEDED field so that programs linking to them do not need to specify those libraries unless they depend on them as well. This is not the case in the current code and the consequence is that anything that needs a library must know its dependencies. This is fragile and caused GRUB2's configure script to break when a dependency was added on libblkid in libzfs. This resolves that problem by using LIBADD/LDADD to specify libraries in Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED entries are generated and prevents GRUB2's configure script from breaking in the presence of a libblkid dependency. This also removes unneeded dependencies from various files. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Richard Yao	3549721c9e	Update drive database Add Corsair Force GS drive (obtained from drive_id) Add Kingston HyperX 3K (obtained from drive_id) Add OCZ Vertex 4 drive (obtained from drive_id) Add Samsung SM843T enterprise drive (obtained from drive_id) Add entries for additional sizes of Intel 320/330/335/520 series Add Cruical C400 (obtained from Illumos user's sd.conf) Add Toshiba SSD (obtained from Illumos user's sd.conf) Add Samsung's first SLC SSD (obtained from drive_id) Add OCZ Core Series (obtained from drive_id) Add Intel DC S3700 (obtained from drive_id) Notes: 1. The drive identifer obtained for the Samsung SM843T was MZ7WD480. The rest were extrapolated. The additional entries were checked with Google to verify that such drives exist in the wild. 2. The additional entries for Intel drives were extrapolated from existing entries. The additional entries were checked with Google to verify that such drives exist in the wild. 3. The "ATA C400-MTFDDAC512M" and "ATA TOSHIBA THNSNH51" entries are from the sd.conf of gcbirzan on freenode. Additional entries were extrapolated from them and checked with Google. 4. I obtained the Samsung MCCOE64G entry from an actual drive. The Samsung MCCOE32G entry was extrapolated from it and checked with Google. 5. I obtained the SSDSC2BA10 from a 100GB Intel DC S3700 drive and extrapolated the entries for the additional models. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1752	2013-10-09 09:16:23 -07:00
Matthew Ahrens	13fe019870	Illumos #3464 3464 zfs synctask code needs restructuring Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3464 illumos/illumos-gate@3b2aab1880 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1495	2013-09-04 16:01:24 -07:00
Matthew Ahrens	6f1ffb0665	Illumos #2882 , #2883 , #2900 2882 implement libzfs_core 2883 changing "canmount" property to "on" should not always remount dataset 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2883 https://www.illumos.org/issues/2900 illumos/illumos-gate@4445fffbbb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1293 Porting notes: WARNING: This patch changes the user/kernel ABI. That means that the zfs/zpool utilities built from master are NOT compatible with the 0.6.2 kernel modules. Ensure you load the matching kernel modules from master after updating the utilities. Otherwise the zfs/zpool commands will be unable to interact with your pool and you will see errors similar to the following: $ zpool list failed to read pool configuration: bad address no pools available $ zfs list no datasets available Add zvol minor device creation to the new zfs_snapshot_nvl function. Remove the logging of the "release" operation in dsl_dataset_user_release_sync(). The logging caused a null dereference because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the logging functions try to get the ds name via the dsl_dataset_name() function. I've got no idea why this particular code would have worked in Illumos. This code has subsequently been completely reworked in Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring). Squash some "may be used uninitialized" warning/erorrs. Fix some printf format warnings for %lld and %llu. Apply a few spa_writeable() changes that were made to Illumos in illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and 3115 fixes. Add a missing call to fnvlist_free(nvl) in log_internal() that was added in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time (zfsonlinux/zfs@9e11c73) because it depended on future work.	2013-09-04 15:49:00 -07:00
Richard Yao	bff32e0972	Implement database to workaround misreported physical sector sizes This implements vdev_bdev_database_check(). It alters the detected sector size of any device listed in a database of drives known to lie about their physical sector sizes. This is based on "6931570 Add flash devices' VID/PID to disk table to advertising 4K physical sector size" from Open Solaris and on sg_simple4.c from sg3_utils. About two dozen lines are taken from sg_simple4.c, which is GPLv2 licensed. However, sg_simple4.c is analogous to a Hello World program and is safe for us to use. We requested that Douglas Gilbert, the author of sg_simple4.c, confirm that this is the case. A cutdown version of his response is as follows: ``` I would consider a SCSI INQUIRY example using the Linux sg driver interface (also written by me) as the equivalent of an "hello world" program in C. ``` The database was created with the help of the freenode and ZFSOnLinux communities. Some notes: 1. The following drives both were confirmed to lie via reports in IRC and they contain capacity information in their identifiers: INTEL SSDSA2M080 INTEL SSDSA2M160 M4-CT256M4SSD2 WDC WD15EARS-00S WDC WD15EARS-00Z WDC WD20EARS-00M The identifiers for different capacity models were extrapolated and added under the assumption that those models also lie. Google was used to verify that the extrapolated drive identifiers existed prior to their inclusion. 2. The OCZ-VERTEX2 3.5 identifer applies to two drives that differ solely in page size (and slightly in capacity). One uses 4096-byte pages and the other uses 8192-byte pages. Both are set to use 8192-byte pages. We could detect the page size by checking the capacity, but that would unnecessarily complicate the code. 3. It is possible for updated drive firmware to correctly report the sector size. There were reports of a few advanced format drives doing that. One report stated that the vendor changed the identification string while another was unclear on this. Both reports involved WDC models. 4. Google was used to determine the size of pages in the listed flash devices. Reports of 8192-byte pages took precedence over reports of 4096-byte pages. 5. Devices behind USB adapters can have their identification strings altered. Identification strings obtained across USB adapters are omitted and no attempt is made to correct for alterations made by USB adapters when doing comparisons against the database. Two entries in the Open Solaris database that appear to have been altered by a USB adapter were omitted. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1652	2013-08-22 11:24:06 -07:00
Turbo Fredriksson	0bc7a7a754	Don't specifically open /etc/mtab - it is done in libzfs_init() a few lines further down and we can share the open file handle. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1498	2013-08-15 10:19:38 -07:00
Turbo Fredriksson	f9e459d143	Use setmntent() OR fopen() For the same reasons it's used in libzfs_init(), this was just overlooked because zinject gets minimal use. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1498	2013-08-15 10:09:09 -07:00
Yuri Pankov	105afebb15	Illumos #3098 zfs userspace/groupspace fail 3098 zfs userspace/groupspace fail without saying why when run as non-root Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3098 illumos/illumos-gate@70f56fa693 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1596	2013-08-14 09:28:34 -07:00
Brian Behlendorf	cd72af9c68	Fix 'zpool list -H' error code Due to an uninitialized variable it was possible for the command 'zpool list -H' to return a non-zero error when there are no pools. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1605	2013-07-23 12:39:05 -07:00
Christer Ekholm	da91c90154	Add missing -v to usage help for zpool list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-07-22 13:39:01 -07:00
Dmitry Khasanov	131cc95ca7	Add FreeBSD 'zpool labelclear' command The FreeBSD implementation of zfs adds the 'zpool labelclear' command. Since this functionality is helpful and straight forward to add it is being included in ZoL. References: freebsd/freebsd@119a041dc9 Ported-by: Dmitry Khasanov <pik4ez@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1126	2013-07-09 15:58:05 -07:00
Tim Chase	5021058756	zdb: enhancement - Display SA xattrs. If the znode has SA xattrs, display them following the other standard attributes. The format used is similar to that used when listing the contents of a ZAP. It is as follows: $ zdb -vvv <pool>/<dataset> <object> ... SA xattrs: <size> bytes, <number> entries <name1> = <value1> <name2> = <value2> ... Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1581	2013-07-09 13:52:28 -07:00
Mike Leddy	5d3dc3fb72	Avoid abort() in vn_rdwr(): libzpool/kernel.c Make sure that buffer is aligned to 512 bytes on linux so that pread call combined with O_DIRECT does not return EINVAL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1570	2013-07-09 11:56:43 -07:00
Craig Loomis	50fe577d1f	Explicitly flush output at end of each zevent For "zpool events -f" flush stdout to ensure the last zevent is always printed immediately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1568	2013-07-08 17:01:16 -07:00
Brian Behlendorf	c76955eaa5	Fix parse_dataset error handling A mount failure was accidentally introduced by commit `0c1171d` which reworked the parse_dataset() function to read pool names from devices. The error case where a label is read from the device but the pool name/value pair doesn't exist was not handled properly. In this case we should fall back to the previous behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1560	2013-07-03 09:20:52 -07:00
George Wilson	294f68063b	Illumos #3498 panic in arc_read() 3498 panic in arc_read(): !refcount_is_zero(&pbuf->b_hdr->b_refcnt) Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@1b912ec710 https://www.illumos.org/issues/3498 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1249	2013-07-02 13:34:31 -07:00
Richard Yao	3db3ff4a78	Use MAXPATHLEN instead of sizeof in snprintf This silences a GCC 4.8.0 warning by fixing a programming error caught by static analysis: ../../cmd/ztest/ztest.c: In function ‘ztest_vdev_aux_add_remove’: ../../cmd/ztest/ztest.c:2584:33: error: argument to ‘sizeof’ in ‘snprintf’ call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess] (void) snprintf(path, sizeof (path), ztest_aux_template, ^ Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1480	2013-07-02 10:39:24 -07:00
Cyril Plisko	64d7b6cf75	Override default SPA config location via environment When using zdb with non-default SPA config file it is not convenient to add -U <non-default-config-file-path> all the time. This commit introduces support for setting/overriding SPA config location via environment variable 'SPA_CONFIG_PATH'. If -U flag is specified in the command line it will override any other value as usual. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1545	2013-07-01 13:25:44 -07:00
Cyril Plisko	20c17b96c9	Add absent \n at the end of the help text line Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1545	2013-06-28 11:26:41 -07:00
Brian Behlendorf	0c1171dcb5	Allow fetching the pool from the device at mount To simplify integration with the xfstests test suite the mount.zfs helper has been extended. When passed a block device (/dev/sdX) to mount, instead of a pool/dataset, the pool name will be read from any existing zfs label and used. This allows you to mount the root dataset of a zfs filesystem by specifing any of the member vdevs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-06-26 15:20:09 -07:00
George Wilson	e51be06697	Illumos #3552 , #3564 3552 condensing one space map burns 3 seconds of CPU in spa_sync() thread 3564 spa_sync() spends 5-10% of its time in metaslab_sync() (when not condensing) Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@16a4a80742 https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1513	2013-06-19 16:22:39 -07:00
Madhav Suresh	c99c90015e	Illumos #3006 3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first argument is zero Reviewed by Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by George Wilson <george.wilson@delphix.com> Approved by Eric Schrock <eric.schrock@delphix.com> References: illumos/illumos-gate@fb09f5aad4 https://illumos.org/issues/3006 Requires: zfsonlinux/spl@1c6d149feb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1509	2013-06-19 15:14:10 -07:00
Christ Schlacta	fb02fabf9b	Modified arcstat.py to run on linux * Modified kstat_update() to read arcstats from proc. * Fix shebang. * Added Makefile.am entries for arcstat.py Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1506	2013-06-18 15:43:15 -07:00
Christ Schlacta	7634cd54db	Added arcstat.py from FreeNAS Original source: http://support.freenas.org/browser/nanobsd/Files/usr/local/bin/arcstat.py Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1506	2013-06-18 15:30:08 -07:00
Ned Bass	da29fe63f0	Don't leak mount flags into kernel When calling mount(), care must be taken to avoid passing in flags that are used only by the user space utilities. Otherwise we may stomp on flags that are reserved for other purposes in the kernel. In particular, openSUSE 12.3 kernels have added a new MS_RICHACL super-block flag whose value conflicts with our MS_COMMENT flag. This causes incorrect behavior such as the umask being ignored. The MS_COMMENT flag essentially serves as a placeholder in the option_map data structure of zfs_mount.c, but its value is never used. Therefore we can avoid the conflict by defining it to 0. The MS_USERS, MS_OWNER, and MS_GROUP flags also conflict with reserved flags in the kernel. While this is not known to have caused any problems, it is nevertheless incorrect. For the purposes of the mount.zfs helper, the "users", "owner", and "group" options just serve as hints to set additional implied options. Therefore we now define their associated mount flags in terms of the options that they imply rather than giving them unique values. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1457	2013-06-18 15:30:08 -07:00
George Wilson	5853fe790d	Illumos #3306 , #3321 3306 zdb should be able to issue reads in parallel 3321 'zpool reopen' command should be documented in the man page and help Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@31d7e8fa33 https://www.illumos.org/issues/3306 https://www.illumos.org/issues/3321 The vdev_file.c implementation in this patch diverges significantly from the upstream version. For consistenty with the vdev_disk.c code the upstream version leverages the Illumos bio interfaces. This makes sense for Illumos but not for ZoL for two reasons. 1) The vdev_disk.c code in ZoL has been rewritten to use the Linux block device interfaces which differ significantly from those in Illumos. Therefore, updating the vdev_file.c to use the Illumos interfaces doesn't get you consistency with vdev_disk.c. 2) Using the upstream patch as is would requiring implementing compatibility code for those Solaris block device interfaces in user and kernel space. That additional complexity could lead to confusion and doesn't buy us anything. For these reasons I've opted to simply move the existing vn_rdwr() as is in to the taskq function. This has the advantage of being low risk and easy to understand. Moving the vn_rdwr() function in to its own taskq thread also neatly avoids the possibility of a stack overflow. Finally, because of the additional work which is being handled by the free taskq the number of threads has been increased. The thread count under Illumos defaults to 100 but was decreased to 2 in commit 08d08e due to contention. We increase it to 8 until the contention can be address by porting Illumos #3581. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1354	2013-05-03 16:53:52 -07:00
Brian Behlendorf	937210a54b	Fix zinject list handlers The zfs_fd must be opened before calling print_all_handlers() or the ioctl() cannot be used to the zfs control device. This brings the zinject code back in sync with the Illumos implementation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-05-01 17:05:58 -07:00
George.Wilson	cc92e9d0c3	3246 ZFS I/O deadman thread Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> NOTES: This patch has been reworked from the original in the following ways to accomidate Linux ZFS implementation ) Usage of the cyclic interface was replaced by the delayed taskq interface. This avoids the need to implement new compatibility code and allows us to rely on the existing taskq implementation. ) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h because declaring externs in source files as was done in the original patch is just plain wrong. ) Instead of panicing the system when the deadman triggers a zevent describing the blocked vdev and the first pending I/O is posted. If the panic behavior is desired Linux provides other generic methods to panic the system when threads are observed to hang. ) For reference, to delay zios by 30 seconds for testing you can use zinject as follows: 'zinject -d <vdev> -D30 <pool>' References: illumos/illumos-gate@283b84606b https://www.illumos.org/issues/3246 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1396	2013-05-01 17:05:52 -07:00
Chris Dunlop	254255f735	Trivial spelling fix Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1411	2013-04-19 15:43:16 -07:00
Ned Bass	92db59ca3b	Refresh links to web site A few files still refer to @behlendorf's private fork on github. Use the primary web site URL instead. Two typos are also corrected. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-06 15:46:41 -08:00
Richard Yao	b01615d5ac	Constify structures containing function pointers The PaX team modified the kernel's modpost to report writeable function pointers as section mismatches because they are potential exploit targets. We could ignore the warnings, but their presence can obscure actual issues. Proper const correctness can also catch programming mistakes. Building the kernel modules against a PaX/GrSecurity patched Linux 3.4.2 kernel reports 133 section mismatches prior to this patch. This patch eliminates 130 of them. The quantity of writeable function pointers eliminated by constifying each structure is as follows: vdev_opts_t 52 zil_replay_func_t 24 zio_compress_info_t 24 zio_checksum_info_t 9 space_map_ops_t 7 arc_byteswap_func_t 5 The remaining 3 writeable function pointers cannot be addressed by this patch. 2 of them are in zpl_fs_type. The kernel's sget function requires that this be non-const. The final writeable function pointer is created by SPL_SHRINKER_DECLARE. The kernel's set_shrinker() and remove_shrinker() functions also require that this be non-const. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1300	2013-03-04 08:49:32 -08:00
Brian Behlendorf	8128bd89fb	Fix hot spares The issue with hot spares in ZoL is because it opens all leaf vdevs exclusively (O_EXCL). On Linux, exclusive opens cause subsequent exclusive opens to fail with EBUSY. This could be resolved by not opening any of the devices exclusively, which is what Illumos does, but the additional protection offered by exclusive opens is desirable. It cleanly prevents you from accidentally adding an in-use non-ZFS device to your pool. To fix this we very slightly relaxed the usage of O_EXCL in the following ways. 1) Functions which open the device but only read had the O_EXCL flag removed and were updated to use O_RDONLY. 2) A common holder was added to the vdev disk code. This allow the ZFS code to internally open the device multiple times but non-ZFS callers may not. 3) An exception was added to make_disks() for hot spare when creating partition tables. For hot spare devices which are already opened exclusively we skip creating the partition table because this must already have been done when the disk was originally added as a hot spare. Additional minor changes include fixing check_in_use() to use a partition instead of a slice suffix. And is_spare() was moved above make_disks() to avoid adding a forward reference. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #250	2013-03-01 13:31:02 -08:00
Tim Connors	c5b247f335	-x shouldn't warn about old on-disk format or unavailable features `zpool status -x` should only flag errors or where the pool is unavailable. If it imported fine but isn't using the latest features available in the code, that's not an error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1319	2013-02-28 09:17:09 -08:00
Brian Behlendorf	f52b31eaf0	Honor 80 character limit in 'zpool status' This is a minor nit, but the second line of the 'action' message when you need to upgrade your pool to support feature flags exceeds the standard 80 character limit. Fix it by moving the word 'feature' on to the third line. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-31 11:06:57 -08:00
Brian Behlendorf	dbf763b39b	Retire zpool_id infrastructure In the interest of maintaining only one udev helper to give vdevs user friendly names, the zpool_id and zpool_layout infrastructure is being retired. They are superseded by vdev_id which incorporates all the previous functionality. Documentation for the new vdev_id(8) helper and its configuration file, vdev_id.conf(5), can be found in their respective man pages. Several useful example files are installed under /etc/zfs/. /etc/zfs/vdev_id.conf.alias.example /etc/zfs/vdev_id.conf.multipath.example /etc/zfs/vdev_id.conf.sas_direct.example /etc/zfs/vdev_id.conf.sas_switch.example Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #981	2013-01-29 12:23:17 -08:00
Ned Bass	ba43f4565a	vdev_id: improve keyword parsing flexibility The vdev_id udev helper strictly requires configuration file keywords to always be anchored at the beginning of the line and to be followed by a space character. However, users may prefer to use indentation or tab delimitation. Improve flexibility by simply requiring a keyword to be the first field on the line. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1239	2013-01-25 13:44:32 -08:00
Christopher Siden	e6f7d01502	Illumos #3397 , #3398 3397 zdb <pool> <objnum> output is too verbose 3398 zdb can't dump feature flags zap objects Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@e690fb27a7 https://www.illumos.org/issues/3397 https://www.illumos.org/issues/3398 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-11 16:42:50 -08:00
Yuri Pankov	5990da81a7	Illumos #1884 , #3028 , #3048 , #3049 , #3060 , #3061 , #3093 1884 Empty "used" field for zfs *space commands 3028 zfs {group,user}space -n prints (null) instead of numeric GID/UID 3048 zfs {user,group}space [-s\|-S] is broken 3049 zfs {user,group}space -t doesn't really filter the results 3060 zfs {user,group}space -H output isn't tab-delimited 3061 zfs {user,group}space -o doesn't use specified fields order 3093 zfs {user,group}space's -i is noop Reviewed by: Garry Mills <gary_mills@fastmail.fm> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: illumos/illumos-gate@89f5d17b06 illumos changeset: 13803:b5e49d71ff0e https://www.illumos.org/issues/1884 https://www.illumos.org/issues/3028 https://www.illumos.org/issues/3048 https://www.illumos.org/issues/3049 https://www.illumos.org/issues/3060 https://www.illumos.org/issues/3061 https://www.illumos.org/issues/3093 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1194	2013-01-11 09:17:19 -08:00
Yuri Pankov	240245896a	Illumos #1377 `zpool status -D' should tell if there are no DDT entries 1337 `zpool status -D' should tell if there are no DDT entries Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Albert Lee <trisk@nexenta.com> References: illumos/illumos-gate@ce72e614c1 illumos changeset: 13432:d1ad8d106d64 https://www.illumos.org/issues/1337 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-11 09:17:13 -08:00
Brian Behlendorf	a1e147eef8	Add /sbin/fsck.zfs helper A fsck helper to accomidate distributions that expect to be able to execute a fsck on all filesystem types. Currently this script does nothing but it could be extended to act as a compatibility wrapper for 'zpool scrub'. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #964	2013-01-09 16:54:58 -08:00
Brian Behlendorf	87bdc45ccb	Report realpath() canonicalization error Rather than just reporting the failure include the passed mount point and error number. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1153	2013-01-09 16:54:58 -08:00
George Wilson	1eb5bfa3dc	Illumos #3145 , #3212 3145 single-copy arc 3212 ztest: race condition between vdev_online() and spa_vdev_remove() Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com> Approved by: Eric Schrock <eric.schrock@delphix.com> References: illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e illumos changeset: 13840:97fd5cdf328a https://www.illumos.org/issues/3145 https://www.illumos.org/issues/3212 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #989 Closes #1137	2013-01-08 10:35:44 -08:00
George Wilson	ea0b2538cd	Illumos #3349 : zpool upgrade -V bumps the on disk version number 3349 zpool upgrade -V bumps the on disk version number, but leaves the in core version Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@25345e4666 https://www.illumos.org/issues/3349 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
Matthew Ahrens	29809a6cba	Illumos #3086 : unnecessarily setting DS_FLAG_INCONSISTENT on async 3086 unnecessarily setting DS_FLAG_INCONSISTENT on async destroyed datasets Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@ce636f8b38 illumos changeset: 13776:cd512c80fd75 https://www.illumos.org/issues/3086 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
Christopher Siden	b9b24bb4ca	Illumos #2762 : zpool command should have better support for feature flags 2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@57221772c3 https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
George Wilson	3bc7e0fb0f	Illumos #3090 and #3102 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@dfbb943217 illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #939	2013-01-08 10:35:42 -08:00
Brian Behlendorf	5ac0c30a94	Revert "Temporarily disable the reguid test." This reverts commit `d135245791`. Since feature flags have now been merged we can apply the real upstream fix from Illumos. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #997	2013-01-08 10:35:42 -08:00
Christopher Siden	9ae529ec5d	Illumos #2619 and #2747 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@53089ab7c8 illumos/illumos-gate@ad135b5d64 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:35 -08:00
Will Rouesnel	462ee8e3f3	Allow fake mounts to succeed on non-legacy filesystems. mountall in Debian depends on being able to pass the -f parameter to mount, which specifies a fake mount and just updates the mtab. Currently mount.zfs will fail such a request if it is not passed with -o zfsutil. This patch allows a fake mount on a non-legacy filesystem to succeed in the same manner as a -o remount does, thus enabling mountall to work correctly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1167	2013-01-07 11:30:27 -08:00
Ned Bass	2957f38d78	vdev_id support for device link aliases Add a vdev_id feature to map device names based on already defined udev device links. To increase the odds that vdev_id will run after the rules it depends on, increase the vdev.rules rule number from 60 to 69. With this change, vdev_id now provides functionality analogous to zpool_id and zpool_layout, paving the way to retire those tools. A defined alias takes precedence over a topology-derived name, but the two naming methods can otherwise coexist. For example, one might name drives in a JBOD with the sas_direct topology while naming an internal L2ARC device with an alias. For example, the following lines in vdev_id.conf will result in the creation of links /dev/disk/by-vdev/{d1,d2}, each pointing to the same target as the device link specified in the third field. # by-vdev # name fully qualified or base name of device link alias d1 /dev/disk/by-id/wwn-0x5000c5002de3b9ca alias d2 wwn-0x5000c5002def789e Also perform some minor vdev_id cleanup, such as removal of the unused -s command line option. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #981	2012-12-03 14:04:47 -08:00
Cyril Plisko	4588bf5701	Make zpool attach -o ashift=... actually work Commit `df83110856` missed update to getopt() call, while delivering all the rest. This commit adds "o" to getopt(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #566	2012-11-30 13:50:26 -08:00
Cyril Plisko	38b344d22a	vdev_id fails to handle complex device topologies While expanding positional parameters shell requires non-single digits to be enclosed in braces. When the SAS topology is non-trivial the number of positional parameters generated internally by vdev_id script (using set -- ...) easily crosses single digit limit and vdev_id fails to generate links. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1119	2012-11-29 13:07:47 -08:00
Ned Bass	a6ef9522ea	Make vdev_id POSIX sh compatible Full bash may not be available in all environments where udev helpers run, such as in an initial ramdisk. To avoid breakage in this case, remove use of bash-specific features such as variable arrays and the `declare' keyword from the vdev_id script. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #870	2012-11-27 14:23:22 -08:00
nordaux	33364b15d3	mount.zfs: canonicalize mount point for mtab Canonicalize the mount point passed to the mount.zfs helper. This way a clean path is always added to mtab which ensures the umount can properly locate and remove the entry. Test case: $ mkdir /mnt/foo $ mount -t zfs zpool/foo /mnt/../mnt/foo//// $ umount /mnt/foo $ cat /etc/mtab \| grep zpool/foo zpool/foo /mnt/../mnt/foo//// zfs rw 0 0 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #573	2012-11-15 14:28:52 -08:00
Cyril Plisko	df83110856	Add "-o ashift" to zpool add and zpool attach When adding devices to an existing pool "ashift" property is auto-detected. However, if this property was overridden at the pool creation time (i.e. zpool create -o ashift=12 tank ...) this may not be what the user wants. This commit lets the user specify the value of "ashift" property to be used with newly added drives. For example, zpool add -o ashift=12 tank disk1 zpool attach -o ashift=12 tank disk1 disk2 Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #566	2012-11-15 11:06:14 -08:00
George Wilson	32a9872bba	Illumos #2671 : zpool import should not fail if vdev ashift has increased Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Richard Lowe <richlowe@richlowe.net> Refererces to Illumos issue: https://www.illumos.org/issues/2671 This patch has been slightly modified from the upstream Illumos version. In the upstream implementation a warning message is logged to the console. To prevent pointless console noise this notification is now posted as a "ereport.fs.zfs.vdev.bad_ashift" event. The event indicates a non-optimial (but entirely safe) ashift value was used to create the pool. Depending on your workload this may impact pool performance. Unfortunately, the only way to correct the issue is to recreate the pool with a new ashift. NOTE: The unrelated fix to the comment in zpool_main.c appears in the upstream commit and was preserved for consistnecy. Ported-by: Cyril Plisko <cyril.plisko@mountall.com> Reworked-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #955	2012-11-15 11:05:59 -08:00
Brian Behlendorf	e8fd45a0f9	Add ddt_object_count() error handling The interface for the ddt_zap_count() function assumes it can never fail. However, internally ddt_zap_count() is implemented with zap_count() which can potentially fail. Now because there was no way to return the error to the caller a VERIFY was used to ensure this case never happens. Unfortunately, it has been observed that pools can be damaged in such a way that zap_count() fails. The result is that the pool can not be imported without hitting the VERIFY and crashing the system. This patch reworks ddt_object_count() so the error can be safely caught and returned to the caller. This allows a pool which has be damaged in this way to be safely rewound for import. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #910	2012-10-29 08:57:45 -07:00
Brian Behlendorf	eac4720465	Allow 'zpool replace' to use short device names The 'zpool replace' command would fail when given a short name because unlike on other platforms the short name cannot be deterministically expanded to a single path. Multiple path prefixes must be checked and in addition the partition suffix for whole disks is determined by the prefix. To handle this complexity a zfs_strcmp_pathname() function was added which takes either a short or fully qualified device name. Short names will be expanded using the prefixes in the default import search path, or the ZPOOL_IMPORT_PATH environment variable if it's defined. All posible expansions are then compared against the comparison path. Care is taken to strip redundant slashes to ensure legitimate matches are not missed. In the context of this work the existing zfs_resolve_shortname() function was extended to consider the ZPOOL_IMPORT_PATH when set. The zfs_append_partition() interface was also simplified to take only a single buffer. The vast majority of these changes rework existing Linux specific code which was originally written to accomidate udev. However, there is some minimal cleanup which removes Illumos specific code. This was done to improve readability but the basic flow and intent of the upstream code was maintained. These changes are the logical conclusion of the previos work to adjust the 'zpool import' search behavior, see commit 44867b6a. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #544 Closes #976	2012-10-22 08:45:58 -07:00
Brian Behlendorf	26099167e6	Disable ztest deadman timer The ztest deadman timer has been causing false positives in the testing VMs. To make it easier to spot possible regressions I'm disabling this timer. The buildbot test infrastructure will still mark ztest instances which take to long to complete as failures. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1018	2012-10-14 19:35:09 -07:00
Brian Behlendorf	ae380cfa76	Realpath arg 2 must be a minimum of PATH_MAX The realpath(3) function expects that when a buffer is passed for the 'resolved_path' that it be at least PATH_MAX in length. If it's not a buffer overflow may occur. Therefore the passed buffer size is changed from MAXNAMELEN to MAXPATHLEN. We also take this opertunity to dynamically allocate the buffer to keep it off the stack. warning: call to '__realpath_chk_warn' declared with attribute warning: second argument of realpath must be either NULL or at least PATH_MAX bytes long buffer [enabled by default] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:19:10 -07:00
Brian Behlendorf	5be98cfe2f	Verify the return value for warn_unused_result functions Under Linux the following functions are flagged with the attribute warn_unused_result, this triggers a warning when ever they are used without checking the return value. To handle this case we check the result VERIFY(). It's better to detect this immediately on failure rather than segfault farther down in the function. ../../cmd/ztest/ztest.c:6033:2: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result [-Wunused-result] ../../cmd/ztest/ztest.c:739:3: warning: ignoring return value of 'realpath', declared with attribute warn_unused_result [-Wunused-result] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:19:10 -07:00
Brian Behlendorf	facbbe4366	Replace tempnam() with mkstemp() The use of tempnam() is racy and it should be avoided in favor of mkstemp(). According to the Linux tempnam(3) man page. "Although tempnam() generates names that are difficult to guess, it is nevertheless possible that between the time that tempnam() returns a pathname, and the time that the program opens it, another program might create that pathname using open(2), or create it as a symbolic link. This can lead to security holes. To avoid such possibilities, use the open(2) O_EXCL flag to open the pathname. Or better yet, use mkstemp(3) or tmpfile(3)." This issue was flagged by gcc. ztest.o: In function `setup_data_fd': cmd/ztest/ztest.c:5822: warning: the use of `tempnam' is dangerous, better use `mkstemp' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:19:10 -07:00
Brian Behlendorf	483106eb71	Minimize ztest stack frame size To ensure ztest behaves as similarly as possible to the kernel implementation of ZFS we attempt to honor the kernel stack limits. This includes keeping the individual stack frame sizes under 1K in size. We currently use gcc to detect and enforce this limit. Therefore to get this building cleanly with full debugging enabled the stack usage in the following functions has been reduced by moving the buffer to the heap. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:19:09 -07:00
Etienne Dechamps	9d81146b01	Use dynamic file descriptor numbers in ztest. Currently, ztest expects to get 3 and 4 as the file descriptors for data and random files, respectively. This is quite fragile and breaks easily if ztest is run with these file descriptors already opened (e.g. in a complex shell script). This patch fixes the issue by removing the assumptions on the file descriptor numbers that open() returns. For the random file (/dev/urandom), the new code doesn't rely on a shared file descriptor; instead, it reopens the file in the child. For the data file, the new code writes the file descriptor number into a "ZTEST_FD_DATA" environment variable so that it can be recovered after the execv() call. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:19:09 -07:00
Christopher Siden	22257dc0d5	Fix mmap() usage in ztest. illumos/illumos-gate@ad135b5d64 Illumos changeset: 13700:2889e2596bd6 Note that this is only a partial port of the aforementioned Illumos changeset. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> Ported to zfsonlinux by: Etienne Dechamps <etienne.dechamps@ovh.net>	2012-10-04 13:19:09 -07:00
Chris Siden	c242c188fd	Illumos #1950 : ztest backwards compatibility testing option. illumos/illumos-gate@420dfc9585 Illumos changeset: 13571:a5771a96228c 1950 ztest backwards compatibility testing option Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Eric Schrock <eric.schrock@delphix.com> Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-10-04 13:18:53 -07:00
Etienne Dechamps	d135245791	Temporarily disable the reguid test. Currently, ztest fails with the following error: error: Pool 'ztest' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic. We know how to fix it (see issue #939), but it may take some time before we get around to merging the fix, which has some heavy dependencies. In the mean time, it is not ideal to be unable to use ztest just because of a small isolated issue, so this patch works around the problem by disabling the reguid test. This is just a temporary hack to keep ztest usable. The reguid test will be enabled again when the proper fix is merged. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #997	2012-10-03 13:59:02 -07:00
Etienne Dechamps	6aec1cd5a6	Fix ztest vdev file paths. Currently, in several instances (but not all), ztest generates vdev file paths using a statement similar to this: snprintf(path, sizeof (path), ztest_dev_template, ...); This worked fine until `40b84e7aec`, which changed path to be a pointer to the heap instead of an array allocated on the stack. Before this change, sizeof(path) would return the size of the array; now, it returns the size of the pointer instead. As a result, the aforementioned sprintf statement uses the wrong size and truncates the vdev file path to the first 4 or 8 bytes (depending on the architecture). Typically, with default settings, the file path will become "/tmp/zt" instead of "/test/ztest.XXX". This issue only exists in ztest_vdev_attach_detach() and ztest_fault_inject(), which explains why ztest doesn't fail right away. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #989	2012-10-03 13:32:48 -07:00
Etienne Dechamps	0aebd4f9e3	Create threads in detached state in userspace. Currently, thread_create(), when called in userspace, creates a joinable (i.e. not detached thread). This is the pthread default. Unfortunately, this does not reproduce kthreads behavior (kthreads are always detached). In addition, this contradicts the original Solaris code which creates userspace threads in detached mode. These joinable threads are never joined, which leads to a leakage of pthread thread objects ("zombie threads"). This in turn results in excessive ressource consumption, and possible ressource exhaustion in extreme cases (e.g. long ztest runs). This patch fixes the issue by creating userspace threads in detached mode. The only exception is ztest worker threads which are meant to be joinable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #989	2012-10-03 13:32:48 -07:00
Bill Pijewski	37abac6d55	Illumos #2703 : add mechanism to report ZFS send progress Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2703 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-19 13:39:06 -07:00
Chris Siden	1bd201e70d	Illumos #1948 : zpool list should show more detailed pool info Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1948 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #685	2012-09-19 13:39:05 -07:00
Richard Lowe	dd4769adc0	Illumos #2088 zdb could use a reasonable manual page Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Steve Gonczi <gonczi@comcast.net> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/2088 Ported by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #682	2012-09-18 09:09:13 -07:00
Brian Behlendorf	44867b6d6e	Improve `zpool import` search behavior The goal of this change is to make 'zpool import' prefer to use the peristent /dev/mapper or /dev/disk/by-* paths. These are far preferable to the devices in /dev/ whos names are not persistent and are determined by the order in which a device is detected. This patch improves things by changing the default search path from just to the top level /dev/ directory to (in order): /dev/disk/by-vdev - Custom rules, use first if they exist /dev/disk/zpool - Custom rules, use first if they exist /dev/mapper - Use multipath devices before components /dev/disk/by-uuid - Single unique entry and persistent /dev/disk/by-id - May be multiple entries and persistent /dev/disk/by-path - Encodes physical location and persistent /dev/disk/by-label - Custom persistent labels /dev - UNSAFE device names will change The default search path can be overriden by setting the ZPOOL_IMPORT_PATH environment variable. This must be a colon delimited list of paths which are searched for vdevs. If the 'zpool import -d' option is specified only those listed paths will be searched. Finally, when multiple paths to the same device are found. If one of the paths is an exact match for the path used last time to import the pool it will be used. When there are no exact matches the prefered path will be determined by the provided search order. This means you can still import a pool and force specific names by providing the -d <path> option. And the prefered names will persist as long as those paths exist on your system. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #965	2012-09-17 13:49:07 -07:00
Cyril Plisko	8e8e7f35b7	Fix zdb printf format string for ZIL data blocks Without this fix the zdb printouts of ZIL data blocks look full of FF due to printf() handling its arguments as int by default. Here is the output before the fix TX_WRITE len 4136, txg 1093817, seq 149231 foid 4242, offset 0, length f68 G FFFFFF8EFFFFFF87FFFFFF91FFFFFFCC 1c FFFFFFAFFFFFFFC9FFFFFFBAZ FFFFFFC3 And the same after the fix TX_WRITE len 4136, txg 1093817, seq 149231 foid 4242, offset 0, length f68 G 8E8791CC 1cAFC9BAZ C3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #962	2012-09-13 09:02:12 -07:00
Cyril Plisko	af909a1089	Illumos #3064 : usr/src/cmd/zpool/zpool_main.c misspells "successful" Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com> Reviewed by: Kartik Mistry <kartik.mistry@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> References: https://www.illumos.org/issues/3064 Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-11 10:19:50 -07:00
Etienne Dechamps	b815ff9a8f	Silence "setting dataset to sync always" message in ztest. ztest outputs a message when testing sync=always no matter what the verbosity level is. There is no point outputting this message for low verbosity levels. With this patch the message is only displayed at verbosity level 5 or above. The result is less output pollution. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #951	2012-09-10 10:55:44 -07:00
Brian Behlendorf	1ecc6d1265	Add zstreamdump .gitignore When zstreamdump was merged in commit `b79fc3f` we failed to add the needed .gitignore file. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-06 14:23:11 -07:00
Etienne Dechamps	ba7dbeb22e	Add libnvpair to mount_zfs dependencies Commit `e6f290535c` added libzpool to the mount_zfs dependencies. This brought in the nvpair symbols which are used by libzpool. To resolve this include the libnvpair library for mount_zfs even though mount_zfs doesn't directly require any of these symbols. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #926	2012-09-02 15:36:09 -07:00
Martin Matuska	b79fc3fea9	Add zstreamdump(8) command to examine ZFS send streams. Obtained from: illumos-gate revision 11935:538c866aaac6 Source: ssh://anonhg@hg.illumos.org/illumos-gate Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #905	2012-09-02 14:54:27 -07:00
Etienne Dechamps	e6f290535c	Fix mount_zfs dependency on libzpool. mount_zfs depends on libzpool for zfs_prop_written since `330d06f90d`. Unfortunately, the Makefile for mount_zfs has not been modified to reflect this. As a result, libtool doesn't know about the dependency, which may result in the wrong libzpool being used during the build (e.g. the libzpool from the system instead of the libzpool from the build directory). This patch adds the dependency to fix the issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Fixes #909.	2012-08-30 16:06:46 -07:00
Brian Behlendorf	ca8b5af89d	Remove autotools products Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #718	2012-08-27 11:47:44 -07:00
Christopher Siden	e956d65106	Illumos #1796 , #2871 , #2903 , #2957 1796 "ZFS HOLD" should not be used when doing "ZFS SEND" from a read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:40:02 -07:00
Eric Schrock	db49968e5c	Illumos #2635 : 'zfs rename -f' to perform force unmount Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <George.Wilson@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/2635 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #717	2012-08-23 10:39:43 -07:00
Andrew Stormont	e346ec25af	Illumos #1936 : add support for "-t <datatype>" argument to zfs get Reviewed by: Kartik Mistry <kartik@nexenta.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1936 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #681	2012-08-23 10:35:59 -07:00
Alexander Eremin	684e8c0643	Illumos #1726 : Removal of pyzfs broke delegation for volumes Reviewed by: Andrew Stormont <andyjstormont@googlemail.com> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Albert Lee <trisk@nexenta.com> Approved by: Garrett D'Amore <garrett@nexenta.com> References: https://www.illumos.org/issues/1726 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:35:37 -07:00
Alexander Eremin	79e722432f	Illumos #1977 : zfs allow arguments not parsed correctly after pyzfs removal Reviewed by: Garrett D'Amore <garrett.damore@gmail.com> Reviewed by: Albert Lee <trisk@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1977 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:21:04 -07:00
Dan McDonald	d96eb2b153	Illumos #1693 : persistent 'comment' field for a zpool Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1693 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #678	2012-08-08 11:49:37 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Matthew Ahrens	330d06f90d	Illumos #1644 , #1645 , #1646 , #1647 , #1708 1644 add ZFS "clones" property 1645 add ZFS "written" and "written@..." properties 1646 "zfs send" should estimate size of stream 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots 1708 adjust size of zpool history data References: https://www.illumos.org/issues/1644 https://www.illumos.org/issues/1645 https://www.illumos.org/issues/1646 https://www.illumos.org/issues/1647 https://www.illumos.org/issues/1708 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. This change has reordered all of the new ioctl()s to the end of the list. This should help minimize this issue in the future. Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Albert Lee <trisk@opensolaris.org> Approved by: Garrett D'Amore <garret@nexenta.com> Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #826 Closes #664	2012-07-31 09:25:30 -07:00
Brian Behlendorf	c171ea71bb	Allow '-o remount' for non-legacy datasets This is done for compatibility with existing Linux infrastructure. In particular, when using zfs as a root filesystem there are init scripts which as part of shutdown remount root read-only. Also, the new systemd infrastructure being used by Fedora expects to be able to remount a file system read-write. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #847	2012-07-30 15:58:02 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
Garrett D'Amore	3541dc6d02	Illumos #1748 : desire support for reguid in zfs Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Reviewed by: Alexander Stetsenko <ams@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1748 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. If only the user space component is updated both the 'zpool events' command and the 'zpool reguid' command will not work until the kernel modules are updated. Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:08:56 -07:00
Pawel Jakub Dawidek	0cee24064a	Speed up 'zfs list -t snapshot -o name -s name' FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #450	2012-06-14 09:49:04 -07:00
Brian Behlendorf	fe2fc8f6d3	Workaround for failing zvol_id This is not a proper fix. It is just a workaround for the stack smashing detected by gcc in zvol_id. We simply disable the gcc stack protector for now when building the zvol_id udev helper. Once the root cause is resolved this patch should be reverted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issues #569	2012-06-13 11:25:17 -07:00
Daniel Verite	92e91da208	Include <locale.h> to avoid error: 'LC_ALL' undeclared. When compiling ZFS with CFLAGS=-O0 it will trigger the following error. Resolve the issue by properly including locale.h. ../../cmd/mount_zfs/mount_zfs.c: In function 'main': ../../cmd/mount_zfs/mount_zfs.c:318:2: warning: implicit declaration of function 'setlocale' [-Wimplicit-function-declaration] ../../cmd/mount_zfs/mount_zfs.c:318:19: error: 'LC_ALL' undeclared (first use in this function) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #724	2012-06-11 10:21:40 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Ned A. Bass	821b683436	Add vdev_id for JBOD-friendly udev aliases vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The only currently supported topologies are sas_direct and sas_switch: o sas_direct - a channel is uniquely identified by a PCI slot and a HBA port o sas_switch - a channel is uniquely identified by a SAS switch port A multipath mode is supported in which dm-mpath devices are handled by examining the first running component disk, as reported by 'multipath -l'. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. vdev_id can replace the existing zpool_id script on systems where the storage topology conforms to sas_direct or sas_switch. The script could be extended to support other topologies as well. The advantage of vdev_id is that it is driven by a single static input file that can be shared across multiple nodes having a common storage toplogy. zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per node and a separate slot-mapping file. However, zpool_id provides the flexibility of using any device names that show up in /dev/disk/by-path, so it may still be needed on some systems. vdev_id's functionality subsumes that of the sas_switch_id script, and it is unlikely that anyone is using it, so sas_switch_id is removed. Finally, /dev/disk/by-vdev is added to the list of directories that 'zpool import' will scan. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #713	2012-06-01 08:55:14 -07:00
Brian Behlendorf	b39d3b9f7b	Linux 3.3 compat, iops->create()/mkdir()/mknod() The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #701	2012-04-30 12:52:38 -07:00
Richard Lowe	ad60af8e1b	Illumos #2067 : uninitialized variables in zfs(1M) may make snapshots undestroyable Reviewed by: Joshua M. Clulow <josh@sysmgr.org> Reviewed by: Milan Jurik <milan.jurik@xylab.cz> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Steve Gonczi <gonczi@comcast.net> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/2067 Ported by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-04-27 15:17:23 -07:00
Frederik Wessels	95bcd51ecc	Illumos #1946 : incorrect formatting when listing output of multiple pools with zpool iostat -v Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Joshua M. Clulow <josh@sysmgr.org> Approved by: Richard Lowe <richlowe@richlowe.net> Reference to Illumos issue: https://www.illumos.org/issues/1946 Ported by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-04-27 15:15:37 -07:00
Mike Harsch	187632dcef	Illumos #952 : separate intent logs should be obvious in 'zpool iostat' output Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Approved by: Eric Schrock <eric.schrock@delphix.com> Refererce to Illumos issue: https://www.illumos.org/issues/952 Ported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #607	2012-04-27 15:12:43 -07:00
P.SCH	cf81b00a73	ZFS list snapshot property alias Add support for the `zfs list -t snap` alias which is available under Oracle Solaris 11. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #640	2012-04-11 12:02:46 -07:00
P.SCH	10b75496bb	ZFS snapshot alias For consistency, and because it's handy, add the 'zfs snap' alias which was introduced by Oracle Solaris 11. This includes an update to the man page to reflect all the available alias (snap, umount, and recv). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #640	2012-04-11 12:02:31 -07:00
Richard Yao	847de12271	Print human readable error message for ENOENT A cryptic error code is printed when mounting a legacy dataset to a non-existent mountpoint. This patch changes this behavior to print "mount point '%s' does not exist", which is similar to the error message printed when mounting procfs. The single quotes were added to be consistent with the existing EBUSY error message, which is the only difference between this error message and the one that is printed when the same condition occurs when mounting procfs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #633	2012-04-03 10:24:34 -07:00
Craig Sanders	9fc60702c6	Remove hard-coded 80 column output When stdout is detected to be a tty use the number of columns specified by the terminal. If that fails fall back to a default 80 column width. In the non-tty case allow for 999 column lines. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-27 15:01:08 -07:00
Brian Behlendorf	f47e1351db	Fix executable permissions Caught by lint, this permission change was accidentally introduced by commit `42cb3819f1`. Restore the correct permissions and while I'm at it add a missing whack-bang to config/ltmain.sh. lint: executable-not-elf-or-script: zpool_main.c zfs_main.c Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #620	2012-03-26 11:52:44 -07:00
Brian Behlendorf	1c5de20ae2	Add --enable-debug-dmu-tx configure option Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-23 12:25:17 -07:00
Brian Behlendorf	ebe7e575ea	Add .zfs control directory Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. ) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. ) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. ) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. ) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <rohan.puri15@gmail.com> Contributiobs-by: Andrew Barnes <barnes333@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #173	2012-03-22 13:03:47 -07:00
Gregor Kopka	42cb3819f1	Use stderr for 'no pools/datasets available' error The 'zfs list' and 'zpool list' commands output the message 'no datasets/pools available' to stdout. This should go to stderr and only the available datasets/pools should go to stdout. Returning nothing to stdout is expected behavior when there is nothing to list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #581	2012-03-15 10:24:00 -07:00
Brian Behlendorf	4b787d75c8	Cleanly support debug packages Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 14:08:17 -08:00
Ned Bass	3a4f6caf08	Return success from check_slice() if device doesn't exist When creating a new pool, make_root_vdev() calls check_in_use() to ensure that none of the consituent disks are in use. If the disk contains a valid vdev label it is read to retrieve the list of its child vdevs and these are checked recursively. However, the partitions stored in the vdev label my no longer exist, for example if the partition table has since been altered. In any such case we would want the pool creation to proceed, so this change removes the check from check_slice() that returns an error if the device doesn't exist. As an added assurance, the Solaris implementation also returns sucess on ENOENT. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 08:52:38 -08:00
Etienne Dechamps	30930fba21	Add support for DISCARD to ZVOLs. DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:19:38 -08:00
Etienne Dechamps	cb2d19010d	Support the fallocate() file operation. Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 16:19:32 -08:00
Etienne Dechamps	34037afe24	Improve ZVOL queue behavior. The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	b18019d2d8	Fix synchronicity for ZVOLs. zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does not mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not mean that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Brian Behlendorf	47621f3d76	Linux 3.3 compat, sops->show_options() The second argument of sops->show_options() was changed from a 'struct vfsmount ' to a 'struct dentry '. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #549	2012-02-03 10:02:01 -08:00
Darik Horn	750562833f	Combine libraries: spl, avl, efi, share, unicode. These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #430	2012-01-17 15:19:50 -08:00
Suman Chakravartula	e18be9a637	Add overlay(-O) mount option support Linux supports mounting over non-empty directories by default. In Solaris this is not the case and -O option is required for zfs mount to mount a zfs filesystem over a non-empty directory. For compatibility, I've added support for -O option to mount zfs filesystems over non-empty directories if the user wants to, just like in Solaris. I've defined MS_OVERLAY to record it in the flags variable if the -O option is supplied. The flags variable passes through a few functions and its checked before performing the empty directory check in zfs_mount function. If -O is given, the check is not performed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #473	2012-01-12 15:49:38 -08:00
Darik Horn	b97f368d04	Avoid using awk in the zpool_id script. Some implementations of `awk` incorrectly parse the \< and \> regex symbols, so use a `while read` loop and regular globbing instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #259	2012-01-11 11:56:56 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Darik Horn	afd7da0ce7	Add LIBSELINUX to mount_zfs_LDFLAGS. Regenerating the autotools configuration on Debian and Ubuntu systems causes compilation to fail with this error message: cmd/mount_zfs/../../cmd/mount_zfs/mount_zfs.c:403: undefined reference to `is_selinux_enabled' In the automake template, set "mount_zfs_LDFLAGS = ... $(LIBSELINUX)" so that the /sbin/mount.zfs utility is linked to libselinux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-16 20:04:42 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Darik Horn	db7c1771da	Demote the whackbang in the zpool_id script. The zpool_id script is posixly correct and does not use bash features, so change its whackbang from /bin/bash to /bin/sh. Debian policy also stipulates that system scripts be dash compatible. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-05 09:48:18 -08:00
Darik Horn	87193e2b61	Demote egrep to grep in the zpool_id script. Direct invocation of GNU egrep is deprecated by its man page, and the its argument in the zpool_id script is not an extended expression. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-05 09:48:01 -08:00
Darik Horn	04bf5ecc1f	Quote variables in the zpool_id script. For consistency and safety, quote all variables in the zpool_id script. This accomodates a `-c CONFIG` parameter value with whitespace in the path name. Also fix a typo in the usage synopsis for `-h`. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #439	2011-12-05 09:47:03 -08:00
Darik Horn	9c8254f6f9	Support path_id changes in udev 174. The /lib/udev/path_id helper became a builtin command in the udev 174 release, so test whether path_id is external in the zpool_id script. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #429	2011-12-05 09:46:48 -08:00
Brian Behlendorf	5547c2f1bf	Simplify BDI integration Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2011-11-08 10:19:03 -08:00
Darik Horn	3cee2262a6	Change sun.com URLs to zfsonlinux.org ZFS contains error messages that point to the defunct www.sun.com domain, which is currently offline. Change these error messages to use the zfsonlinux.org mirror instead. This commit depends on: zfsonlinux/zfsonlinux.github.com@8e10ead3dc Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-10-24 09:52:21 -07:00
Brian Behlendorf	86f35f34f4	Export symbols for the VFS API Export all symbols already marked extern in the zfs_vfsops.h header. Several non-static symbols have also been added to the header and exportewd. This allows external modules to more easily create and manipulate properly created ZFS filesystem type datasets. Rename zfsvfs_teardown() to zfs_sb_teardown and export it. This is done simply for consistency with the rest of the code base. All other zfsvfs_* functions have already been renamed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-10-11 10:25:59 -07:00
Brian Behlendorf	c70602f1ea	Fix uninitialized varible in zfs_do_userspace() When compiling under Debian Lenny with gcc version 4.3.2 (Debian 4.3.2-1.1) the following warning occurs. To quiet the warning initialize 'error' to zero. Newer versions of gcc correctly determine that this uninitialized varible is impossible because ZFS_NUM_USERQUOTA_PROPS is known to be greater than zero. cmd/zfs/zfs_main.c:2377: warning: "error" may be used uninitialized in this function Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-09-27 16:56:38 -07:00
Brian Behlendorf	4c069d3494	Fixed uninitialized variable This warning was accidentally introduced by commit b7936d5c2337bc976ac831c1c38de563844c36b. The fix is to simply initialize the variable to ZFS_DELEG_WHO_UNKNOWN. cmd/zfs/zfs_main.c:4460:25: warning: 'who_type' may be used uninitialized in this function Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-08-19 16:26:06 -07:00
Brian Behlendorf	29b35200a7	Fix missing format arguments These warnings were accidentally introduced by commit b7936d5c2337bc976ac831c1c38de563844c36b. The fix is to simply add the missing format specifier. cmd/zfs/zfs_main.c:4565: warning: format not a string literal and no format arguments Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-08-19 15:16:34 -07:00
Brian Behlendorf	b740d602bd	Disable zfs /etc/mtab updates Completely disable the zfs binary from attempting to directly update /etc/mtab. The Linux port relies entirely on the mount.zfs helper to safely update /etc/mtab. If we left the /etc/mtab updates to the zfs binary then they could race with concurrent non-zfs mounts. Routing everything through the system mount command ensures the /etc/mtab updates are locked properly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #329	2011-08-19 11:53:01 -07:00
Brian Behlendorf	de0a1c099b	Autogen refresh for udev changes Run autogen.sh using the same autotools versions as upstream: * autoconf-2.63 * automake-1.11.1 * libtool-2.2.6b	2011-08-08 16:30:27 -07:00
Kyle Fuller	12d06bac9b	Move udev rules from /etc/udev to /lib/udev This change moves the default install location for the zfs udev rules from /etc/udev/ to /lib/udev/. The correct convention is for rules provided by a package to be installed in /lib/udev/. The /etc/udev/ directory is reserved for custom rules or local overrides. Additionally, this patch cleans up some abuse of the bindir install location by adding a udevdir and udevruledir install directories. This allows us to revert to the default bin install location. The udev install directories can be set with the following new options. --with-udevdir=DIR install udev helpers [EPREFIX/lib/udev] --with-udevruledir=DIR install udev rules [UDEVDIR/rules.d] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #356	2011-08-08 16:21:10 -07:00
Brian Behlendorf	76659dc110	Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2011-08-04 13:37:38 -07:00
Alexander Stetsenko	0b7936d5c2	Illumos #278 : get rid zfs of python and pyzfs dependencies Remove all python and pyzfs dependencies for consistency and to ensure full functionality even in a mimimalist environment. Reviewed by: gordon.w.ross@gmail.com Reviewed by: trisk@opensolaris.org Reviewed by: alexander.r.eremin@gmail.com Reviewed by: jerry.jelinek@joyent.com Approved by: garrett@nexenta.com References to Illumos issue and patch: - https://www.illumos.org/issues/278 - https://github.com/illumos/illumos-gate/commit/1af68beac3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340 Issue #160 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-08-01 12:09:36 -07:00
Eric Schrock	3e31d2b080	Illumos #883 : ZIL reuse during remount corruption Moving the zil_free() cleanup to zil_close() prevents this problem from occurring in the first place. There is a very good description of the issue and fix in Illumus #883. Reviewed by: Matt Ahrens <Matt.Ahrens@delphix.com> Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Reivewed by: Dan McDonald <danmcd@nexenta.com> Approved by: Gordon Ross <gwr@nexenta.com> References to Illumos issue and patch: - https://www.illumos.org/issues/883 - https://github.com/illumos/illumos-gate/commit/c9ba2a43cb Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340	2011-08-01 12:09:11 -07:00
George Wilson	6d974228ef	Illumos #1051 : zfs should handle imbalanced luns Today zfs tries to allocate blocks evenly across all devices. This means when devices are imbalanced zfs will use lots of CPU searching for space on devices which tend to be pretty full. It should instead fail quickly on the full LUNs and move onto devices which have more availability. Reviewed by: Eric Schrock <Eric.Schrock@delphix.com> Reviewed by: Matt Ahrens <Matt.Ahrens@delphix.com> Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Approved by: Garrett D'Amore <garrett@nexenta.com> References to Illumos issue and patch: - https://www.illumos.org/issues/510 - https://github.com/illumos/illumos-gate/commit/5ead3ed965 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340	2011-08-01 12:09:11 -07:00
Shampavman	bb939d1085	Illumos #510 : 'zfs get' enhancement - mountpoint as an argument The 'zfs get' command should be able to deal with mountpoint as an argument. It already works with 'zfs list' command: # zfs list /export/home/estibi NAME USED AVAIL REFER MOUNTPOINT rpool/export/home/estibi 1.14G 3.86G 1.14G /export/home/estibi but it fails with 'zfs get': # zfs get all /export/home/estibi cannot open '/export/home/estibi': invalid dataset name Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Deano <deano@rattie.demon.co.uk> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Approved by: Garrett D'Amore <garrett@nexenta.com> References to Illumos issue and patch: - https://www.illumos.org/issues/510 - https://github.com/illumos/illumos-gate/commit/5ead3ed965 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340	2011-08-01 12:09:11 -07:00
Kyle Fuller	615ab66d18	Provide a rc.d script for archlinux Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #322	2011-07-11 14:12:23 -07:00
Brian Behlendorf	e0f86c9862	Update 'zfs send' documentation The -D and -p options were missing from the manpage. This commit adds documentation for these features. Closes #311	2011-07-08 12:16:09 -07:00
Brian Behlendorf	341b5f1d4c	Update ztest paths Unfortunately, ztest is hard coded to export the zdb utility to be installed in a certain location. When the packaging was updated to install zdb in /sbin/ ztest was broken. To fix this I'm updating ztest to check both common install paths.	2011-07-06 12:30:09 -07:00
Prasad Joshi	5a52105925	Use consistent error message in zpool sub-command The zpool sub-commands like iostat, list, and status should display consistent message when a given pool is unavailable or no pool is present. This change unifies the default behavior as follows: root@prasad:~# ./zpool list 1 2 no pools available no pools available root@prasad:~# ./zpool iostat 1 2 no pools available no pools available root@prasad:~# ./zpool status 1 2 no pools available no pools available root@prasad:~# ./zpool list tan 1 2 cannot open 'tan': no such pool root@prasad:~# ./zpool iostat tan 1 2 cannot open 'tan': no such pool root@prasad:~# ./zpool status tan 1 2 cannot open 'tan': no such pool Reported-by: Rajshree Thorat <rthorat@stec-inc.com> Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #306	2011-07-04 19:57:01 -07:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Brian Behlendorf	5c03efc379	Linux compat 2.6.39: security_inode_init_security() The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187	2011-07-01 12:40:08 -07:00
Prasad Joshi	b312979252	Tear down and flush the mmap region The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Closes #255	2011-06-27 09:59:19 -07:00
Ned A. Bass	560bcf9d14	Multipath device manageability improvements Update udev helper scripts to deal with device-mapper devices created by multipathd. These enhancements are targeted at a particular storage network topology under evaluation at LLNL consisting of two SAS switches providing redundant connectivity between multiple server nodes and disk enclosures. The key to making these systems manageable is to create shortnames for each disk that conveys its physical location in a drawer. In a direct-attached topology we infer a disk's enclosure from the PCI bus number and HBA port number in the by-path name provided by udev. In a switched topology, however, multiple drawers are accessed via a single HBA port. We therefore resort to assigning drawer identifiers based on which switch port a drive's enclosure is connected to. This information is available from sysfs. Add options to zpool_layout to generate an /etc/zfs/zdev.conf using symbolic links in /dev/disk/by-id of the form <label>-<UUID>-switch-port:<X>-slot:<Y>. <label> is a string that depends on the subsystem that created the link and defaults to "dm-uuid-mpath" (this prefix is used by multipathd). <UUID> is a unique identifier for the disk typically obtained from the scsi_id program, and <X> and <Y> denote the switch port and disk slot numbers, respectively. Add a callout script sas_switch_id for use by multipathd to help create symlinks of the form described above. Update zpool_id and the udev zpool rules file to handle both multipath devices and conventional drives.	2011-06-23 10:46:06 -07:00
Christian Kohlschütter	df30f56639	Add "ashift" property to zpool create Some disks with internal sectors larger than 512 bytes (e.g., 4k) can suffer from bad write performance when ashift is not configured correctly. This is caused by the disk not reporting its actual sector size, but a sector size of 512 bytes. The drive may behave this way for compatibility reasons. For example, the WDC WD20EARS disks are known to exhibit this behavior. When creating a zpool, ZFS takes that wrong sector size and sets the "ashift" property accordingly (to 9: 1<<9=512), whereas it should be set to 12 for 4k sectors (1<<12=4096). This patch allows an adminstrator to manual specify the known correct ashift size at 'zpool create' time. This can significantly improve performance in certain cases. However, it will have an impact on your total pool capacity. See the updated ashift property description in the zpool.8 man page for additional details. Valid values for the ashift property range from 9 to 17 (512B-128KB). Additionally, you may set the ashift to 0 if you wish to auto-detect the sector size based on what the disk reports, this is the default behavior. The most common ashift values are 9 and 12. Example: zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd Closes #280 Original-patch-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-06-17 16:35:49 -07:00
Brian Behlendorf	e130330a87	Handle /etc/mtab -> /proc/mounts symlink Under Fedora 15 /etc/mtab is now a symlink to /proc/mounts by default. When /etc/mtab is a symlink the mount.zfs helper should not update it. There was code in place to handle this case but it used stat() which traverses the link and then issues the stat on /proc/mounts. We need to use lstat() to prevent the link traversal and instead stat /etc/mtab. Closes #270	2011-06-14 16:48:38 -07:00
Brian Behlendorf	2e08aedba4	Always check -Wno-unused-but-set-variable gcc support The previous commit `8a7e1ceefa` wasn't quite right. This check applies to both the user and kernel space build and as such we must make sure it runs regardless of what the --with-config option is set too. For example, if --with-config=kernel then the autoconf test does not run and we generate build warnings when compiling the kernel packages.	2011-06-14 16:40:35 -07:00
Brian Behlendorf	8a7e1ceefa	Check for -Wno-unused-but-set-variable gcc support Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's `12c1acde76` and `79713039a2` for the reason the -Wno-unused-but-set-variable options was originally added.	2011-06-14 14:43:22 -07:00
Brian Behlendorf	4804b739e1	Default to internal 'zfs userspace' implementation We will never bring over the pyzfs.py helper script from Solaris to Linux. Instead the missing functionality will be directly integrated in to the zfs commands and libraries. To avoid confusion remove the warning about the missing pyzfs.py utility and simply use the default internal support. The Illumous developers are of the same mind and have proposed an initial patch to do this which has been integrated in to the 'allow' development branch. After some additional testing this code can be merged in to master as the right long term solution.	2011-05-20 10:25:41 -07:00
Brian Behlendorf	6ee44e32be	Fix awk usage The zpool_id and zpool_layout helper scripts have been updated to use the more common /usr/bin/awk symlink. On Fedora/Redhat systems there are both /bin/awk and /usr/bin/awk symlinks to your installed version of awk. On Debian/Ubuntu systems only the /usr/bin/awk symlink exists. Additionally, add the '\<' token to the beginning of the regex pattern to prevent partial matches. This pattern only appears to work with gawk despite the mawk man page claiming to support this extended regex. Thus you will need to have gawk installed to use these optional helper scripts. A comment has been added to the script to reflect this reality.	2011-05-06 10:16:04 -07:00
Brian Behlendorf	3613204cd7	Allow mounting of read-only snapshots With the addition of the mount helper we accidentally regressed the ability to manually mount snapshots. This commit updates the mount helper to expect the possibility of a ZFS_TYPE_SNAPSHOT. All snapshot will be automatically treated as 'legacy' type mounts so they can be mounted manually.	2011-05-05 10:13:38 -07:00
Brian Behlendorf	df554c148e	Fix 'zfs set volsize=N pool/dataset' This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <github@fajar.net> Closes #68 Closes #84	2011-05-02 08:54:40 -07:00
Gunnar Beutner	055656d4f4	Implemented NFS export_operations. Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.	2011-04-29 12:36:13 -07:00
Darik Horn	492b8e9e7b	Use gethostid in the Linux convention. Disable the gethostid() override for Solaris behavior because Linux systems implement the POSIX standard in a way that allows a negative result. Mask the gethostid() result to the lower four bytes, like coreutils does in /usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an eight byte long type. This can cause a spurious hostid mismatch that prevents zpool import on 64-bit systems.	2011-04-25 10:36:17 -05:00
Brian Behlendorf	12c1acde76	Set -Wno-unused-but-set-variable globally As of gcc-4.6 the option -Wunused-but-set-variable is enabled by default. While this is a useful warning there are numerous places in the ZFS code when a variable is set and then only checked in an ASSERT(). To avoid having to update every instance of this in the code we now set -Wno-unused-but-set-variable to suppress the warning. Additionally, when building with --enable-debug and -Werror set these warning also become fatal. We can reevaluate the suppression of these error at a later time if it becomes an issue. For now we are basically just reverting to the previous gcc behavior.	2011-04-19 10:44:10 -07:00
Brian Behlendorf	03514b0110	Fix gcc compiler warning, parse_option() When compiling ZFS in user space gcc-4.6.0 correctly identifies the variable 'value' as being set but never used. This generates a warning and a build failure when using --enable-debug. Once again this is correct but I'm reluctant to remove 'value' because we are breaking the string in to name/value pairs. While it is not used now there's a good chance it will be soon and I'd rather not have to reinvent this. To suppress the warning with just as a VERIFY(). This was observed under Fedora 15. cmd/mount_zfs/mount_zfs.c: In function ‘parse_option’: cmd/mount_zfs/mount_zfs.c:112:21: error: variable ‘value’ set but not used [-Werror=unused-but-set-variable]	2011-04-19 09:04:51 -07:00
Manuel Amador (Rudd-O)	8610b52bd4	Added .gitignore for mount.zfs and zvol_id Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-04-07 10:28:38 -07:00
Ned Bass	fa417e57a6	Call udevadm trigger more safely Some udev hooks are not designed to be idempotent, so calling udevadm trigger outside of the distribution's initialization scripts can have unexpected (and potentially dangerous) side effects. For example, the system time may change or devices may appear multiple times. See Ubuntu launchpad bug 320200 and this mailing list post for more details: https://lists.ubuntu.com/archives/ubuntu-devel/2009-January/027260.html To avoid these problems we call udevadm trigger with --action=change --subsystem-match=block. The first argument tells udev just to refresh devices, and make sure everything's as it should be. The second argument limits the scope to block devices, so devices belonging to other subsystems cannot be affected. This doesn't fix the problem on older udev implementations that don't provide udevadm but instead have udevtrigger as a standalone program. In this case the above options aren't available so there's no way to call call udevtrigger safely. But we can live with that since this issue only exists in optional test and helper scripts, and most zfs-on-linux users are running newer systems anyways.	2011-04-05 13:00:51 -07:00
Fajar A. Nugraha	a5729f7b22	Fixes to enable zvol symlink creation This commit fixes issue on https://github.com/behlendorf/zfs/issues/#issue/172 Changes: - update BLKZNAME to use _IOR instead of _IO. Kernel 2.6.32 allows read parameters (copy_to_user) with _IO, while newer kernels (tested Archlinux's 2.6.37 kernel) enforces _IOR (which is correct) - fix return code and message on error Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-03-24 11:48:18 -07:00
Brian Behlendorf	bdf4328b04	Linux 2.6.28 compat, insert_inode_locked() Added insert_inode_locked() helper function, prior to this most callers used insert_inode_hash(). The older method doesn't check for collisions in the inode_hashtable but it still acceptible for use. Fallback to using insert_inode_hash() when insert_inode_locked() is unavailable.	2011-03-22 12:15:54 -07:00
Brian Behlendorf	f47c42e214	Merge branch 'dracut'	2011-03-22 12:13:04 -07:00
Brian Behlendorf	716895b161	Fix 'LDFLAGS=-Wl,--as-needed' build error Compiling with 'LDFLAGS=-Wl,--as-needed' exposed the fact that there were some library linking problems introduced by mount_zfs. In particular, the libzfs library does use nvpair symbols, and mount_zfs contains no dependencies on libzpool. Closes #161 Closes #162	2011-03-18 14:47:19 -07:00
Brian Behlendorf	ec49a5f0ec	Fix getcwd() warning New versions glibc declare getcwd() with the warn_unused_result attribute. This results in a warning because the updated mount helper was not checking this return value. This issue was fixed by checking the return type and in the case of an error simply returning the passed dataset. One possible, but unlikely, error would be having your cwd directory unlinked while the mount command was running. cmd/mount_zfs/mount_zfs.c: In function ‘parse_dataset’: cmd/mount_zfs/mount_zfs.c:223:2: error: ignoring return value of ‘getcwd’, declared with attribute warn_unused_result	2011-03-18 13:54:49 -07:00
Brian Behlendorf	01c0e61da0	Add init scripts To support automatically mounting your zfs on filesystem on boot a basic init script is needed. Unfortunately, every distribution has their own idea of the _right_ way to do things. Rather than write one very complicated portable init script, which would be invariably replaced by the distributions own anyway. I have instead added support to provide multiple distribution specific init scripts. The correct init script for your distribution will be selected by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT. During 'make install' the correct script for your system will be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the usual /etc/init.d/zfs location. Currently, there is zfs.fedora and a more generic zfs.lsb init script. Hopefully, the distribution maintainers who know best how they want their init scripts to function will feedback their approved versions to be included in the project. This change does not consider upstart jobs but I'm not at all opposed to add that sort of thing.	2011-03-17 16:51:54 -07:00
Brian Behlendorf	3aff775555	Strip 'zfsutil,remount' from /etc/mtab When updating /etc/mtab we should be careful and strip certain options. In particular, we need to strip 'zfsutil' because if we don't the mount utility will helpfull provide it to the mount helper when we issue mount(8) again. This subverts the check that the caller is zfs(8) and not mount(8).	2011-03-15 13:33:29 -07:00
Brian Behlendorf	093aa69286	Always allow '-o remount,ro' Allow the mount(8) utility to always operate on all datasets when remounting them read-only. This critical for rc.sysinit/umountroot which remounts the root filesystem read-only during shutdown to ensure everything is correctly flushed to disk. Fix minor typo, the check to set zfsutil should use the bitwise '&'. I must have accidentally hit the adjacent '*' and obviously neither the compiler or my code review caught this. Fix it now.	2011-03-15 13:33:29 -07:00
Brian Behlendorf	a6cba65cca	Check for trailing '/' in mount.zfs When run with a root '/' cwd the mount.zfs helper would strip not only the '/' but also the next character from the dataset name. For example, '/tank' was changed to 'ank' instead of just 'tank'. Originally, this was done for the '/tmp' cwd case where we needed to strip the '/' following the cwd. For example '/tmp/tank' needed to remove the '/tmp' cwd plus 1 character for the '/'. This change fixes the problem by checking the cwd and if it ends in a '/' it does not strip and extra character. Otherwise it will strip the next character. I believe this should only ever be true for the root directory. Closes #148	2011-03-10 12:58:44 -08:00
Brian Behlendorf	d53368f675	Fix mount helper Several issues related to strange mount/umount behavior were reported and this commit should address most of them. The original idea was to put in place a zfs mount helper (mount.zfs). This helper is used to enforce 'legacy' mount behavior, and perform any extra mount argument processing (selinux, zfsutil, etc). This helper wasn't ready for the 0.6.0-rc1 release but with this change it's functional but needs to extensively tested. This change addresses the following open issues. Closes #101 Closes #107 Closes #113 Closes #115 Closes #119	2011-03-09 15:26:48 -08:00
Brian Behlendorf	321a498b95	Add xvattr support With the removal of the minimal xvattr support from the spl this support needs to be replaced in the zfs package. This is fairly easily accomplished by directly adding portions of the sys/vnode.h header from OpenSolaris. These xvattr additions have been placed in the sys/xvattr.h header file and included as needed where simply a sys/vnode.h was included before. In additon to the xvattr types and helper macros two functions were also included. The xva_init() and xva_getxoptattr() functions were included as static inline functions in xvattr.h. They are simple enough and it was simpler to place them here rather than in their own .c file.	2011-03-02 11:43:50 -08:00
Fajar A. Nugraha	4c0d8e50b9	Use udev to create /dev/zvol/[dataset_name] links This commit allows zvols with names longer than 32 characters, which fixes issue on https://github.com/behlendorf/zfs/issues/#issue/102. Changes include: - use /dev/zd* device names for zvol, where * is the device minor (include/sys/fs/zfs.h, module/zfs/zvol.c). - add BLKZNAME ioctl to get dataset name from userland (include/sys/fs/zfs.h, module/zfs/zvol.c, cmd/zvol_id). - add udev rule to create /dev/zvol/[dataset_name] and the legacy /dev/[dataset_name] symlink. For partitions on zvol, it will create /dev/zvol/[dataset_name]-part* (etc/udev/rules.d/60-zvol.rules, cmd/zvol_id). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-02-25 09:43:19 -08:00
Brian Behlendorf	718d77f622	Fix uninitialized variable It was possible for rc to be unitialized in the parse_options() function which triggered a compiler warning. Ensure rc is always initialized.	2011-02-23 12:57:25 -08:00
Brian Behlendorf	45066d1f20	Linux 2.6.38 compat, blkdev_get_by_path() The open_bdev_exclusive() function has been replaced (again) by the more generic blkdev_get_by_path() function. Additionally, the counterpart function close_bdev_exclusive() has been replaced by blkdev_put(). Because these functions are more generic versions of the functions they replaced the compatibility macro must add the FMODE_EXCL mask to ensure they are exclusive. Closes #114	2011-02-23 12:29:38 -08:00
Brian Behlendorf	2c395def27	Linux 2.6.36 compat, sops->evict_inode() The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.	2011-02-11 13:47:51 -08:00
Brian Behlendorf	7268e1bec8	Linux 2.6.35 compat, fops->fsync() The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.	2011-02-11 09:05:51 -08:00
Brian Behlendorf	777d4af891	Linux 2.6.35 compat, const struct xattr_handler The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.	2011-02-10 16:29:00 -08:00
Brian Behlendorf	afffb5cd10	MS_DIRSYNC and MS_REC compat It turns out that older versions of the glibc headers do not properly define MS_DIRSYNC despite it being explicitly mentioned in the man pages. They instead call it S_WRITE, so for system where this is not correct defined map MS_DIRSYNC to S_WRITE. At the time of this commit both Ubuntu Lucid, and Debian Squeeze both use the out of date glibc headers. As for MS_REC this field is also not available in the older headers. Since there is no obvious mapping in this case we simply disable the recursive mount option which used it.	2011-02-10 12:14:57 -08:00
Brian Behlendorf	1ac0ea38a5	Add missing -ldl linker option The inclusion on dlsym(), dlopen(), and dlclose() symbols require us to link against the dl library. Be careful to add the flag to both the libzfs library and the commands which depend on the library.	2011-02-10 11:05:44 -08:00
Brian Behlendorf	b4ead57cfb	Remove HAVE_ZPL from commands and libraries Thanks to the previous few commits we can now build all of the user space commands and libraries with support for the zpl.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	9a616b5d17	Documentation updates Minor Linux specific documentation updates to the comments and man pages.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	c5d915f423	Minimal libshare infrastructure ZFS even under Solaris does not strictly require libshare to be available. The current implementation attempts to dlopen() the library to access the needed symbols. If this fails libshare support is simply disabled. This means that on Linux we only need the most minimal libshare implementation. In fact just enough to prevent the build from failing. Longer term we can decide if we want to implement a libshare library like Solaris. At best this would be an abstraction layer between ZFS and NFS/SMB. Alternately, we can drop libshare entirely and directly integrate ZFS with Linux's NFS/SMB. Finally the bare bones user-libshare.m4 test was dropped. If we do decide to implement libshare at some point it will surely be as part of this package so the check is not needed.	2011-02-04 16:14:29 -08:00
Brian Behlendorf	3fb1fcdea1	Add 'zfs mount' support By design the zfs utility is supposed to handle mounting and unmounting a zfs filesystem. We could allow zfs to do this directly. There are system calls available to mount/umount a filesystem. And there are library calls available to manipulate /etc/mtab. But there are a couple very good reasons not to take this appraoch... for now. Instead of directly calling the system and library calls to (u)mount the filesystem we fork and exec a (u)mount process. The principle reason for this is to delegate the responsibility for locking and updating /etc/mtab to (u)mount(8). This ensures maximum portability and ensures the right locking scheme for your version of (u)mount will be used. If we didn't do this we would have to resort to an autoconf test to determine what locking mechanism is used. The downside to using mount(8) instead of mount(2) is that we lose the exact errno which was returned by the kernel. The return code from mount(8) provides some insight in to what went wrong but it not quite as good. For the moment this is translated as a best guess in to a errno for the higher layers of zfs. In the long term a shared library called libmount is under development which provides a common API to address the locking and errno issues. Once the standard mount utility has been updated to use this library we can then leverage it. Until then this is the only safe solution. http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html	2011-02-04 16:11:58 -08:00
Brian Behlendorf	95c4cae39f	Disable umount.zfs helper For the moment, the only advantage in registering a umount helper would be to automatically unshare a zfs filesystem. Since under Linux this would be unexpected (but nice) behavior there is no harm in disabling it. This is desirable because the 'zfs unmount' path invokes the system umount. This is done to ensure correct mtab locking but has the side effect that the umount.zfs helper would be called if it exists. By default this helper calls back in to zfs to do the unmount on Solaris which we don't want under Linux. Once libmount is available and we have a safe way to correctly lock and update the /etc/mtab file we can reconsider the need for a umount helper. Using libmount is the prefered solution.	2011-01-28 12:47:57 -08:00
Brian Behlendorf	3b8cfee8af	Enable mount.zfs helper While not strictly required to mount a zfs filesystem using a mount helper has certain advantages. First, we need it if we want to honor the mount behavior as found on Solaris. As part of the mount we need to validate that the dataset has the legacy mount property set if we are using 'mount' instead of 'zfs mount'. Secondly, by using a mount helper we can automatically load the zpl kernel module. This way you can just issue a 'mount' or 'zfs mount' and it will just work. Finally, it gives us common hook in user space to add any zfs specific mount options we might want. At the moment we don't have any but now the infrastructure is at least in place.	2011-01-28 12:47:57 -08:00
Brian Behlendorf	b3259b6a2b	Autoconf selinux support If libselinux is detected on your system at configure time link against it. This allows us to use a library call to detect if selinux is enabled and if it is to pass the mount option: "context=\"system_u:object_r:file_t:s0" For now this is required because none of the existing selinux policies are aware of the zfs filesystem type. Because of this they do not properly enable xattr based labeling even though zfs supports all of the required hooks. Until distro's add zfs as a known xattr friendly fs type we must use mntpoint labeling. Alternately, end users could modify their existing selinux policy with a little guidance.	2011-01-28 12:45:19 -08:00
Brian Behlendorf	149e873ab1	Fix minor compiler warnings These compiler warnings were introduced when code which was previously #ifdef'ed out by HAVE_ZPL was re-added for use by the posix layer. All of the following changes should be obviously correct and will cause no semantic changes.	2011-01-06 15:04:28 -08:00
Ricardo M. Correia	8d4e8140ef	Fix block device-related issues in zdb. Specifically, this fixes the two following errors in zdb when a pool is composed of block devices: 1) 'Value too large for defined data type' when running 'zdb <dataset>'. 2) 'character device required' when running 'zdb -l <block-device>'. Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-12-14 09:52:46 -08:00
Ned Bass	3ee56c292b	Make rollbacks fail gracefully Support for rolling back datasets require a functional ZPL, which we currently do not have. The zfs command does not check for ZPL support before attempting a rollback, and in preparation for rolling back a zvol it removes the minor node of the device. To prevent the zvol device node from disappearing after a failed rollback operation, this change wraps the zfs_do_rollback() function in an #ifdef HAVE_ZPL and returns ENOSYS in the absence of a ZPL. This is consistent with the behavior of other ZPL dependent commands such as mount. The orginal error message observed with this bug was rather confusing: internal error: Unknown error 524 Aborted This was because zfs_ioc_rollback() returns ENOTSUP if we don't HAVE_ZPL, but Linux actually has no such error code. It should instead return EOPNOTSUPP, as that is how ENOTSUP is defined in user space. With that we would have gotten the somewhat more helpful message cannot rollback 'tank/fish': unsupported version This is rather a moot point with the above changes since we will no longer make that ioctl call without a ZPL. But, this change updates the error code just in case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-08 14:03:36 -08:00
Ned Bass	d877ac6bfe	Fix intermittent 'zpool add' failures Creating whole-disk vdevs can intermittently fail if a udev-managed symlink to the disk partition is already in place. To avoid this, we now remove any such symlink before partitioning the disk. This makes zpool_label_disk_wait() truly wait for the new link to show up instead of returning if it finds an old link still in place. Otherwise there is a window between when udev deletes and recreates the link during which access attempts will fail with ENOENT. Also, clean up a comment about waiting for udev to create symlinks. It no longer needs to describe the special cases for the link names, since that is now handled in a separate helper function. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:38:58 -07:00
Ned Bass	4682b8c14e	Remove solaris-specific code from make_leaf_vdev() Portability between Solaris and Linux isn't really an issue for us anymore, and removing sections like this one helps simplify the code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:58 -07:00
Ned Bass	79e7242a91	Add helper functions for manipulating device names This change adds two helper functions for working with vdev names and paths. zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid, /dev/disk/zpool. This was previously done only in the function is_shorthand_path(), but we need a general helper function to implement shorthand names for additional zpool subcommands like remove. is_shorthand_path() is accordingly updated to call the helper function. There is a minor change in the way zfs_resolve_shortname() tests if a file exists. is_shorthand_path() effectively used open() and stat64() to test for file existence, since its scope includes testing if a device is a whole disk and collecting file status information. zfs_resolve_shortname(), on the other hand, only uses access() to test for existence and leaves it to the caller to perform any additional file operations. This seemed like the most general and lightweight approach, and still preserves the semantics of is_shorthand_path(). zfs_append_partition() appends a partition suffix to a device path. This should be used to generate the name of a whole disk as it is stored in the vdev label. The user-visible names of whole disks do not contain the partition information, while the name in the vdev label does. The code was lifted from the function make_disks(), which now just calls the helper function. Again, having a helper function to do this supports general handling of shorthand names in the user interface. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:30 -07:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	c5343ba71b	Fix 'zpool events' formatting for awk To make the 'zpool events' output simple to parse with awk the extra newline after embedded nvlists has been dropped. This allows the entire event to be parsed as a single whitespace seperated record. The -H option has been added to operate in scripted mode. For the 'zpool events' command this means don't print the header. The usage of -H is consistent with scripted mode for other zpool commands.	2010-10-12 14:55:01 -07:00
Ned Bass	5c1bad0013	Fix undersized buffer in is_shorthand_path() The string array 'char dirs[5][8]' was too small to accomodate the terminating NUL character in "by-label". This change adds the needed additional byte. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-12 14:47:39 -07:00
Brian Behlendorf	a5b4d63582	Add [-m map] option to zpool_layout By default the zpool_layout command would always use the slot number assigned by Linux when generating the zdev.conf file. This is a reasonable default there are cases when it makes sense to remap the slot id assigned by Linux using your own custom mapping. This commit adds support to zpool_layout to provide a custom slot mapping file. The file contains in the first column the Linux slot it and in the second column the custom slot mapping. By passing this map file with '-m map' to zpool_config the mapping will be applied when generating zdev.conf. Additionally, two sample mapping have been added which reflect different ways to map the slots in the dragon drawers.	2010-09-17 11:02:19 -07:00
Brian Behlendorf	6283f55ea1	Support custom build directories and move includes One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.	2010-09-08 12:38:56 -07:00
Brian Behlendorf	e70e591c51	Add initial autoconf products Add the initial products from autogen.sh. These products will be updated incrementally after this point as development occurs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:02 -07:00
Brian Behlendorf	0e8d1b2d8b	Add linux ztest support Minor changes to ztest for this environment. These including updating ztest to run in the local development tree, as well as relocating some local variables in this function to the heap. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:02 -07:00
Brian Behlendorf	302ef1517e	Add linux zpios support Linux kernel implementation of PIOS test app. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:01 -07:00
Brian Behlendorf	9b020fd97a	Add linux user util support This topic branch contains required changes to the user space utilities to allow them to integrate cleanly with Linux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:01 -07:00
Brian Behlendorf	d603ed6c27	Add linux user disk support This topic branch contains all the changes needed to integrate the user side zfs tools with Linux style devices. Primarily this includes fixing up the Solaris libefi library to be Linux friendly, and integrating with the libblkid library which is provided by e2fsprogs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:00 -07:00
Brian Behlendorf	266852767f	Add linux events This topic branch leverages the Solaris style FMA call points in ZFS to create a user space visible event notification system under Linux. This new system is called zevent and it unifies all previous Solaris style ereports and sysevent notifications. Under this Linux specific scheme when a sysevent or ereport event occurs an nvlist describing the event is created which looks almost exactly like a Solaris ereport. These events are queued up in the kernel when they occur and conditionally logged to the console. It is then up to a user space application to consume the events and do whatever it likes with them. To make this possible the existing /dev/zfs ABI has been extended with two new ioctls which behave as follows. * ZFS_IOC_EVENTS_NEXT Get the next pending event. The kernel will keep track of the last event consumed by the file descriptor and provide the next one if available. If no new events are available the ioctl() will block waiting for the next event. This ioctl may also be called in a non-blocking mode by setting zc.zc_guid = ZEVENT_NONBLOCK. In the non-blocking case if no events are available ENOENT will be returned. It is possible that ESHUTDOWN will be returned if the ioctl() is called while module unloading is in progress. And finally ENOMEM may occur if the provided nvlist buffer is not large enough to contain the entire event. * ZFS_IOC_EVENTS_CLEAR Clear are events queued by the kernel. The kernel will keep a fairly large number of recent events queued, use this ioctl to clear the in kernel list. This will effect all user space processes consuming events. The zpool command has been extended to use this events ABI with the 'events' subcommand. You may run 'zpool events -v' to output a verbose log of all recent events. This is very similar to the Solaris 'fmdump -ev' command with the key difference being it also includes what would be considered sysevents under Solaris. You may also run in follow mode with the '-f' option. To clear the in kernel event queue use the '-c' option. $ sudo cmd/zpool/zpool events -fv TIME CLASS May 13 2010 16:31:15.777711000 ereport.fs.zfs.config.sync class = "ereport.fs.zfs.config.sync" ena = 0x40982b7897700001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0xed976600de75dfa6 (end detector) time = 0x4bec8bc3 0x2e5aed98 pool = "zpios" pool_guid = 0xed976600de75dfa6 pool_context = 0x0 While the 'zpool events' command is handy for interactive debugging it is not expected to be the primary consumer of zevents. This ABI was primarily added to facilitate the addition of a user space monitoring daemon. This daemon would consume all events posted by the kernel and based on the type of event perform an action. For most events simply forwarding them on to syslog is likely enough. But this interface also cleanly allows for more sophisticated actions to be taken such as generating an email for a failed drive. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:36 -07:00
Brian Behlendorf	c9c0d073da	Add build system Add autoconf style build infrastructure to the ZFS tree. This includes autogen.sh, configure.ac, m4 macros, some scripts/*, and makefiles for all the core ZFS components.	2010-08-31 13:41:27 -07:00
Brian Behlendorf	40b84e7aec	Fix stack ztest While ztest does run in user space we run it with the same stack restrictions it would have in kernel space. This ensures that any stack related issues which would be hit in the kernel can be caught and debugged in user space instead. This patch is a first pass to limit the stack usage of every ztest function to 1024 bytes. Subsequent updates can further reduce this. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:50 -07:00
Brian Behlendorf	1e33ac1e26	Fix Solaris thread dependency by using pthreads This is a portability change which removes the dependence of the Solaris thread library. All locations where Solaris thread API was used before have been replaced with equivilant Solaris kernel style thread calls. In user space the kernel style threading API is implemented in term of the portable pthreads library. This includes all threads, mutexs, condition variables, reader/writer locks, and taskqs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:47 -07:00
Brian Behlendorf	235db0acea	Fix deadcode Remove deadcode. It's possible the code should be in use somewhere, but as the source code is laid out it currently is not. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:44 -07:00
Ricardo M. Correia	090ff0929e	Fix commit callbacks The upstream commit cb code had a few bugs: 1) The arguments of the list_move_tail() call in txg_dispatch_callbacks() were reversed by mistake. This caused the commit callbacks to not be called at all. 2) ztest had a bug in ztest_dmu_commit_callbacks() where "error" was not initialized correctly. This seems to have caused the test to always take the simulated error code path, which made ztest unable to detect whether commit cbs were being called for transactions that successfuly complete. 3) ztest had another bug in ztest_dmu_commit_callbacks() where the commit cb threshold was not being compared correctly. 4) The commit cb taskq was using 'max_ncpus * 2' as the maxalloc argument of taskq_create(), which could have caused unnecessary delays in the txg sync thread. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:44 -07:00
Brian Behlendorf	d4ed667343	Fix gcc uninitialized variable warnings Gcc -Wall warn: 'uninitialized variable' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:43 -07:00
Brian Behlendorf	1fde1e3720	Fix gcc unused variable warnings Gcc -Wall warn: 'unused variable' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:43 -07:00
Brian Behlendorf	c65aa5b2b9	Fix gcc missing parenthesis warnings Gcc -Wall warn: 'missing parenthesis' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:35 -07:00
Brian Behlendorf	e75c13c353	Fix gcc missing case warnings Gcc ASSERT() missing cases are impossible Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:03 -07:00
Brian Behlendorf	2598c0012d	Fix gcc missing braces warnings Resolve compiler warnings concerning missing braces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:03 -07:00
Brian Behlendorf	0bc8fd7884	Fix gcc invalid prototype warnings Gcc -Wall warn: 'invalid prototype' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:03 -07:00
Ricardo M. Correia	e5dc681a50	Fix gcc ident pragma warnings Remove all ident pragmas which are unknown to gcc. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:02 -07:00
Brian Behlendorf	0e5b68e015	Fix gcc fortify source warnings Resolve issues uncovered by -D_FORTIFY_SOURCE=2, the default redhat macro's file adds this option to the cflags. This causes warnings of the following type designed to keep the developer honest: warning: ignoring return value of 'foo', declared with attribute warn_unused_result The short term fix is to wrap these calls in VERIFY() to check the return code. The code was already assusing these would never fail. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:02 -07:00
Brian Behlendorf	b8864a233c	Fix gcc cast warnings Gcc -Wall warn: 'lacks a cast' Gcc -Wall warn: 'comparison between pointer and integer' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:33:32 -07:00

... 4 5 6 7 8 ...

561 Commits