mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-04-17 08:54:52 +03:00

Author	SHA1	Message	Date
Tim Chase	e890dd85a7	Produce a full snapshot list for zfs send -p In order to accelerate zfs receive operations in the face of many property-containing snapshots, commit `0574855` changed the header nvlist ("fss") of a send stream to exclude snapshots which aren't part of the stream. This, however, would cause zfs receive -F to erroneously remove snapshots; it would remove any snapshot which wasn't listed in the header nvlist. This patch restores the full list of snapshots in fss[<id>[snaps]] but still suppresses the properties of non-sent snapshots and also removes a consistency check in which an error is raised if a listed snapshot does not have any properties in fss[<id>[snapprops]]. The `0574855` commit also introduced a bug in which zfs send -p of a complete stream (zfs send -p pool/fs@snap) would exclude the snapshot properties in fss[<id>[snapprops]]. This patch detects the last snapshot in a series when no "from" snapshot has been specified and includes its properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2907	2015-02-09 16:43:17 -08:00
Chunwei Chen	53698a453d	Read spl_hostid module parameter before gethostid() If spl_hostid is set via module parameter, it's likely different from gethostid(). Therefore, the userspace tool should read it first before falling back to gethostid(). Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3034	2015-02-04 16:44:53 -08:00
Brian Behlendorf	6466b61db6	Make `zpool import -d\|-c` behave consistently When importing pools with zpool import -aN there is inconsistent behavior between '-d /dev/disk/by-id' (or another path) and '-c /etc/zfs/zpool.cache'. The difference in behavior is caused by zpool_find_import_cached() returning an empty nvlist_t when there are no pools to import but zpool_find_import_impl() returns NULL for the same situation. The behavior of zpool_find_import_cached() is arguably more correct because it allows returning NULL to be used for an error case and not an empty set. This change resolves the issue by updating get_configs() such that it returns an empty set instead of NULL when no config is found. The updated behavior will now always return 0 for this case. $ zpool import -aN; echo $? no pools available to import 0 $ zpool import -aN -d /var/tmp/; echo $? no pools available to import 0 $ zpool import -aN -c /etc/zfs/zpool.cache; echo $? no pools available to import 0 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2080	2015-01-28 11:12:31 -08:00
Andriy Gapon	057485504e	zfs send -p send properties only for snapshots that are actually sent ... as opposed to sending properties of all snapshots of the relevant filesystem. The previous behavior results in properties being set on all snapshots on the receiving side, which is quite slow. Behavior of zfs send -R is not changed. References: http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2729 Issue #2210	2014-10-02 16:52:02 -07:00
smh	7509a3d299	FreeBSD PR kern/172259: Fixes zfs receive errors FreeBSD PR kern/172259: Fixes zfs receive errors caused by snapshot replication being processed in a random order instead of creation order. Eliminates needless filesystem renames caused by removed parent snapshots which subsequently causes many more errors. PR: kern/172259 Submitted by: Steven Hartland Reviewed by: pjd (mentor) Approved by: pjd (mentor) MFC after: 2 weeks References: https://github.com/freebsd/freebsd/commit/4995789 Porting notes: Minor whitespace fixes were made to conform with style requirements: lib/libzfs/libzfs_sendrecv.c: 2269: indent by spaces instead of tabs lib/libzfs/libzfs_sendrecv.c: 2270: indent by spaces instead of tabs Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2729	2014-10-02 16:48:19 -07:00
Richard Yao	83e9986f6e	Implement -t option to zpool create for temporary pool names Creating virtual machines that have their rootfs on ZFS on hosts that have their rootfs on ZFS causes SPA namespace collisions when the standard name rpool is used. The solution is either to give each guest pool a name unique to the host, which is not always desireable, or boot a VM environment containing an ISO image to install it, which is cumbersome. `26b42f3f9d` introduced `zpool import -t ...` to simplify situations where a host must access a guest's pool when there is a SPA namespace conflict. We build upon that to introduce `zpool import -t tname ...`. That allows us to create a pool whose in-core name is tname, but whose on-disk name is the normal name specified. This simplifies the creation of machine images that use a rootfs on ZFS. That benefits not only real world deployments, but also ZFSOnLinux development by decreasing the time needed to perform rootfs on ZFS experiments. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2417	2014-09-30 10:46:59 -07:00
Matthew Ahrens	1f6f97f304	Illumos 5116 - zpool history -i goes into infinite loop 5116 zpool history -i goes into infinite loop Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Boris Protopopov <boris.protopopov@me.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5116 https://github.com/illumos/illumos-gate/commit/3339867 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2715	2014-09-23 11:58:05 -07:00
Matthew Ahrens	ab2894e66f	Illumos 5135 - zpool_find_import_cached() can use fnvlist_* Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5135 https://github.com/illumos/illumos-gate/commit/b18d6b0 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2693	2014-09-23 11:44:29 -07:00
Richard Yao	928ee9fe18	Properly NULL terminate string in zfs_strcmp_pathname The utility cppcheck caught this. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-09-23 10:32:21 -07:00
George Wilson	a05dfd0028	Illumos 5147 - zpool list -v should show individual disk capacity The 'zpool list -v' command displays lots of info but excludes the capacity of each disk. This should be added. 5147 zpool list -v should show individual disk capacity Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Matthew Ahrens <matthew.ahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5147 https://github.com/illumos/illumos-gate/commit/7a09f97 Ported by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2688	2014-09-23 10:10:42 -07:00
ilovezfs	1ca56e6033	Fragmentation should display as '-' if spacemap_histogram=disabled When com.delphix:spacemap_histogram is disabled, the value of fragmentation was printing as 18446744073709551615 (UINT64_MAX), when it should print as '-'. The issue was caused by a small mistake during the merge of "4980 metaslabs should have a fragmentation metric." upstream: https://github.com/illumos/illumos-gate/commit/2e4c998 ZoL: https://github.com/zfsonlinux/zfs/commit/f3a7f66 The problem is in zpool_get_prop_literal, where the handling of the pool property ZPOOL_PROP_FRAGMENTATION was added to wrong the section. In particular, ZPOOL_PROP_FRAGMENTATION should not be in the section where zpool_get_state(zhp) == POOL_STATE_UNAVAIL, but lower down after it's already been determined that the pool is in fact available, which is where upstream illumos correctly has had it. Thanks to lundman for helping to track down this bug. Signed-off-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2664	2014-09-05 09:19:01 -07:00
Turbo Fredriksson	c3f8dc2a48	Add a pkgconfig file Providing a pkg-config file makes is easy for 3rd party applications to link against the libzfs libraries. It also allows the libzfs developers to modify the list of required libraries and cflags without breaking existing applications. The following example illustrates how pkg-config can be used: cc `pkg-config --cflags --libs libzfs` -o myapp myapp.c /* * myapp.c / void main() { libzfs_handle_t hdl; hdl = libzfs_init(); if (hdl) libzfs_fini(hdl); } Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #585	2014-08-28 07:59:43 -07:00
George Wilson	f3a7f6610f	Illumos 4976-4984 - metaslab improvements 4976 zfs should only avoid writing to a failing non-redundant top-level vdev 4978 ztest fails in get_metaslab_refcount() 4979 extend free space histogram to device and pool 4980 metaslabs should have a fragmentation metric 4981 remove fragmented ops vector from block allocator 4982 space_map object should proactively upgrade when feature is enabled 4983 need to collect metaslab information via mdb 4984 device selection should use fragmentation metric Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Adam Leventhal <adam.leventhal@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4976 https://www.illumos.org/issues/4978 https://www.illumos.org/issues/4979 https://www.illumos.org/issues/4980 https://www.illumos.org/issues/4981 https://www.illumos.org/issues/4982 https://www.illumos.org/issues/4983 https://www.illumos.org/issues/4984 https://github.com/illumos/illumos-gate/commit/2e4c998 Notes: The "zdb -M" option has been re-tasked to display the new metaslab fragmentation metric and the new "zdb -I" option is used to control the maximum number of in-flight I/Os. The new fragmentation metric is derived from the space map histogram which has been rolled up to the vdev and pool level and is presented to the user via "zpool list". Add a number of module parameters related to the new metaslab weighting logic. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2595	2014-08-18 08:40:49 -07:00
Matthew Ahrens	5dbd68a352	Illumos 4914 - zfs on-disk bookmark structure should be named _phys_t 4914 zfs on-disk bookmark structure should be named _phys_t Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/4914 https://github.com/illumos/illumos-gate/commit/7802d7b Porting notes: There were a number of zfsonlinux-specific uses of zbookmark_t which needed to be updated. This should reduce the likelihood of further problems like issue #2094 from occurring. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2558	2014-08-06 14:48:41 -07:00
Matthew Ahrens	fbeddd60b7	Illumos 4390 - I/O errors can corrupt space map when deleting fs/vol 4390 i/o errors when deleting filesystem/zvol can lead to space map corruption Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4390 https://github.com/illumos/illumos-gate/commit/7fd05ac Porting notes: Previous stack-reduction efforts in traverse_visitb() caused a fair number of un-mergable pieces of code. This patch should reduce its stack footprint a bit more. The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated using kmem_zalloc() for the purpose of stack reduction. The new global zfs_free_leak_on_eio has been defined as an integer rather than a boolean_t as was the case with the related zfs_recover global. Also, zfs_free_leak_on_eio's definition has been inserted into zfs_debug.c for consistency with the existing definition of zfs_recover. Illumos placed it in spa_misc.c. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2545	2014-08-04 11:50:52 -07:00
Matthew Ahrens	9b67f60560	Illumos 4757, 4913 4757 ZFS embedded-data block pointers ("zero block compression") 4913 zfs release should not be subject to space checks Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4757 https://www.illumos.org/issues/4913 https://github.com/illumos/illumos-gate/commit/5d7b4d4 Porting notes: For compatibility with the fastpath code the zio_done() function needed to be updated. Because embedded-data block pointers do not require DVAs to be allocated the associated vdevs will not be marked and therefore should not be unmarked. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2544	2014-08-01 14:28:05 -07:00
Matthew Ahrens	da536844d5	Illumos 4368, 4369. 4369 implement zfs bookmarks 4368 zfs send filesystems from readonly pools Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4369 https://www.illumos.org/issues/4368 https://github.com/illumos/illumos-gate/commit/78f1710 Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2530	2014-07-29 10:55:29 -07:00
Matthew Ahrens	fa86b5dbb6	Illumos 4171, 4172 4171 clean up spa_feature_*() interfaces 4172 implement extensible_dataset feature for use by other zpool features Reviewed by: Max Grossman <max.grossman@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org>a References: https://www.illumos.org/issues/4171 https://www.illumos.org/issues/4172 https://github.com/illumos/illumos-gate/commit/2acef22 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2528	2014-07-25 16:40:07 -07:00
Tim Chase	09c0b8fe5e	Return default value on numeric properties failing the "head check. Updates `962d524212`. The referenced fix to get_numeric_property() caused numeric property lookups to consider the type of the parent (head) dataset when checking validity but there are some cases in the caller expects to see the property's default value even when the lookup is invalid. One case in which this is true is change_one() which is part of the renaming infrastructure. It may look up "zoned" on a snapshot of a volume which is not valid but it expects to see the default value of false. There may be other, yet unidentified cases in which zfs_prop_get_int() is used on technically invalid properties but which expect the property's default value to be returned. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Turbo Fredriksson <turbo@bayour.com> Closes #2320	2014-07-01 14:14:31 -07:00
Brian Behlendorf	d4aae2a054	Improve differing sector size error When adding or replacing a vdev with a different sector size the error message should be more useful. In addition to describing the problem provide a hint that the '-o ashift' option can be used to override the optimal default value. Since using a non-optimal value may incur a significant performance penalty we should issue this error. But there a numerous reasons why a administrator may wish to do this anyway. Signed-off-by: Niklas Edmundsson <ZNikke@github> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2421	2014-06-27 11:44:36 -07:00
Richard Yao	4def05f8a6	Fix memory leak in zpool_clear_label() Clang's static analyzer reported a memory leak in zpool_clear_label(). Upon review, it turns out to be right. This should be a very short lived leak because no daemons use this functionality, but that does not preclude the possibility of third party daemons that do use it. Lets fix it to be a good Samaritan. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2330	2014-05-30 17:00:37 -07:00
Tim Chase	962d524212	Check the dataset type more rigorously when fetching properties. When fetching property values of snapshots, a check against the head dataset type must be performed. Previously, this additional check was performed only when fetching "version", "normalize", "utf8only" or "case". This caused the ZPL properties "acltype", "exec", "devices", "nbmand", "setuid" and "xattr" to be erroneously displayed with meaningless values for snapshots of volumes. It also did not allow for the display of "volsize" of a snapshot of a volume. This patch adds the headcheck flag paramater to zfs_prop_valid_for_type() and zprop_valid_for_type() to indicate the check is being done against a head dataset's type in order that properties valid only for snapshots are handled correctly. This allows the the head check in get_numeric_property() to be performed when fetching a property for a snapshot. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2265	2014-05-06 10:41:46 -07:00
ilovezfs	78597769b4	Fill in mountpoint buffer before using it in errors zfs_is_mountable() fills in the mountpoint buffer, so, as in upstream, it needs to have been called before the mountpoint buffer can be used in error messages. In particular, return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED, dgettext(TEXT_DOMAIN, "cannot mount '%s'"), mountpoint)); should not come before the call to zfs_is_mountable(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: ilovezfs <ilovezfs@icloud.com> Closes #2284	2014-04-30 15:52:01 -07:00
Tim Chase	b066274a77	Report atime and relatime as the property's actual value. Neither atime nor relatime should be considered to be "temporary mount point properties". Their semantics are enforced completely within ZFS and also they're (correctly) not documented as being temporary mount point properties. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2257	2014-04-16 11:57:17 -07:00
John M. Layman	cbca6076b3	Fix for re-reading /etc/mtab. This is a continuation of `fb5c53ea65`: When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. In this commit, a few more rewind(3) calls were replaced with freopen(3) to allow updated mtab entries to be picked up immediately. Signed-off-by: John M. Layman <jml@frijid.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2215 Issue #1611	2014-04-04 09:46:20 -07:00
Brian Behlendorf	1a5c611a22	Make command line guid parsing more tolerant Several of the zfs utilities allow you to pass a vdev's guid rather than the device name. However, the utilities are not consistent in how they parse that guid. For example, 'zinject' expects the guid to be passed as a hex value while 'zpool replace' wants it as a decimal. The user is forced to just know what format to use. This patch improve things by making the parsing more tolerant. When strtol(3) is called using 0 for the base, rather than say 10 or 16, it will then accept hex, decimal, or octal input based on the prefix. From the man page. If base is zero or 16, the string may then include a "0x" prefix, and the number will be read in base 16; otherwise, a zero base is taken as 10 (decimal) unless the next character is '0', in which case it is taken as 8 (octal). NOTE: There may be additional conversions not caught be this patch. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-04-02 13:10:08 -07:00
Chris Dunlap	8c7aa0cfc4	Replace zpool_events_next() "block" parm w/ "flags" zpool_events_next() can be called in blocking mode by specifying a non-zero value for the "block" parameter. However, the design of the ZFS Event Daemon (zed) requires additional functionality from zpool_events_next(). Instead of adding additional arguments to the function, it makes more sense to use flags that can be bitwise-or'd together. This commit replaces the zpool_events_next() int "block" parameter with an unsigned bitwise "flags" parameter. It also defines ZEVENT_NONE to specify the default behavior. Since non-blocking mode can be specified with the existing ZEVENT_NONBLOCK flag, the default behavior becomes blocking mode. This, in effect, inverts the previous use of the "block" parameter. Existing callers of zpool_events_next() have been modified to check for the ZEVENT_NONBLOCK flag. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2	2014-03-31 16:11:21 -07:00
Brian Behlendorf	9b101a7320	Clarify zpool_events_next() comment Due to the very poorly chosen argument name 'cleanup_fd' it was completely unclear that this file descriptor is used to track the current cursor location. When the file descriptor is created by opening ZFS_DEV a private cursor is created in the kernel for the returned file descriptor. Subsequent calls to zpool_events_next() and zpool_events_seek() then require the file descriptor as an argument to reposition the cursor. When the file descriptor is closed the kernel state tracking the cursor is destroyed. This patch contains no functional change, it just changes a few variable names and clarifies the documentation. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-03-31 16:11:08 -07:00
Brian Behlendorf	75e3ff58fe	Add zpool_events_seek() functionality The ZFS_IOC_EVENTS_SEEK ioctl was added to allow user space callers to seek around the zevent file descriptor by EID. When a specific EID is passed and it exists the cursor will be positioned there. If the EID is no longer cached by the kernel ENOENT is returned. The caller may also pass ZEVENT_SEEK_START or ZEVENT_SEEK_END to seek to those respective locations. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2	2014-03-31 16:10:57 -07:00
Gunnar Beutner	4d8c78c844	Remount datasets for "zfs inherit". Changing properties with "zfs inherit" should cause the datasets to be remounted. This ensures that the modified property values will be propagated in to the filesystem namespace where they can be enforced. This change is modeled after an identical fix made to zfs_prop_set(). Signed-off-by: Gunnar Beutner <gunnar@beutner.name> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2201	2014-03-24 11:11:15 -07:00
Brian Behlendorf	ffe9d38275	Add generic errata infrastructure From time to time it may be necessary to inform the pool administrator about an errata which impacts their pool. These errata will by shown to the administrator through the 'zpool status' and 'zpool import' output as appropriate. The errata must clearly describe the issue detected, how the pool is impacted, and what action should be taken to resolve the situation. Additional information for each errata will be provided at http://zfsonlinux.org/msg/ZFS-8000-ER. To accomplish the above this patch adds the required infrastructure to allow the kernel modules to notify the utilities that an errata has been detected. This is done through the ZPOOL_CONFIG_ERRATA uint64_t which has been added to the pool configuration nvlist. To add a new errata the following changes must be made: * A new errata identifier must be assigned by adding a new enum value to the zpool_errata_t type. New enums must be added to the end to preserve the existing ordering. * Code must be added to detect the issue. This does not strictly need to be done at pool import time but doing so will make the errata visible in 'zpool import' as well as 'zpool status'. Once detected the spa->spa_errata member should be set to the new enum. * If possible code should be added to clear the spa->spa_errata member once the errata has been resolved. * The show_import() and status_callback() functions must be updated to include an informational message describing the errata. This should include an action message describing what an administrator should do to address the errata. * The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be updated to describe the errata. This space can be used to provide as much additional information as needed to fully describe the errata. A link to this documentation will be automatically generated in the output of 'zpool import' and 'zpool status'. Original-idea-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Richard Yao <ryao@gentoo.or Issue #2094	2014-02-21 12:10:40 -08:00
Tim Chase	6d111134c0	Implement relatime. Add the "relatime" property. When set to "on", a file's atime will only be updated if the existing atime at least a day old or if the existing ctime or mtime has been updated since the last access. This behavior is compatible with the Linux "relatime" mount option. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2064 Closes #1917	2014-01-29 15:50:44 -08:00
Brian Behlendorf	741304503a	Prevent duplicate mnttab cache entries Under Linux its possible to mount the same filesystem multiple times in the namespace. This can be done either with bind mounts or simply with multiple mount points. Unfortunately, the mnttab cache code is implemented using an AVL tree which does not support duplicate entries. To avoid this issue this patch updates the code to check for a duplicate entry before adding a new one. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michael Martin <mgmartin.mgm@gmail.com> Closes #2041	2014-01-14 10:27:12 -08:00
Michael Kjorling	d1d7e2689d	cstyle: Resolve C style issues The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1821	2013-12-18 16:46:35 -08:00
Brian Behlendorf	ba6a24026c	Remove ZFC_IOC__MINOR ioctl()s Early versions of ZFS coordinated the creation and destruction of device minors from userspace. This was inherently racy and in late 2009 these ioctl()s were removed leaving everything up to the kernel. This significantly simplified the code. However, we never picked up these changes in ZoL since we'd already significantly adjusted this code for Linux. This patch aims to rectify that by finally removing ZFC_IOC__MINOR ioctl()s and moving all the functionality down in to the kernel. Since this cleanup will change the kernel/user ABI it's being done in the same tag as the previous libzfs_core ABI changes. This will minimize, but not eliminate, the disruption to end users. Once merged ZoL, Illumos, and FreeBSD will basically be back in sync in regards to handling ZVOLs in the common code. While each platform must have its own custom zvol.c implemenation the interfaces provided are consistent. NOTES: 1) This patch introduces one subtle change in behavior which could not be easily avoided. Prior to this change callers of 'zfs create -V ...' were guaranteed that upon exit the /dev/zvol/ block device link would be created or an error returned. That's no longer the case. The utilities will no longer block waiting for the symlink to be created. Callers are now responsible for blocking, this is why a 'udev_wait' call was added to the 'label' function in scripts/common.sh. 2) The read-only behavior of a ZVOL now solely depends on if the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant policy setting in the gendisk structure was removed. This both simplifies the code and allows us to safely leverage set_disk_ro() to issue a KOBJ_CHANGE uevent. See the comment in the code for futher details on this. 3) Because __zvol_create_minor() and zvol_alloc() may now be called in a sync task they must use KM_PUSHPAGE. References: illumos/illumos-gate@681d9761e8 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #1969	2013-12-16 09:15:57 -08:00
Yuri Pankov	54d5378fae	Illumos #2583 2583 Add -p (parsable) option to zfs list References: https://www.illumos.org/issues/2583 illumos/illumos-gate@43d68d68c1 Ported-by: Gregor Kopka <gregor@kopka.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #937	2013-11-21 11:13:53 -08:00
Brian Behlendorf	64ad2b26e2	Remove the slog restriction on bootfs pools Under Linux this restriction does not apply because we have access to all the required devices. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1631	2013-11-14 14:28:35 -08:00
Tim Chase	fd4f76160c	Handle concurrent snapshot automounts failing due to EBUSY. In the current snapshot automount implementation, it is possible for multiple mounts to attempted concurrently. Only one of the mounts will succeed and the other will fail. The failed mounts will cause an EREMOTE to be propagated back to the application. This commit works around the problem by adding a new exit status, MOUNT_BUSY to the mount.zfs program which is used when the underlying mount(2) call returns EBUSY. The zfs code detects this condition and treats it as if the mount had succeeded. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1819	2013-11-08 10:45:14 -08:00
Marcel Telka	8ce0af07bb	Illumos #4061 4061 libzfs: memory leak in iter_dependents_cb() Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com> Reviewed by: Boris Protopopov <boris.protopopov@nexenta.com> Reviewed by: Andy Stormont <andyjstormont@gmail.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/4061 illumos/illumos-gate@2fbdf8dbf0 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:24:10 -08:00
Matthew Ahrens	46ba1e59d3	Illumos #3996 3996 want a libzfs_core API to rollback to latest snapshot Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Andy Stormont <andyjstormont@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3996 illumos/illumos-gate@a7027df17f Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:23:11 -08:00
Steven Hartland	6389d42205	Illumos #3909 3909 "zfs send -D" does not work Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3909 illumos/illumos-gate@36f7455d36 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-05 12:15:11 -08:00
Keith M Wesolowski	96c2e96193	Illumos #3894 3894 zfs should not allow snapshot of inconsistent dataset Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/3894 illumos/illumos-gate@ca48f36f20 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Matthew Ahrens	1a077756e8	Illumos #3829 3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl Reviewed by: Matt Amdur <matt.amdur@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3829 illumos/illumos-gate@bb6e70758d Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Steven Hartland	34ffbed88c	Illumos #3818 3818 zpool status -x should report pools with removed l2arc devices Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3818 illumos/illumos-gate@7f2416ef64 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 11:18:14 -08:00
Steven Hartland	95fd54a1c5	Illumos #3740 3740 Poor ZFS send / receive performance due to snapshot hold / release processing Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3740 illumos/illumos-gate@a7a845e4bf Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Porting notes: 1. `13fe019870` introduced a merge conflict in dsl_dataset_user_release_tmp where some variables were moved outside of the preprocessor directive. 2. dea9dfefdd747534b3846845629d2200f0616dad made the previous merge conflict worse by switching KM_SLEEP to KM_PUSHPAGE. This is notable because this commit refactors the code, adding a new KM_SLEEP allocation. It is not clear to me whether this should be converted to KM_PUSHPAGE. 3. We had a merge conflict in libzfs_sendrecv.c because of copyright notices. 4. Several small C99 compatibility fixed were made.	2013-11-04 11:17:48 -08:00
Will Andrews	7bc7f25040	Illumos #3745 , #3811 3745 zpool create should treat -O mountpoint and -m the same 3811 zpool create -o altroot=/xyz -O mountpoint=/mnt ignores the mountpoint option Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3745 https://www.illumos.org/issues/3811 illumos/illumos-gate@8b71377531 Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Will Andrews	e49f1e20a0	Illumos #3741 3741 zfs needs better comments Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3741 illumos/illumos-gate@3e30c24aee Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Martin Matuska	b1118acbb1	Illumos #3699 , #3739 3699 zfs hold or release of a non-existent snapshot does not output error 3739 cannot set zfs quota or reservation on pool version < 22 Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Eric Shrock <eric.schrock@delphix.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/3699 https://www.illumos.org/issues/3739 illumos/illumos-gate@013023d4ed Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775	2013-11-04 10:55:25 -08:00
Ralf Ertzinger	8b921f667a	Introduce zpool_get_prop_literal interface This change introduces zpool_get_prop_literal. It's an expanded version of zpool_get_prop taking one additional boolean parameter. With this parameter set to B_FALSE it will behave identically to zpool_get_prop. Setting it to B_TRUE will return full precision numbers for the following properties: ZPOOL_PROP_SIZE ZPOOL_PROP_ALLOCATED ZPOOL_PROP_FREE ZPOOL_PROP_FREEING ZPOOL_PROP_EXPANDSZ ZPOOL_PROP_ASHIFT Also introduced is a wrapper function for zpool_get_prop making it use zpool_get_prop_literal in the background. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1813	2013-10-28 14:27:53 -07:00
Brian Behlendorf	11cb9d773f	Increase default udev wait time When creating a new pool, or adding/replacing a disk in an existing pool, partition tables will be automatically created on the devices. Under normal circumstances it will take less than a second for udev to create the expected device files under /dev/. However, it has been observed that if the system is doing heavy IO concurrently udev may take far longer. If you also throw in some cheap dodgy hardware it may take even longer. To prevent zpool commands from failing due to this the default wait time for udev is being increased to 30 seconds. This will have no impact on normal usage, the increase timeout should only be noticed if your udev rules are incorrectly configured. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1646	2013-10-22 10:25:51 -07:00
Richard Yao	a6ce1eae54	Fix libzfs_core changes to follow GNU libtool guidelines The GNU libtool documentation states to start with a version of 0:0:0, rather than 1:1:0. Illumos uses the name libzfs_core.so.1, so to be consistent, we should go with 1:0:0. http://www.gnu.org/software/libtool/manual/libtool.html#Updating-version-info The GNU libtool documentation also provides guidence on how the version information should be incremented. Doing this does a SONAME bump of the libzfs and libzpool libraries. This is particularly important on Gentoo because a SONAME bump enables portage to retain the older libraries until any packages that link to them are rebuilt. The main example of this is GRUB2's grub2-mkconfig, which will break unless it is rebuilt against the new libraries. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Richard Yao	31fc19399e	Generate libraries with correct DT_NEEDED entries Libraries that depend on other libraries should list them in ELF's DT_NEEDED field so that programs linking to them do not need to specify those libraries unless they depend on them as well. This is not the case in the current code and the consequence is that anything that needs a library must know its dependencies. This is fragile and caused GRUB2's configure script to break when a dependency was added on libblkid in libzfs. This resolves that problem by using LIBADD/LDADD to specify libraries in Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED entries are generated and prevents GRUB2's configure script from breaking in the presence of a libblkid dependency. This also removes unneeded dependencies from various files. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Richard Yao	1db7b9be75	Fix libblkid support libblkid support is dormant because the autotools check is broken and liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that with a few changes: First, we fix the libblkid autotools check to do a few things: 1. Make a 64MB file, which is the minimum size ZFS permits. 2. Make 4 fake uberblock entries to make libblkid's check succeed. 3. Return 0 upon success to make autotools use the success case. 4. Include stdlib.h to avoid implicit declration of free(). 5. Check for "zfs_member", not "zfs" 6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation) 7. Pass '-lblkid' correctly using LIBS not LDFLAGS. Second, we change the libblkid support to scan for "zfs_member", not "zfs". This makes --with-blkid work on Gentoo. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1751	2013-10-10 16:56:51 -07:00
Matthew Ahrens	13fe019870	Illumos #3464 3464 zfs synctask code needs restructuring Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3464 illumos/illumos-gate@3b2aab1880 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1495	2013-09-04 16:01:24 -07:00
Matthew Ahrens	6f1ffb0665	Illumos #2882 , #2883 , #2900 2882 implement libzfs_core 2883 changing "canmount" property to "on" should not always remount dataset 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2883 https://www.illumos.org/issues/2900 illumos/illumos-gate@4445fffbbb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1293 Porting notes: WARNING: This patch changes the user/kernel ABI. That means that the zfs/zpool utilities built from master are NOT compatible with the 0.6.2 kernel modules. Ensure you load the matching kernel modules from master after updating the utilities. Otherwise the zfs/zpool commands will be unable to interact with your pool and you will see errors similar to the following: $ zpool list failed to read pool configuration: bad address no pools available $ zfs list no datasets available Add zvol minor device creation to the new zfs_snapshot_nvl function. Remove the logging of the "release" operation in dsl_dataset_user_release_sync(). The logging caused a null dereference because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the logging functions try to get the ds name via the dsl_dataset_name() function. I've got no idea why this particular code would have worked in Illumos. This code has subsequently been completely reworked in Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring). Squash some "may be used uninitialized" warning/erorrs. Fix some printf format warnings for %lld and %llu. Apply a few spa_writeable() changes that were made to Illumos in illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and 3115 fixes. Add a missing call to fnvlist_free(nvl) in log_internal() that was added in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time (zfsonlinux/zfs@9e11c73) because it depended on future work.	2013-09-04 15:49:00 -07:00
Turbo Fredriksson	abbfdca483	No point in rewind() mtab in zfs_unshare_proto(). We're not really reading the file, but instead use libzfs_mnttab_find() which does the nessesary freopen() for us. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1498	2013-08-15 10:18:31 -07:00
John Layman	fb5c53ea65	Fix for re-reading /etc/mtab in zfs_is_mounted() When /etc/mtab is updated on Linux it's done atomically with rename(2). A new mtab is written, the existing mtab is unlinked, and the new mtab is renamed to /etc/mtab. This means that we must close the old file and open the new file to get the updated contents. Using rewind(3) will just move the file pointer back to the start of the file, freopen(3) will close and open the file. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1611	2013-08-14 11:37:06 -07:00
Yuri Pankov	105afebb15	Illumos #3098 zfs userspace/groupspace fail 3098 zfs userspace/groupspace fail without saying why when run as non-root Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/3098 illumos/illumos-gate@70f56fa693 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1596	2013-08-14 09:28:34 -07:00
Brian Behlendorf	ff3510c1a5	Fix zpool_read_label() The zpool_read_label() function was subtly broken due to a difference of behavior in fstat64(2) on Solaris vs Linux. Under Solaris when a block device is stat'ed the st_size field will contain the size of the device in bytes. Under Linux this is only true for regular file and symlinks. A compatibility function called fstat64_blk(2) was added which can be used when the Solaris behavior is required. This flaw was never noticed because the only time we would need to use the device size is when the first two labels are damaged. I noticed this issue while adding the zpool_clear_label() function which is similar in design and does require us to write all the labels. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-07-09 16:02:04 -07:00
Dmitry Khasanov	131cc95ca7	Add FreeBSD 'zpool labelclear' command The FreeBSD implementation of zfs adds the 'zpool labelclear' command. Since this functionality is helpful and straight forward to add it is being included in ZoL. References: freebsd/freebsd@119a041dc9 Ported-by: Dmitry Khasanov <pik4ez@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1126	2013-07-09 15:58:05 -07:00
Dmitry Khasanov	51a3ae72d2	Readd zpool_clear_label() from OpenSolaris This patch restores the zpool_clear_label() function from OpenSolaris. This was removed by commit `d603ed6` because it wasn't clear we had a use for it in ZoL. However, this functionality is a prerequisite for adding the 'zpool labelclear' command from FreeBSD. As part of bringing this change in the zpool_clear_label() function was changed to use fstat64_blk(2) for compatibility with Linux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1126	2013-07-09 15:42:27 -07:00
Aaron Fineman	bbb75c1190	Add error message for missing /etc/mtab The zpool command should not silently fail when the /etc/mtab file does not exist. This can occur in an initramfs environment when the /etc/mtab file hasn't yet been generated. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1541	2013-06-27 14:43:37 -07:00
George Wilson	295304bed6	Illumos #3422 , #3425 3422 zpool create/syseventd race yield non-importable pool 3425 first write to a new zvol can fail with EFBIG Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: illumos/illumos-gate@bda8819455 https://www.illumos.org/issues/3422 https://www.illumos.org/issues/3425 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1390	2013-04-12 09:01:36 -07:00
Eric Dillmann	0b4d1b5853	Add snapdev=[hidden\|visible] dataset property The new snapdev dataset property may be set to control the visibility of zvol snapshot devices. By default this value is set to 'hidden' which will prevent zvol snapshots from appearing under /dev/zvol/ and /dev/<dataset>/. When set to 'visible' all zvol snapshots for the dataset will be visible. This functionality was largely added because when automatic snapshoting is enabled large numbers of read-only zvol snapshots will be created. When creating these devices the kernel will attempt to read their partition tables, and blkid will attempt to identify any filesystems on those partitions. This leads to a variety of issues: 1) The zvol partition tables will be read in the context of the `modprobe zfs` for automatically imported pools. This is undesirable and should be done asynchronously, but for now reducing the number of visible devices helps. 2) Udev expects to be able to complete its work for a new block devices fairly quickly. When many zvol devices are added at the same time this is no longer be true. It can lead to udev timeouts and missing /dev/zvol links. 3) Simply having lots of devices in /dev/ can be aukward from a management standpoint. Hidding the devices your unlikely to ever use helps with this. Any snapshot device which is needed can be made visible by changing the snapdev property. NOTE: This patch changes the default behavior for zvols which was effectively 'snapdev=visible'. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1235 Closes #945 Issue #956 Issue #756	2013-03-05 12:37:54 -08:00
Brian Behlendorf	dbf763b39b	Retire zpool_id infrastructure In the interest of maintaining only one udev helper to give vdevs user friendly names, the zpool_id and zpool_layout infrastructure is being retired. They are superseded by vdev_id which incorporates all the previous functionality. Documentation for the new vdev_id(8) helper and its configuration file, vdev_id.conf(5), can be found in their respective man pages. Several useful example files are installed under /etc/zfs/. /etc/zfs/vdev_id.conf.alias.example /etc/zfs/vdev_id.conf.multipath.example /etc/zfs/vdev_id.conf.sas_direct.example /etc/zfs/vdev_id.conf.sas_switch.example Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #981	2013-01-29 12:23:17 -08:00
Brian Behlendorf	14ee71efbc	Use strerror() not strerror_r() The differ() function used strerror_r() instead of strerror() because it allowed the error message to be directly copied in to a buffer. This causes two issues under Linux. * There are two versions of strerror_r() available an XSI-compliant version which returns an 'int' error code. And a GNU-specific version which return a 'char ' to the resulting error string. int strerror_r(int errnum, char buf, size_t buflen); /* XSI / char strerror_r(int errnum, char buf, size_t buflen); / GNU / The most recent versions of strerror_r() are annotated with the warn_unused_result attribute. This causes the following warning since the upstream implementation casts the result to void. warning: ignoring return value of 'strerror_r', declared with attribute warn_unused_result [-Wunused-result] The cleanest way to resolve both of these problems is just to use strerror() and make a copy of the result in to the buffer. This resolves both issues and this is the only instance of strerror_r() in the code base. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1231	2013-01-28 10:02:38 -08:00
Darik Horn	38145d6129	Ensure that zfs diff prints unicode safely. In the stream_bytes() library function used by `zfs diff`, explicitly cast each byte in the input string to an unsigned character so that the Linux fprintf() correctly escapes to octal and does not mangle the output. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1172	2013-01-16 10:15:57 -08:00
Christopher Siden	b9b24bb4ca	Illumos #2762 : zpool command should have better support for feature flags 2762 zpool command should have better support for feature flags Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@57221772c3 https://www.illumos.org/issues/2762 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:43 -08:00
George Wilson	3bc7e0fb0f	Illumos #3090 and #3102 3090 vdev_reopen() during reguid causes vdev to be treated as corrupt 3102 vdev_uberblock_load() and vdev_validate() may read the wrong label Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@dfbb943217 illumos changeset: 13777:b1e53580146d https://www.illumos.org/issues/3090 https://www.illumos.org/issues/3102 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #939	2013-01-08 10:35:42 -08:00
Christopher Siden	9ae529ec5d	Illumos #2619 and #2747 2619 asynchronous destruction of ZFS file systems 2747 SPA versioning with zfs feature flags Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <gwilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: illumos/illumos-gate@53089ab7c8 illumos/illumos-gate@ad135b5d64 illumos changeset: 13700:2889e2596bd6 https://www.illumos.org/issues/2619 https://www.illumos.org/issues/2747 NOTE: The grub specific changes were not ported. This change must be made to the Linux grub packages. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-08 10:35:35 -08:00
Massimo Maggi	5e6320cd12	Fix get/set users/groups in quota props via numeric id Fix setting/getting users/groups in quota properties through numeric identifier. This support was accidentally disabled in the original port by applying the HAVE_IDMAP wrapper macro too broadly. Fix obtained by moving #ifdef HAVE_IDMAP to exclude only the part of code that really needs IDMAP. Now zfs (get\|set) (user\|group)quota@1000 works as expected. Signed-off-by: Massimo Maggi <massimo@mmmm.it> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1147	2012-12-17 09:52:58 -08:00
Jorgen Lundman	53c2ec1d1b	Fix 'zpool create' segfault due to bad syntax Incorrect syntax should never cause a segfault. In this case listing multiple comma delimited options after '-o' triggered the problem. For example: zpool create -o ashift=12,listsnaps=on This patch resolves the issue by wrapping the calls which use hdr with a NULL test. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1118	2012-12-04 11:15:25 -08:00
Turbo Fredriksson	645fb9cc21	Implemented sharing datasets via SMB using libshare Add the initial support for the 'smbshare' option using the existing libshare infrastructure. Because this implementation relies on usershares samba version 3.0.23 is required. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #493	2012-12-03 09:42:15 -08:00
Brian Behlendorf	c372b36e3e	Allow GPT+EFI vdevs for root pools Commit `57a4edd` allows the bootfs property to be set on any pool. However, many of the zpool commands still prevent you from using EFI labeled devices for the root pool. For example: # zpool attach rpool /dev/sda /dev/sdb cannot label 'sdb': EFI labeled devices are not supported on root pools. on root devices. For non-Solaris builds such as Linux disable this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1077	2012-11-30 13:45:14 -08:00
Brian Behlendorf	0e20a31b4b	Recreate minors when renaming zvols When a zvol with snapshots is renamed the device files under /dev/zvol/ are not renamed. This patch resolves the problem by destroying and recreating the minors with the new name so the links can be recreated bu udev. Original-patch-by: Suman Chakravartula <schakrava@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #408	2012-11-19 16:59:44 -08:00
Brian Behlendorf	30b937ee15	Update spare and cache device names on import During 'zpool import' all ZPOOL_CONFIG_PATH names are supposed to be updated by fix_paths(). This was not happening for spare and cache devices because the proper names were getting filtered out of the pool_list_t->names. Interestingly, the names were being filtered because the spare and cache devices do not contain the pool name in their vdev label. The fix is to exclude the device path from the list only if: 1) has a valid ZPOOL_CONFIG_POOL_NAME key in the label, and 2) that pool name does not match the specified pool name. Since the label is valid and because it does properly store the vdev guid it will be correctly assembled without the pool name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #725	2012-10-22 08:46:02 -07:00
Brian Behlendorf	eac4720465	Allow 'zpool replace' to use short device names The 'zpool replace' command would fail when given a short name because unlike on other platforms the short name cannot be deterministically expanded to a single path. Multiple path prefixes must be checked and in addition the partition suffix for whole disks is determined by the prefix. To handle this complexity a zfs_strcmp_pathname() function was added which takes either a short or fully qualified device name. Short names will be expanded using the prefixes in the default import search path, or the ZPOOL_IMPORT_PATH environment variable if it's defined. All posible expansions are then compared against the comparison path. Care is taken to strip redundant slashes to ensure legitimate matches are not missed. In the context of this work the existing zfs_resolve_shortname() function was extended to consider the ZPOOL_IMPORT_PATH when set. The zfs_append_partition() interface was also simplified to take only a single buffer. The vast majority of these changes rework existing Linux specific code which was originally written to accomidate udev. However, there is some minimal cleanup which removes Illumos specific code. This was done to improve readability but the basic flow and intent of the upstream code was maintained. These changes are the logical conclusion of the previos work to adjust the 'zpool import' search behavior, see commit 44867b6a. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #544 Closes #976	2012-10-22 08:45:58 -07:00
Matthew Ahrens	04434775b7	Illumos #3100 : zvol rename fails with EBUSY when dirty. illumos/illumos-gate@2e2c135528 Illumos changeset: 13780:6da32a929222 3100 zvol rename fails with EBUSY when dirty Reviewed by: Christopher Siden <chris.siden@delphix.com> Reviewed by: Adam H. Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #995	2012-10-03 13:59:02 -07:00
Bill Pijewski	37abac6d55	Illumos #2703 : add mechanism to report ZFS send progress Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2703 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-09-19 13:39:06 -07:00
Chris Siden	1bd201e70d	Illumos #1948 : zpool list should show more detailed pool info Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1948 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #685	2012-09-19 13:39:05 -07:00
Brian Behlendorf	0a2f7b3662	Seg fault 'zpool import -d /dev/disk/by-id -a' Introduced by commit `44867b6d6e`. We should of course check to ensure best isn't NULL before attempting to dereference it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #974	2012-09-18 12:33:37 -07:00
Brian Behlendorf	44867b6d6e	Improve `zpool import` search behavior The goal of this change is to make 'zpool import' prefer to use the peristent /dev/mapper or /dev/disk/by-* paths. These are far preferable to the devices in /dev/ whos names are not persistent and are determined by the order in which a device is detected. This patch improves things by changing the default search path from just to the top level /dev/ directory to (in order): /dev/disk/by-vdev - Custom rules, use first if they exist /dev/disk/zpool - Custom rules, use first if they exist /dev/mapper - Use multipath devices before components /dev/disk/by-uuid - Single unique entry and persistent /dev/disk/by-id - May be multiple entries and persistent /dev/disk/by-path - Encodes physical location and persistent /dev/disk/by-label - Custom persistent labels /dev - UNSAFE device names will change The default search path can be overriden by setting the ZPOOL_IMPORT_PATH environment variable. This must be a colon delimited list of paths which are searched for vdevs. If the 'zpool import -d' option is specified only those listed paths will be searched. Finally, when multiple paths to the same device are found. If one of the paths is an exact match for the path used last time to import the pool it will be used. When there are no exact matches the prefered path will be determined by the provided search order. This means you can still import a pool and force specific names by providing the -d <path> option. And the prefered names will persist as long as those paths exist on your system. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #965	2012-09-17 13:49:07 -07:00
Michael Martin	fc24f7c887	Fix missing vdev names in zpool status output Commit `858219c` makes more sense down below in the 'if (verbose)' section of the code. Initially, buf and path will never point to the same location. Once 'path = buf' is set on a raidz vdev, the code may drop into the verbose section depending on the verbose flag. In here, using a tmpbuf makes sense since now 'buf == path'. This issue does not occur in the upstream Solaris code because their implementations of snprintf() allow for buf and path to be the same address. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #57	2012-09-05 22:09:12 -07:00
Brian Behlendorf	ca8b5af89d	Remove autotools products Remove all of the generated autotools products from the repository and update the .gitignore files accordingly. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #718	2012-08-27 11:47:44 -07:00
Garrett D'Amore	08b1b21d58	Illumos #2803 : zfs get guid pretty-prints the output Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Approved by: Dan McDonald <danmcd@nexenta.com> References: https://www.illumos.org/issues/2803 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:40:14 -07:00
Christopher Siden	e956d65106	Illumos #1796 , #2871 , #2903 , #2957 1796 "ZFS HOLD" should not be used when doing "ZFS SEND" from a read-only pool 2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite 2903 zfs destroy -d does not work 2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/1796 https://www.illumos.org/issues/2871 https://www.illumos.org/issues/2903 https://www.illumos.org/issues/2957 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-23 10:40:02 -07:00
Eric Schrock	db49968e5c	Illumos #2635 : 'zfs rename -f' to perform force unmount Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <George.Wilson@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/2635 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #717	2012-08-23 10:39:43 -07:00
Martin Matuska	cf997d797b	Properly initialize and free destroydata This regression was accidentally introduced by commit `330d06f90d` due to ZoL specific code. The fix is to simply ensure the passed nvlist is initialized and freed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #876	2012-08-23 09:42:21 -07:00
Dan McDonald	d96eb2b153	Illumos #1693 : persistent 'comment' field for a zpool Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1693 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #678	2012-08-08 11:49:37 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Matthew Ahrens	330d06f90d	Illumos #1644 , #1645 , #1646 , #1647 , #1708 1644 add ZFS "clones" property 1645 add ZFS "written" and "written@..." properties 1646 "zfs send" should estimate size of stream 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots 1708 adjust size of zpool history data References: https://www.illumos.org/issues/1644 https://www.illumos.org/issues/1645 https://www.illumos.org/issues/1646 https://www.illumos.org/issues/1647 https://www.illumos.org/issues/1708 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. This change has reordered all of the new ioctl()s to the end of the list. This should help minimize this issue in the future. Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Albert Lee <trisk@opensolaris.org> Approved by: Garrett D'Amore <garret@nexenta.com> Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #826 Closes #664	2012-07-31 09:25:30 -07:00
Etienne Dechamps	f09398cec6	Use /sys/module instead of /proc/modules. When libzfs checks if the module is loaded or not, it currently reads /proc/modules and searches for a line matching the module name. Unfortunately, if the module is included in the kernel itself (built-in module), then /proc/modules won't list it, so libzfs will wrongly conclude that the module is not loaded, thus making all ZFS userspace tools unusable. Fortunately, all loaded modules appear as directories in /sys/module, even built-in ones. Thus we can use /sys/module in lieu of /proc/modules to fix the issue. As a bonus, the code for checking becomes much simpler. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:45:33 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
Etienne Dechamps	7608bd0dd0	Use the right device path when relabeling. Currently, zpool_vdev_online() calls zpool_relabel_disk() with a short partition device name, which is obviously wrong because (1) zpool_relabel_disk() expects a full, absolute path to use with open() and (2) efi_write() must be called on an opened disk device, not a partition device. With this patch, zpool_relabel_disk() gets called with a full disk device path. The path is determined using the same algorithm as zpool_find_vdev(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #808	2012-07-12 08:59:16 -07:00
Etienne Dechamps	8adf486422	Fix error handling for "zpool online -e". The error handling code around zpool_relabel_disk() is either inexistent or wrong. The function call itself is not checked, and zpool_relabel_disk() is generating error messages from an unitialized buffer. Before: # zpool online -e homez sdb; echo $? `: cannot relabel 'sdb1': unable to open device: 2 0 After: # zpool online -e homez sdb; echo $? cannot expand sdb: cannot relabel 'sdb1': unable to open device: 2 1 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #808	2012-07-12 08:58:19 -07:00
George Wilson	c7f2d69de3	Illumos #1949 , #1953 1949 crash during reguid causes stale config 1953 allow and unallow missing from zpool history since removal of pyzfs Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Garrett D'Amore <garrett.damore@gmail.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Steve Gonczi <gonczi@comcast.net> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1949 https://www.illumos.org/issues/1953 Ported by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:33:31 -07:00
Garrett D'Amore	3541dc6d02	Illumos #1748 : desire support for reguid in zfs Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Reviewed by: Alexander Stetsenko <ams@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1748 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. If only the user space component is updated both the 'zpool events' command and the 'zpool reguid' command will not work until the kernel modules are updated. Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:08:56 -07:00
Pawel Jakub Dawidek	0cee24064a	Speed up 'zfs list -t snapshot -o name -s name' FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #450	2012-06-14 09:49:04 -07:00
Richard Yao	bc98d6c809	Make zvol_remove_link() print a more useful error message Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-13 16:27:19 -07:00
Daniel Verite	c6327b63e6	Retry removal of busy minors When failing to remove a zvol device link because it's busy, wait a bit and retry in a loop instead of giving up immediately. This technique is similar to the loop in zpool_label_disk_wait(), with the same goal: waiting for the asynchronous udev processes to finish their work. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #692	2012-06-11 10:50:20 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Brian Behlendorf	abe5b8fb66	Improve 'zpool import' EBUSY error message When a device is already open O_EXCL by another process the `zpool import` will correctly fail. However, the default failure message isn't very helpful. It may in fact be harmful if you take its advise and destroy your pool. cannot import 'tank': pool is busy Destroy and re-create the pool from a backup source. Improve the error message in the EBUSY case to simply print a message indicating that the devices are current in use. The user will need to manually identify which process has the device open exclusively and why. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-01 08:55:24 -07:00
Brian Behlendorf	b04c9fc009	Add /dev/mapper/ to search path When creating pools short device names may be used when those devices appear in certain well known locations under /dev/. This change adds /dev/mapper/ to that list. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-01 08:55:24 -07:00
Ned A. Bass	821b683436	Add vdev_id for JBOD-friendly udev aliases vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path in a storage topology to a channel name. The channel name is combined with a disk enclosure slot number to create an alias that reflects the physical location of the drive. This is particularly helpful when it comes to tasks like replacing failed drives. Slot numbers may also be re-mapped in case the default numbering is unsatisfactory. The drive aliases will be created as symbolic links in /dev/disk/by-vdev. The only currently supported topologies are sas_direct and sas_switch: o sas_direct - a channel is uniquely identified by a PCI slot and a HBA port o sas_switch - a channel is uniquely identified by a SAS switch port A multipath mode is supported in which dm-mpath devices are handled by examining the first running component disk, as reported by 'multipath -l'. In multipath mode the configuration file should contain a channel definition with the same name for each path to a given enclosure. vdev_id can replace the existing zpool_id script on systems where the storage topology conforms to sas_direct or sas_switch. The script could be extended to support other topologies as well. The advantage of vdev_id is that it is driven by a single static input file that can be shared across multiple nodes having a common storage toplogy. zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per node and a separate slot-mapping file. However, zpool_id provides the flexibility of using any device names that show up in /dev/disk/by-path, so it may still be needed on some systems. vdev_id's functionality subsumes that of the sas_switch_id script, and it is unlikely that anyone is using it, so sas_switch_id is removed. Finally, /dev/disk/by-vdev is added to the list of directories that 'zpool import' will scan. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #713	2012-06-01 08:55:14 -07:00
Brian Behlendorf	b39d3b9f7b	Linux 3.3 compat, iops->create()/mkdir()/mknod() The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #701	2012-04-30 12:52:38 -07:00
Richard Laager	109491a897	Improve error message consistency Signed-off-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-04-11 10:43:17 -07:00
Brian Behlendorf	1c5de20ae2	Add --enable-debug-dmu-tx configure option Allow rigorous (and expensive) tx validation to be enabled/disabled indepentantly from the standard zfs debugging. When enabled these checks ensure that all txs are constructed properly and that a dbuf is never dirtied without taking the correct tx hold. This checking is particularly helpful when adding new dmu consumers like Lustre. However, for established consumers such as the zpl with no known outstanding tx construction problems this is just overhead. --enable-debug-dmu-tx - Enable/disable validation of each tx as --disable-debug-dmu-tx it is constructed. By default validation is disabled due to performance concerns. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-03-23 12:25:17 -07:00
Brian Behlendorf	ebe7e575ea	Add .zfs control directory Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. ) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. ) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. ) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. ) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <rohan.puri15@gmail.com> Contributiobs-by: Andrew Barnes <barnes333@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #173	2012-03-22 13:03:47 -07:00
Ned Bass	613d88eda8	Align parition end on 1 MiB boundary Some devices have exhibited sensitivity to the ending alignment of partitions. In particular, even if the first partition begins at 1 MiB, we have seen many sd driver task abort errors with certain SSDs if the first partition doesn't end on a 1 MiB boundary. This occurs when the vdev label is read during pool creation or importation and causes a delay of about 30 seconds per device. It can also be simulated with dd when the pool isn't imported: dd if=/dev/sda1 of=/dev/null bs=262144 count=1 For the record, this problem was observed with SMARTMOD SG9XCA2E200GE01 200GB SSDs. Unfortunately I don't have a good explanation for this behavior. It seems to have something to do with highly fragmented single-sector requests being issued to the device, which it may not support. With end-aligned partitions at least page-sized requests were queued and issued to the driver according to blktrace. In any case, aligning the partition end is a fairly innocuous work-around, wasting at most 1 MiB of space. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #574	2012-03-05 09:49:50 -08:00
Brian Behlendorf	4b787d75c8	Cleanly support debug packages Allow a source rpm to be rebuilt with debugging enabled. This avoids the need to have to manually modify the spec file. By default debugging is still largely disabled. To enable specific debugging features use the following options with rpmbuild. '--with debug' - Enables ASSERTs # For example: $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm Additionally, ZFS_CONFIG has been added to zfs_config.h for packages which build against these headers. This is critical to ensure both zfs and the dependant package are using the same prototype and structure definitions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-27 14:08:17 -08:00
Richard Yao	b41c9906dc	Support ashift=13 for 8KB SSD block sizes New SSDs are now available which use an internal 8k block size. To make sure ZFS can get the maximum performance out of these devices we're increasing the maximum ashift to 13 (8KB). This value is still small enough that we can fit 16 uberblocks in the vdev ring label. However, I don't want to increase this any futher or it will limit the ability the safely roll back a pool to recover it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #565	2012-02-13 12:25:27 -08:00
Etienne Dechamps	30930fba21	Add support for DISCARD to ZVOLs. DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning. It allows ZVOL clients to discard (unmap, trim) block ranges from a ZVOL, thus optimizing disk space usage by allowing a ZVOL to shrink instead of just grow. We can't use zfs_space() or zfs_freesp() here, since these functions only work on regular files, not volumes. Fortunately we can use the low-level function dmu_free_long_range() which does exactly what we want. Currently the discard operation is not added to the log. That's not a big deal since losing discard requests cannot result in data corruption. It would however result in disk space usage higher than it should be. Thus adding log support to zvol_discard() is probably a good idea for a future improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-09 16:19:38 -08:00
Etienne Dechamps	cb2d19010d	Support the fallocate() file operation. Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is supported, since it's the only one that matches the behavior of zfs_space(). This makes it pretty much useless in its current form, but it's a start. To support other flag combinations we would need to modify zfs_space() to make it more flexible, or emulate the desired functionality in zpl_fallocate(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #334	2012-02-09 16:19:32 -08:00
Etienne Dechamps	34037afe24	Improve ZVOL queue behavior. The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Etienne Dechamps	b18019d2d8	Fix synchronicity for ZVOLs. zvol_write() assumes that the write request must be written to stable storage if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed, "sync" does not mean what we think it means in the context of the Linux block layer. This is well explained in linux/fs.h: WRITE: A normal async write. Device will be plugged. WRITE_SYNC: Synchronous write. Identical to WRITE, but passes down the hint that someone will be waiting on this IO shortly. WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush. WRITE_FUA: Like WRITE_SYNC but data is guaranteed to be on non-volatile media on completion. In other words, SYNC does not mean that the write must be on stable storage on completion. It just means that someone is waiting on us to complete the write request. Thus triggering a ZIL commit for each SYNC write request on a ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL users have no way to express that they actually want data to be written to stable storage, which means the ZIL is broken for ZVOLs. The request for stable storage is expressed by the FUA flag, so we must commit the ZIL after the write if the FUA flag is set. In addition, we must commit the ZIL before the write if the FLUSH flag is set. Also, we must inform the block layer that we actually support FLUSH and FUA. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-07 16:23:06 -08:00
Brian Behlendorf	47621f3d76	Linux 3.3 compat, sops->show_options() The second argument of sops->show_options() was changed from a 'struct vfsmount ' to a 'struct dentry '. Add an autoconf check to detect the API change and then conditionally define the expected interface. In either case we are only interested in the zfs_sb_t. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #549	2012-02-03 10:02:01 -08:00
Prakash Surya	ff998d804f	Ignore dataset if the dds_type is DMU_OST_OTHER Since the zpios and potentially other ZFS tests use the DMU_OST_OTHER type to label their datasets, the zpool and zfs commands should gracefully handle this type when it is encountered. This patch modifies the commands' behavior to ignore any datasets with a dds_type of DMU_OST_OTHER. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #536	2012-01-19 09:29:48 -08:00
Darik Horn	f783130a1f	Allow GPT+EFI vdev replacement in boot pools. Commit zfsonlinux/zfs@57a4eddc4d allows the bootfs property to be set on any pool, but does not accommodate subsequent vdev changes. For example: # zpool replace rpool /dev/sda /dev/sdb operation not supported on this type of pool property 'bootfs' is not supported on EFI labeled devices For non-Solaris builds, disable the check that emits this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:05:24 -08:00
Darik Horn	750562833f	Combine libraries: spl, avl, efi, share, unicode. These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #430	2012-01-17 15:19:50 -08:00
Suman Chakravartula	e18be9a637	Add overlay(-O) mount option support Linux supports mounting over non-empty directories by default. In Solaris this is not the case and -O option is required for zfs mount to mount a zfs filesystem over a non-empty directory. For compatibility, I've added support for -O option to mount zfs filesystems over non-empty directories if the user wants to, just like in Solaris. I've defined MS_OVERLAY to record it in the flags variable if the -O option is supplied. The flags variable passes through a few functions and its checked before performing the empty directory check in zfs_mount function. If -O is given, the check is not performed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #473	2012-01-12 15:49:38 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Suman Chakravartula	ada8ec1ec5	Allow leading digits in userquota/groupquota names While setting/getting userquota and groupquota properties, the input was not treated as a possible username or groupname if it had a leading digit. While useradd in linux recommends the regexp [a-z_][a-z0-9_-]*[$]? , it is not enforced. This causes problem for usernames with leading digits in them. We need to be able to support getting and setting properties for this unconventional but possible input category I've updated the code to validate the username or groupname directly via the API. Also, note that I moved this validation to the beginning before the check for SID names with @. This also supports usernames with @ character in them which are valid. Only when input with @ is not a valid username, it is interpreted as a potential SID name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #428	2011-11-21 16:29:18 -08:00
Brian Behlendorf	ca5fd24984	Limit maximum ashift value to 12 While we initially allowed you to set your ashift as large as 17 (SPA_MAXBLOCKSIZE) that is actually unsafe. What wasn't considered at the time is that each uberblock written to the vdev label ring buffer will be of this size. Now the buffer is statically sized to 128k and we need to be able to fit several uberblocks in it. With a large ashift that becomes a problem. Therefore I'm reducing the maximum configurable ashift value to 12. This is large enough for the 4k sector drives and small enough that we can still keep the most recent 32 uberblock in the vdev label ring buffer. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #425	2011-11-11 14:50:48 -08:00
Brian Behlendorf	5547c2f1bf	Simplify BDI integration Update the code to use the bdi_setup_and_register() helper to simplify the bdi integration code. The updated code now just registers the bdi during mount and destroys it during unmount. The only complication is that for 2.6.32 - 2.6.33 kernels the helper wasn't available so in these cases the zfs code must provide it. Luckily the bdi_setup_and_register() function is trivial. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #367	2011-11-08 10:19:03 -08:00
Brian Behlendorf	de0a1c099b	Autogen refresh for udev changes Run autogen.sh using the same autotools versions as upstream: * autoconf-2.63 * automake-1.11.1 * libtool-2.2.6b	2011-08-08 16:30:27 -07:00
Brian Behlendorf	76659dc110	Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174	2011-08-04 13:37:38 -07:00
Gunnar Beutner	3132cb397a	Use /dev/null for stdout/stderr in libzfs_run_process(). Simply closing the stdout and/or stderr file descriptors for the child process can have bad side effects if for example the child writes to stdout/stderr after open()ing a file. The open() call might have returned the same file descriptor one would usually expect for stdout/stderr (1 and 2), thereby causing mis-directed writes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #190	2011-08-01 13:52:23 -07:00
Alexander Stetsenko	0b7936d5c2	Illumos #278 : get rid zfs of python and pyzfs dependencies Remove all python and pyzfs dependencies for consistency and to ensure full functionality even in a mimimalist environment. Reviewed by: gordon.w.ross@gmail.com Reviewed by: trisk@opensolaris.org Reviewed by: alexander.r.eremin@gmail.com Reviewed by: jerry.jelinek@joyent.com Approved by: garrett@nexenta.com References to Illumos issue and patch: - https://www.illumos.org/issues/278 - https://github.com/illumos/illumos-gate/commit/1af68beac3 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340 Issue #160 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-08-01 12:09:36 -07:00
Matt Ahrens	f5fc4acaa7	Illumos #1092 : zfs refratio property Add a "REFRATIO" property, which is the compression ratio based on data referenced. For snapshots, this is the same as COMPRESSRATIO, but for filesystems/volumes, the COMPRESSRATIO is based on the data "USED" (ie, includes blocks in children, but not blocks shared with the origin). This is needed to figure out how much space a filesystem would use if it were not compressed (ignoring snapshots). Reviewed by: George Wilson <George.Wilson@delphix.com> Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Mark Musante <Mark.Musante@oracle.com> Reviewed by: Garrett D'Amore <garrett@nexenta.com> Approved by: Garrett D'Amore <garrett@nexenta.com> References to Illumos issue and patch: - https://www.illumos.org/issues/1092 - https://github.com/illumos/illumos-gate/commit/187d6ac08a Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #340	2011-08-01 12:09:11 -07:00
Kyle Fuller	615ab66d18	Provide a rc.d script for archlinux Unlike most other Linux distributions archlinux installs its init scripts in /etc/rc.d insead of /etc/init.d. This commit provides an archlinux rc.d script for zfs and extends the build infrastructure to ensure it get's installed in the correct place. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #322	2011-07-11 14:12:23 -07:00
Brian Behlendorf	b1c932d318	Add proper library versioning The zfs libraries were never properly versioned. Since the API has remained static for quite some time this we never an issue. However, going forward they should be versioned. This commit versions all of the libraries to 1.0.0. From here on out this version must be updated to reflect changes to the library.	2011-07-06 09:20:28 -07:00
Gunnar Beutner	52e7c3a2e5	Link libshare directly to libzfs Drop usage of dlopen/dlsym for libshare. There is no need to do this because the zfs packages provide libshare. Unlike on Solaris we are guaranteed it will be available. This avoids possible problems with hardcoding the libshare path in the code (e.g. when users specify a different install path via configure options). It additionally simplifies the code which is good for maintainability. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-07-06 09:20:28 -07:00
Gunnar Beutner	46e18b3f0f	Implemented sharing datasets via NFS using libshare. The sharenfs and sharesmb properties depend on the libshare library to export datasets via NFS and SMB. This commit implements the base libshare functionality as well as support for managing NFS shares. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-07-06 09:20:28 -07:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Brian Behlendorf	5c03efc379	Linux compat 2.6.39: security_inode_init_security() The security_inode_init_security() function now takes an additional qstr argument which must be passed in from the dentry if available. Passing a NULL is safe when no qstr is available the relevant security checks will just be skipped. Closes #246 Closes #217 Closes #187	2011-07-01 12:40:08 -07:00
Prasad Joshi	b312979252	Tear down and flush the mmap region The inode eviction should unmap the pages associated with the inode. These pages should also be flushed to disk to avoid the data loss. Therefore, use truncate_setsize() in evict_inode() to release the pagecache. The API truncate_setsize() was added in 2.6.35 kernel. To ensure compatibility with the old kernel, the patch defines its own truncate_setsize function. Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com> Closes #255	2011-06-27 09:59:19 -07:00
Christian Kohlschütter	df30f56639	Add "ashift" property to zpool create Some disks with internal sectors larger than 512 bytes (e.g., 4k) can suffer from bad write performance when ashift is not configured correctly. This is caused by the disk not reporting its actual sector size, but a sector size of 512 bytes. The drive may behave this way for compatibility reasons. For example, the WDC WD20EARS disks are known to exhibit this behavior. When creating a zpool, ZFS takes that wrong sector size and sets the "ashift" property accordingly (to 9: 1<<9=512), whereas it should be set to 12 for 4k sectors (1<<12=4096). This patch allows an adminstrator to manual specify the known correct ashift size at 'zpool create' time. This can significantly improve performance in certain cases. However, it will have an impact on your total pool capacity. See the updated ashift property description in the zpool.8 man page for additional details. Valid values for the ashift property range from 9 to 17 (512B-128KB). Additionally, you may set the ashift to 0 if you wish to auto-detect the sector size based on what the disk reports, this is the default behavior. The most common ashift values are 9 and 12. Example: zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd Closes #280 Original-patch-by: Richard Laager <rlaager@wiktel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-06-17 16:35:49 -07:00
Brian Behlendorf	2e08aedba4	Always check -Wno-unused-but-set-variable gcc support The previous commit `8a7e1ceefa` wasn't quite right. This check applies to both the user and kernel space build and as such we must make sure it runs regardless of what the --with-config option is set too. For example, if --with-config=kernel then the autoconf test does not run and we generate build warnings when compiling the kernel packages.	2011-06-14 16:40:35 -07:00
Brian Behlendorf	8a7e1ceefa	Check for -Wno-unused-but-set-variable gcc support Gcc versions 4.3.2 and earlier do not support the compiler flag -Wno-unused-but-set-variable. This can lead to build failures on older Linux platforms such as Debian Lenny. Since this is an optional build argument this changes add a new autoconf check for the option. If it is supported by the installed version of gcc then it is used otherwise it is omited. See commit's `12c1acde76` and `79713039a2` for the reason the -Wno-unused-but-set-variable options was originally added.	2011-06-14 14:43:22 -07:00
Brian Behlendorf	1b9d8c340f	Fix 'zfs send -D' segfault Sending pools with dedup results in a segfault due to a Solaris portability issue. Under Solaris the pipe(2) library call creates a bidirectional data channel. Unfortunately, on Linux pipe(2) call creates unidirection data channel. The fix is to use the socketpair(2) function to create the expected bidirectional channel. Seth Heeren did the original leg work on this issue for zfs-fuse. We finally just rediscovered the same portability issue and dfurphy was able to point me at the original issue for the fix. Closes #268	2011-06-09 13:58:48 -07:00
Brian Behlendorf	df554c148e	Fix 'zfs set volsize=N pool/dataset' This change fixes a kernel panic which would occur when resizing a dataset which was not open. The objset_t stored in the zvol_state_t will be set to NULL when the block device is closed. To avoid this issue we pass the correct objset_t as the third arg. The code has also been updated to correctly notify the kernel when the block device capacity changes. For 2.6.28 and newer kernels the capacity change will be immediately detected. For earlier kernels the capacity change will be detected when the device is next opened. This is a known limitation of older kernels. Online ext3 resize test case passes on 2.6.28+ kernels: $ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023 $ zpool create tank /tmp/zvol $ zfs create -V 500M tank/zd0 $ mkfs.ext3 /dev/zd0 $ mkdir /mnt/zd0 $ mount /dev/zd0 /mnt/zd0 $ df -h /mnt/zd0 $ zfs set volsize=800M tank/zd0 $ resize2fs /dev/zd0 $ df -h /mnt/zd0 Original-patch-by: Fajar A. Nugraha <github@fajar.net> Closes #68 Closes #84	2011-05-02 08:54:40 -07:00
Gunnar Beutner	055656d4f4	Implemented NFS export_operations. Implemented the required NFS operations for exporting ZFS datasets using the in-kernel NFS daemon.	2011-04-29 12:36:13 -07:00
Darik Horn	492b8e9e7b	Use gethostid in the Linux convention. Disable the gethostid() override for Solaris behavior because Linux systems implement the POSIX standard in a way that allows a negative result. Mask the gethostid() result to the lower four bytes, like coreutils does in /usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an eight byte long type. This can cause a spurious hostid mismatch that prevents zpool import on 64-bit systems.	2011-04-25 10:36:17 -05:00
Richard Laager	826ab7ad19	Support IEC base-2 prefixes Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-04-19 12:56:21 -07:00
Brian Behlendorf	12c1acde76	Set -Wno-unused-but-set-variable globally As of gcc-4.6 the option -Wunused-but-set-variable is enabled by default. While this is a useful warning there are numerous places in the ZFS code when a variable is set and then only checked in an ASSERT(). To avoid having to update every instance of this in the code we now set -Wno-unused-but-set-variable to suppress the warning. Additionally, when building with --enable-debug and -Werror set these warning also become fatal. We can reevaluate the suppression of these error at a later time if it becomes an issue. For now we are basically just reverting to the previous gcc behavior.	2011-04-19 10:44:10 -07:00
Brian Behlendorf	bdf4328b04	Linux 2.6.28 compat, insert_inode_locked() Added insert_inode_locked() helper function, prior to this most callers used insert_inode_hash(). The older method doesn't check for collisions in the inode_hashtable but it still acceptible for use. Fallback to using insert_inode_hash() when insert_inode_locked() is unavailable.	2011-03-22 12:15:54 -07:00
Brian Behlendorf	f47c42e214	Merge branch 'dracut'	2011-03-22 12:13:04 -07:00
Brian Behlendorf	716895b161	Fix 'LDFLAGS=-Wl,--as-needed' build error Compiling with 'LDFLAGS=-Wl,--as-needed' exposed the fact that there were some library linking problems introduced by mount_zfs. In particular, the libzfs library does use nvpair symbols, and mount_zfs contains no dependencies on libzpool. Closes #161 Closes #162	2011-03-18 14:47:19 -07:00
Brian Behlendorf	01c0e61da0	Add init scripts To support automatically mounting your zfs on filesystem on boot a basic init script is needed. Unfortunately, every distribution has their own idea of the _right_ way to do things. Rather than write one very complicated portable init script, which would be invariably replaced by the distributions own anyway. I have instead added support to provide multiple distribution specific init scripts. The correct init script for your distribution will be selected by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT. During 'make install' the correct script for your system will be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the usual /etc/init.d/zfs location. Currently, there is zfs.fedora and a more generic zfs.lsb init script. Hopefully, the distribution maintainers who know best how they want their init scripts to function will feedback their approved versions to be included in the project. This change does not consider upstart jobs but I'm not at all opposed to add that sort of thing.	2011-03-17 16:51:54 -07:00
Brian Behlendorf	9ac97c2a93	Print mount/umount errors Because we are dependent of the system mount/umount utilities to ensure correct mtab locking, we should not suppress their error output. During a successful mount/umount they will be silent, but during a failure the error message they print is the only sure way to know why a mount failed. This is because the (u)mount(8) return code does not contain the result of the system call issued. The only way to clearly idenify why thing failed is to rely on the error message printed by the tool. Longer term once libmount is available we can issue the mount/umount system calls within the tool and still be ensured correct mtab locking. Closed #107	2011-03-09 15:26:48 -08:00
Brian Behlendorf	d53368f675	Fix mount helper Several issues related to strange mount/umount behavior were reported and this commit should address most of them. The original idea was to put in place a zfs mount helper (mount.zfs). This helper is used to enforce 'legacy' mount behavior, and perform any extra mount argument processing (selinux, zfsutil, etc). This helper wasn't ready for the 0.6.0-rc1 release but with this change it's functional but needs to extensively tested. This change addresses the following open issues. Closes #101 Closes #107 Closes #113 Closes #115 Closes #119	2011-03-09 15:26:48 -08:00
Brian Behlendorf	5075c7ea69	Add missing libspl+libzpool libs to libzfs The libspl and libzpool libraries were missing from the libzfs Makefile.am. They should be explicitly listed to avoid build issues when compiling static libraries and binaries. Additionally, ensure libzpool is built before libzfs because libzfs is dependent on libzpool. This was also exposed as an issue when forcing static linking.	2011-03-03 15:48:57 -08:00
Brian Behlendorf	45066d1f20	Linux 2.6.38 compat, blkdev_get_by_path() The open_bdev_exclusive() function has been replaced (again) by the more generic blkdev_get_by_path() function. Additionally, the counterpart function close_bdev_exclusive() has been replaced by blkdev_put(). Because these functions are more generic versions of the functions they replaced the compatibility macro must add the FMODE_EXCL mask to ensure they are exclusive. Closes #114	2011-02-23 12:29:38 -08:00
Brian Behlendorf	f03e41e8da	Improve 'zpool import' safety There are three improvements here to 'zpool import' proposed by Fajar in Github issue #98. They are all good so I'm commiting all three. 1) Add descriptions for "hpet" and "core" blacklist entries. 2) Add "core" to the blacklist, as described in the issue accessing this device will crash Xen dom0. 3) Refine probing behavior to use fstatat64(). This allows us to determine if a device is a block device or a regular file without having to open it. This is the safest appraoch when probing /dev/ because the simple act of opening a device may have unexpected consequences. Closes #98	2011-02-17 09:35:43 -08:00
Brian Behlendorf	07bd86718b	Suppress share error on mount Until code is added to support automatically sharing datasets we should return success instead of failure. This prevents the command line tools from returning a non-zero error code. While a user likely won't notice this, test scripts like zconfig.sh do and correctly fail because of it.	2011-02-16 11:05:55 -08:00
Brian Behlendorf	2c395def27	Linux 2.6.36 compat, sops->evict_inode() The new prefered inteface for evicting an inode from the inode cache is the ->evict_inode() callback. It replaces both the ->delete_inode() and ->clear_inode() callbacks which were previously used for this.	2011-02-11 13:47:51 -08:00
Brian Behlendorf	7268e1bec8	Linux 2.6.35 compat, fops->fsync() The fsync() callback in the file_operations structure used to take 3 arguments. The callback now only takes 2 arguments because the dentry argument was determined to be unused by all consumers. To handle this a compatibility prototype was added to ensure the right prototype is used. Our implementation never used the dentry argument either so it's just a matter of using the right prototype.	2011-02-11 09:05:51 -08:00
Brian Behlendorf	777d4af891	Linux 2.6.35 compat, const struct xattr_handler The const keyword was added to the 'struct xattr_handler' in the generic Linux super_block structure. To handle this we define an appropriate xattr_handler_t typedef which can be used. This was the preferred solution because it keeps the code clean and readable.	2011-02-10 16:29:00 -08:00
Brian Behlendorf	1ac0ea38a5	Add missing -ldl linker option The inclusion on dlsym(), dlopen(), and dlclose() symbols require us to link against the dl library. Be careful to add the flag to both the libzfs library and the commands which depend on the library.	2011-02-10 11:05:44 -08:00
Brian Behlendorf	b4ead57cfb	Remove HAVE_ZPL from commands and libraries Thanks to the previous few commits we can now build all of the user space commands and libraries with support for the zpl.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	9a616b5d17	Documentation updates Minor Linux specific documentation updates to the comments and man pages.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	c5d915f423	Minimal libshare infrastructure ZFS even under Solaris does not strictly require libshare to be available. The current implementation attempts to dlopen() the library to access the needed symbols. If this fails libshare support is simply disabled. This means that on Linux we only need the most minimal libshare implementation. In fact just enough to prevent the build from failing. Longer term we can decide if we want to implement a libshare library like Solaris. At best this would be an abstraction layer between ZFS and NFS/SMB. Alternately, we can drop libshare entirely and directly integrate ZFS with Linux's NFS/SMB. Finally the bare bones user-libshare.m4 test was dropped. If we do decide to implement libshare at some point it will surely be as part of this package so the check is not needed.	2011-02-04 16:14:29 -08:00
Brian Behlendorf	3fb1fcdea1	Add 'zfs mount' support By design the zfs utility is supposed to handle mounting and unmounting a zfs filesystem. We could allow zfs to do this directly. There are system calls available to mount/umount a filesystem. And there are library calls available to manipulate /etc/mtab. But there are a couple very good reasons not to take this appraoch... for now. Instead of directly calling the system and library calls to (u)mount the filesystem we fork and exec a (u)mount process. The principle reason for this is to delegate the responsibility for locking and updating /etc/mtab to (u)mount(8). This ensures maximum portability and ensures the right locking scheme for your version of (u)mount will be used. If we didn't do this we would have to resort to an autoconf test to determine what locking mechanism is used. The downside to using mount(8) instead of mount(2) is that we lose the exact errno which was returned by the kernel. The return code from mount(8) provides some insight in to what went wrong but it not quite as good. For the moment this is translated as a best guess in to a errno for the higher layers of zfs. In the long term a shared library called libmount is under development which provides a common API to address the locking and errno issues. Once the standard mount utility has been updated to use this library we can then leverage it. Until then this is the only safe solution. http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html	2011-02-04 16:11:58 -08:00
Brian Behlendorf	feb46b92a7	Open up libzfs_run_process/libzfs_load_module Recently helper functions were added to libzfs_util to load a kernel module or execute a process. Initially this functionality was limited to libzfs but it has become clear there will be other consumers. This change opens up the interface so it may be used where appropriate.	2011-01-28 12:47:57 -08:00
Brian Behlendorf	b3259b6a2b	Autoconf selinux support If libselinux is detected on your system at configure time link against it. This allows us to use a library call to detect if selinux is enabled and if it is to pass the mount option: "context=\"system_u:object_r:file_t:s0" For now this is required because none of the existing selinux policies are aware of the zfs filesystem type. Because of this they do not properly enable xattr based labeling even though zfs supports all of the required hooks. Until distro's add zfs as a known xattr friendly fs type we must use mntpoint labeling. Alternately, end users could modify their existing selinux policy with a little guidance.	2011-01-28 12:45:19 -08:00
Brian Behlendorf	149e873ab1	Fix minor compiler warnings These compiler warnings were introduced when code which was previously #ifdef'ed out by HAVE_ZPL was re-added for use by the posix layer. All of the following changes should be obviously correct and will cause no semantic changes.	2011-01-06 15:04:28 -08:00
Brian Behlendorf	5e7affae52	Skip /dev/hpet during 'zpool import' If libblkid does not contain ZFS support, then 'zpool import' will scan all block devices in /dev/ to determine which ones are components of a ZFS filesystem. It does this by opening all the devices and stat'ing them to determine which ones are block devices. If the device turns out not to be a block device it is skipped. Usually, this whole process is pretty harmless (although slow). But there are certain devices in /dev/ which must be handled in a very specific way or your system may crash. For example, if /dev/watchdog is simply opened the watchdog timer will be started and your system will panic when the timer expires. It turns out the /dev/hpet causes similiar problems although only when accessed under a virtual machine. For some reason accessing /dev/hpet causes qemu to crash. To address this issue this commit adds /dev/hpet to the device blacklist, it will be skipped solely based on its name.	2010-11-12 09:33:17 -08:00
Ned Bass	6ee71f5ce3	Call modprobe with absolute path Some sudo configurations may not include /sbin in the PATH. libzfs_load_module() currently does not call modprobe with an absolute path, so it may fail under such configurations if called under sudo. This change adds the absolute path to modprobe so we no longer rely on how PATH is set. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:39:57 -07:00
Ned Bass	a2c6816c34	Support shorthand names with zpool remove zpool status displays abbreviated vdev names without leading path components and, in the case of whole disks, without partition information. Also, the zpool subcommands 'create' and 'add' support using shorthand devices names without qualified paths. Prior to this change, however, removing a device generally required specifying its name as it is stored in the vdev label. So while zpool status might list a cache disk with a name like A16, removing it would require a full path such as /dev/disk/zpool/A16-part1, which is non-intuitive. This change adds support for shorthand device names with the remove subcommand so one can simply type, for example, zpool remove tank A16 A consequence of this change is that including the partition information when removing a whole-disk vdev now results in an error. While this is arguably the correct behavior, it is a departure from how zpool previously worked in this project. This change removes the only reference to ctd_check_path(), so that function is also removed to avoid compiler warnings. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:46 -07:00
Ned Bass	79e7242a91	Add helper functions for manipulating device names This change adds two helper functions for working with vdev names and paths. zfs_resolve_shortname() resolves a shorthand vdev name to an absolute path of a file in /dev, /dev/disk/by-id, /dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid, /dev/disk/zpool. This was previously done only in the function is_shorthand_path(), but we need a general helper function to implement shorthand names for additional zpool subcommands like remove. is_shorthand_path() is accordingly updated to call the helper function. There is a minor change in the way zfs_resolve_shortname() tests if a file exists. is_shorthand_path() effectively used open() and stat64() to test for file existence, since its scope includes testing if a device is a whole disk and collecting file status information. zfs_resolve_shortname(), on the other hand, only uses access() to test for existence and leaves it to the caller to perform any additional file operations. This seemed like the most general and lightweight approach, and still preserves the semantics of is_shorthand_path(). zfs_append_partition() appends a partition suffix to a device path. This should be used to generate the name of a whole disk as it is stored in the vdev label. The user-visible names of whole disks do not contain the partition information, while the name in the vdev label does. The code was lifted from the function make_disks(), which now just calls the helper function. Again, having a helper function to do this supports general handling of shorthand names in the user interface. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-22 12:25:30 -07:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Ned Bass	4b1abce9f5	Make commands load zfs module on demand This commit modifies libzfs_init() to attempt to load the zfs kernel module if it is not already loaded. This is done to simplify initialization by letting users simply import their zpools without having to first load the module. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-11 12:22:36 -07:00
Ned Bass	83c62c9399	Strip partition from device name for whole disks Under Solaris, the slice number is chopped off when displaying the device name if the vdev is a whole disk. Under Linux we should similarly discard the partition number. This commit adds the logic to perform the name truncation for devices ending in -partX, XpX, or X, where X is a string of digits. The second case handles devices like md0p0. The third case is limited to scsi and ide disks, i.e. those beginning with "sd" or "hd", in order to avoid stripping the number from names like "loop0". This commit removes the Solaris-specific code for removing slices, since we no longer reasonably expect our changes to be merged in upstream. The partition stripping code was moved off to a helper function to improve readability. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-04 13:53:24 -07:00
Ned Bass	858219cc4e	Fix missing vdev names in zpool status output Top-level vdev names in zpool status output should follow a <type-id> naming convention. In the case of raidz devices, the type portion of the name was missing. This commit fixes a bug in zpool_vdev_name() where in this snprintf call (void) snprintf(buf, sizeof (buf), "%s-%llu", path, (u_longlong_t)id); buf and path may point to the same location. The result is that buf ends up containing only the "-id" part. This only occurred for raidz devices because the code for appending the parity level to the type string stored its result in buf then set path to point there. To fix this we allocate a new temporary buffer on the stack instead of reusing buf. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #57	2010-09-23 12:14:06 -07:00
Brian Behlendorf	6283f55ea1	Support custom build directories and move includes One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.	2010-09-08 12:38:56 -07:00
Brian Behlendorf	e70e591c51	Add initial autoconf products Add the initial products from autogen.sh. These products will be updated incrementally after this point as development occurs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:02 -07:00
Brian Behlendorf	9b020fd97a	Add linux user util support This topic branch contains required changes to the user space utilities to allow them to integrate cleanly with Linux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:01 -07:00
Brian Behlendorf	d603ed6c27	Add linux user disk support This topic branch contains all the changes needed to integrate the user side zfs tools with Linux style devices. Primarily this includes fixing up the Solaris libefi library to be Linux friendly, and integrating with the libblkid library which is provided by e2fsprogs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:00 -07:00
Brian Behlendorf	f1fb119f6b	Add linux unused code tracking Track various large hunks which have been dropped simply because they are not relevant to this port. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:00 -07:00
Brian Behlendorf	6b003d7cda	Add linux topology support Solaris recently introduced the idea of drive topology because where a drive is located does matter. I have already handled this with udev/blkid integration under Linux so I'm hopeful this case can simply be removed but for now I've just stubbed out what is needed in libspl and commented out the rest here. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:42:00 -07:00
Brian Behlendorf	054bc00b4c	Add linux compatibility Resolve minor Linux compatibility issues. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:59 -07:00
Brian Behlendorf	9c905c550b	Add linux sha2 support The upstream ZFS code has correctly moved to a faster native sha2 implementation. Unfortunately, under Linux that's going to be a little problematic so we revert the code to the more portable version contained in earlier ZFS releases. Using the native sha2 implementation in Linux is possible but the API is slightly different in kernel version user space depending on which libraries are used. Ideally, we need a fast implementation of SHA256 which builds as part of ZFS this shouldn't be that hard to do but it will take some effort. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:59 -07:00
Brian Behlendorf	325f023544	Add linux kernel device support This branch contains the majority of the changes required to cleanly intergrate with Linux style special devices (/dev/zfs). Mainly this means dropping all the Solaris style callbacks and replacing them with the Linux equivilants. This patch also adds the onexit infrastructure needed to track some minimal state between ioctls. Under Linux it would be easy to do this simply using the file->private_data. But under Solaris they apparent need to pass the file descriptor as part of the ioctl data and then perform a lookup in the kernel. Once again to keep code change to a minimum I've implemented the Solaris solution. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:50 -07:00
Brian Behlendorf	2eadf037f5	Add linux mntent support Use mount entry if HAVE_SETMNTENT defined Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:50 -07:00
Brian Behlendorf	d2c15e84e9	Add linux mlslabel support The ZFS update to onnv_141 brought with it support for a security label attribute called mlslabel. This feature depends on zones to work correctly and thus I am disabling it under Linux. Equivilant functionality could be added at some point in the future. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:49 -07:00
Brian Behlendorf	be160928b7	Add linux idmap support Use idmap service if available. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:49 -07:00
Brian Behlendorf	266852767f	Add linux events This topic branch leverages the Solaris style FMA call points in ZFS to create a user space visible event notification system under Linux. This new system is called zevent and it unifies all previous Solaris style ereports and sysevent notifications. Under this Linux specific scheme when a sysevent or ereport event occurs an nvlist describing the event is created which looks almost exactly like a Solaris ereport. These events are queued up in the kernel when they occur and conditionally logged to the console. It is then up to a user space application to consume the events and do whatever it likes with them. To make this possible the existing /dev/zfs ABI has been extended with two new ioctls which behave as follows. * ZFS_IOC_EVENTS_NEXT Get the next pending event. The kernel will keep track of the last event consumed by the file descriptor and provide the next one if available. If no new events are available the ioctl() will block waiting for the next event. This ioctl may also be called in a non-blocking mode by setting zc.zc_guid = ZEVENT_NONBLOCK. In the non-blocking case if no events are available ENOENT will be returned. It is possible that ESHUTDOWN will be returned if the ioctl() is called while module unloading is in progress. And finally ENOMEM may occur if the provided nvlist buffer is not large enough to contain the entire event. * ZFS_IOC_EVENTS_CLEAR Clear are events queued by the kernel. The kernel will keep a fairly large number of recent events queued, use this ioctl to clear the in kernel list. This will effect all user space processes consuming events. The zpool command has been extended to use this events ABI with the 'events' subcommand. You may run 'zpool events -v' to output a verbose log of all recent events. This is very similar to the Solaris 'fmdump -ev' command with the key difference being it also includes what would be considered sysevents under Solaris. You may also run in follow mode with the '-f' option. To clear the in kernel event queue use the '-c' option. $ sudo cmd/zpool/zpool events -fv TIME CLASS May 13 2010 16:31:15.777711000 ereport.fs.zfs.config.sync class = "ereport.fs.zfs.config.sync" ena = 0x40982b7897700001 detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0xed976600de75dfa6 (end detector) time = 0x4bec8bc3 0x2e5aed98 pool = "zpios" pool_guid = 0xed976600de75dfa6 pool_context = 0x0 While the 'zpool events' command is handy for interactive debugging it is not expected to be the primary consumer of zevents. This ABI was primarily added to facilitate the addition of a user space monitoring daemon. This daemon would consume all events posted by the kernel and based on the type of event perform an action. For most events simply forwarding them on to syslog is likely enough. But this interface also cleanly allows for more sophisticated actions to be taken such as generating an email for a failed drive. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:36 -07:00
Brian Behlendorf	c9c0d073da	Add build system Add autoconf style build infrastructure to the ZFS tree. This includes autogen.sh, configure.ac, m4 macros, some scripts/*, and makefiles for all the core ZFS components.	2010-08-31 13:41:27 -07:00
Brian Behlendorf	2a442d1629	Fix strncat usage This look like a typo. The intention was to use strlcat() however strncat() was used instead accidentally this may lead to a buffer overflow. This was caught by gcc -D_FORTIFY_SOURCE=2. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:46 -07:00
Brian Behlendorf	235db0acea	Fix deadcode Remove deadcode. It's possible the code should be in use somewhere, but as the source code is laid out it currently is not. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:44 -07:00
Brian Behlendorf	a6098088eb	Fix minor acl issue Minor fixes for newly introduced acl support. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:43 -07:00
Brian Behlendorf	d4ed667343	Fix gcc uninitialized variable warnings Gcc -Wall warn: 'uninitialized variable' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:43 -07:00
Brian Behlendorf	c65aa5b2b9	Fix gcc missing parenthesis warnings Gcc -Wall warn: 'missing parenthesis' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 08:38:35 -07:00
Brian Behlendorf	e75c13c353	Fix gcc missing case warnings Gcc ASSERT() missing cases are impossible Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:03 -07:00
Brian Behlendorf	2598c0012d	Fix gcc missing braces warnings Resolve compiler warnings concerning missing braces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:03 -07:00
Brian Behlendorf	0ccd9d24e4	Fix gcc init pragma warnings Use constructor attribute on non-Solaris platforms. The #pragma init/fini ->__attribute__((constructor/destructor)) conversions, these should go upstream. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:34:02 -07:00
Brian Behlendorf	b8864a233c	Fix gcc cast warnings Gcc -Wall warn: 'lacks a cast' Gcc -Wall warn: 'comparison between pointer and integer' Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:33:32 -07:00
Brian Behlendorf	d6320ddb78	Fix gcc c90 compliance warnings Fix non-c90 compliant code, for the most part these changes simply deal with where a particular variable is declared. Under c90 it must alway be done at the very start of a block. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-27 15:28:32 -07:00
Brian Behlendorf	572e285762	Update to onnv_147 This is the last official OpenSolaris tag before the public development tree was closed.	2010-08-26 14:24:34 -07:00
Brian Behlendorf	428870ff73	Update core ZFS code from build 121 to build 141.	2010-05-28 13:45:14 -07:00
Brian Behlendorf	45d1cae3b8	Rebase master to b121	2009-08-18 11:43:27 -07:00
Brian Behlendorf	9babb37438	Rebase master to b117	2009-07-02 15:44:48 -07:00
Brian Behlendorf	d164b20935	Rebase master to b108	2009-02-18 12:51:31 -08:00
Brian Behlendorf	fb5f0bc833	Rebase master to b105	2009-01-15 13:59:39 -08:00
Brian Behlendorf	172bb4bd5e	Move the world out of /zfs/ and seperate out module build tree	2008-12-11 11:08:09 -08:00
Brian Behlendorf	b6097ae55a	Remove stray stub kernel files which should be brought in my linux-kernel-module patch	2008-12-02 08:47:21 -08:00
Brian Behlendorf	34dc7c2f25	Initial Linux ZFS GIT Repo	2008-11-20 12:01:55 -08:00

... 4 5 6 7 8 ...

462 Commits