Commit Graph

439 Commits

Author SHA1 Message Date
Gvozden Neskovic
ee36c709c3 Performance optimization of AVL tree comparator functions
perf: 2.75x faster ddt_entry_compare()
    First 256bits of ddt_key_t is a block checksum, which are expected
to be close to random data. Hence, on average, comparison only needs to
look at first few bytes of the keys. To reduce number of conditional
jump instructions, the result is computed as: sign(memcmp(k1, k2)).

Sign of an integer 'a' can be obtained as: `(0 < a) - (a < 0)` := {-1, 0, 1} ,
which is computed efficiently.  Synthetic performance evaluation of
original and new algorithm over 1G random keys on 2.6GHz Intel(R) Xeon(R)
CPU E5-2660 v3:

old	6.85789 s
new	2.49089 s

perf: 2.8x faster vdev_queue_offset_compare() and vdev_queue_timestamp_compare()
    Compute the result directly instead of using conditionals

perf: zfs_range_compare()
    Speedup between 1.1x - 2.5x, depending on compiler version and
optimization level.

perf: spa_error_entry_compare()
    `bcmp()` is not suitable for comparator use. Use `memcmp()` instead.

perf: 2.8x faster metaslab_compare() and metaslab_rangesize_compare()
perf: 2.8x faster zil_bp_compare()
perf: 2.8x faster mze_compare()
perf: faster dbuf_compare()
perf: faster compares in spa_misc
perf: 2.8x faster layout_hash_compare()
perf: 2.8x faster space_reftree_compare()
perf: libzfs: faster avl tree comparators
perf: guid_compare()
perf: dsl_deadlist_compare()
perf: perm_set_compare()
perf: 2x faster range_tree_seg_compare()
perf: faster unique_compare()
perf: faster vdev_cache _compare()
perf: faster vdev_uberblock_compare()
perf: faster fuid _compare()
perf: faster zfs_znode_hold_compare()

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Richard Elling <richard.elling@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5033
2016-08-31 14:35:34 -07:00
GeLiXin
c40db193a5 Fix: Build warnings with different gcc optimization levels in debug mode
This fix resolves warnings reported during compiling with different gcc
optimization levels in debug mode,

Test tools:
gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
Linux version: 2.6.32-573.18.1.el6.x86_64, Red Hat Enterprise Linux Server release 6.1 (Santiago)

List of warnings:
CFLAGS=-O1 ./configure --enable-debug ;make
../../module/icp/core/kcf_sched.c: In function ‘kcf_aop_done’:
../../module/icp/core/kcf_sched.c:499: error: ‘fg’ may be used uninitialized in this function
../../module/icp/core/kcf_sched.c:499: note: ‘fg’ was declared here

CFLAGS=-Os ./configure --enable-debug ; make
libzfs_dataset.c: In function ‘zfs_prop_set_list’:
libzfs_dataset.c:1575: error: ‘nvl_len’ may be used uninitialized in this function

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5022
2016-08-29 12:46:18 -07:00
GeLiXin
23827a4ca1 Fix: Array bounds read in zprop_print_one_property()
If the loop index i comes to (ZFS_GET_NCOLS - 1), the cbp->cb_columns[i + 1]
actually read the data of cbp->cb_colwidths[0], which means the array
subscript is above array bounds.

Luckily the cbp->cb_colwidths[0] is always 0 and it seems we haven't
looped enough times to exceed the array bounds so far, but it's really
a secluded risk someday.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5003
2016-08-22 10:23:47 -07:00
Gvozden Neskovic
fc897b24b2 Rework of fletcher_4 module
- Benchmark memory block is increased to 128kiB to reflect real block sizes more
accurately. Measurements include all three stages needed for checksum generation,
i.e. `init()/compute()/fini()`. The inner loop is repeated multiple times to offset
overhead of time function.

- Fastest implementation selects native and byteswap methods independently in
benchmark. To support this new function pointers `init_byteswap()/fini_byteswap()`
are introduced.

- Implementation mutex lock is replaced by atomic variable.

- To save time, benchmark is not executed in userspace. Instead, highest supported
implementation is used for fastest. Default userspace selector is still 'cycle'.

- `fletcher_4_native/byteswap()` methods use incremental methods to finish
calculation if data size is not multiple of vector stride (currently 64B).

- Added `fletcher_4_native_varsize()` special purpose method for use when buffer size
is not known in advance. The method does not enforce 4B alignment on buffer size, and
will ignore last (size % 4) bytes of the data buffer.

- Benchmark `kstat` is changed to match the one of vdev_raidz. It now shows
throughput for all supported implementations (in B/s), native and byteswap,
as well as the code [fastest] is running.

Example of `fletcher_4_bench` running on `Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz`:
implementation   native         byteswap
scalar           4768120823     3426105750
sse2             7947841777     4318964249
ssse3            7951922722     6112191941
avx2             13269714358    11043200912
fastest          avx2           avx2

Example of `fletcher_4_bench` running on `Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz`:
implementation   native         byteswap
scalar           1291115967     1031555336
sse2             2539571138     1280970926
ssse3            2537778746     1080016762
avx2             4950749767     1078493449
avx512f          9581379998     4010029046
fastest          avx512f        avx512f

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4952
2016-08-16 14:11:55 -07:00
GeLiXin
d5884c3453 Fix indefinite article
The indefinite article before nvlist should be "an", not "a".

We have 27 "an nvlist" and 7 "a nvlist" in our comment, they should
stay the same as we are such a strict filesystem.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4941
2016-08-11 11:23:49 -07:00
Gvozden Neskovic
689f093ebc Build user-space with different gcc optimization levels
This fix resolves warnings reported during compiling of user-space
libraries with different gcc optimization levels.

Tested with gcc versions: 4.9.2 (Debian), and 6.1.1 (Fedora).
The patch enables use of following opt levels: O0, O1, O2, O3, Og, Os, Ofast.

List of warnings:

[GCC 4.9.2 -Os]
libzfs_sendrecv.c:3726:26: error: 'clp' may be used uninitialized in this function [-Werror=maybe-uninitialized]

[GCC 4.9.2 -Og]
fs_fletcher.c:323:26: error: 'idx' may be used uninitialized in this function [-Werror=maybe-uninitialized]
dsl_dataset.c:1290:12: error: 'atp' may be used uninitialized in this function [-Werror=maybe-uninitialized]

[GCC 4.9.2 -Ofast]
u8_textprep.c:1310:9: error: 'tc[3ul]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
u8_textprep.c:177:23: error: 'u8t[0ul]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
dsl_dataset.c:2089:37: error: ‘hds’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
dsl_dataset.c:3216:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
dsl_dataset.c:1591:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
dsl_dataset.c:3341:2: error: ‘ds’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
vdev_raidz.c:1153:8: error: 'dcount[2]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
vdev_raidz.c:1167:17: error: 'dst[2]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
kernel.c:1005:2: error: ‘resid’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:2826:8: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:3056:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:1584:13: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:3056:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:1792:66: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
libzfs_dataset.c:3986:35: error: ‘val’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

[GCC 6.1.1]
Resolved in PR #4907

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4937
2016-08-09 14:40:35 -07:00
GeLiXin
88c4c7a067 Fix call zfs_get_name() with invalid parameter
zfs_get_name() expects a parameter of type zfs_handle_t *zhp , but
gets an invalid parameter type of zfs_handle_t **zhp actually in
libzfs_dataset_cmp(), which may trigger a coredump if called.

libzfs_dataset_cmp() working normally so far, just because all the
callers only give datasets of type ZFS_TYPE_FILESYSTEM to it, we
compared their mountpoint and return, luckily.

Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4919
2016-08-08 12:24:57 -07:00
liaoyuxiangqin
e24e62a948 Fix memory leak in function add_config()
Config of a hot spare or l2cache device will leak memory in function
add_config().  At the start of this function, when dealing with a
config which belongs to a hot spare not currently in use or a l2cache
device the config should be freed.

Signed-off-by: liaoyuxiangqin <guo.yong33@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4910
2016-08-01 12:49:03 -07:00
Gvozden Neskovic
78867a0a0a libzfs_import.c: Uninitialized pointer read
In zpool_find_import_scan: Reads an uninitialized pointer or
its target Coverity #150966

Found by static analysis with CoverityScan 0.8.5

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4897
2016-07-29 15:34:12 -07:00
Colin Ian King
b264d9b3e5 libzfs: Fix missing va_end call on ENOSPC and EDQUOT cases
The switch statement in function zfs_standard_error_fmt for the
ENOSPC and EDQUOT cases returns immediately and unlike all other
cases in the switch this does not perform the va_end call.

Perform a break which ends up calling va_end rather than returning
immediately.

Found by static analysis with CoverityScan 0.8.5

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4900
2016-07-29 15:34:12 -07:00
Brian Behlendorf
8a39abaafa Multi-thread 'zpool import' for blkid
Commit 519129f added support to multi-thread 'zpool import' for
the case where block devices are scanned for under /dev/.  This
commit generalizes that logic and applies it to the case where
device names are acquired from libblkid.

The zpool_find_import_scan() and zpool_find_import_blkid()
functions create an AVL tree containing each device name.  Each
entry in this tree is dispatched to a taskq where the function
zpool_open_func() validates the device by opening it and reading
the label.  This may result in additional entries being added
to the tree and those device paths being verified.

This is largely how the upstream OpenZFS code behaves but due to
significant differences the non-Linux code has been dropped for
readability.  Additionally, this code makes use of taskqs and
kmutexs which are normally not available to the command line tools.
Special care has been taken to allow their use in the import
functions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #4794
2016-07-27 13:38:46 -07:00
Gvozden Neskovic
a64f903b06 Fixes for issues found with cppcheck tool
The patch fixes small number of errors/false positives reported by `cppcheck`,
static analysis tool for C/C++.

cppcheck 1.72

$ cppcheck . --force --quiet
[cmd/zfs/zfs_main.c:4444]: (error) Possible null pointer dereference: who_perm
[cmd/zfs/zfs_main.c:4445]: (error) Possible null pointer dereference: who_perm
[cmd/zfs/zfs_main.c:4446]: (error) Possible null pointer dereference: who_perm
[cmd/zpool/zpool_iter.c:317]: (error) Uninitialized variable: nvroot
[cmd/zpool/zpool_vdev.c:1526]: (error) Memory leak: child
[lib/libefi/rdwr_efi.c:1118]: (error) Memory leak: efi_label
[lib/libuutil/uu_misc.c:207]: (error) va_list 'args' was opened but not closed by va_end().
[lib/libzfs/libzfs_import.c:1554]: (error) Dangerous usage of 'diskname' (strncpy doesn't always null-terminate it).
[lib/libzfs/libzfs_sendrecv.c:3279]: (error) Dereferencing 'cp' after it is deallocated / released
[tests/zfs-tests/cmd/file_write/file_write.c:154]: (error) Possible null pointer dereference: operation
[tests/zfs-tests/cmd/randfree_file/randfree_file.c:90]: (error) Memory leak: buf
[cmd/zinject/zinject.c:1068]: (error) Uninitialized variable: dataset
[module/icp/io/sha2_mod.c:698]: (error) Uninitialized variable: blocks_per_int64

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1392
2016-07-27 13:31:22 -07:00
Paul Dagnelie
d1d19c7854 OpenZFS 6876 - Stack corruption after importing a pool with a too-long name
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Calling dsl_dataset_name on a dataset with a 256 byte buffer is asking
for trouble. We should check every dataset on import, using a 1024 byte
buffer and checking each time to see if the dataset's new name is longer
than 256 bytes.

OpenZFS-issue: https://www.illumos.org/issues/6876
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ca8674e
2016-06-28 13:47:04 -07:00
Igor Kozhukhov
eca7b76001 OpenZFS 6314 - buffer overflow in dsl_dataset_name
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6314
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d6160ee
2016-06-28 13:47:03 -07:00
Brian Behlendorf
43e52eddb1 Implement zfs_ioc_recv_new() for OpenZFS 2605
Adds ZFS_IOC_RECV_NEW for resumable streams and preserves the legacy
ZFS_IOC_RECV user/kernel interface.  The new interface supports all
stream options but is currently only used for resumable streams.
This way updated user space utilities will interoperate with older
kernel modules.

ZFS_IOC_RECV_NEW is modeled after the existing ZFS_IOC_SEND_NEW
handler.  Non-Linux OpenZFS platforms have opted to change the
legacy interface in an incompatible fashion instead of adding a
new ioctl.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-06-28 13:47:03 -07:00
Dan McDonald
671c93546c OpenZFS 4986 - receiving replication stream fails if any snapshot exceeds refquota
Authored by: Dan McDonald <danmcd@omniti.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/4986
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5878fad
2016-06-28 13:47:03 -07:00
Matthew Ahrens
47dfff3b86 OpenZFS 2605, 6980, 6902
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
Ported-by: kernelOfTruth <kerneloftruth@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/2605
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/9c3fd12

6980 6902 causes zfs send to break due to 32-bit/64-bit struct mismatch
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/6980
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ea4a67f

Porting notes:
- All rsend and snapshop tests enabled and updated for Linux.
- Fix misuse of input argument in traverse_visitbp().
- Fix ISO C90 warnings and errors.
- Fix gcc 'missing braces around initializer' in
  'struct send_thread_arg to_arg =' warning.
- Replace 4 argument fletcher_4_native() with 3 argument version,
  this change was made in OpenZFS 4185 which has not been ported.
- Part of the sections for 'zfs receive' and 'zfs send' was
  rewritten and reordered to approximate upstream.
- Fix mktree xattr creation, 'user.' prefix required.
- Minor fixes to newly enabled test cases
- Long holds for volumes allowed during receive for minor registration.
2016-06-28 13:47:02 -07:00
Ned Bass
50c957f702 Implement large_dnode pool feature
Justification
-------------

This feature adds support for variable length dnodes. Our motivation is
to eliminate the overhead associated with using spill blocks.  Spill
blocks are used to store system attribute data (i.e. file metadata) that
does not fit in the dnode's bonus buffer. By allowing a larger bonus
buffer area the use of a spill block can be avoided.  Spill blocks
potentially incur an additional read I/O for every dnode in a dnode
block. As a worst case example, reading 32 dnodes from a 16k dnode block
and all of the spill blocks could issue 33 separate reads. Now suppose
those dnodes have size 1024 and therefore don't need spill blocks.  Then
the worst case number of blocks read is reduced to from 33 to two--one
per dnode block. In practice spill blocks may tend to be co-located on
disk with the dnode blocks so the reduction in I/O would not be this
drastic. In a badly fragmented pool, however, the improvement could be
significant.

ZFS-on-Linux systems that make heavy use of extended attributes would
benefit from this feature. In particular, ZFS-on-Linux supports the
xattr=sa dataset property which allows file extended attribute data
to be stored in the dnode bonus buffer as an alternative to the
traditional directory-based format. Workloads such as SELinux and the
Lustre distributed filesystem often store enough xattr data to force
spill bocks when xattr=sa is in effect. Large dnodes may therefore
provide a performance benefit to such systems.

Other use cases that may benefit from this feature include files with
large ACLs and symbolic links with long target names. Furthermore,
this feature may be desirable on other platforms in case future
applications or features are developed that could make use of a
larger bonus buffer area.

Implementation
--------------

The size of a dnode may be a multiple of 512 bytes up to the size of
a dnode block (currently 16384 bytes). A dn_extra_slots field was
added to the current on-disk dnode_phys_t structure to describe the
size of the physical dnode on disk. The 8 bits for this field were
taken from the zero filled dn_pad2 field. The field represents how
many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
This convention results in a value of 0 for 512 byte dnodes which
preserves on-disk format compatibility with older software.

Similarly, the in-memory dnode_t structure has a new dn_num_slots field
to represent the total number of dnode_phys_t slots consumed on disk.
Thus dn->dn_num_slots is 1 greater than the corresponding
dnp->dn_extra_slots. This difference in convention was adopted
because, unlike on-disk structures, backward compatibility is not a
concern for in-memory objects, so we used a more natural way to
represent size for a dnode_t.

The default size for newly created dnodes is determined by the value of
a new "dnodesize" dataset property. By default the property is set to
"legacy" which is compatible with older software. Setting the property
to "auto" will allow the filesystem to choose the most suitable dnode
size. Currently this just sets the default dnode size to 1k, but future
code improvements could dynamically choose a size based on observed
workload patterns. Dnodes of varying sizes can coexist within the same
dataset and even within the same dnode block. For example, to enable
automatically-sized dnodes, run

 # zfs set dnodesize=auto tank/fish

The user can also specify literal values for the dnodesize property.
These are currently limited to powers of two from 1k to 16k. The
power-of-2 limitation is only for simplicity of the user interface.
Internally the implementation can handle any multiple of 512 up to 16k,
and consumers of the DMU API can specify any legal dnode value.

The size of a new dnode is determined at object allocation time and
stored as a new field in the znode in-memory structure. New DMU
interfaces are added to allow the consumer to specify the dnode size
that a newly allocated object should use. Existing interfaces are
unchanged to avoid having to update every call site and to preserve
compatibility with external consumers such as Lustre. The new
interfaces names are given below. The versions of these functions that
don't take a dnodesize parameter now just call the _dnsize() versions
with a dnodesize of 0, which means use the legacy dnode size.

New DMU interfaces:
  dmu_object_alloc_dnsize()
  dmu_object_claim_dnsize()
  dmu_object_reclaim_dnsize()

New ZAP interfaces:
  zap_create_dnsize()
  zap_create_norm_dnsize()
  zap_create_flags_dnsize()
  zap_create_claim_norm_dnsize()
  zap_create_link_dnsize()

The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
spa_maxdnodesize() function should be used to determine the maximum
bonus length for a pool.

These are a few noteworthy changes to key functions:

* The prototype for dnode_hold_impl() now takes a "slots" parameter.
  When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
  ensure the hole at the specified object offset is large enough to
  hold the dnode being created. The slots parameter is also used
  to ensure a dnode does not span multiple dnode blocks. In both of
  these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
  these failure cases are only possible when using DNODE_MUST_BE_FREE.

  If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
  dnode_hold_impl() will check if the requested dnode is already
  consumed as an extra dnode slot by an large dnode, in which case
  it returns ENOENT.

* The function dmu_object_alloc() advances to the next dnode block
  if dnode_hold_impl() returns an error for a requested object.
  This is because the beginning of the next dnode block is the only
  location it can safely assume to either be a hole or a valid
  starting point for a dnode.

* dnode_next_offset_level() and other functions that iterate
  through dnode blocks may no longer use a simple array indexing
  scheme. These now use the current dnode's dn_num_slots field to
  advance to the next dnode in the block. This is to ensure we
  properly skip the current dnode's bonus area and don't interpret it
  as a valid dnode.

zdb
---
The zdb command was updated to display a dnode's size under the
"dnsize" column when the object is dumped.

For ZIL create log records, zdb will now display the slot count for
the object.

ztest
-----
Ztest chooses a random dnodesize for every newly created object. The
random distribution is more heavily weighted toward small dnodes to
better simulate real-world datasets.

Unused bonus buffer space is filled with non-zero values computed from
the object number, dataset id, offset, and generation number.  This
helps ensure that the dnode traversal code properly skips the interior
regions of large dnodes, and that these interior regions are not
overwritten by data belonging to other dnodes. A new test visits each
object in a dataset. It verifies that the actual dnode size matches what
was stored in the ztest block tag when it was created. It also verifies
that the unused bonus buffer space is filled with the expected data
patterns.

ZFS Test Suite
--------------
Added six new large dnode-specific tests, and integrated the dnodesize
property into existing tests for zfs allow and send/recv.

Send/Receive
------------
ZFS send streams for datasets containing large dnodes cannot be received
on pools that don't support the large_dnode feature. A send stream with
large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
unrecognized by an incompatible receiving pool so that the zfs receive
will fail gracefully.

While not implemented here, it may be possible to generate a
backward-compatible send stream from a dataset containing large
dnodes. The implementation may be tricky, however, because the send
object record for a large dnode would need to be resized to a 512
byte dnode, possibly kicking in a spill block in the process. This
means we would need to construct a new SA layout and possibly
register it in the SA layout object. The SA layout is normally just
sent as an ordinary object record. But if we are constructing new
layouts while generating the send stream we'd have to build the SA
layout object dynamically and send it at the end of the stream.

For sending and receiving between pools that do support large dnodes,
the drr_object send record type is extended with a new field to store
the dnode slot count. This field was repurposed from unused padding
in the structure.

ZIL Replay
----------
The dnode slot count is stored in the uppermost 8 bits of the lr_foid
field. The bits were unused as the object id is currently capped at
48 bits.

Resizing Dnodes
---------------
It should be possible to resize a dnode when it is dirtied if the
current dnodesize dataset property differs from the dnode's size, but
this functionality is not currently implemented. Clearly a dnode can
only grow if there are sufficient contiguous unused slots in the
dnode block, but it should always be possible to shrink a dnode.
Growing dnodes may be useful to reduce fragmentation in a pool with
many spill blocks in use. Shrinking dnodes may be useful to allow
sending a dataset to a pool that doesn't support the large_dnode
feature.

Feature Reference Counting
--------------------------
The reference count for the large_dnode pool feature tracks the
number of datasets that have ever contained a dnode of size larger
than 512 bytes. The first time a large dnode is created in a dataset
the dataset is converted to an extensible dataset. This is a one-way
operation and the only way to decrement the feature count is to
destroy the dataset, even if the dataset no longer contains any large
dnodes. The complexity of reference counting on a per-dnode basis was
too high, so we chose to track it on a per-dataset basis similarly to
the large_block feature.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3542
2016-06-24 13:13:21 -07:00
Brian Behlendorf
46ab35954c Remove libzfs_graph.c
The libzfs_graph.c source file should have been removed in 330d06f,
it is entirely unused.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4766
2016-06-16 13:53:15 -07:00
Brian Behlendorf
f74b821a66 Add zfs allow and zfs unallow support
ZFS allows for specific permissions to be delegated to normal users
with the `zfs allow` and `zfs unallow` commands.  In addition, non-
privileged users should be able to run all of the following commands:

  * zpool [list | iostat | status | get]
  * zfs [list | get]

Historically this functionality was not available on Linux.  In order
to add it the secpolicy_* functions needed to be implemented and mapped
to the equivalent Linux capability.  Only then could the permissions on
the `/dev/zfs` be relaxed and the internal ZFS permission checks used.

Even with this change some limitations remain.  Under Linux only the
root user is allowed to modify the namespace (unless it's a private
namespace).  This means the mount, mountpoint, canmount, unmount,
and remount delegations cannot be supported with the existing code.  It
may be possible to add this functionality in the future.

This functionality was validated with the cli_user and delegation test
cases from the ZFS Test Suite.  These tests exhaustively verify each
of the supported permissions which can be delegated and ensures only
an authorized user can perform it.

Two minor bug fixes were required for test-running.py.  First, the
Timer() object cannot be safely created in a `try:` block when there
is an unconditional `finally` block which references it.  Second,
when running as a normal user also check for scripts using the
both the .ksh and .sh suffixes.

Finally, existing users who are simulating delegations by setting
group permissions on the /dev/zfs device should revert that
customization when updating to a version with this change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #362 
Closes #434 
Closes #4100
Closes #4394 
Closes #4410 
Closes #4487
2016-06-07 09:16:52 -07:00
Christer Ekholm
bc2d809387 Make zpool list -vp print individual vdev sizes parsable.
Add argument format to print_one_column(), and use it to call
zfs_nicenum_format with, instead of just zfs_nicenum. Don't print "%"
for fragmentation or capacity percent values.

The calls to print_one_colum is made with ZFS_NICENUM_RAW if
cb->cb_literal (zpool list called with -p), and ZFS_NICENUM_1024 if not.

Also zpool_get_prop is modified to don't add "%" or "x" if literal.

Signed-off-by: Christer Ekholm <che@chrekh.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov
Closes #4657
2016-05-18 10:15:32 -07:00
Tony Hutter
193a37cb24 Add -lhHpw options to "zpool iostat" for avg latency, histograms, & queues
Update the zfs module to collect statistics on average latencies, queue sizes,
and keep an internal histogram of all IO latencies.  Along with this, update
"zpool iostat" with some new options to print out the stats:

-l: Include average IO latencies stats:

 total_wait     disk_wait    syncq_wait    asyncq_wait  scrub
 read  write   read  write   read  write   read  write   wait
-----  -----  -----  -----  -----  -----  -----  -----  -----
    -   41ms      -    2ms      -   46ms      -    4ms      -
    -    5ms      -    1ms      -    1us      -    4ms      -
    -    5ms      -    1ms      -    1us      -    4ms      -
    -      -      -      -      -      -      -      -      -
    -   49ms      -    2ms      -   47ms      -      -      -
    -      -      -      -      -      -      -      -      -
    -    2ms      -    1ms      -      -      -    1ms      -
-----  -----  -----  -----  -----  -----  -----  -----  -----
  1ms    1ms    1ms  413us   16us   25us      -    5ms      -
  1ms    1ms    1ms  413us   16us   25us      -    5ms      -
  2ms    1ms    2ms  412us   26us   25us      -    5ms      -
    -    1ms      -  413us      -   25us      -    5ms      -
    -    1ms      -  460us      -   29us      -    5ms      -
196us    1ms  196us  370us    7us   23us      -    5ms      -
-----  -----  -----  -----  -----  -----  -----  -----  -----

-w: Print out latency histograms:

sdb           total           disk         sync_queue      async_queue
latency    read   write    read   write    read   write    read   write   scrub
-------  ------  ------  ------  ------  ------  ------  ------  ------  ------
1ns           0       0       0       0       0       0       0       0       0
...
33us          0       0       0       0       0       0       0       0       0
66us          0       0     107    2486       2     788      12      12       0
131us         2     797     359    4499      10     558     184     184       6
262us        22     801     264    1563      10     286     287     287      24
524us        87     575      71   52086      15    1063     136     136      92
1ms         152    1190       5   41292       4    1693     252     252     141
2ms         245    2018       0   50007       0    2322     371     371     220
4ms         189    7455      22  162957       0    3912    6726    6726     199
8ms         108    9461       0  102320       0    5775    2526    2526      86
17ms         23   11287       0   37142       0    8043    1813    1813      19
34ms          0   14725       0   24015       0   11732    3071    3071       0
67ms          0   23597       0    7914       0   18113    5025    5025       0
134ms         0   33798       0     254       0   25755    7326    7326       0
268ms         0   51780       0      12       0   41593   10002   10002       0
537ms         0   77808       0       0       0   64255   13120   13120       0
1s            0  105281       0       0       0   83805   20841   20841       0
2s            0   88248       0       0       0   73772   14006   14006       0
4s            0   47266       0       0       0   29783   17176   17176       0
9s            0   10460       0       0       0    4130    6295    6295       0
17s           0       0       0       0       0       0       0       0       0
34s           0       0       0       0       0       0       0       0       0
69s           0       0       0       0       0       0       0       0       0
137s          0       0       0       0       0       0       0       0       0
-------------------------------------------------------------------------------

-h: Help

-H: Scripted mode. Do not display headers, and separate fields by a single
    tab instead of arbitrary space.

-q: Include current number of entries in sync & async read/write queues,
    and scrub queue:

 syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read
 pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----
    0      0      0      0     78     29      0      0      0      0
    0      0      0      0     78     29      0      0      0      0
    0      0      0      0      0      0      0      0      0      0
    -      -      -      -      -      -      -      -      -      -
    0      0      0      0      0      0      0      0      0      0
    -      -      -      -      -      -      -      -      -      -
    0      0      0      0      0      0      0      0      0      0
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----
    0      0    227    394      0     19      0      0      0      0
    0      0    227    394      0     19      0      0      0      0
    0      0    108     98      0     19      0      0      0      0
    0      0     19     98      0      0      0      0      0      0
    0      0     78     98      0      0      0      0      0      0
    0      0     19     88      0      0      0      0      0      0
-----  -----  -----  -----  -----  -----  -----  -----  -----  -----

-p: Display numbers in parseable (exact) values.

Also, update iostat syntax to allow the user to specify specific vdevs
to show statistics for.  The three options for choosing pools/vdevs are:

Display a list of pools:
    zpool iostat ... [pool ...]

Display a list of vdevs from a specific pool:
    zpool iostat ... [pool vdev ...]

Display a list of vdevs from any pools:
    zpool iostat ... [vdev ...]

Lastly, allow zpool command "interval" value to be floating point:
    zpool iostat -v 0.5

Signed-off-by: Tony Hutter <hutter2@llnl.gov
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4433
2016-05-12 12:36:32 -07:00
Marcel Huber
20c901dc7a Fixes bug in fix_paths()
Fixes bug introduced in commit 7d90f569a.  Hinted by gcc:

libzfs_import.c: In function ‘fix_paths’:
libzfs_import.c:602:28: warning: self-comparison always evaluates to true [-Wtautological-compare]
    if (best->ne_num_labels == best->ne_num_labels &&

Signed-off-by: Marcel Huber <marcelhuberfoo@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4632
2016-05-12 12:36:28 -07:00
Adam Stevko
2a8b84b747 OpenZFS 3993, 4700
3993 zpool(1M) and zfs(1M) should support -p for "list" and "get"
4700 "zpool get" doesn't support -H or -o options

Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/3993
OpenZFS-issue: https://www.illumos.org/issues/4700
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c58b352

Porting notes:
I removed ZoL's zpool_get_prop_literal() in favor of
zpool_get_prop(..., boolean_t literal) since that's what OpenZFS
uses.  The functionality is the same.
2016-05-11 11:49:37 -07:00
Chris Williamson
b3744ae611 OpenZFS 6873 - zfs_destroy_snaps_nvl leaks errlist
Authored by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Ported-by: Denys Rtveliashvili <denys@rtveliashvili.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

lzc_destroy_snaps() returns an nvlist in errlist.
zfs_destroy_snaps_nvl() should nvlist_free() it before returning.

OpenZFS-issue: https://www.illumos.org/issues/6873
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/ee06391
Closes #4614
2016-05-09 11:25:41 -07:00
Denys Rtveliashvili
9f8026c802 OpenZFS 6879 - Incorrect endianness swap
Authored by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Ported-by: Denys Rtveliashvili <denys@rtveliashvili.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

Incorrect endianness swap for drr_spill.drr_length in libzfs_sendrecv.c
Instead of drr_write.drr_length, we should be assigning the result of
the byteswap to drr_spill.drr_length.

OpenZFS-issue: https ://www.illumos.org/issues/6879
OpenZFS-commit: https ://github.com/openzfs/openzfs/commit/74c8720
Closes #4613
2016-05-09 11:22:00 -07:00
Josef 'Jeff' Sipek
8a5fc74880 Illumos 6659 - nvlist_free(NULL) is a no-op
6659 nvlist_free(NULL) is a no-op
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Marcel Telka <marcel@telka.sk>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/6659
  https://github.com/illumos/illumos-gate/commit/aab83bb

Ported-by: David Quigley <dpquigl@davequigley.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4566
2016-04-27 15:58:23 -07:00
Brian Behlendorf
325414e483 Fix 'zpool import' blkid device names
When importing a pool using the blkid cache only the device
node path was added to the list of known paths for a device.
This results in 'zpool import' always using the sdX names
in preference to the 'path' name stored in the label.

To fix the issue the blkid import path has been updated to
add both the 'path', 'devid', and 'devname' names from the
label to the known paths.  A sanity check is done to ensure
these paths do refer to the same device identified by blkid.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #4523
Closes #3043
2016-04-25 11:13:26 -07:00
Brian Behlendorf
2d82ea8b11 Use udev for partition detection
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4523
Closes #3708
Closes #4077
Closes #4144
Closes #4214
Closes #4517
2016-04-25 11:13:20 -07:00
Brian Behlendorf
5b4136bd49 Create unique partition labels
When partitioning a device a name may be specified for each partition.
Internally zfs doesn't use this partition name for anything so it
has always just been set to "zfs".

However this isn't optimal because udev will create symlinks using
this name in /dev/disk/by-partlabel/.  If the name isn't unique
then all the links cannot be created.

Therefore a random 64-bit value has been added to the partition
label, i.e "zfs-1234567890abcdef".  Additional information could
be encoded here but since partitions may be reused that might
result in confusion and it was decided against.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes #4517
2016-04-25 11:13:09 -07:00
Nikolay Borisov
8fc5674c52 Rework zpool import excluded devices check
Current zpool import code skips directory entries which have prefixes
similar to some system files on linux such as "fd", "core" etc. However,
this means one cannot have one's zpools hosted inside files which are named
e.g. core-1 or lp. Furthermore, apart from the string checks there is already
which makes the zpool_open_func work only with regular files and block devices.

To fix this problem remove most of the checks since they are redundant but
leave the checks for the 'hpet' and 'watchdog' names. Furthermore, change
the checks to strcmp which albeit less safe than strncmp allows to have
devices whose names are prefixed by 'hpet' or 'watchdog'.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4438
2016-04-18 11:55:22 -07:00
Chunwei Chen
6760077194 Make zfs mount according to relatime config in dataset
Also enable lazytime in mount.zfs

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
2016-04-05 18:55:59 -07:00
Don Brady
39fc0cb557 Add support for devid and phys_path keys in vdev disk labels
This is foundational work for ZED.

Updates a leaf vdev's persistent device strings on Linux platform

* only applies for a dedicated leaf vdev (aka whole disk)
* updated during pool create|add|attach|import
* used for matching device matching during auto-{online,expand,replace}
* stored in a leaf disk config label (i.e. alongside 'path' NVP)
* can opt-out using env var ZFS_VDEV_DEVID_OPT_OUT=YES

Some examples:

    path: '/dev/sdb1'
    devid: 'scsi-350000394a8ca4fbc-part1'
    phys_path: 'pci-0000:04:00.0-sas-0x50000394a8ca4fbf-lun-0'

    path: '/dev/mapper/mpatha'
    devid: 'dm-uuid-mpath-35000c5006304de3f'

Signed-off-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2856
Closes #3978
Closes #4416
2016-03-31 13:45:53 -07:00
Brian Behlendorf
505d9655c9 Fix zdb -e and zhack thread_init()
This issue was caused by calling `thread_init()` and `thread_fini()`
multiple times resulting in `kthread_key` being invalid.  To resolve
the issue the explicit calls to `thread_init()` and `thread_fini()`
required by the `zpool` command have been moved in to the command.
Consumers such as `zdb` and `zhack` perform the same initialized
through `kernel_init()` and `kernel_fini()`.

Resolving this issue allows multiple additional test cases to
be enabled.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #4331
2016-03-21 10:20:02 -07:00
Richard Yao
e853ba3519 Cleanup linking
I noticed during code review of zfsonlinux/zfs#4385 that the author of a
commit had peppered the various Makefile.am files with `$(TIRPC_LIBS)`
when putting it into `lib/libspl/Makefile.am` should have sufficed. Upon
further examination, it seems that he had copied what we do with
`$(ZLIB)`. We also have a bit of that with `-ldl` too.  Unfortunately,
what we do is wrong, so lets fix it to set a good example for future
contributors.

In addition, we have multiple `-lz` and `-luuid` passed to the compiler
because each `AC_CHECK_LIB` adds it to `$LIBS`. That is somewhat
annoying to see, so we switch to `AC_SEARCH_LIBS` to avoid it.  This is
consistent with the recommendation to use `AC_SEARCH_LIBS` over
`AC_CHECK_LIB` by autotools upstream:

https://www.gnu.org/software/autoconf/manual/autoconf-2.66/html_node/Libraries.html

In an ideal world, this would translate into improvements in ELF's
`DT_NEEDED` entries, but that is not the case because of a couple of
bugs in libtool.

The first bug causes libtool to overlink by using static link
dependencies for dynamic linking:

https://wiki.mageia.org/en/Overlinking_issues_in_packaging#libtool_issues

The workaround for this should be to pass `-Wl,--as-needed` in
`LDFLAGS`. That leads us to the second bug, where libtool passes
`LDFLAGS` after the libraries are specified and `ld` will only honor
`--as-needed` on libraries specified before it:

https://sigquit.wordpress.com/2011/02/16/why-asneeded-doesnt-work-as-expected-for-your-libraries-on-your-autotools-project/

There are a few possible workarounds for the second bug. One is to
either patch the compiler spec file to specify `-Wl,--as-needed` or pass
`-Wl,--as-needed` via `CC` like `CC='gcc -Wl,--as-needed'` so that it is
specified early. Another is to patch ltmain.sh like Gentoo does:

https://gitweb.gentoo.org/repo/gentoo.git/tree/eclass/ELT-patches/as-needed

Without one of those workarounds, this cleanup provides no benefit in
terms of `DT_NEEDED` entry generation. It should still be an improvement
because it nicely simplifies the code while encouraging good habits when
patching autotools scripts.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4426
2016-03-18 13:31:11 -07:00
Brian Behlendorf
6bb24f4dc7 Add the ZFS Test Suite
Add the ZFS Test Suite and test-runner framework from illumos.
This is a continuation of the work done by Turbo Fredriksson to
port the ZFS Test Suite to Linux.  While this work was originally
conceived as a stand alone project integrating it directly with
the ZoL source tree has several advantages:

  * Allows the ZFS Test Suite to be packaged in zfs-test package.
    * Facilitates easy integration with the CI testing.
    * Users can locally run the ZFS Test Suite to validate ZFS.
      This testing should ONLY be done on a dedicated test system
      because the ZFS Test Suite in its current form is destructive.
  * Allows the ZFS Test Suite to be run directly in the ZoL source
    tree enabled developers to iterate quickly during development.
  * Developers can easily add/modify tests in the framework as
    features are added or functionality is changed.  The tests
    will then always be in sync with the implementation.

Full documentation for how to run the ZFS Test Suite is available
in the tests/README.md file.

Warning: This test suite is designed to be run on a dedicated test
system.  It will make modifications to the system including, but
not limited to, the following.

  * Adding new users
  * Adding new groups
  * Modifying the following /proc files:
    * /proc/sys/kernel/core_pattern
    * /proc/sys/kernel/core_uses_pid
  * Creating directories under /

Notes:
  * Not all of the test cases are expected to pass and by default
    these test cases are disabled.  The failures are primarily due
    to assumption made for illumos which are invalid under Linux.
  * When updating these test cases it should be done in as generic
    a way as possible so the patch can be submitted back upstream.
    Most existing library functions have been updated to be Linux
    aware, and the following functions and variables have been added.
    * Functions:
      * is_linux          - Used to wrap a Linux specific section.
      * block_device_wait - Waits for block devices to be added to /dev/.
    * Variables:            Linux          Illumos
      * ZVOL_DEVDIR         "/dev/zvol"    "/dev/zvol/dsk"
      * ZVOL_RDEVDIR        "/dev/zvol"    "/dev/zvol/rdsk"
      * DEV_DSKDIR          "/dev"         "/dev/dsk"
      * DEV_RDSKDIR         "/dev"         "/dev/rdsk"
      * NEWFS_DEFAULT_FS    "ext2"         "ufs"
  * Many of the disabled test cases fail because 'zfs/zpool destroy'
    returns EBUSY.  This is largely causes by the asynchronous nature
    of device handling on Linux and is expected, the impacted test
    cases will need to be updated to handle this.
  * There are several test cases which have been disabled because
    they can trigger a deadlock.  A primary example of this is to
    recursively create zpools within zpools.  These tests have been
    disabled until the root issue can be addressed.
  * Illumos specific utilities such as (mkfile) should be added to
    the tests/zfs-tests/cmd/ directory.  Custom programs required by
    the test scripts can also be added here.
  * SELinux should be either is permissive mode or disabled when
    running the tests.  The test cases should be updated to conform
    to a standard policy.
  * Redundant test functionality has been removed (zfault.sh).
  * Existing test scripts (zconfig.sh) should be migrated to use
    the framework for consistency and ease of testing.
  * The DISKS environment variable currently only supports loopback
    devices because of how the ZFS Test Suite expects partitions to
    be named (p1, p2, etc).  Support must be added to generate the
    correct partition name based on the device location and name.
  * The ZFS Test Suite is part of the illumos code base at:
    https://github.com/illumos/illumos-gate/tree/master/usr/src/test

Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes #6
Closes #1534
2016-03-16 13:46:16 -07:00
Thijs Cramer
95003f7098 Updated paths to scan when importing zpool(s)
Added by-partlabel and by-partuuid to the default device search
path.  Made made device names in by-label more preferable.

Signed-off-by: Thijs Cramer <thijs.cramer@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3892
2016-03-09 10:41:23 -08:00
Brian Behlendorf
7d11e37e55 Require libblkid
Historically libblkid support was detected as part of configure
and optionally enabled.  This was done because at the time support
for detecting ZFS pool vdevs had just be added to libblkid and
those updated packages were not yet part of many distributions.
This is no longer the case and any reasonably current distribution
will ship a version of libblkid which can detect ZFS pool vdevs.

This patch makes libblkid mandatory at build time and libblkid
the preferred method of scanning for ZFS pools.  For distributions
which include a modern version of libblkid there is no change in
behavior.  Explicitly scanning the default search paths is still
supported and can be enabled with the '-s' command line option.

Additionally making libblkid mandatory means that the 'zpool create'
command can reliably detect if a specified device has an existing
non-ZFS filesystem (ext4, xfs) and print a warning.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2448
2016-03-09 10:39:22 -08:00
Tony Hutter
272be6834c Fix zpool iostat bandwidth/ops calculation
print_vdev_stats() subtracts the old bandwidth/ops stats from the new stats
to calculate the bandwidth/ops numbers in "zpool iostat".  However when the
TXG numbers change between stats, zpool_refresh_stats() will incorrectly assign
a NULL to the old stats. This causes print_vdev_stats() to use zeroes for
the old bandwidth/ops numbers, resulting in an inaccurate calculation.

This fix allows the calculation to happen even when TXGs change.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4387
2016-03-08 17:43:33 -08:00
Richard Yao
d2f3e292dc Add -gLp to zpool subcommands for alt vdev names
The following options have been added to the zpool add, iostat,
list, status, and split subcommands.  The default behavior was
not modified, from zfs(8).

  -g    Display vdev GUIDs  instead  of  the  normal  short
        device  names.  These GUIDs can be used in-place of
        device   names   for    the    zpool    detach/off‐
        line/remove/replace commands.

  -L    Display real paths for vdevs resolving all symbolic
        links. This can be used to lookup the current block
        device  name regardless of the /dev/disk/ path used
        to open it.

  -p    Display  full  paths  for vdevs instead of only the
        last component of the path.  This can  be  used  in
        conjunction with the -L flag.

This behavior may also be enabled using the following environment
variables.

  ZPOOL_VDEV_NAME_GUID
  ZPOOL_VDEV_NAME_FOLLOW_LINKS
  ZPOOL_VDEV_NAME_PATH

This change is based on worked originally started by Richard Yao
to add a -g option.  Then extended by @ilovezfs to add a -L option
for openzfsonosx.  Those changes have been merged, re-factored,
a -p option added and extended to all relevant zpool subcommands.

Original-patch-by: Richard Yao <ryao@gentoo.org>
Extended-by: ilovezfs <ilovezfs@icloud.com>
Extended-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ilovezfs <ilovezfs@icloud.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2011
Closes #4341
2016-02-25 11:58:39 -08:00
Brian Behlendorf
eea9309423 Prevent zpool_find_vdev() from truncating vdev path
When extracting tokens from the string strtok(2) is allowed to modify
the passed buffer.  Therefore the zfs_strcmp_pathname() function must
make a copy of the passed string before passing it to strtok(3).

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes #4312
2016-02-08 09:37:55 -08:00
Joshua M. Clulow
007595564e Illumos 4448 - zfs diff misprints unicode characters
4448 zfs diff misprints unicode characters
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>

References:
  https://www.illumos.org/issues/4448
  https://github.com/illumos/illumos-gate/commit/b211eb9

Porting Notes:
- [lib/libzfs/libzfs_diff.c]
  - 38145d6 Ensure that zfs diff prints unicode safely.
  - 141b638 Change 3-digit octal escapes to 4-digit ones

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-02-05 13:04:58 -08:00
Andrew Stormont
ee42b3d6c3 Illumos 1778 - Assertion failed: rn->rn_nozpool == B_FALSE
1778 Assertion failed: rn->rn_nozpool == B_FALSE
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>

References:
  https://www.illumos.org/issues/1778
  https://github.com/illumos/illumos-gate/commit/bd0f709

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-22 16:18:26 -08:00
Marcel Telka
0fdd8d6482 Illumos 5518 - Memory leaks in libzfs import implementation
5518 Memory leaks in libzfs import implementation
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serghei Samsi <sscdvp@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5518
  https://github.com/illumos/illumos-gate/commit/078266a

Porting notes:
- One hunk of this change was already applied independently in
  commit 4def05f.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-22 16:17:23 -08:00
Brian Behlendorf
519129ff4e Illumos 6815179, 6844191
6815179 zpool import with a large number of LUNs is too slow
6844191 zpool import, scanning of disks should be multi-threaded

References:
  https://github.com/illumos/illumos-gate/commit/4f67d75

Porting notes:
- This change was originally never ported to Linux due to it
  dependence on the thread pool interface.  This patch solves
  that issue by switching the code to use the existing taskq
  implementation which provides the same basic functionality.
  However, in order for this to work properly thread_init()
  and thread_fini() must be called around to taskq consumer
  to perform the needed thread initialization.

- The check_one_slice, nozpool_all_slices, and check_slices
  functions have been disabled for Linux.  They are difficult,
  but possible, to implement for Linux due to how partitions
  are get names.  Since this is only an optimization this code
  can be added at a latter date.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-22 09:39:46 -08:00
Matthew Ahrens
e3e670d006 Illumos 4953, 4954, 4955
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4953
  https://www.illumos.org/issues/4954
  https://www.illumos.org/issues/4955
  https://github.com/illumos/illumos-gate/commit/33cde0d

Porting notes:
- Dropped qsort libshare_zfs.c hunk, no equivalent ZoL code.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4219
2016-01-15 15:38:36 -08:00
Joe Stein
82f6f6e654 Illumos 6298 - zfs_create_008_neg and zpool_create_023_neg
6298 zfs_create_008_neg and zpool_create_023_neg need to be updated
for large block support
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/6298
  https://github.com/illumos/illumos-gate/commit/e9316f7

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4217
2016-01-15 15:38:35 -08:00
George Wilson
59d4c71cca Illumos 3557, 3558, 3559, 3560
3557 dumpvp_size is not updated correctly when a dump zvol's size is changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
3560 dumpadm should be able to remove a dump device
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Albert Lee <trisk@nexenta.com>

References:
  https://www.illumos.org/issues/3559
  https://github.com/illumos/illumos-gate/commit/c61ea56

Porting notes:
- Internal zvol.c changes not applied due to implementation differences.
  The external interface and behavior was already consistent with the
  latest upstream code.
- Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check.  All
  supported kernels (2.6.32 and newer) provide this interface.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #4217
2016-01-15 15:38:35 -08:00
Marcel Telka
7ea4f88f8f Illumos 6280 - libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
6280 libzfs: unshare_one() could fail with EZFS_SHARENFSFAILED
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/6280
  https://github.com/illumos/illumos-gate/commit/d1672ef

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-12 14:27:50 -08:00
Dan Vatca
fe467e06fd Illumos 6358 - A faulted pool with only unavailable vdevs
6358 A faulted pool with only unavailable vdevs triggers assertion
failure in libzfs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Serban Maduta <serban.maduta@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://illumos.org/issues/6358
  https://github.com/illumos/illumos-gate/commit/b289d04

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-12 11:15:26 -08:00
Joshua M. Clulow
616a57bea8 Illumos 6268 - zfs diff confused by moving a file to another directory
6268 zfs diff confused by moving a file to another directory
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/6268
  https://github.com/illumos/illumos-gate/commit/aab0441

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-12 10:59:24 -08:00
Paul Dagnelie
fcff0f35bd Illumos 5960, 5925
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>

References:
  https://www.illumos.org/issues/5960
  https://www.illumos.org/issues/5925
  https://github.com/illumos/illumos-gate/commit/a2cdcdd

Porting notes:
- [lib/libzfs/libzfs_sendrecv.c]
  - b8864a2 Fix gcc cast warnings
  - 325f023 Add linux kernel device support
  - 5c3f61e Increase Linux pipe buffer size on 'zfs receive'
- [module/zfs/zfs_vnops.c]
  - 3558fd7 Prototype/structure update for Linux
  - c12e3a5 Restructure zfs_readdir() to fix regressions
- [module/zfs/zvol.c]
  - Function @zvol_map_block() isn't needed in ZoL
  - 9965059 Prefetch start and end of volumes
- [module/zfs/dmu.c]
  - Fixed ISO C90 - mixed declarations and code
  - Function dmu_prefetch() 'int i' is initialized before
    the following code block (c90 vs. c99)
- [module/zfs/dbuf.c]
  - fc5bb51 Fix stack dbuf_hold_impl()
  - 9b67f60 Illumos 4757, 4913
  - 34229a2 Reduce stack usage for recursive traverse_visitbp()
- [module/zfs/dmu_send.c]
  - Fixed ISO C90 - mixed declarations and code
  - b58986e Use large stacks when available
  - 241b541 Illumos 5959 - clean up per-dataset feature count code
  - 77aef6f Use vmem_alloc() for nvlists
  - 00b4602 Add linux kernel memory support

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2016-01-08 15:08:19 -08:00
Matthew Ahrens
37f8a8835a Illumos 5746 - more checksumming in zfs send
5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>

References:
  https://www.illumos.org/issues/5746
  https://github.com/illumos/illumos-gate/commit/98110f0
  https://github.com/zfsonlinux/zfs/issues/905

Porting notes:
- Minor conflicts due to:
  - https://github.com/zfsonlinux/zfs/commit/2024041
  - https://github.com/zfsonlinux/zfs/commit/044baf0
  - https://github.com/zfsonlinux/zfs/commit/88904bb
- Fix ISO C90 warnings (-Werror=declaration-after-statement)
  - arc_buf_t *abuf;
  - dmu_buf_t *bonus;
  - zio_cksum_t cksum_orig;
  - zio_cksum_t *cksump;
- Fix format '%llx' format specifier warning
- Align message in zstreamdump safe_malloc() with upstream

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3611
2015-12-30 14:24:14 -08:00
Chris Williamson
23de906c72 Illumos 5745 - zfs set allows only one dataset property to be set at a time
5745 zfs set allows only one dataset property to be set at a time
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Richard PALO <richard@NetBSD.org>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Rich Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/5745
  https://github.com/illumos/illumos-gate/commit/3092556

Porting notes:
- Fix the missing braces around initializer, zfs_cmd_t zc = {"\0"};
- Remove extra format argument in zfs_do_set()
- Declare at the top:
  - zfs_prop_t prop;
  - nvpair_t *elem;
  - nvpair_t *next;
  - int i;
- Additionally initialize:
  - int added_resv = 0;
  - zfs_prop_t prop = 0;
- Assign 0 install of NULL for uint64_t types.
  - zc->zc_nvlist_conf = '\0';
  - zc->zc_nvlist_src = '\0';
  - zc->zc_nvlist_dst = '\0';

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3574
2015-12-29 16:59:26 -08:00
Brian Behlendorf
98401d2361 Fix maybe uninitialized
As of gcc 5.1.1 20150618 (Red Hat 5.1.1-4) the -Werror=maybe-uninitialized
check detects that 'snapname' in recv_incremental_replication() may not be
initialized.  Explicitly initialize the variable to resolved the warning.

  libzfs_sendrecv.c: In function ‘recv_incremental_replication’:
  libzfs_sendrecv.c:2019:2: error: ‘snapname’ may be used uninitialized in
    (void) snprintf(buf, sizeof (buf), "%s@%s", fsname, snapname);

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-11-09 12:15:19 -08:00
DHE
385f9691c4 libzfs: handle EDOM errors
EDOM may occur if a user tries to set `recordsize` too large without
use "zfs set". This can be demonstrated with:

> zpool create testpool -O recordsize=32M /dev/...

Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3911
2015-10-13 09:54:04 -07:00
Ned Bass
3af56fd95f Honor xattr=sa dataset property
ZFS incorrectly uses directory-based extended attributes even when
xattr=sa is specified as a dataset property or mount option. Support to
honor temporary mount options including "xattr" was added in commit
0282c4137e. There are two issues with the
mount option handling:

* Libzfs has historically included "xattr" in its list of default mount
  options. This overrides the dataset property, so the dataset is always
  configured to use directory-based xattrs even when the xattr dataset
  property is set to off or sa. Address this by removing "xattr" from
  the set of default mount options in libzfs.

* There was no way to enable system attribute-based extended attributes
  using temporary mount options. Add the mount options "saxattr" and
  "dirxattr" which enable the xattr behavior their names suggest.  This
  approach has the advantages of mirroring the valid xattr dataset
  property values and following existing conventions for mount option
  names.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3787
2015-09-19 14:04:14 -07:00
Brian Behlendorf
0282c4137e Add temporary mount options
Add the required kernel side infrastructure to parse arbitrary
mount options.  This enables us to support temporary mount
options in largely the same way it is handled on other platforms.

See the 'Temporary Mount Point Properties' section of zfs(8)
for complete details.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #985
Closes #3351
2015-09-03 14:14:55 -07:00
Brian Behlendorf
4cb7b9c5d4 Check large block feature flag on volumes
Since ZoL allows large blocks to be used by volumes, unlike upstream
illumos, the feature flag must be checked prior to volume creation.
This is critical because unlike filesystems, volumes will create a
object which uses large blocks as part of the create.  Therefore, it
cannot be safely checked in zfs_check_settable() after the dataset
can been created.

In addition this patch updates the relevant error messages to use
zfs_nicenum() to print the maximum blocksize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3591
2015-08-28 09:25:03 -07:00
Turbo Fredriksson
47a4a6fd5f Support parallel build trees (VPATH builds)
Build products from an out of tree build should be written
relative to the build directory.  Sources should be referred
to by their locations in the source directory.

This is accomplished by adding the 'src' and 'obj' variables
for the module Makefile.am, using relative paths to reference
source files, and by setting VPATH when source files are not
co-located with the Makefile.  This enables the following:

  $ mkdir build
  $ cd build
  $ ../configure \
    --with-spl=$HOME/src/git/spl/ \
    --with-spl-obj=$HOME/src/git/spl/build
  $ make -s

This change also has the advantage of resolving the following
warning which is generated by modern versions of automake.

  Makefile.am:00: warning: source file 'xxx' is in a subdirectory,
  Makefile.am:00: but option 'subdir-objects' is disabled

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1082
2015-07-17 13:42:51 -07:00
Manoj Joseph
93f6d7e2e5 Illumos 5764 - "zfs send -nv" directs output to stderr
5764 "zfs send -nv" directs output to stderr
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Basil Crow <basil.crow@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/dc5f28a
  https://www.illumos.org/issues/5764

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3585
2015-07-14 10:28:32 -07:00
Richard Yao
541da9935d Fix Xen Virtual Block Device detection
We fail to make partitions on xvd (Xen Virtual Block) devices. This also
causes debug builds of zpool create to return an error when given xen
virtual block devices. These devices should be given the same treatment
as vd (KVM Virtual Block) devices, so we adjust the relevant code paths.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3576
2015-07-10 12:21:14 -07:00
Will Andrews
98cb3a7655 Illumos 5813 - zfs_setprop_error(): Handle errno value E2BIG.
5813 zfs_setprop_error(): Handle errno value E2BIG.
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://github.com/illumos/illumos-gate/commit/6fdcb3d
  https://www.illumos.org/issues/5813

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3572
2015-07-10 12:13:19 -07:00
Jan Kryl
15cfbb38fd Illumos 5427 - memory leak in libzfs when doing rollback
5427 memory leak in libzfs when doing rollback
Reviewed by: Michael Tsymbalyuk <mtzaurus@gmail.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Dan McDonald <danmcd@omniti.com>

References
  https://github.com/illumos/illumos-gate/commit/b7070b7
  https://www.illumos.org/issues/5427

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3569
2015-07-10 12:09:32 -07:00
Josef 'Jeff' Sipek
02f8fe4260 Illumos 4626 - libzfs memleak in zpool_in_use()
4626 libzfs memleak in zpool_in_use()
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://github.com/illumos/illumos-gate/commit/fb13f48
  https://www.illumos.org/issues/4626

Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3563
2015-07-10 11:57:38 -07:00
Brian Behlendorf
65037d9b25 Add libzfs_error_init() function
All fprintf() error messages are moved out of the libzfs_init()
library function where they never belonged in the first place.  A
libzfs_error_init() function is added to provide useful error
messages for the most common causes of failure.

Additionally, in libzfs_run_process() the 'rc' variable was renamed
to 'error' for consistency with the rest of the code base.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
2015-05-22 13:34:58 -07:00
Brian Behlendorf
87abfcba22 Wait in libzfs_init() for the /dev/zfs device
While module loading itself is synchronous the creation of the /dev/zfs
device is not.  This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs.  This small window between module loading and device creation
can result in spurious failures of libzfs_init().

This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created.  This allows scripts to reliably use the following
shell construct without the need for additional error handling.

$ /sbin/modprobe zfs && /sbin/zpool import -a

To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed.  The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently.  If it
takes longer than this it will fall back to polling for up to 10 seconds.

This behavior can be customized to some degree by setting the following
new environment variables.  This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior.  By default module auto-loading is now disabled.

* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>"     - Seconds to wait for /dev/zfs

The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #2556
2015-05-22 13:31:58 -07:00
Hajo Möller
141b6381d3 Change 3-digit octal escapes to 4-digit ones
Prefixing an octal value with a leading zero is the standard way
to disambiguate it.  This change only impacts the `zfs diff` output
and is therefore very limited in scope.

Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3417
2015-05-18 17:14:07 -07:00
Alexander Eremin
7fec46b9d8 Illumos 5847 - libzfs_diff should check zfs_prop_get() return
5847 libzfs_diff should check zfs_prop_get() return
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Albert Lee <trisk@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5847
  https://github.com/illumos/illumos-gate/commit/8430278

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3412
2015-05-15 11:17:14 -07:00
Matthew Ahrens
f1512ee61e Illumos 5027 - zfs large block support
5027 zfs large block support
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5027
  https://github.com/illumos/illumos-gate/commit/b515258

Porting Notes:

* Included in this patch is a tiny ISP2() cleanup in zio_init() from
Illumos 5255.

* Unlike the upstream Illumos commit this patch does not impose an
arbitrary 128K block size limit on volumes.  Volumes, like filesystems,
are limited by the zfs_max_recordsize=1M module option.

* By default the maximum record size is limited to 1M by the module
option zfs_max_recordsize.  This value may be safely increased up to
16M which is the largest block size supported by the on-disk format.
At the moment, 1M blocks clearly offer a significant performance
improvement but the benefits of going beyond this for the majority
of workloads are less clear.

* The illumos version of this patch increased DMU_MAX_ACCESS to 32M.
This was determined not to be large enough when using 16M blocks
because the zfs_make_xattrdir() function will fail (EFBIG) when
assigning a TX.  This was immediately observed under Linux because
all newly created files must have a security xattr created and
that was failing.  Therefore, we've set DMU_MAX_ACCESS to 64M.

* On 32-bit platforms a hard limit of 1M is set for blocks due
to the limited virtual address space.  We should be able to relax
this one the ABD patches are merged.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #354
2015-05-11 12:23:16 -07:00
Brian Behlendorf
3df293404a Fix type mismatch on 32-bit systems
The umem_alloc_aligned() function should not assume that a 'void *'
type is 64-bit.  It will not be on 32-bit platforms.  Rather than
complicating the ASSERT to handle this it is simply removed.

Additionally, the '%lu' format specifier should not be assumed to
imply a 64-bit value.  Fix this by using the 'llu' format specifier
which will always be atleast 64-bit and explicitly casing the
variable to an u_longlong_t.  This issue is handled the same way
in many other parts of the code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-05-11 12:15:41 -07:00
Jerry Jelinek
788eb90c4c Illumos 3897 - zfs filesystem and snapshot limits
3897 zfs filesystem and snapshot limits
Author: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3897
  https://github.com/illumos/illumos-gate/commit/a2afb61

Porting Notes:

dsl_dataset_snapshot_check(): reduce stack usage using kmem_alloc().

Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2015-04-28 16:22:51 -07:00
Tim Chase
59199d9083 Support the "version" property on volumes via the zfs_prop_get_int() API
As of this commit, volumes do not possess the version property in
any existing OpenZFS implementation.  The zpool upgrade code, however,
uses zfs_prop_get_int() to fetch the version property of all children in
a pool.  The semantics of the function, however, demand that it only be
used for known valid properties so it returns a garbage value for volumes.

This patch causes the version of a volume to appear to callers using
the zfs_prop_get_int() API to be that of the default ZPL version for
the implementation.  In the future, should volumes gain the property,
its actual value will be used.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes #3313
2015-04-24 11:08:37 -07:00
Ned Bass
9540be9b23 zpool import should honor overlay property
Make the 'zpool import' command honor the overlay property to allow
filesystems to be mounted on a non-empty directory. As it stands now
this property is only checked by the 'zfs mount' command.  Move the
check into 'zfs_mount()` in libzpool so the property is honored for all
callers.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3227
2015-03-27 14:46:58 -07:00
Brian Behlendorf
7d90f569b3 Check all vdev labels in 'zpool import'
When using 'zpool import' to scan for available pools prefer vdev names
which reference vdevs with more valid labels.  There should be two labels
at the start of the device and two labels at the end of the device.  If
labels are missing then the device has been damaged or is in some other
way incomplete.  Preferring names with fully intact labels helps weed out
bad paths and improves the likelihood of being able to import the pool.

This behavior only applies when scanning /dev/ for valid pools.  If a
cache file exists the pools described by the cache file will be used.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes #3145
Closes #2844
Closes #3107
2015-03-25 14:52:52 -07:00
Richard Yao
5c3f61eb49 Increase Linux pipe buffer size on 'zfs receive'
I noticed when reviewing documentation that it is possible for user
space to use fctnl(fd, F_SETPIPE_SZ, (unsigned long) size) to change
the kernel pipe buffer size on Linux to increase the pipe size up to
the value specified in /proc/sys/fs/pipe-max-size. There are users using
mbuffer to improve zfs recv performance when piping over the network, so
it seems advantageous to integrate such functionality directly into the
zfs recv tool. This avoids the addition of two buffers and two copies
(one for the buffer mbuffer adds and another for the additional pipe),
so it should be more efficient.

This could have been made configurable and/or this could have changed
the value back to the original after we were done with the file
descriptor, but I do not see a strong case for doing either, so I
went with a simple implementation.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1161
2015-03-20 10:03:34 -07:00
Christer Ekholm
8bdcfb5396 Fix possible future overflow in zfs_nicenum
The function zfs_nicenum that converts number to human-readable output
uses a index to a string of letters. This patch limits the index to
the length of the string.

Signed-off-by: Christer Ekholm <che@chrekh.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3122
2015-02-24 11:39:10 -08:00
Tim Chase
e890dd85a7 Produce a full snapshot list for zfs send -p
In order to accelerate zfs receive operations in the face of many
property-containing snapshots, commit 0574855 changed the header nvlist
("fss") of a send stream to exclude snapshots which aren't part of the
stream.  This, however, would cause zfs receive -F to erroneously remove
snapshots; it would remove any snapshot which wasn't listed in the header
nvlist.

This patch restores the full list of snapshots in fss[<id>[snaps]] but
still suppresses the properties of non-sent snapshots and also removes a
consistency check in which an error is raised if a listed snapshot does
not have any properties in fss[<id>[snapprops]].

The 0574855 commit also introduced a bug in which zfs send -p of a
complete stream (zfs send -p pool/fs@snap) would exclude the snapshot
properties in fss[<id>[snapprops]].  This patch detects the last snapshot
in a series when no "from" snapshot has been specified and includes its
properties.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2907
2015-02-09 16:43:17 -08:00
Chunwei Chen
53698a453d Read spl_hostid module parameter before gethostid()
If spl_hostid is set via module parameter, it's likely different from
gethostid(). Therefore, the userspace tool should read it first before
falling back to gethostid().

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #3034
2015-02-04 16:44:53 -08:00
Brian Behlendorf
6466b61db6 Make zpool import -d|-c behave consistently
When importing pools with zpool import -aN there is inconsistent
behavior between '-d /dev/disk/by-id' (or another path) and
'-c /etc/zfs/zpool.cache'.

The difference in behavior is caused by zpool_find_import_cached()
returning an empty nvlist_t when there are no pools to import but
zpool_find_import_impl() returns NULL for the same situation. The
behavior of zpool_find_import_cached() is arguably more correct
because it allows returning NULL to be used for an error case and
not an empty set.

This change resolves the issue by updating get_configs() such that
it returns an empty set instead of NULL when no config is found.
The updated behavior will now always return 0 for this case.

$ zpool import -aN; echo $?
no pools available to import
0

$ zpool import -aN -d /var/tmp/; echo $?
no pools available to import
0

$ zpool import -aN -c /etc/zfs/zpool.cache; echo $?
no pools available to import
0

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2080
2015-01-28 11:12:31 -08:00
Andriy Gapon
057485504e zfs send -p send properties only for snapshots that are actually sent
... as opposed to sending properties of all snapshots of the relevant
filesystem.  The previous behavior results in properties being set on
all snapshots on the receiving side, which is quite slow.

Behavior of zfs send -R is not changed.

References:
  http://thread.gmane.org/gmane.comp.file-systems.openzfs.devel/346

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2729
Issue #2210
2014-10-02 16:52:02 -07:00
smh
7509a3d299 FreeBSD PR kern/172259: Fixes zfs receive errors
FreeBSD PR kern/172259: Fixes zfs receive errors caused by snapshot
replication being processed in a random order instead of creation
order.

Eliminates needless filesystem renames caused by removed parent
snapshots which subsequently causes many more errors.

PR:		kern/172259
Submitted by:	Steven Hartland
Reviewed by:	pjd (mentor)
Approved by:	pjd (mentor)
MFC after:	2 weeks

References:
  https://github.com/freebsd/freebsd/commit/4995789

Porting notes:

Minor whitespace fixes were made to conform with style requirements:

lib/libzfs/libzfs_sendrecv.c: 2269: indent by spaces instead of tabs
lib/libzfs/libzfs_sendrecv.c: 2270: indent by spaces instead of tabs

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2729
2014-10-02 16:48:19 -07:00
Richard Yao
83e9986f6e Implement -t option to zpool create for temporary pool names
Creating virtual machines that have their rootfs on ZFS on hosts that
have their rootfs on ZFS causes SPA namespace collisions when the
standard name rpool is used. The solution is either to give each guest
pool a name unique to the host, which is not always desireable, or boot
a VM environment containing an ISO image to install it, which is
cumbersome.

26b42f3f9d introduced `zpool import -t
...` to simplify situations where a host must access a guest's pool when
there is a SPA namespace conflict. We build upon that to introduce
`zpool import -t tname ...`. That allows us to create a pool whose
in-core name is tname, but whose on-disk name is the normal name
specified.

This simplifies the creation of machine images that use a rootfs on ZFS.
That benefits not only real world deployments, but also ZFSOnLinux
development by decreasing the time needed to perform rootfs on ZFS
experiments.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2417
2014-09-30 10:46:59 -07:00
Matthew Ahrens
1f6f97f304 Illumos 5116 - zpool history -i goes into infinite loop
5116 zpool history -i goes into infinite loop
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Boris Protopopov <boris.protopopov@me.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5116
  https://github.com/illumos/illumos-gate/commit/3339867

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2715
2014-09-23 11:58:05 -07:00
Matthew Ahrens
ab2894e66f Illumos 5135 - zpool_find_import_cached() can use fnvlist_*
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5135
  https://github.com/illumos/illumos-gate/commit/b18d6b0

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2693
2014-09-23 11:44:29 -07:00
Richard Yao
928ee9fe18 Properly NULL terminate string in zfs_strcmp_pathname
The utility cppcheck caught this.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2330
2014-09-23 10:32:21 -07:00
George Wilson
a05dfd0028 Illumos 5147 - zpool list -v should show individual disk capacity
The 'zpool list -v' command displays lots of info but excludes the
capacity of each disk. This should be added.

5147 zpool list -v should show individual disk capacity
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Matthew Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5147
  https://github.com/illumos/illumos-gate/commit/7a09f97

Ported by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2688
2014-09-23 10:10:42 -07:00
ilovezfs
1ca56e6033 Fragmentation should display as '-' if spacemap_histogram=disabled
When com.delphix:spacemap_histogram is disabled, the value of
fragmentation was printing as 18446744073709551615 (UINT64_MAX),
when it should print as '-'.

The issue was caused by a small mistake during the merge of
"4980 metaslabs should have a fragmentation metric."

upstream: https://github.com/illumos/illumos-gate/commit/2e4c998
ZoL: https://github.com/zfsonlinux/zfs/commit/f3a7f66

The problem is in zpool_get_prop_literal, where the handling of the
pool property ZPOOL_PROP_FRAGMENTATION was added to wrong the
section. In particular, ZPOOL_PROP_FRAGMENTATION should not be in
the section where zpool_get_state(zhp) == POOL_STATE_UNAVAIL, but
lower down after it's already been determined that the pool is in
fact available, which is where upstream illumos correctly has had
it.

Thanks to lundman for helping to track down this bug.

Signed-off-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2664
2014-09-05 09:19:01 -07:00
Turbo Fredriksson
c3f8dc2a48 Add a pkgconfig file
Providing a pkg-config file makes is easy for 3rd party applications
to link against the libzfs libraries.  It also allows the libzfs
developers to modify the list of required libraries and cflags
without breaking existing applications.

The following example illustrates how pkg-config can be used:

cc `pkg-config --cflags --libs libzfs` -o myapp myapp.c

/*
 * myapp.c
 */
void main()
{
	libzfs_handle_t *hdl;

	hdl = libzfs_init();
	if (hdl)
		libzfs_fini(hdl);
}

Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #585
2014-08-28 07:59:43 -07:00
George Wilson
f3a7f6610f Illumos 4976-4984 - metaslab improvements
4976 zfs should only avoid writing to a failing non-redundant top-level vdev
4978 ztest fails in get_metaslab_refcount()
4979 extend free space histogram to device and pool
4980 metaslabs should have a fragmentation metric
4981 remove fragmented ops vector from block allocator
4982 space_map object should proactively upgrade when feature is enabled
4983 need to collect metaslab information via mdb
4984 device selection should use fragmentation metric
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4976
  https://www.illumos.org/issues/4978
  https://www.illumos.org/issues/4979
  https://www.illumos.org/issues/4980
  https://www.illumos.org/issues/4981
  https://www.illumos.org/issues/4982
  https://www.illumos.org/issues/4983
  https://www.illumos.org/issues/4984
  https://github.com/illumos/illumos-gate/commit/2e4c998

Notes:
    The "zdb -M" option has been re-tasked to display the new metaslab
    fragmentation metric and the new "zdb -I" option is used to control
    the maximum number of in-flight I/Os.

    The new fragmentation metric is derived from the space map histogram
    which has been rolled up to the vdev and pool level and is presented
    to the user via "zpool list".

    Add a number of module parameters related to the new metaslab weighting
    logic.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2595
2014-08-18 08:40:49 -07:00
Matthew Ahrens
5dbd68a352 Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t
4914 zfs on-disk bookmark structure should be named *_phys_t

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/4914
  https://github.com/illumos/illumos-gate/commit/7802d7b

Porting notes:

There were a number of zfsonlinux-specific uses of zbookmark_t which
needed to be updated.  This should reduce the likelihood of further
problems like issue #2094 from occurring.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2558
2014-08-06 14:48:41 -07:00
Matthew Ahrens
fbeddd60b7 Illumos 4390 - I/O errors can corrupt space map when deleting fs/vol
4390 i/o errors when deleting filesystem/zvol can lead to space map corruption
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/4390
  https://github.com/illumos/illumos-gate/commit/7fd05ac

Porting notes:

Previous stack-reduction efforts in traverse_visitb() caused a fair
number of un-mergable pieces of code.  This patch should reduce its
stack footprint a bit more.

The new local bptree_entry_phys_t in bptree_add() is dynamically-allocated
using kmem_zalloc() for the purpose of stack reduction.

The new global zfs_free_leak_on_eio has been defined as an integer
rather than a boolean_t as was the case with the related zfs_recover
global.  Also, zfs_free_leak_on_eio's definition has been inserted into
zfs_debug.c for consistency with the existing definition of zfs_recover.
Illumos placed it in spa_misc.c.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2545
2014-08-04 11:50:52 -07:00
Matthew Ahrens
9b67f60560 Illumos 4757, 4913
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/4757
  https://www.illumos.org/issues/4913
  https://github.com/illumos/illumos-gate/commit/5d7b4d4

Porting notes:

For compatibility with the fastpath code the zio_done() function
needed to be updated.  Because embedded-data block pointers do
not require DVAs to be allocated the associated vdevs will not
be marked and therefore should not be unmarked.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2544
2014-08-01 14:28:05 -07:00
Matthew Ahrens
da536844d5 Illumos 4368, 4369.
4369 implement zfs bookmarks
4368 zfs send filesystems from readonly pools
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/4369
  https://www.illumos.org/issues/4368
  https://github.com/illumos/illumos-gate/commit/78f1710

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2530
2014-07-29 10:55:29 -07:00
Matthew Ahrens
fa86b5dbb6 Illumos 4171, 4172
4171 clean up spa_feature_*() interfaces
4172 implement extensible_dataset feature for use by other zpool features

Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Garrett D'Amore <garrett@damore.org>a

References:
  https://www.illumos.org/issues/4171
  https://www.illumos.org/issues/4172
  https://github.com/illumos/illumos-gate/commit/2acef22

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2528
2014-07-25 16:40:07 -07:00
Tim Chase
09c0b8fe5e Return default value on numeric properties failing the "head check.
Updates 962d524212.

The referenced fix to get_numeric_property() caused numeric property
lookups to consider the type of the parent (head) dataset when checking
validity but there are some cases in the caller expects to see the
property's default value even when the lookup is invalid.

One case in which this is true is change_one() which is part of the
renaming infrastructure.  It may look up "zoned" on a snapshot of a volume
which is not valid but it expects to see the default value of false.

There may be other, yet unidentified cases in which zfs_prop_get_int()
is used on technically invalid properties but which expect the property's
default value to be returned.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Turbo Fredriksson <turbo@bayour.com>
Closes #2320
2014-07-01 14:14:31 -07:00
Brian Behlendorf
d4aae2a054 Improve differing sector size error
When adding or replacing a vdev with a different sector size the
error message should be more useful.  In addition to describing
the problem provide a hint that the '-o ashift' option can be
used to override the optimal default value.

Since using a non-optimal value may incur a significant performance
penalty we should issue this error.  But there a numerous reasons
why a administrator may wish to do this anyway.

Signed-off-by: Niklas Edmundsson <ZNikke@github>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2421
2014-06-27 11:44:36 -07:00
Richard Yao
4def05f8a6 Fix memory leak in zpool_clear_label()
Clang's static analyzer reported a memory leak in zpool_clear_label().
Upon review, it turns out to be right. This should be a very short lived
leak because no daemons use this functionality, but that does not
preclude the possibility of third party daemons that do use it. Lets fix
it to be a good Samaritan.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2330
2014-05-30 17:00:37 -07:00
Tim Chase
962d524212 Check the dataset type more rigorously when fetching properties.
When fetching property values of snapshots, a check against the head
dataset type must be performed.  Previously, this additional check was
performed only when fetching "version", "normalize", "utf8only" or "case".

This caused the ZPL properties "acltype", "exec", "devices", "nbmand",
"setuid" and "xattr" to be erroneously displayed with meaningless values
for snapshots of volumes.  It also did not allow for the display of
"volsize" of a snapshot of a volume.

This patch adds the headcheck flag paramater to zfs_prop_valid_for_type()
and zprop_valid_for_type() to indicate the check is being done
against a head dataset's type in order that properties valid only for
snapshots are handled correctly.  This allows the the head check in
get_numeric_property() to be performed when fetching a property for
a snapshot.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2265
2014-05-06 10:41:46 -07:00
ilovezfs
78597769b4 Fill in mountpoint buffer before using it in errors
zfs_is_mountable() fills in the mountpoint buffer, so, as in
upstream, it needs to have been called before the mountpoint
buffer can be used in error messages.

In particular,

	return (zfs_error_fmt(hdl, EZFS_MOUNTFAILED,
	    dgettext(TEXT_DOMAIN, "cannot mount '%s'"),
	    mountpoint));

should not come before the call to zfs_is_mountable().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ilovezfs <ilovezfs@icloud.com>
Closes #2284
2014-04-30 15:52:01 -07:00
Tim Chase
b066274a77 Report atime and relatime as the property's actual value.
Neither atime nor relatime should be considered to be "temporary mount
point properties".  Their semantics are enforced completely within ZFS
and also they're (correctly) not documented as being temporary mount
point properties.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2257
2014-04-16 11:57:17 -07:00
John M. Layman
cbca6076b3 Fix for re-reading /etc/mtab.
This is a continuation of fb5c53ea65:

    When /etc/mtab is updated on Linux it's done atomically with
    rename(2).  A new mtab is written, the existing mtab is unlinked,
    and the new mtab is renamed to /etc/mtab.  This means that we
    must close the old file and open the new file to get the updated
    contents.  Using rewind(3) will just move the file pointer back
    to the start of the file, freopen(3) will close and open the file.

In this commit, a few more rewind(3) calls were replaced with freopen(3)
to allow updated mtab entries to be picked up immediately.

Signed-off-by: John M. Layman <jml@frijid.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2215
Issue #1611
2014-04-04 09:46:20 -07:00
Brian Behlendorf
1a5c611a22 Make command line guid parsing more tolerant
Several of the zfs utilities allow you to pass a vdev's guid rather
than the device name.  However, the utilities are not consistent in
how they parse that guid.  For example, 'zinject' expects the guid
to be passed as a hex value while 'zpool replace' wants it as a
decimal.  The user is forced to just know what format to use.

This patch improve things by making the parsing more tolerant.
When strtol(3) is called using 0 for the base, rather than say
10 or 16, it will then accept hex, decimal, or octal input based
on the prefix.  From the man page.

    If base is zero or 16, the string may then include a "0x"
    prefix, and  the number  will  be read in base 16; otherwise,
    a zero base is taken as 10 (decimal) unless the next character
    is '0', in which case it  is  taken as 8 (octal).

NOTE: There may be additional conversions not caught be this patch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Issue #2
2014-04-02 13:10:08 -07:00
Chris Dunlap
8c7aa0cfc4 Replace zpool_events_next() "block" parm w/ "flags"
zpool_events_next() can be called in blocking mode by specifying a
non-zero value for the "block" parameter.  However, the design of
the ZFS Event Daemon (zed) requires additional functionality from
zpool_events_next().  Instead of adding additional arguments to the
function, it makes more sense to use flags that can be bitwise-or'd
together.

This commit replaces the zpool_events_next() int "block" parameter with
an unsigned bitwise "flags" parameter.  It also defines ZEVENT_NONE
to specify the default behavior.  Since non-blocking mode can be
specified with the existing ZEVENT_NONBLOCK flag, the default behavior
becomes blocking mode.  This, in effect, inverts the previous use
of the "block" parameter.  Existing callers of zpool_events_next()
have been modified to check for the ZEVENT_NONBLOCK flag.

Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
2014-03-31 16:11:21 -07:00
Brian Behlendorf
9b101a7320 Clarify zpool_events_next() comment
Due to the very poorly chosen argument name 'cleanup_fd' it was
completely unclear that this file descriptor is used to track the
current cursor location.  When the file descriptor is created by
opening ZFS_DEV a private cursor is created in the kernel for the
returned file descriptor.  Subsequent calls to zpool_events_next()
and zpool_events_seek() then require the file descriptor as an
argument to reposition the cursor.  When the file descriptor is
closed the kernel state tracking the cursor is destroyed.

This patch contains no functional change, it just changes a
few variable names and clarifies the documentation.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Issue #2
2014-03-31 16:11:08 -07:00
Brian Behlendorf
75e3ff58fe Add zpool_events_seek() functionality
The ZFS_IOC_EVENTS_SEEK ioctl was added to allow user space callers
to seek around the zevent file descriptor by EID.  When a specific
EID is passed and it exists the cursor will be positioned there.
If the EID is no longer cached by the kernel ENOENT is returned.
The caller may also pass ZEVENT_SEEK_START or ZEVENT_SEEK_END to seek
to those respective locations.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Issue #2
2014-03-31 16:10:57 -07:00
Gunnar Beutner
4d8c78c844 Remount datasets for "zfs inherit".
Changing properties with "zfs inherit" should cause the datasets
to be remounted.  This ensures that the modified property values
will be propagated in to the filesystem namespace where they can
be enforced.  This change is modeled after an identical fix made
to zfs_prop_set().

Signed-off-by: Gunnar Beutner <gunnar@beutner.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2201
2014-03-24 11:11:15 -07:00
Brian Behlendorf
ffe9d38275 Add generic errata infrastructure
From time to time it may be necessary to inform the pool administrator
about an errata which impacts their pool.  These errata will by shown
to the administrator through the 'zpool status' and 'zpool import'
output as appropriate.  The errata must clearly describe the issue
detected, how the pool is impacted, and what action should be taken
to resolve the situation.  Additional information for each errata will
be provided at http://zfsonlinux.org/msg/ZFS-8000-ER.

To accomplish the above this patch adds the required infrastructure to
allow the kernel modules to notify the utilities that an errata has
been detected.  This is done through the ZPOOL_CONFIG_ERRATA uint64_t
which has been added to the pool configuration nvlist.

To add a new errata the following changes must be made:

* A new errata identifier must be assigned by adding a new enum value
  to the zpool_errata_t type.  New enums must be added to the end to
  preserve the existing ordering.

* Code must be added to detect the issue.  This does not strictly
  need to be done at pool import time but doing so will make the
  errata visible in 'zpool import' as well as 'zpool status'.  Once
  detected the spa->spa_errata member should be set to the new enum.

* If possible code should be added to clear the spa->spa_errata member
  once the errata has been resolved.

* The show_import() and status_callback() functions must be updated
  to include an informational message describing the errata.  This
  should include an action message describing what an administrator
  should do to address the errata.

* The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be
  updated to describe the errata.  This space can be used to provide
  as much additional information as needed to fully describe the errata.
  A link to this documentation will be automatically generated in the
  output of 'zpool import' and 'zpool status'.

Original-idea-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue #2094
2014-02-21 12:10:40 -08:00
Tim Chase
6d111134c0 Implement relatime.
Add the "relatime" property.  When set to "on", a file's atime will only
be updated if the existing atime at least a day old or if the existing
ctime or mtime has been updated since the last access.  This behavior
is compatible with the Linux "relatime" mount option.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2064
Closes #1917
2014-01-29 15:50:44 -08:00
Brian Behlendorf
741304503a Prevent duplicate mnttab cache entries
Under Linux its possible to mount the same filesystem multiple
times in the namespace.  This can be done either with bind mounts
or simply with multiple mount points.  Unfortunately, the mnttab
cache code is implemented using an AVL tree which does not support
duplicate entries.  To avoid this issue this patch updates the
code to check for a duplicate entry before adding a new one.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Michael Martin <mgmartin.mgm@gmail.com>
Closes #2041
2014-01-14 10:27:12 -08:00
Michael Kjorling
d1d7e2689d cstyle: Resolve C style issues
The vast majority of these changes are in Linux specific code.
They are the result of not having an automated style checker to
validate the code when it was originally written.  Others were
caused when the common code was slightly adjusted for Linux.

This patch contains no functional changes.  It only refreshes
the code to conform to style guide.

Everyone submitting patches for inclusion upstream should now
run 'make checkstyle' and resolve any warning prior to opening
a pull request.  The automated builders have been updated to
fail a build if when 'make checkstyle' detects an issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1821
2013-12-18 16:46:35 -08:00
Brian Behlendorf
ba6a24026c Remove ZFC_IOC_*_MINOR ioctl()s
Early versions of ZFS coordinated the creation and destruction
of device minors from userspace.  This was inherently racy and
in late 2009 these ioctl()s were removed leaving everything up
to the kernel.  This significantly simplified the code.

However, we never picked up these changes in ZoL since we'd
already significantly adjusted this code for Linux.  This patch
aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s
and moving all the functionality down in to the kernel.  Since
this cleanup will change the kernel/user ABI it's being done
in the same tag as the previous libzfs_core ABI changes.  This
will minimize, but not eliminate, the disruption to end users.

Once merged ZoL, Illumos, and FreeBSD will basically be back
in sync in regards to handling ZVOLs in the common code.  While
each platform must have its own custom zvol.c implemenation the
interfaces provided are consistent.

NOTES:

1) This patch introduces one subtle change in behavior which
   could not be easily avoided.  Prior to this change callers
   of 'zfs create -V ...' were guaranteed that upon exit the
   /dev/zvol/ block device link would be created or an error
   returned.  That's no longer the case.  The utilities will no
   longer block waiting for the symlink to be created.  Callers
   are now responsible for blocking, this is why a 'udev_wait'
   call was added to the 'label' function in scripts/common.sh.

2) The read-only behavior of a ZVOL now solely depends on if
   the ZVOL_RDONLY bit is set in zv->zv_flags.  The redundant
   policy setting in the gendisk structure was removed.  This
   both simplifies the code and allows us to safely leverage
   set_disk_ro() to issue a KOBJ_CHANGE uevent.  See the
   comment in the code for futher details on this.

3) Because __zvol_create_minor() and zvol_alloc() may now be
   called in a sync task they must use KM_PUSHPAGE.

References:
  illumos/illumos-gate@681d9761e8

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes #1969
2013-12-16 09:15:57 -08:00
Yuri Pankov
54d5378fae Illumos #2583
2583 Add -p (parsable) option to zfs list

References:
  https://www.illumos.org/issues/2583
  illumos/illumos-gate@43d68d68c1

Ported-by: Gregor Kopka <gregor@kopka.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #937
2013-11-21 11:13:53 -08:00
Brian Behlendorf
64ad2b26e2 Remove the slog restriction on bootfs pools
Under Linux this restriction does not apply because we have access
to all the required devices.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1631
2013-11-14 14:28:35 -08:00
Tim Chase
fd4f76160c Handle concurrent snapshot automounts failing due to EBUSY.
In the current snapshot automount implementation, it is possible for
multiple mounts to attempted concurrently.  Only one of the mounts will
succeed and the other will fail.  The failed mounts will cause an EREMOTE
to be propagated back to the application.

This commit works around the problem by adding a new exit status,
MOUNT_BUSY to the mount.zfs program which is used when the underlying
mount(2) call returns EBUSY.  The zfs code detects this condition and
treats it as if the mount had succeeded.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1819
2013-11-08 10:45:14 -08:00
Marcel Telka
8ce0af07bb Illumos #4061
4061 libzfs: memory leak in iter_dependents_cb()
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
Reviewed by: Boris Protopopov <boris.protopopov@nexenta.com>
Reviewed by: Andy Stormont <andyjstormont@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/4061
  illumos/illumos-gate@2fbdf8dbf0

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-05 12:24:10 -08:00
Matthew Ahrens
46ba1e59d3 Illumos #3996
3996 want a libzfs_core API to rollback to latest snapshot
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andy Stormont <andyjstormont@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/3996
  illumos/illumos-gate@a7027df17f

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-05 12:23:11 -08:00
Steven Hartland
6389d42205 Illumos #3909
3909 "zfs send -D" does not work
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3909
  illumos/illumos-gate@36f7455d36

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-05 12:15:11 -08:00
Keith M Wesolowski
96c2e96193 Illumos #3894
3894 zfs should not allow snapshot of inconsistent dataset
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References:
  https://www.illumos.org/issues/3894
  illumos/illumos-gate@ca48f36f20

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 11:18:14 -08:00
Matthew Ahrens
1a077756e8 Illumos #3829
3829 fix for 3740 changed behavior of zfs destroy/hold/release ioctl
Reviewed by: Matt Amdur <matt.amdur@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/3829
  illumos/illumos-gate@bb6e70758d

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 11:18:14 -08:00
Steven Hartland
34ffbed88c Illumos #3818
3818 zpool status -x should report pools with removed l2arc devices
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3818
  illumos/illumos-gate@7f2416ef64

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 11:18:14 -08:00
Steven Hartland
95fd54a1c5 Illumos #3740
3740 Poor ZFS send / receive performance due to snapshot
     hold / release processing
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3740
  illumos/illumos-gate@a7a845e4bf

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775

Porting notes:

1. 13fe019870 introduced a merge conflict
   in dsl_dataset_user_release_tmp where some variables were moved
   outside of the preprocessor directive.

2. dea9dfefdd747534b3846845629d2200f0616dad made the previous merge
   conflict worse by switching KM_SLEEP to KM_PUSHPAGE. This is notable
   because this commit refactors the code, adding a new KM_SLEEP
   allocation. It is not clear to me whether this should be converted
   to KM_PUSHPAGE.

3. We had a merge conflict in libzfs_sendrecv.c because of copyright
   notices.

4. Several small C99 compatibility fixed were made.
2013-11-04 11:17:48 -08:00
Will Andrews
7bc7f25040 Illumos #3745, #3811
3745 zpool create should treat -O mountpoint and -m the same
3811 zpool create -o altroot=/xyz -O mountpoint=/mnt ignores
     the mountpoint option
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3745
  https://www.illumos.org/issues/3811
  illumos/illumos-gate@8b71377531

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 10:55:25 -08:00
Will Andrews
e49f1e20a0 Illumos #3741
3741 zfs needs better comments
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>

References:
  https://www.illumos.org/issues/3741
  illumos/illumos-gate@3e30c24aee

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 10:55:25 -08:00
Martin Matuska
b1118acbb1 Illumos #3699, #3739
3699 zfs hold or release of a non-existent snapshot does not output error
3739 cannot set zfs quota or reservation on pool version < 22
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Eric Shrock <eric.schrock@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/3699
  https://www.illumos.org/issues/3739
  illumos/illumos-gate@013023d4ed

Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-11-04 10:55:25 -08:00
Ralf Ertzinger
8b921f667a Introduce zpool_get_prop_literal interface
This change introduces zpool_get_prop_literal. It's an expanded version
of zpool_get_prop taking one additional boolean parameter. With this
parameter set to B_FALSE it will behave identically to zpool_get_prop.
Setting it to B_TRUE will return full precision numbers for the
following properties:

ZPOOL_PROP_SIZE
ZPOOL_PROP_ALLOCATED
ZPOOL_PROP_FREE
ZPOOL_PROP_FREEING
ZPOOL_PROP_EXPANDSZ
ZPOOL_PROP_ASHIFT

Also introduced is a wrapper function for zpool_get_prop making it
use zpool_get_prop_literal in the background.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1813
2013-10-28 14:27:53 -07:00
Brian Behlendorf
11cb9d773f Increase default udev wait time
When creating a new pool, or adding/replacing a disk in an existing
pool, partition tables will be automatically created on the devices.
Under normal circumstances it will take less than a second for udev
to create the expected device files under /dev/.  However, it has
been observed that if the system is doing heavy IO concurrently udev
may take far longer.  If you also throw in some cheap dodgy hardware
it may take even longer.

To prevent zpool commands from failing due to this the default wait
time for udev is being increased to 30 seconds.  This will have no
impact on normal usage, the increase timeout should only be noticed
if your udev rules are incorrectly configured.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1646
2013-10-22 10:25:51 -07:00
Richard Yao
a6ce1eae54 Fix libzfs_core changes to follow GNU libtool guidelines
The GNU libtool documentation states to start with a version of 0:0:0,
rather than 1:1:0. Illumos uses the name libzfs_core.so.1, so to be
consistent, we should go with 1:0:0.

http://www.gnu.org/software/libtool/manual/libtool.html#Updating-version-info

The GNU libtool documentation also provides guidence on how the version
information should be incremented. Doing this does a SONAME bump of the
libzfs and libzpool libraries. This is particularly important on Gentoo
because a SONAME bump enables portage to retain the older libraries
until any packages that link to them are rebuilt. The main example of
this is GRUB2's grub2-mkconfig, which will break unless it is rebuilt
against the new libraries.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1751
2013-10-10 16:56:51 -07:00
Richard Yao
31fc19399e Generate libraries with correct DT_NEEDED entries
Libraries that depend on other libraries should list them in ELF's
DT_NEEDED field so that programs linking to them do not need to specify
those libraries unless they depend on them as well. This is not the case
in the current code and the consequence is that anything that needs a
library must know its dependencies. This is fragile and caused GRUB2's
configure script to break when a dependency was added on libblkid in
libzfs.

This resolves that problem by using LIBADD/LDADD to specify libraries in
Makefile.am instead of LDFLAGS. This ensures that proper DT_NEEDED
entries are generated and prevents GRUB2's configure script from
breaking in the presence of a libblkid dependency. This also removes
unneeded dependencies from various files.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1751
2013-10-10 16:56:51 -07:00
Richard Yao
1db7b9be75 Fix libblkid support
libblkid support is dormant because the autotools check is broken and
liblkid identifies ZFS vdevs as "zfs_member", not "zfs". We fix that
with a few changes:

First, we fix the libblkid autotools check to do a few things:

1. Make a 64MB file, which is the minimum size ZFS permits.
2. Make 4 fake uberblock entries to make libblkid's check succeed.
3. Return 0 upon success to make autotools use the success case.
4. Include stdlib.h to avoid implicit declration of free().
5. Check for "zfs_member", not "zfs"
6. Make --with-blkid disable autotools check (avoids Gentoo sandbox violation)
7. Pass '-lblkid' correctly using LIBS not LDFLAGS.

Second, we change the libblkid support to scan for "zfs_member", not
"zfs".

This makes --with-blkid work on Gentoo.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1751
2013-10-10 16:56:51 -07:00
Matthew Ahrens
13fe019870 Illumos #3464
3464 zfs synctask code needs restructuring
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/3464
  illumos/illumos-gate@3b2aab1880

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1495
2013-09-04 16:01:24 -07:00
Matthew Ahrens
6f1ffb0665 Illumos #2882, #2883, #2900
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/2882
  https://www.illumos.org/issues/2883
  https://www.illumos.org/issues/2900
  illumos/illumos-gate@4445fffbbb

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1293

Porting notes:

WARNING: This patch changes the user/kernel ABI.  That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules.  Ensure you load the matching kernel
modules from master after updating the utilities.  Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:

  $ zpool list
  failed to read pool configuration: bad address
  no pools available

  $ zfs list
  no datasets available

Add zvol minor device creation to the new zfs_snapshot_nvl function.

Remove the logging of the "release" operation in
dsl_dataset_user_release_sync().  The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos.  This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).

Squash some "may be used uninitialized" warning/erorrs.

Fix some printf format warnings for %lld and %llu.

Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.

Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
2013-09-04 15:49:00 -07:00
Turbo Fredriksson
abbfdca483 No point in rewind() mtab in zfs_unshare_proto(). We're not really
reading the file, but instead use libzfs_mnttab_find() which does
the nessesary freopen() for us.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1498
2013-08-15 10:18:31 -07:00
John Layman
fb5c53ea65 Fix for re-reading /etc/mtab in zfs_is_mounted()
When /etc/mtab is updated on Linux it's done atomically with
rename(2).  A new mtab is written, the existing mtab is unlinked,
and the new mtab is renamed to /etc/mtab.  This means that we
must close the old file and open the new file to get the updated
contents.  Using rewind(3) will just move the file pointer back
to the start of the file, freopen(3) will close and open the file.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1611
2013-08-14 11:37:06 -07:00
Yuri Pankov
105afebb15 Illumos #3098 zfs userspace/groupspace fail
3098 zfs userspace/groupspace fail without saying why when run as non-root
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/3098
  illumos/illumos-gate@70f56fa693

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1596
2013-08-14 09:28:34 -07:00
Brian Behlendorf
ff3510c1a5 Fix zpool_read_label()
The zpool_read_label() function was subtly broken due to a
difference of behavior in fstat64(2) on Solaris vs Linux.

Under Solaris when a block device is stat'ed the st_size
field will contain the size of the device in bytes.  Under
Linux this is only true for regular file and symlinks.  A
compatibility function called fstat64_blk(2) was added
which can be used when the Solaris behavior is required.

This flaw was never noticed because the only time we would
need to use the device size is when the first two labels
are damaged.  I noticed this issue while adding the
zpool_clear_label() function which is similar in design
and does require us to write all the labels.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-07-09 16:02:04 -07:00
Dmitry Khasanov
131cc95ca7 Add FreeBSD 'zpool labelclear' command
The FreeBSD implementation of zfs adds the 'zpool labelclear'
command.  Since this functionality is helpful and straight
forward to add it is being included in ZoL.

References:
  freebsd/freebsd@119a041dc9

Ported-by: Dmitry Khasanov <pik4ez@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1126
2013-07-09 15:58:05 -07:00
Dmitry Khasanov
51a3ae72d2 Readd zpool_clear_label() from OpenSolaris
This patch restores the zpool_clear_label() function from
OpenSolaris.  This was removed by commit d603ed6 because
it wasn't clear we had a use for it in ZoL.  However, this
functionality is a prerequisite for adding the 'zpool labelclear'
command from FreeBSD.

As part of bringing this change in the zpool_clear_label()
function was changed to use fstat64_blk(2) for compatibility
with Linux.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1126
2013-07-09 15:42:27 -07:00
Aaron Fineman
bbb75c1190 Add error message for missing /etc/mtab
The zpool command should not silently fail when the /etc/mtab
file does not exist.  This can occur in an initramfs environment
when the /etc/mtab file hasn't yet been generated.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1541
2013-06-27 14:43:37 -07:00
George Wilson
295304bed6 Illumos #3422, #3425
3422 zpool create/syseventd race yield non-importable pool
3425 first write to a new zvol can fail with EFBIG

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  illumos/illumos-gate@bda8819455
  https://www.illumos.org/issues/3422
  https://www.illumos.org/issues/3425

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1390
2013-04-12 09:01:36 -07:00
Eric Dillmann
0b4d1b5853 Add snapdev=[hidden|visible] dataset property
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices.  By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/.  When set to
'visible' all zvol snapshots for the dataset will be visible.

This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created.  When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions.  This leads
to a variety of issues:

1) The zvol partition tables will be read in the context of
   the `modprobe zfs` for automatically imported pools.  This
   is undesirable and should be done asynchronously, but for
   now reducing the number of visible devices helps.

2) Udev expects to be able to complete its work for a new
   block devices fairly quickly.  When many zvol devices are
   added at the same time this is no longer be true.  It can
   lead to udev timeouts and missing /dev/zvol links.

3) Simply having lots of devices in /dev/ can be aukward from
   a management standpoint.  Hidding the devices your unlikely
   to ever use helps with this.  Any snapshot device which is
   needed can be made visible by changing the snapdev property.

NOTE: This patch changes the default behavior for zvols which
      was effectively 'snapdev=visible'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1235
Closes #945
Issue #956
Issue #756
2013-03-05 12:37:54 -08:00
Brian Behlendorf
dbf763b39b Retire zpool_id infrastructure
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired.  They are superseded by vdev_id which incorporates
all the previous functionality.

Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.

  /etc/zfs/vdev_id.conf.alias.example
  /etc/zfs/vdev_id.conf.multipath.example
  /etc/zfs/vdev_id.conf.sas_direct.example
  /etc/zfs/vdev_id.conf.sas_switch.example

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #981
2013-01-29 12:23:17 -08:00
Brian Behlendorf
14ee71efbc Use strerror() not strerror_r()
The differ() function used strerror_r() instead of strerror() because
it allowed the error message to be directly copied in to a buffer.
This causes two issues under Linux.

* There are two versions of strerror_r() available an XSI-compliant
  version which returns an 'int' error code.  And a GNU-specific
  version which return a 'char *' to the resulting error string.

    int strerror_r(int errnum, char *buf, size_t buflen);   /* XSI */
    char *strerror_r(int errnum, char *buf, size_t buflen); /* GNU */

* The most recent versions of strerror_r() are annotated with the
  warn_unused_result attribute.  This causes the following warning
  since the upstream implementation casts the result to void.

    warning: ignoring return value of 'strerror_r', declared with
    attribute warn_unused_result [-Wunused-result]

The cleanest way to resolve both of these problems is just to use
strerror() and make a copy of the result in to the buffer.  This
resolves both issues and this is the only instance of strerror_r()
in the code base.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1231
2013-01-28 10:02:38 -08:00
Darik Horn
38145d6129 Ensure that zfs diff prints unicode safely.
In the stream_bytes() library function used by `zfs diff`, explicitly
cast each byte in the input string to an unsigned character so that the
Linux fprintf() correctly escapes to octal and does not mangle the output.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1172
2013-01-16 10:15:57 -08:00
Christopher Siden
b9b24bb4ca Illumos #2762: zpool command should have better support for feature flags
2762 zpool command should have better support for feature flags
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  illumos/illumos-gate@57221772c3
  https://www.illumos.org/issues/2762

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-08 10:35:43 -08:00
George Wilson
3bc7e0fb0f Illumos #3090 and #3102
3090 vdev_reopen() during reguid causes vdev to be treated as corrupt
3102 vdev_uberblock_load() and vdev_validate() may read the wrong label

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  illumos/illumos-gate@dfbb943217
  illumos changeset: 13777:b1e53580146d
  https://www.illumos.org/issues/3090
  https://www.illumos.org/issues/3102

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #939
2013-01-08 10:35:42 -08:00
Christopher Siden
9ae529ec5d Illumos #2619 and #2747
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  illumos/illumos-gate@53089ab7c8
  illumos/illumos-gate@ad135b5d64
  illumos changeset: 13700:2889e2596bd6
  https://www.illumos.org/issues/2619
  https://www.illumos.org/issues/2747

NOTE: The grub specific changes were not ported.  This change
must be made to the Linux grub packages.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-08 10:35:35 -08:00
Massimo Maggi
5e6320cd12 Fix get/set users/groups in quota props via numeric id
Fix setting/getting users/groups in quota properties through
numeric identifier.  This support was accidentally disabled
in the original port by applying the HAVE_IDMAP wrapper macro
too broadly.

Fix obtained by moving #ifdef HAVE_IDMAP to exclude only
the part of code that really needs IDMAP.  Now zfs (get|set)
(user|group)quota@1000 works as expected.

Signed-off-by: Massimo Maggi <massimo@mmmm.it>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1147
2012-12-17 09:52:58 -08:00
Jorgen Lundman
53c2ec1d1b Fix 'zpool create' segfault due to bad syntax
Incorrect syntax should never cause a segfault.  In this case
listing multiple comma delimited options after '-o' triggered
the problem.  For example:

  zpool create -o ashift=12,listsnaps=on

This patch resolves the issue by wrapping the calls which use
hdr with a NULL test.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1118
2012-12-04 11:15:25 -08:00
Turbo Fredriksson
645fb9cc21 Implemented sharing datasets via SMB using libshare
Add the initial support for the 'smbshare' option using the
existing libshare infrastructure.  Because this implementation
relies on usershares samba version 3.0.23 is required.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #493
2012-12-03 09:42:15 -08:00
Brian Behlendorf
c372b36e3e Allow GPT+EFI vdevs for root pools
Commit 57a4edd allows the bootfs property to be set on any pool.
However, many of the zpool commands still prevent you from using
EFI labeled devices for the root pool.  For example:

    # zpool attach rpool /dev/sda /dev/sdb
    cannot label 'sdb': EFI labeled devices are not supported on
    root pools. on root devices.

For non-Solaris builds such as Linux disable this error.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1077
2012-11-30 13:45:14 -08:00
Brian Behlendorf
0e20a31b4b Recreate minors when renaming zvols
When a zvol with snapshots is renamed the device files under
/dev/zvol/ are not renamed.  This patch resolves the problem
by destroying and recreating the minors with the new name so
the links can be recreated bu udev.

Original-patch-by: Suman Chakravartula <schakrava@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #408
2012-11-19 16:59:44 -08:00
Brian Behlendorf
30b937ee15 Update spare and cache device names on import
During 'zpool import' all ZPOOL_CONFIG_PATH names are supposed
to be updated by fix_paths().  This was not happening for spare
and cache devices because the proper names were getting filtered
out of the pool_list_t->names.  Interestingly, the names were
being filtered because the spare and cache devices do not
contain the pool name in their vdev label.

The fix is to exclude the device path from the list only if:

  1) has a valid ZPOOL_CONFIG_POOL_NAME key in the label, and
  2) that pool name does not match the specified pool name.

Since the label is valid and because it does properly store the
vdev guid it will be correctly assembled without the pool name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #725
2012-10-22 08:46:02 -07:00
Brian Behlendorf
eac4720465 Allow 'zpool replace' to use short device names
The 'zpool replace' command would fail when given a short name
because unlike on other platforms the short name cannot be
deterministically expanded to a single path.  Multiple path
prefixes must be checked and in addition the partition suffix
for whole disks is determined by the prefix.

To handle this complexity a zfs_strcmp_pathname() function was
added which takes either a short or fully qualified device name.
Short names will be expanded using the prefixes in the default
import search path, or the ZPOOL_IMPORT_PATH environment variable
if it's defined.  All posible expansions are then compared against
the comparison path.  Care is taken to strip redundant slashes to
ensure legitimate matches are not missed.

In the context of this work the existing zfs_resolve_shortname()
function was extended to consider the ZPOOL_IMPORT_PATH when set.
The zfs_append_partition() interface was also simplified to take
only a single buffer.

The vast majority of these changes rework existing Linux specific
code which was originally written to accomidate udev.  However,
there is some minimal cleanup which removes Illumos specific code.
This was done to improve readability but the basic flow and intent
of the upstream code was maintained.

These changes are the logical conclusion of the previos work to
adjust the 'zpool import' search behavior, see commit 44867b6a.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #544
Closes #976
2012-10-22 08:45:58 -07:00
Matthew Ahrens
04434775b7 Illumos #3100: zvol rename fails with EBUSY when dirty.
illumos/illumos-gate@2e2c135528
Illumos changeset: 13780:6da32a929222

3100 zvol rename fails with EBUSY when dirty

Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Adam H. Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <eric.schrock@delphix.com>

Ported-by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #995
2012-10-03 13:59:02 -07:00
Bill Pijewski
37abac6d55 Illumos #2703: add mechanism to report ZFS send progress
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/2703

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-19 13:39:06 -07:00
Chris Siden
1bd201e70d Illumos #1948: zpool list should show more detailed pool info
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <eric.schrock@delphix.com>

References:
  https://www.illumos.org/issues/1948

Ported by:	Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #685
2012-09-19 13:39:05 -07:00
Brian Behlendorf
0a2f7b3662 Seg fault 'zpool import -d /dev/disk/by-id -a'
Introduced by commit 44867b6d6e.
We should of course check to ensure best isn't NULL before
attempting to dereference it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #974
2012-09-18 12:33:37 -07:00
Brian Behlendorf
44867b6d6e Improve zpool import search behavior
The goal of this change is to make 'zpool import' prefer to use
the peristent /dev/mapper or /dev/disk/by-* paths.  These are far
preferable to the devices in /dev/ whos names are not persistent
and are determined by the order in which a device is detected.

This patch improves things by changing the default search path from
just to the top level /dev/ directory to (in order):

  /dev/disk/by-vdev   - Custom rules, use first if they exist
  /dev/disk/zpool     - Custom rules, use first if they exist
  /dev/mapper         - Use multipath devices before components
  /dev/disk/by-uuid   - Single unique entry and persistent
  /dev/disk/by-id     - May be multiple entries and persistent
  /dev/disk/by-path   - Encodes physical location and persistent
  /dev/disk/by-label  - Custom persistent labels
  /dev                - UNSAFE device names will change

The default search path can be overriden by setting the
ZPOOL_IMPORT_PATH environment variable.  This must be a colon
delimited list of paths which are searched for vdevs.  If the
'zpool import -d' option is specified only those listed paths
will be searched.

Finally, when multiple paths to the same device are found.  If one
of the paths is an exact match for the path used last time to import
the pool it will be used.  When there are no exact matches the
prefered path will be determined by the provided search order.

This means you can still import a pool and force specific names by
providing the -d <path> option.  And the prefered names will persist
as long as those paths exist on your system.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #965
2012-09-17 13:49:07 -07:00
Michael Martin
fc24f7c887 Fix missing vdev names in zpool status output
Commit 858219c makes more sense down below in the 'if (verbose)'
section of the code.  Initially, buf and path will never point
to the same location.  Once 'path = buf' is set on a raidz vdev,
the code may drop into the verbose section depending on the
verbose flag.  In here, using a tmpbuf makes sense since now
'buf == path'.

This issue does not occur in the upstream Solaris code because
their implementations of snprintf() allow for buf and path to
be the same address.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #57
2012-09-05 22:09:12 -07:00
Brian Behlendorf
ca8b5af89d Remove autotools products
Remove all of the generated autotools products from the repository
and update the .gitignore files accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #718
2012-08-27 11:47:44 -07:00
Garrett D'Amore
08b1b21d58 Illumos #2803: zfs get guid pretty-prints the output
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/2803

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-08-23 10:40:14 -07:00
Christopher Siden
e956d65106 Illumos #1796, #2871, #2903, #2957
1796 "ZFS HOLD" should not be used when doing "ZFS SEND" from a read-only pool
2871 support for __ZFS_POOL_RESTRICT used by ZFS test suite
2903 zfs destroy -d does not work
2957 zfs destroy -R/r sometimes fails when removing defer-destroyed snapshot
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/1796
  https://www.illumos.org/issues/2871
  https://www.illumos.org/issues/2903
  https://www.illumos.org/issues/2957

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-08-23 10:40:02 -07:00
Eric Schrock
db49968e5c Illumos #2635: 'zfs rename -f' to perform force unmount
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <George.Wilson@delphix.com>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/2635

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #717
2012-08-23 10:39:43 -07:00
Martin Matuska
cf997d797b Properly initialize and free destroydata
This regression was accidentally introduced by commit
330d06f90d due to ZoL
specific code.  The fix is to simply ensure the passed
nvlist is initialized and freed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #876
2012-08-23 09:42:21 -07:00
Dan McDonald
d96eb2b153 Illumos #1693: persistent 'comment' field for a zpool
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/1693

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #678
2012-08-08 11:49:37 -07:00
Etienne Dechamps
ee5fd0bb80 Set zvol discard_granularity to the volblocksize.
Currently, zvols have a discard granularity set to 0, which suggests to
the upper layer that discard requests of arbirarily small size and
alignment can be made efficiently.

In practice however, ZFS does not handle unaligned discard requests
efficiently: indeed, it is unable to free a part of a block. It will
write zeros to the specified range instead, which is both useless and
inefficient (see dnode_free_range).

With this patch, zvol block devices expose volblocksize as their discard
granularity, so the upper layer is aware that it's not supposed to send
discard requests smaller than volblocksize.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #862
2012-08-07 14:55:31 -07:00
Matthew Ahrens
330d06f90d Illumos #1644, #1645, #1646, #1647, #1708
1644 add ZFS "clones" property
1645 add ZFS "written" and "written@..." properties
1646 "zfs send" should estimate size of stream
1647 "zfs destroy" should determine space reclaimed by
     destroying multiple snapshots
1708 adjust size of zpool history data

References:
  https://www.illumos.org/issues/1644
  https://www.illumos.org/issues/1645
  https://www.illumos.org/issues/1646
  https://www.illumos.org/issues/1647
  https://www.illumos.org/issues/1708

This commit modifies the user to kernel space ioctl ABI.  Extra
care should be taken when updating to ensure both the kernel
modules and utilities are updated.  This change has reordered
all of the new ioctl()s to the end of the list.  This should
help minimize this issue in the future.

Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Albert Lee <trisk@opensolaris.org>
Approved by: Garrett D'Amore <garret@nexenta.com>

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #826
Closes #664
2012-07-31 09:25:30 -07:00
Etienne Dechamps
f09398cec6 Use /sys/module instead of /proc/modules.
When libzfs checks if the module is loaded or not, it currently reads
/proc/modules and searches for a line matching the module name.

Unfortunately, if the module is included in the kernel itself (built-in
module), then /proc/modules won't list it, so libzfs will wrongly conclude
that the module is not loaded, thus making all ZFS userspace tools unusable.

Fortunately, all loaded modules appear as directories in /sys/module, even
built-in ones. Thus we can use /sys/module in lieu of /proc/modules to fix
the issue.

As a bonus, the code for checking becomes much simpler.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #851
2012-07-26 13:45:33 -07:00
Richard Yao
739a1a82e0 Linux 3.5 compat, end_writeback() changed to clear_inode()
The end_writeback() function was changed by moving the call to
inode_sync_wait() earlier in to evict().   This effecitvely changes
the ordering of the sync but it does not impact the details of
the zfs implementation.

However, as part of this change end_writeback() was renamed to
clear_inode() to reflect the new semantics.  This change does
impact us and clear_inode() now maps to end_writeback() for
kernels prior to 3.5.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #784
2012-07-23 12:29:36 -07:00
Richard Yao
ea1fdf46e2 Linux 3.5 compat, iops->truncate_range() removed
The vmtruncate_range() support has been removed from the kernel in
favor of using the fallocate method in the file_operations table.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:32 -07:00
Richard Yao
756c3e5a9c Linux 3.5 compat, eops->encode_fh() takes inodes
The export_operations member ->encode_fh() has been updated to
take both the child and parent inodes.  This interface used to
take the child dentry and a bool describing if the parent is needed.

NOTE: While updating this code I noticed that we do not currently
cleanly handle the case where we're passed a connectable parent.
This code should be audited to make sure we're doing the right thing.

Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #784
2012-07-23 12:29:23 -07:00
Etienne Dechamps
b5a28807cd Move partition scanning from userspace to module.
Currently, zpool online -e (dynamic vdev expansion) doesn't work on
whole disks because we're invoking ioctl(BLKRRPART) from userspace
while ZFS still has a partition open on the disk, which results in
EBUSY.

This patch moves the BLKRRPART invocation from the zpool utility to the
module. Specifically, this is done just before opening the device in
vdev_disk_open() which is called inside vdev_reopen(). This requires
jumping through some hoops to get to the disk device from the partition
device, and to make sure we can still open the partition after the
BLKRRPART call.

Note that this new code path is triggered on dynamic vdev expansion
only; other actions, like creating a new pool, are unchanged and still
call BLKRRPART from userspace.

This change also depends on API changes which are available in 2.6.37
and latter kernels.  The build system has been updated to detect this,
but there is no compatibility mode for older kernels.  This means that
online expansion will NOT be available in older kernels.  However, it
will still be possible to expand the vdev offline.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #808
2012-07-17 09:17:31 -07:00
Etienne Dechamps
7608bd0dd0 Use the right device path when relabeling.
Currently, zpool_vdev_online() calls zpool_relabel_disk() with a short
partition device name, which is obviously wrong because (1)
zpool_relabel_disk() expects a full, absolute path to use with open()
and (2) efi_write() must be called on an opened disk device, not a
partition device.

With this patch, zpool_relabel_disk() gets called with a full disk
device path. The path is determined using the same algorithm as
zpool_find_vdev().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #808
2012-07-12 08:59:16 -07:00
Etienne Dechamps
8adf486422 Fix error handling for "zpool online -e".
The error handling code around zpool_relabel_disk() is either inexistent
or wrong. The function call itself is not checked, and
zpool_relabel_disk() is generating error messages from an unitialized
buffer.

Before:

    # zpool online -e homez sdb; echo $?
    `: cannot relabel 'sdb1': unable to open device: 2
    0

After:

    # zpool online -e homez sdb; echo $?
    cannot expand sdb: cannot relabel 'sdb1': unable to open device: 2
    1

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #808
2012-07-12 08:58:19 -07:00
George Wilson
c7f2d69de3 Illumos #1949, #1953
1949 crash during reguid causes stale config
1953 allow and unallow missing from zpool history since removal of pyzfs

Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Garrett D'Amore <garrett.damore@gmail.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Steve Gonczi <gonczi@comcast.net>
Approved by: Eric Schrock <eric.schrock@delphix.com>

References:
  https://www.illumos.org/issues/1949
  https://www.illumos.org/issues/1953

Ported by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #665
2012-07-11 13:33:31 -07:00
Garrett D'Amore
3541dc6d02 Illumos #1748: desire support for reguid in zfs
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com>
Reviewed by: Alexander Stetsenko <ams@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/1748

This commit modifies the user to kernel space ioctl ABI.  Extra
care should be taken when updating to ensure both the kernel
modules and utilities are updated.  If only the user space
component is updated both the 'zpool events' command and the
'zpool reguid' command will not work until the kernel modules
are updated.

Ported by:     Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #665
2012-07-11 13:08:56 -07:00
Pawel Jakub Dawidek
0cee24064a Speed up 'zfs list -t snapshot -o name -s name'
FreeBSD #xxx:  Dramatically optimize listing snapshots when user
requests only snapshot names and wants to sort them by name, ie.
when executes:

  # zfs list -t snapshot -o name -s name

Because only name is needed we don't have to read all snapshot
properties.

Below you can find how long does it take to list 34509 snapshots
from a single disk pool before and after this change with cold and
warm cache:

    before:

        # time zfs list -t snapshot -o name -s name > /dev/null
        cold cache: 525s
        warm cache: 218s

    after:

        # time zfs list -t snapshot -o name -s name > /dev/null
        cold cache: 1.7s
        warm cache: 1.1s

NOTE: This patch only appears in FreeBSD.  If/when Illumos picks up
the change we may want to drop this patch and adopt their version.
However, for now this addresses a real issue.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #450
2012-06-14 09:49:04 -07:00
Richard Yao
bc98d6c809 Make zvol_remove_link() print a more useful error message
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-06-13 16:27:19 -07:00
Daniel Verite
c6327b63e6 Retry removal of busy minors
When failing to remove a zvol device link because it's busy, wait
a bit and retry in a loop instead of giving up immediately.  This
technique is similar to the loop in zpool_label_disk_wait(), with
the same goal: waiting for the asynchronous udev processes to finish
their work.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #692
2012-06-11 10:50:20 -07:00
Richard Yao
6a0936babc Linux 3.4 compat, d_make_root() replaces d_alloc_root()
torvalds/linux@adc0e91ab1 introduced
introduced d_make_root() as a replacement for d_alloc_root(). Further
commits appear to have removed d_alloc_root() from the Linux source
tree. This causes the following failure:

  error: implicit declaration of function 'd_alloc_root'
  [-Werror=implicit-function-declaration]

To correct this we update the code to use the current d_make_root()
interface for readability.  Then we introduce an autotools check
to determine if d_make_root() is available.  If it isn't then we
define some compatibility logic which used the older d_alloc_root()
interface.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #776
2012-06-11 10:04:49 -07:00
Brian Behlendorf
abe5b8fb66 Improve 'zpool import' EBUSY error message
When a device is already open O_EXCL by another process the
`zpool import` will correctly fail.  However, the default failure
message isn't very helpful.  It may in fact be harmful if you
take its advise and destroy your pool.

    cannot import 'tank': pool is busy
            Destroy and re-create the pool from
            a backup source.

Improve the error message in the EBUSY case to simply print a
message indicating that the devices are current in use.  The user
will need to manually identify which process has the device open
exclusively and why.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-06-01 08:55:24 -07:00
Brian Behlendorf
b04c9fc009 Add /dev/mapper/ to search path
When creating pools short device names may be used when those
devices appear in certain well known locations under /dev/.
This change adds /dev/mapper/ to that list.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-06-01 08:55:24 -07:00
Ned A. Bass
821b683436 Add vdev_id for JBOD-friendly udev aliases
vdev_id parses the file /etc/zfs/vdev_id.conf to map a physical path
in a storage topology to a channel name.  The channel name is combined
with a disk enclosure slot number to create an alias that reflects the
physical location of the drive.  This is particularly helpful when it
comes to tasks like replacing failed drives.  Slot numbers may also be
re-mapped in case the default numbering is unsatisfactory.  The drive
aliases will be created as symbolic links in /dev/disk/by-vdev.

The only currently supported topologies are sas_direct and sas_switch:

o  sas_direct - a channel is uniquely identified by a PCI slot and a
   HBA port

o  sas_switch - a channel is uniquely identified by a SAS switch port

A multipath mode is supported in which dm-mpath devices are handled by
examining the first running component disk, as reported by 'multipath
-l'.  In multipath mode the configuration file should contain a
channel definition with the same name for each path to a given
enclosure.

vdev_id can replace the existing zpool_id script on systems where the
storage topology conforms to sas_direct or sas_switch.  The script
could be extended to support other topologies as well.  The advantage
of vdev_id is that it is driven by a single static input file that can
be shared across multiple nodes having a common storage toplogy.
zpool_id, on the other hand, requires a unique /etc/zfs/zdev.conf per
node and a separate slot-mapping file.  However, zpool_id provides the
flexibility of using any device names that show up in
/dev/disk/by-path, so it may still be needed on some systems.

vdev_id's functionality subsumes that of the sas_switch_id script, and
it is unlikely that anyone is using it, so sas_switch_id is removed.

Finally, /dev/disk/by-vdev is added to the list of directories that
'zpool import' will scan.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #713
2012-06-01 08:55:14 -07:00
Brian Behlendorf
b39d3b9f7b Linux 3.3 compat, iops->create()/mkdir()/mknod()
The mode argument of iops->create()/mkdir()/mknod() was changed from
an 'int' to a 'umode_t'.  To prevent a compiler warning an autoconf
check was added to detect the API change and then correctly set a
zpl_umode_t typedef.  There is no functional change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #701
2012-04-30 12:52:38 -07:00
Richard Laager
109491a897 Improve error message consistency
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-04-11 10:43:17 -07:00
Brian Behlendorf
1c5de20ae2 Add --enable-debug-dmu-tx configure option
Allow rigorous (and expensive) tx validation to be enabled/disabled
indepentantly from the standard zfs debugging.  When enabled these
checks ensure that all txs are constructed properly and that a dbuf
is never dirtied without taking the correct tx hold.

This checking is particularly helpful when adding new dmu consumers
like Lustre.  However, for established consumers such as the zpl
with no known outstanding tx construction problems this is just
overhead.

--enable-debug-dmu-tx  - Enable/disable validation of each tx as
--disable-debug-dmu-tx   it is constructed.  By default validation
                         is disabled due to performance concerns.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-23 12:25:17 -07:00
Brian Behlendorf
ebe7e575ea Add .zfs control directory
Add support for the .zfs control directory.  This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required.  The bulk of the core
functionality is now all there with the following limitations.

*) The .zfs/snapshot directory automount support requires a 2.6.37
   or newer kernel.  The exception is RHEL6.2 which has backported
   the d_automount patches.

*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
   in the .zfs/snapshot directory works as expected.  However,
   this functionality is only available to root until zfs
   delegations are finished.

      * mkdir - create a snapshot
      * rmdir - destroy a snapshot
      * mv    - rename a snapshot

The following issues are known defeciences, but we expect them to
be addressed by future commits.

*) Add automount support for kernels older the 2.6.37.  This should
   be possible using follow_link() which is what Linux did before.

*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
   The majority of the ground work for this is complete.  However,
   finishing this work will require resolving some lingering
   integration issues with the Linux NFS kernel server.

*) The .zfs/shares directory exists but no futher smb functionality
   has yet been implemented.

Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #173
2012-03-22 13:03:47 -07:00
Ned Bass
613d88eda8 Align parition end on 1 MiB boundary
Some devices have exhibited sensitivity to the ending alignment of
partitions.  In particular, even if the first partition begins at 1
MiB, we have seen many sd driver task abort errors with certain SSDs
if the first partition doesn't end on a 1 MiB boundary.  This occurs
when the vdev label is read during pool creation or importation and
causes a delay of about 30 seconds per device.  It can also be
simulated with dd when the pool isn't imported:

  dd if=/dev/sda1 of=/dev/null bs=262144 count=1

For the record, this problem was observed with SMARTMOD
SG9XCA2E200GE01 200GB SSDs.  Unfortunately I don't have a good
explanation for this behavior. It seems to have something to do with
highly fragmented single-sector requests being issued to the device,
which it may not support.  With end-aligned partitions at least
page-sized requests were queued and issued to the driver according
to blktrace. In any case, aligning the partition end is a fairly
innocuous work-around, wasting at most 1 MiB of space.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #574
2012-03-05 09:49:50 -08:00
Brian Behlendorf
4b787d75c8 Cleanly support debug packages
Allow a source rpm to be rebuilt with debugging enabled.  This
avoids the need to have to manually modify the spec file.  By
default debugging is still largely disabled.  To enable specific
debugging features use the following options with rpmbuild.

  '--with debug'               - Enables ASSERTs

  # For example:
  $ rpmbuild --rebuild --with debug zfs-modules-0.6.0-rc6.src.rpm

Additionally, ZFS_CONFIG has been added to zfs_config.h for
packages which build against these headers.  This is critical
to ensure both zfs and the dependant package are using the same
prototype and structure definitions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-27 14:08:17 -08:00
Richard Yao
b41c9906dc Support ashift=13 for 8KB SSD block sizes
New SSDs are now available which use an internal 8k block size.
To make sure ZFS can get the maximum performance out of these
devices we're increasing the maximum ashift to 13 (8KB).

This value is still small enough that we can fit 16 uberblocks
in the vdev ring label.  However, I don't want to increase this
any futher or it will limit the ability the safely roll back a
pool to recover it.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #565
2012-02-13 12:25:27 -08:00
Etienne Dechamps
30930fba21 Add support for DISCARD to ZVOLs.
DISCARD (REQ_DISCARD, BLKDISCARD) is useful for thin provisioning.
It allows ZVOL clients to discard (unmap, trim) block ranges from
a ZVOL, thus optimizing disk space usage by allowing a ZVOL to
shrink instead of just grow.

We can't use zfs_space() or zfs_freesp() here, since these functions
only work on regular files, not volumes. Fortunately we can use the
low-level function dmu_free_long_range() which does exactly what we
want.

Currently the discard operation is not added to the log. That's not
a big deal since losing discard requests cannot result in data
corruption. It would however result in disk space usage higher than
it should be. Thus adding log support to zvol_discard() is probably
a good idea for a future improvement.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-09 16:19:38 -08:00
Etienne Dechamps
cb2d19010d Support the fallocate() file operation.
Currently only the (FALLOC_FL_PUNCH_HOLE) flag combination is
supported, since it's the only one that matches the behavior of
zfs_space(). This makes it pretty much useless in its current
form, but it's a start.

To support other flag combinations we would need to modify
zfs_space() to make it more flexible, or emulate the desired
functionality in zpl_fallocate().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #334
2012-02-09 16:19:32 -08:00
Etienne Dechamps
34037afe24 Improve ZVOL queue behavior.
The Linux block device queue subsystem exposes a number of configurable
settings described in Linux block/blk-settings.c. The defaults for these
settings are tuned for hard drives, and are not optimized for ZVOLs. Proper
configuration of these options would allow upper layers (I/O scheduler) to
take better decisions about write merging and ordering.

Detailed rationale:

 - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to
   handle writes of any size, so there's no reason to impose a limit. Let the
   upper layer decide.

 - max_segments and max_segment_size are set to unlimited. zvol_write() will
   copy the requests' contents into a dbuf anyway, so the number and size of
   the segments are irrelevant. Let the upper layer decide.

 - physical_block_size and io_opt are set to the ZVOL's block size. This
   has the potential to somewhat alleviate issue #361 for ZVOLs, by warning
   the upper layers that writes smaller than the volume's block size will be
   slow.

 - The NONROT flag is set to indicate this isn't a rotational device.
   Although the backing zpool might be composed of rotational devices, the
   resulting ZVOL often doesn't exhibit the same behavior due to the COW
   mechanisms used by ZFS. Setting this flag will prevent upper layers from
   making useless decisions (such as reordering writes) based on incorrect
   assumptions about the behavior of the ZVOL.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Etienne Dechamps
b18019d2d8 Fix synchronicity for ZVOLs.
zvol_write() assumes that the write request must be written to stable storage
if rq_is_sync() is true. Unfortunately, this assumption is incorrect. Indeed,
"sync" does *not* mean what we think it means in the context of the Linux
block layer. This is well explained in linux/fs.h:

    WRITE:       A normal async write. Device will be plugged.
    WRITE_SYNC:  Synchronous write. Identical to WRITE, but passes down
                 the hint that someone will be waiting on this IO
                 shortly.
    WRITE_FLUSH: Like WRITE_SYNC but with preceding cache flush.
    WRITE_FUA:   Like WRITE_SYNC but data is guaranteed to be on
                 non-volatile media on completion.

In other words, SYNC does not *mean* that the write must be on stable storage
on completion. It just means that someone is waiting on us to complete the
write request. Thus triggering a ZIL commit for each SYNC write request on a
ZVOL is unnecessary and harmful for performance. To make matters worse, ZVOL
users have no way to express that they actually want data to be written to
stable storage, which means the ZIL is broken for ZVOLs.

The request for stable storage is expressed by the FUA flag, so we must
commit the ZIL after the write if the FUA flag is set. In addition, we must
commit the ZIL before the write if the FLUSH flag is set.

Also, we must inform the block layer that we actually support FLUSH and FUA.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-07 16:23:06 -08:00
Brian Behlendorf
47621f3d76 Linux 3.3 compat, sops->show_options()
The second argument of sops->show_options() was changed from a
'struct vfsmount *' to a 'struct dentry *'.  Add an autoconf check
to detect the API change and then conditionally define the expected
interface.  In either case we are only interested in the zfs_sb_t.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #549
2012-02-03 10:02:01 -08:00
Prakash Surya
ff998d804f Ignore dataset if the dds_type is DMU_OST_OTHER
Since the zpios and potentially other ZFS tests use the
DMU_OST_OTHER type to label their datasets, the zpool and
zfs commands should gracefully handle this type when it is
encountered.  This patch modifies the commands' behavior
to ignore any datasets with a dds_type of DMU_OST_OTHER.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #536
2012-01-19 09:29:48 -08:00
Darik Horn
f783130a1f Allow GPT+EFI vdev replacement in boot pools.
Commit zfsonlinux/zfs@57a4eddc4d
allows the bootfs property to be set on any pool, but does not
accommodate subsequent vdev changes. For example:

	# zpool replace rpool /dev/sda /dev/sdb
	operation not supported on this type of pool
	property 'bootfs' is not supported on EFI labeled devices

For non-Solaris builds, disable the check that emits this error.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-01-18 11:05:24 -08:00
Darik Horn
750562833f Combine libraries: spl, avl, efi, share, unicode.
These libraries, which are an artifact of the ZoL development
process, conflict with packages that are already in distribution:

  * libspl: SPL Programming Language
  * libavl: AVL for Linux
  * libefi: GRUB

And these libraries are potential conflicts:

  * libshare: the Linux Mount Manager
  * libunicode: Perl and Python

Recompose these five ZoL components into the four libraries that are
conventionally provided by Solaris and FreeBSD systems:

  + libnvpair
  + libuutil
  + libzpool
  + libzfs

This change resolves the name conflict, makes ZoL more compatible
with existing software that uses autotools to detect ZFS, and allows
pkg-zfs to better reflect the official Debian kFreeBSD packaging.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #430
2012-01-17 15:19:50 -08:00
Suman Chakravartula
e18be9a637 Add overlay(-O) mount option support
Linux supports mounting over non-empty directories by default.
In Solaris this is not the case and -O option is required for
zfs mount to mount a zfs filesystem over a non-empty directory.

For compatibility, I've added support for -O option to mount
zfs filesystems over non-empty directories if the user wants
to, just like in Solaris.

I've defined MS_OVERLAY to record it in the flags variable if
the -O option is supplied.  The flags variable passes through
a few functions and its checked before performing the empty
directory check in zfs_mount function.  If -O is given, the
check is not performed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #473
2012-01-12 15:49:38 -08:00