After resuming a pool the godfather zio could have both the
ZIO_REEXECUTE_NOW and ZIO_REEXECUTE_SUSPEND bits set. This
can occur if some child zios set ZIO_REEXECUTE_NOW while
other set ZIO_REEXECUTE_SUSPEND. The godfather zio can
inherit both flags in zio_notify_parent().
The child zios which assigned the ZIO_REEXECUTE_SUSPEND flag
will be removed from the godfather's child list and added to
the spa->spa_suspend_zio_root child list. While child zios
with the ZIO_REEXECUTE_NOW bit set remain being monitored
by the godfather zio.
When the godfather zio executes zio_done() the presence of
the ZIO_REEXECUTE_SUSPEND bit results in all io_reexecute
being cleared. These child zios will then not be re-executed
and instead will be destroyed and lost.
The most straight forward way to address this situation is
to only clear the ZIO_REEXECUTE_SUSPEND bit and leave the
ZIO_REEXECUTE_NOW bit set.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
The .write/.read file operations callbacks can be retired since
support for .read_iter/.write_iter and .aio_read/.aio_write has
been added. The vfs_write()/vfs_read() entry functions will
select the correct interface for the kernel. This is desirable
because all VFS write/read operations now rely on common code.
This change also add the generic write checks to make sure that
ulimits are enforced correctly on write.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5587Closes#5673
When importing a pool with a large number of filesystems within the same
parent filesystem, we see that dmu_objset_find_dp() takes a long time.
It is called from 3 places: spa_check_logs(), spa_ld_claim_log_blocks(),
and spa_load_verify().
There are several ways to improve performance here:
1. We don't really need to do spa_check_logs() or
spa_ld_claim_log_blocks() if the pool was closed cleanly.
2. spa_load_verify() uses dmu_objset_find_dp() to check that no
datasets have too long of names.
3. dmu_objset_find_dp() is slow because it's doing
zap_value_search() (which is O(N sibling datasets)) to determine
the name of each dsl_dir when it's opened. In this case we
actually know the name when we are opening it, so we can provide
it and avoid the lookup.
This change implements fix#3 from the above list; i.e. make
dmu_objset_find_dp() provide the name of the dataset so that we don't
have to search for it.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com>
Reviewed-by: David Quigley <david.quigley@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
OpenZFS-issue: https://www.illumos.org/issues/7606
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/cac6babCloses#5662
When doing recv and rollback, dsl_dataset_clone_swap_sync_impl will be
called to swap out the ds_objset and do dmu_objset_evict on the old one.
However, currently zv->zv_objset will not be swapped out accordingly, so
if anyone currently holds a fd on the zvol, we risk hitting a use-after-free.
We fix this by introducing the suspend and resume mechanism of zsb to
zv. Before recv or rollback, we use zvol_suspend to block all access to
zv_objset and shut it down. After the recv or rollback, we use zvol_resume
to swap in zv_objset with the new ds_objset and unblock the access.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#4866Closes#5609
Fix dmu_object_next() to correctly handle unallocated objects on
large_dnode datasets.
We implement this by scanning the dnode block until we find the correct
offset to be used in dnode_next_offset(). This is necessary because we
can't assume *objectp is a hole even if dmu_object_info() returns
ENOENT.
This fixes a couple of issues with zfs receive on large_dnode datasets.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#5027Closes#5532
Assuming /bin/cp causes problems on systems where cp is
not in /bin such as NixOS.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Joerg Thalheim <joerg@higgsboson.tk>
Closes#5548
Add *_by_dnode() routines for accessing objects given their
dnode_t *, this is more efficient than accessing the object by
(objset_t *, uint64_t object). This change converts some but
not all of the existing consumers. As performance-sensitive
code paths are discovered they should be converted to use
these routines.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Closes#5534
Issue #4802
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Joe Stein <jas14@cs.brown.edu>
Ported-by: Don Brady <don.brady@intel.com>
When loading a pool that had been created before the existance of
per-vdev zaps, on a system that knows about per-vdev zaps, the
per-vdev zaps will not be allocated and initialized.
This appears to be because the logic that would have done so, in
spa_sync_config_object(), is not reached under normal operation. It is
only reached if spa_config_dirty_list is non-empty.
The fix is to add another `AVZ_ACTION_` enum that will allow this code
to be reached when we detect that we're loading an old pool, even when
there are no dirty configs.
OpenZFS-issue: https://www.illumos.org/issues/7743
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e2d29d0Closes#5582
[bio] The req_op enum was changed to req_opf. Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined. This should work properly on kernels >= 4.8.
[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef. Add a configure check to determine whether bio_set_op_attrs()
is defined. Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.
[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h. Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.
[vfs] The generic_readlink() function has been made static. If .readlink
in inode_operations is NULL, generic_readlink() is used.
[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:
s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#5499
Fix a regression accidentally introduced by e0ab3ab.
Additionally, add a new script zpool_import_014_pos.ksh to
the ZFS test suite to exercise 'zpool import -t' functionality.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#5466Closes#5515
The introduction of parallel zvol prefetch causes deadlock when using
vdev_file.
spa_async->(spa_namespace_lock)->txg_wait_synced->(wait for txg_sync)
txg_sync->zio_wait->(wait for vdev_file_io_fsync on system_taskq)
zvol_prefetch_minors_impl (on system_taskq)->spa_open_common->(wait for spa_namespace_lock)
We fix this by using dedicated taskq for vdev_file. This same change
was originally made in commit bc25c93 but reverted in commit aa9af22
when dynamic taskqs were added.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#5506Closes#5495
When iterating over the input nvlist in dsl_props_set_sync_impl() when we don't
preserve the nvpair name before looking up ZPROP_VALUE, so when we later go to
process it nvpair_name() is always "value" and not the actual property name.
This fixes a couple of bugs in zfs_ioc_recv():
* Received properties were not restored correctly when failing to receive an
incremental send stream
* Received properties were not completely replaced by the new ones when
successfully receiving an incremental send stream
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#5497
This branch contains the following fixes/improvements.
* Fix setting i_flags
* Fix wrong operator in xvattr.h
* Fix fchange macro in zpl_ioctl_setflags()
* Added configure check to use inode_set_flags()
* Added a test case for chattr for better test coverage
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5486Closes#5470Closes#5469
zfs_sb_create would normally takes ownership of zmo, and it will be freed in
zfs_sb_free. However, when zfs_sb_create fails we need to explicit free it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5490Closes#5496
The fchange in zpl_ioctl_setflags was for detecting flag change. However it
was incorrect and would always fail to detect a flag change from set to unset,
causing users without CAP_LINUX_IMMUTABLE to be able to unset flags.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Adapt avx512bw implementation for use with abd buffers. Mul2 implementation
is rewritten to take advantage of the BW instruction set.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Romain Dolbeau <romain.dolbeau@atos.net>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes#5477
Fix zfs_xvattr_set to set S_IMMUTABLE and S_APPEND flags correctly.
Reinstate zfs_set_inode_flags and use it when zfs_xvatter_set and also when
setting up inode in zfs_znode_alloc and zfs_rezget.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
User of ida needs to call ida_destroy after using it. Otherwise
ida->free_bitmap and/or other stuff may leak.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5484
This removes two large whitespaces in "modinfo zfs" as well as correcting
a couple typos.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes#5475