mirror_zfs/module/zfs
Matthew Ahrens ec21397127
async zvol minor node creation interferes with receive
When we finish a zfs receive, dmu_recv_end_sync() calls
zvol_create_minors(async=TRUE).  This kicks off some other threads that
create the minor device nodes (in /dev/zvol/poolname/...).  These async
threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which
both call dmu_objset_own(), which puts a "long hold" on the dataset.
Since the zvol minor node creation is asynchronous, this can happen
after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have
completed.

After the first receive ioctl has completed, userland may attempt to do
another receive into the same dataset (e.g. the next incremental
stream).  This second receive and the asynchronous minor node creation
can interfere with one another in several different ways, because they
both require exclusive access to the dataset:

1. When the second receive is finishing up, dmu_recv_end_check() does
dsl_dataset_handoff_check(), which can fail with EBUSY if the async
minor node creation already has a "long hold" on this dataset.  This
causes the 2nd receive to fail.

2. The async udev rule can fail if zvol_id and/or systemd-udevd try to
open the device while the the second receive's async attempt at minor
node creation owns the dataset (via zvol_prefetch_minors_impl).  This
causes the minor node (/dev/zd*) to exist, but the udev-generated
/dev/zvol/... to not exist.

3. The async minor node creation can silently fail with EBUSY if the
first receive's zvol_create_minor() trys to own the dataset while the
second receive's zvol_prefetch_minors_impl already owns the dataset.

To address these problems, this change synchronously creates the minor
node.  To avoid the lock ordering problems that the asynchrony was
introduced to fix (see #3681), we create the minor nodes from open
context, with no locks held, rather than from syncing contex as was
originally done.

Implementation notes:

We generally do not need to traverse children or prefetch anything (e.g.
when running the recv, snapshot, create, or clone subcommands of zfs).
We only need recursion when importing/opening a pool and when loading
encryption keys.  The existing recursive, asynchronous, prefetching code
is preserved for use in these cases.

Channel programs may need to create zvol minor nodes, when creating a
snapshot of a zvol with the snapdev property set.  We figure out what
snapshots are created when running the LUA program in syncing context.
In this case we need to remember what snapshots were created, and then
try to create their minor nodes from open context, after the LUA code
has completed.

There are additional zvol use cases that asynchronously own the dataset,
which can cause similar problems.  E.g. changing the volmode or snapdev
properties.  These are less problematic because they are not recursive
and don't touch datasets that are not involved in the operation, there
is still potential for interference with subsequent operations.  In the
future, these cases should be similarly converted to create the zvol
minor node synchronously from open context.

The async tasks of removing and renaming minors do not own the objset,
so they do not have this problem.  However, it may make sense to also
convert these operations to happen synchronously from open context, in
the future.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
External-issue: DLPX-65948
Closes #7863
Closes #9885
2020-02-03 09:33:14 -08:00
..
aggsum.c OpenZFS 9688 - aggsum_fini leaks memory 2018-10-19 12:08:03 -07:00
arc.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.c Add subcommand to wait for background zfs activity to complete 2019-09-13 18:09:06 -07:00
bptree.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
bqueue.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
btree.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
cityhash.c OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
dataset_kstats.c Fix panic on DilOS with kstat per dataset statistics 2019-09-03 12:12:31 -07:00
dbuf_stats.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dbuf.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
ddt_zap.c fat zap should prefetch when iterating 2019-06-12 13:13:09 -07:00
ddt.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dmu_diff.c Mark write_record static 2019-12-03 09:51:44 -08:00
dmu_object.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dmu_objset.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
dmu_recv.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
dmu_redact.c Move objnode handling to common code 2019-09-12 13:31:09 -07:00
dmu_send.c dmu_send: redacted: fix memory leak on invalid redaction/from bookmark 2020-01-23 09:34:31 -08:00
dmu_traverse.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dmu_tx.c Enable use of DTRACE_PROBE* macros in "spl" module 2019-11-01 13:13:43 -07:00
dmu_zfetch.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
dmu.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
dnode_sync.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
dnode.c cppcheck: (warning) Possible null pointer dereference: dnp 2019-12-18 17:25:13 -08:00
dsl_bookmark.c dsl_bookmark_create_check: fix NULL pointer deref if dbca_errors == NULL 2020-01-23 21:13:42 -08:00
dsl_crypt.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
dsl_dataset.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
dsl_deadlist.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dsl_deleg.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dsl_destroy.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
dsl_dir.c Add FreeBSD jail support hooks 2019-12-11 11:58:37 -08:00
dsl_pool.c Eliminate Linux specific inode usage from common code 2019-12-11 11:53:57 -08:00
dsl_prop.c Support inheriting properties in channel programs 2020-01-22 17:03:17 -08:00
dsl_scan.c Prevent unnecessary resilver restarts 2019-11-27 10:15:01 -08:00
dsl_synctask.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
dsl_userhold.c Fix strdup conflict on other platforms 2019-10-10 09:47:06 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c Add zfs_file_* interface, remove vnodes 2019-11-21 09:32:57 -08:00
gzip.c OpenZFS restructuring - move platform specific sources 2019-09-06 11:26:26 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Enable clang to use intrinsics for lz4 2019-10-01 13:17:32 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in Add AltiVec RAID-Z 2020-01-23 11:01:24 -08:00
metaslab.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
mmp.c Move linux specific mmp module_param_call handler to platform code 2019-10-16 18:37:31 -07:00
multilist.c Enable use of DTRACE_PROBE* macros in "spl" module 2019-11-01 13:13:43 -07:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Disable unused pathname::pn_path* (unneeded in Linux) 2019-07-15 13:57:56 -07:00
range_tree.c Function name and comment updates 2019-10-11 10:13:21 -07:00
refcount.c Prevent race in blkptr_verify against device removal 2019-08-13 21:24:43 -06:00
rrwlock.c Enable use of DTRACE_PROBE* macros in "spl" module 2019-11-01 13:13:43 -07:00
sa.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
sha256.c OpenZFS restructuring - move platform specific sources 2019-09-06 11:26:26 -07:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_checkpoint.c Add subcommand to wait for background zfs activity to complete 2019-09-13 18:09:06 -07:00
spa_config.c Add zfs_file_* interface, remove vnodes 2019-11-21 09:32:57 -08:00
spa_errlog.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
spa_history.c Fix strdup conflict on other platforms 2019-10-10 09:47:06 -07:00
spa_log_spacemap.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
spa_misc.c Refactor deadman set failmode to be cross platform 2019-12-05 12:40:45 -08:00
spa.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
space_map.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
space_reftree.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
txg.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
uberblock.c MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
unique.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev_cache.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_indirect_births.c Fixes: #8934 Large kmem_alloc 2019-07-10 15:54:49 -07:00
vdev_indirect_mapping.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_indirect.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_initialize.c Include prototypes for vdev_initialize 2019-10-31 10:09:01 -07:00
vdev_label.c Add zfs_file_* interface, remove vnodes 2019-11-21 09:32:57 -08:00
vdev_mirror.c Avoid some crashes when importing a pool with corrupt metadata 2019-12-26 10:57:05 -08:00
vdev_missing.c Update vdev_ops_t from illumos 2019-06-20 18:29:02 -07:00
vdev_queue.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev_raidz_math_aarch64_neon_common.h Minor performance fix for NEON RAID-Z 2019-12-17 19:34:52 -08:00
vdev_raidz_math_aarch64_neon.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_aarch64_neonx2.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx2.c OpenZFS restructuring - move platform specific headers 2019-09-05 09:34:54 -07:00
vdev_raidz_math_avx512bw.c OpenZFS restructuring - move platform specific headers 2019-09-05 09:34:54 -07:00
vdev_raidz_math_avx512f.c Make clang happy with vdev_raidz_ code 2019-10-10 09:45:37 -07:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_powerpc_altivec_common.h Add AltiVec RAID-Z 2020-01-23 11:01:24 -08:00
vdev_raidz_math_powerpc_altivec.c Add AltiVec RAID-Z 2020-01-23 11:01:24 -08:00
vdev_raidz_math_scalar.c Linux 5.3: Fix switch() fall though compiler errors 2019-08-21 09:29:23 -07:00
vdev_raidz_math_sse2.c Make clang happy with vdev_raidz_ code 2019-10-10 09:45:37 -07:00
vdev_raidz_math_ssse3.c OpenZFS restructuring - move platform specific headers 2019-09-05 09:34:54 -07:00
vdev_raidz_math.c Add AltiVec RAID-Z 2020-01-23 11:01:24 -08:00
vdev_raidz.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev_removal.c Cancel initialize and TRIM before vdev_metaslab_fini() 2019-12-26 10:50:23 -08:00
vdev_root.c Update vdev_ops_t from illumos 2019-06-20 18:29:02 -07:00
vdev_trim.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
zap_leaf.c Off-by-one in zap_leaf_array_create() 2019-01-18 09:58:46 -08:00
zap_micro.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
zap.c Make module tunables cross platform 2019-09-05 14:49:49 -07:00
zcp_get.c Relocate common quota functions to shared code 2019-12-11 12:12:08 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
zcp_synctask.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
zcp.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
zfeature.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
zfs_byteswap.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
zfs_fm.c Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
zfs_fuid.c Relocate common quota functions to shared code 2019-12-11 12:12:08 -08:00
zfs_ioctl.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00
zfs_log.c Fix zfs_xattr_owner_unlinked on FreeBSD and comment 2019-12-16 09:49:05 -08:00
zfs_onexit.c Move zfs_onexit_fd_hold to platform code 2019-10-13 19:19:39 -07:00
zfs_quota.c Relocate common quota functions to shared code 2019-12-11 12:12:08 -08:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Simplify FreeBSD's locking requirements in zfs_replay.c 2020-01-22 17:55:56 -08:00
zfs_rlock.c Move platform specific parts of zfs_znode.h to platform code 2019-11-06 10:54:25 -08:00
zfs_sa.c Add inode accessors to common code 2019-10-02 09:15:12 -07:00
zil.c Improve logging of 128KB writes 2019-11-11 09:27:59 -08:00
zio_checksum.c Disable EDONR on FreeBSD 2019-12-05 13:10:29 -08:00
zio_compress.c zio_decompress_data always ASSERTs successful decompression 2019-12-10 15:51:58 -08:00
zio_inject.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
zio.c Re-consolidate zio_delay_interrupt 2020-01-21 15:04:13 -08:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zrlock.c Enable use of DTRACE_PROBE* macros in "spl" module 2019-11-01 13:13:43 -07:00
zthr.c Fast Clone Deletion 2019-07-26 10:54:14 -07:00
zvol.c async zvol minor node creation interferes with receive 2020-02-03 09:33:14 -08:00