mirror_zfs/include/sys
Pavel Zakharov 6cb8e5306d OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery
Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

	7638 Refactor spa_load_impl into several functions
	8961 SPA load/import should tell us why it failed
	7277 zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
of data, this feature is for now restricted to a read-only import.

Porting notes (ZTS):
* Fix 'make dist' target in zpool_import

* The maximum path length allowed by tar is 99 characters.  Several
  of the new test cases exceeded this limit resulting in them not
  being included in the tarball.  Shorten the names slightly.

* Set/get tunables using accessor functions.

* Get last synced txg via the "zfs_txg_history" mechanism.

* Clear zinject handlers in cleanup for import_cache_device_replaced
  and import_rewind_device_replaced in order that the zpool can be
  exported if there is an error.

* Increase FILESIZE to 8G in zfs-test.sh to allow for a larger
  ext4 file system to be created on ZFS_DISK2.  Also, there's
  no need to partition ZFS_DISK2 at all.  The partitioning had
  already been disabled for multipath devices.  Among other things,
  the partitioning steals some space from the ext4 file system,
  makes it difficult to accurately calculate the paramters to
  parted and can make some of the tests fail.

* Increase FS_SIZE and FILE_SIZE in the zpool_import test
  configuration now that FILESIZE is larger.

* Write more data in order that device evacuation take lonnger in
  a couple tests.

* Use mkdir -p to avoid errors when the directory already exists.

* Remove use of sudo in import_rewind_config_changed.

Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9075
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/619c0123
Closes #7459
2018-05-08 21:35:27 -07:00
..
crypto OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
fm Extend deadman logic 2018-01-25 13:40:38 -08:00
fs OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
lua Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
sysevent OpenZFS 8959 - Add notifications when a scrub is paused or resumed 2018-01-17 10:31:00 -08:00
abd.h OpenZFS 8416 - abd.h is not C++ friendly 2017-06-30 11:11:01 -07:00
arc_impl.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
arc.h Decryption error handling improvements 2018-03-31 11:12:51 -07:00
avl_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
avl.h Remove dead code from AVL tree 2017-10-05 19:28:00 -07:00
blkptr.h OpenZFS 8067 - zdb should be able to dump literal embedded block pointer 2017-07-07 11:28:01 -07:00
bplist.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
bpobj.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
bqueue.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
dbuf.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
ddt.h Incorrect maximum DVA value in DDE_GET_NDVAS() 2018-02-26 14:20:12 -08:00
dmu_impl.h Fix race in dnode_check_slots_free() 2018-04-10 11:15:05 -07:00
dmu_objset.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
dmu_send.h Raw receive should change key atomically 2018-02-21 12:31:03 -08:00
dmu_traverse.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
dmu_tx.h OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue 2018-01-26 20:19:46 -08:00
dmu_zfetch.h OpenZFS 6322 - ZFS indirect block predictive prefetch 2016-08-30 14:26:55 -07:00
dmu.h assertion in arc_release() during encrypted receive 2018-04-17 11:06:54 -07:00
dnode.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_bookmark.h Illumos 4368, 4369. 2014-07-29 10:55:29 -07:00
dsl_crypt.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
dsl_dataset.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deadlist.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_destroy.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
dsl_dir.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_pool.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_prop.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_scan.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_synctask.h Illumos 4951 - ZFS administrative commands should use reserved space 2015-05-04 09:41:10 -07:00
dsl_userhold.h Illumos #3740 2013-11-04 11:17:48 -08:00
edonr.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
efi_partition.h Fix spelling 2017-01-03 11:31:18 -06:00
frame.h Suppress incorrect objtool warnings 2017-12-07 10:28:50 -08:00
hkdf.h Encryption patch follow-up 2017-10-11 16:54:48 -04:00
Makefile.am OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
metaslab_impl.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
metaslab.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
mmp.h Record skipped MMP writes in multihost_history 2018-03-06 15:15:15 -08:00
mntent.h Make zfs mount according to relatime config in dataset 2016-04-05 18:55:59 -07:00
multilist.h OpenZFS 7968 - multi-threaded spa_sync() 2017-03-20 18:36:00 -07:00
nvpair_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
nvpair.h Replace __va_list with va_list 2014-08-13 10:35:00 -07:00
pathname.h Add pn_alloc()/pn_free() functions 2016-04-21 09:49:25 -07:00
policy.h Add zfs allow and zfs unallow support 2016-06-07 09:16:52 -07:00
range_tree.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
refcount.h OpenZFS 8081 - Compiler warnings in zdb 2017-10-27 12:46:35 -07:00
rrwlock.h Illumos 5008 - lock contention (rrw_exit) while running a read only load 2015-07-06 09:34:13 -07:00
sa_impl.h Implement large_dnode pool feature 2016-06-24 13:13:21 -07:00
sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
sdt.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
sha2.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
skein.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
spa_boot.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
spa_checksum.h Implementation of AVX2 optimized Fletcher-4 2016-06-02 14:30:51 -07:00
spa_impl.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
spa.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
space_map.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
space_reftree.h Illumos #4101, #4102, #4103, #4105, #4106 2014-07-22 09:39:16 -07:00
sysevent.h OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
trace_acl.h Linux 4.16 compat: inode_set_iversion() 2018-02-08 21:25:19 -08:00
trace_arc.h Support re-prioritizing asynchronous prefetches 2017-12-21 09:13:06 -08:00
trace_common.h OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
trace_dbgmsg.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
trace_dbuf.h Crash in dbuf_evict_one with DTRACE_PROBE 2017-08-09 11:04:41 -07:00
trace_dmu.h tx_waited -> tx_dirty_delayed in trace_dmu.h 2018-01-31 16:13:26 -08:00
trace_dnode.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_multilist.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_txg.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_vdev.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
trace_zil.h OpenZFS 8585 - improve batching done in zil_commit() 2017-12-05 09:39:16 -08:00
trace_zio.h Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
trace_zrlock.h Fix race in trace point in zrl_add_impl 2018-03-12 11:27:02 -07:00
trace.h Remove duplicate typedefs from trace.h 2015-01-06 16:53:24 -08:00
txg_impl.h Fix spelling 2017-01-03 11:31:18 -06:00
txg.h OpenZFS 8063 - verify that we do not attempt to access inactive txg 2017-05-10 13:52:22 -04:00
u8_textprep_data.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
u8_textprep.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock_impl.h OpenZFS 8491 - uberblock on-disk padding to reserve space for smoothly merging zpool checkpoint & MMP in ZFS 2017-07-24 13:47:51 -04:00
uberblock.h Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
uio_impl.h Add basic uio support 2011-02-10 09:21:43 -08:00
unique.h Illumos #3742 2013-11-04 10:55:25 -08:00
uuid.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
vdev_disk.h Remove custom root pool import code 2016-08-11 11:19:34 -07:00
vdev_file.h Use a dedicated taskq for vdev_file 2016-12-21 10:47:15 -08:00
vdev_impl.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
vdev_indirect_births.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_mapping.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_raidz_impl.h Revert raidz_map and _col structure types 2018-01-09 14:46:52 -08:00
vdev_raidz.h Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_removal.h OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
vdev.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
xvattr.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zap_impl.h OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space 2017-03-07 09:51:59 -08:00
zap_leaf.h Fix ENOSPC in "Handle zap_add() failures in ..." 2018-04-18 14:19:50 -07:00
zap.h OpenZFS 1300 - filename normalization doesn't work for removes 2017-02-02 14:13:41 -08:00
zcp_global.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_iter.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_prop.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp.h OpenZFS 8677 - Open-Context Channel Programs 2018-02-08 16:05:57 -08:00
zfeature.h Revert "zhack: Add 'feature disable' command" 2016-05-17 11:52:07 -07:00
zfs_acl.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_context.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
zfs_ctldir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_debug.h OpenZFS 9236 - nuke spa_dbgmsg 2018-04-30 10:19:48 -07:00
zfs_delay.h cstyle: Resolve C style issues 2013-12-18 16:46:35 -08:00
zfs_dir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_fuid.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_ioctl.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zfs_onexit.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_project.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_ratelimit.h Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_rlock.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_stat.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_vfsops.h ZIL claiming should not start user accounting 2018-02-20 16:27:31 -08:00
zfs_vnops.h RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zfs_znode.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zil_impl.h OpenZFS 8909 - 8585 can cause a use-after-free kernel panic 2017-12-28 10:18:04 -08:00
zil.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zio_checksum.h Remove dependency on linear ABD 2017-03-29 12:24:51 -07:00
zio_compress.h DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zio_crypt.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zio_impl.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
zio_priority.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zio.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zpl.h RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zrlock.h OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
zthr.h OpenZFS 9079 - race condition in starting and ending condensing thread for indirect vdevs 2018-04-14 12:23:53 -07:00
zvol.h Add port of FreeBSD 'volmode' property 2017-07-12 13:05:37 -07:00