mirror_zfs/module/zfs
George Wilson 619f097693 OpenZFS 9102 - zfs should be able to initialize storage devices
PROBLEM
========

The first access to a block incurs a performance penalty on some platforms
(e.g. AWS's EBS, VMware VMDKs). Therefore we recommend that volumes are
"thick provisioned", where supported by the platform (VMware). This can
create a large delay in getting a new virtual machines up and running (or
adding storage to an existing Engine). If the thick provision step is
omitted, write performance will be suboptimal until all blocks on the LUN
have been written.

SOLUTION
=========

This feature introduces a way to 'initialize' the disks at install or in the
background to make sure we don't incur this first read penalty.

When an entire LUN is added to ZFS, we make all space available immediately,
and allow ZFS to find unallocated space and zero it out. This works with
concurrent writes to arbitrary offsets, ensuring that we don't zero out
something that has been (or is in the middle of being) written. This scheme
can also be applied to existing pools (affecting only free regions on the
vdev). Detailed design:
        - new subcommand:zpool initialize [-cs] <pool> [<vdev> ...]
                - start, suspend, or cancel initialization
        - Creates new open-context thread for each vdev
        - Thread iterates through all metaslabs in this vdev
        - Each metaslab:
                - select a metaslab
                - load the metaslab
                - mark the metaslab as being zeroed
                - walk all free ranges within that metaslab and translate
                  them to ranges on the leaf vdev
                - issue a "zeroing" I/O on the leaf vdev that corresponds to
                  a free range on the metaslab we're working on
                - continue until all free ranges for this metaslab have been
                  "zeroed"
                - reset/unmark the metaslab being zeroed
                - if more metaslabs exist, then repeat above tasks.
                - if no more metaslabs, then we're done.

        - progress for the initialization is stored on-disk in the vdev’s
          leaf zap object. The following information is stored:
                - the last offset that has been initialized
                - the state of the initialization process (i.e. active,
                  suspended, or canceled)
                - the start time for the initialization

        - progress is reported via the zpool status command and shows
          information for each of the vdevs that are initializing

Porting notes:
- Added zfs_initialize_value module parameter to set the pattern
  written by "zpool initialize".
- Added zfs_vdev_{initializing,removal}_{min,max}_active module options.

Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: loli10K <ezomori.nozomu@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Richard Lowe <richlowe@richlowe.net>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://www.illumos.org/issues/9102
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c3963210eb
Closes #8230
2019-01-07 10:37:26 -08:00
..
abd.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
aggsum.c OpenZFS 9688 - aggsum_fini leaks memory 2018-10-19 12:08:03 -07:00
arc.c OpenZFS 9284 - arc_reclaim_thread has 2 jobs 2018-12-26 13:22:28 -08:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c bpobj_enqueue_subobj() should copy small subobj's 2018-10-31 11:58:17 -05:00
bptree.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
cityhash.c OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
dataset_kstats.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
dbuf_stats.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
dbuf.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
ddt_zap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
ddt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_diff.c Fix issues found with zfs diff 2018-05-01 11:24:20 -07:00
dmu_object.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dmu_objset.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dmu_recv.c Refactor dmu_recv into its own file 2018-10-09 14:05:13 -07:00
dmu_send.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dmu_traverse.c Fix traverse_impl() kmem leak 2018-08-15 09:53:44 -07:00
dmu_tx.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
dmu_zfetch.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu.c Fix 'zpool remap' freeing race 2019-01-02 11:46:04 -08:00
dnode_sync.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dnode.c Fix dnode_hold() freeing dnode behavior 2018-12-05 09:29:33 -08:00
dsl_bookmark.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
dsl_crypt.c Replay logs before starting ztest workers 2018-11-07 15:40:24 -08:00
dsl_dataset.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dsl_deadlist.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_destroy.c Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dsl_dir.c zfs_ioc_unload_key can drop extra spa ref 2018-08-03 14:50:51 -07:00
dsl_pool.c Fix zfs_dirty_data_sync_percent documentation 2018-12-18 14:47:33 -08:00
dsl_prop.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_scan.c Ensure dsl scan prefetch queue is emptied 2018-12-06 09:47:23 -08:00
dsl_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_userhold.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
gzip.c Update build system and packaging 2018-05-29 16:00:33 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
metaslab.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
mmp.c Update build system and packaging 2018-05-29 16:00:33 -07:00
multilist.c Update build system and packaging 2018-05-29 16:00:33 -07:00
pathname.c Update build system and packaging 2018-05-29 16:00:33 -07:00
policy.c Take user namespaces into account in policy checks 2018-03-07 15:40:42 -08:00
qat_compress.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat_crypt.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
qat.h Resolve QAT issues with incompressible data 2018-03-29 17:40:34 -07:00
range_tree.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
refcount.c Add zfs_refcount_transfer_ownership_many() 2018-10-09 10:05:48 -07:00
rrwlock.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
sa.c Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
sha256.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_checkpoint.c Fix consistency of ztest_device_removal_active 2018-11-28 20:47:09 -08:00
spa_config.c OpenZFS 9591 - ms_shift can be incorrectly changed 2018-06-21 09:35:26 -07:00
spa_errlog.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_history.c Create /proc/sys/kernel/spl/gitrev with git hash 2018-10-08 21:57:02 -07:00
spa_misc.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
spa_stats.c Fixes for procfs files backed by linked lists 2018-09-26 11:08:12 -07:00
spa.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
space_map.c OpenZFS 9442 - decrease indirect block size of spacemaps 2018-07-25 14:11:35 -07:00
space_reftree.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
trace.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
txg.c Fix lock inversion in txg_sync_thread() 2018-10-24 14:37:02 -07:00
uberblock.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_disk.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_file.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_indirect_births.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_indirect_mapping.c OpenZFS 9847 - leaking dd_clones (DMU_OT_DSL_CLONES) objects (#7979) 2018-10-12 11:28:26 -07:00
vdev_indirect.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_initialize.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_label.c Fix coverity defects: CID 184285 2018-11-11 18:09:00 -08:00
vdev_mirror.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_missing.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_queue.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_raidz_math_aarch64_neon_common.h ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_aarch64_neon.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_aarch64_neonx2.c ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx512bw.c ABD: Adapt avx512bw raidz assembly 2016-12-15 17:31:33 -08:00
vdev_raidz_math_avx512f.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c ABD Vectorized raidz 2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_ssse3.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_raidz.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_removal.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev_root.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
vdev.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
zap_leaf.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zap_micro.c Fix zap_update() ASSERT from ztest 2018-12-14 10:04:11 -08:00
zap.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zcp_get.c Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
zcp_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zcp.c OpenZFS 9424 - ztest failure: "unprotected error in call to Lua API (Invalid value type 'function' for key 'error')" 2018-07-10 21:29:23 -07:00
zfeature.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zfs_acl.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_byteswap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ctldir.c Update zfs_admin_snapshot value (disabled) 2018-11-08 16:17:12 -08:00
zfs_debug.c Fix dbgmsg printing in ztest and zdb 2018-10-24 14:36:50 -07:00
zfs_dir.c Linux does not HAVE_DNLC 2018-10-17 10:30:08 -07:00
zfs_fm.c Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
zfs_fuid.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ioctl.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
zfs_log.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_onexit.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_rlock.c OpenZFS 9689 - zfs range lock code should not be zpl-specific 2018-10-11 10:19:33 -07:00
zfs_sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_sysfs.c OpenZFS 9102 - zfs should be able to initialize storage devices 2019-01-07 10:37:26 -08:00
zfs_vfsops.c Fix statfs(2) for 32-bit user space 2018-09-24 17:11:25 -07:00
zfs_vnops.c OpenZFS 9962 - zil_commit should omit cache thrash 2018-12-07 11:09:42 -08:00
zfs_znode.c Linux does not HAVE_SMB_SHARE 2018-10-17 10:31:38 -07:00
zil.c OpenZFS 9962 - zil_commit should omit cache thrash 2018-12-07 11:09:42 -08:00
zio_checksum.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_compress.c OpenZFS 9403 - assertion failed in arc_buf_destroy() 2018-08-29 11:33:33 -07:00
zio_crypt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_inject.c Add libzutil for libzfs or libzpool consumers 2018-11-05 11:22:33 -08:00
zio.c OpenZFS 9993 - zil writes can get delayed in zio pipeline 2018-12-07 11:05:35 -08:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zpl_ctldir.c RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zpl_export.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
zpl_file.c Direct IO support 2018-08-27 10:04:21 -07:00
zpl_inode.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zpl_super.c Fix statfs(2) for 32-bit user space 2018-09-24 17:11:25 -07:00
zpl_xattr.c Add missing checks to zpl_xattr_* functions 2018-08-02 14:03:56 -07:00
zrlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zthr.c OpenZFS 9284 - arc_reclaim_thread has 2 jobs 2018-12-26 13:22:28 -08:00
zvol.c OpenZFS 9962 - zil_commit should omit cache thrash 2018-12-07 11:09:42 -08:00