mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 02:27:36 +03:00

Files

T

Paul Dagnelie 492f64e941 OpenZFS 9112 - Improve allocation performance on high-end systems

Overview
========

We parallelize the allocation process by creating the concept of
"allocators". There are a certain number of allocators per metaslab
group, defined by the value of a tunable at pool open time.  Each
allocator for a given metaslab group has up to 2 active metaslabs; one
"primary", and one "secondary". The primary and secondary weight mean
the same thing they did in in the pre-allocator world; primary metaslabs
are used for most allocations, secondary metaslabs are used for ditto
blocks being allocated in the same metaslab group.  There is also the
CLAIM weight, which has been separated out from the other weights, but
that is less important to understanding the patch.  The active metaslabs
for each allocator are moved from their normal place in the metaslab
tree for the group to the back of the tree. This way, they will not be
selected for use by other allocators searching for new metaslabs unless
all the passive metaslabs are unsuitable for allocations.  If that does
happen, the allocators will "steal" from each other to ensure that IOs
don't fail until there is truly no space left to perform allocations.

In addition, the alloc queue for each metaslab group has been broken
into a separate queue for each allocator. We don't want to dramatically
increase the number of inflight IOs on low-end systems, because it can
significantly increase txg times. On the other hand, we want to ensure
that there are enough IOs for each allocator to allow for good
coalescing before sending the IOs to the disk.  As a result, we take a
compromise path; each allocator's alloc queue max depth starts at a
certain value for every txg. Every time an IO completes, we increase the
max depth. This should hopefully provide a good balance between the two
failure modes, while not dramatically increasing complexity.

We also parallelize the spa_alloc_tree and spa_alloc_lock, which cause
very similar contention when selecting IOs to allocate. This
parallelization uses the same allocator scheme as metaslab selection.

Performance Results
===================

Performance improvements from this change can vary significantly based
on the number of CPUs in the system, whether or not the system has a
NUMA architecture, the speed of the drives, the values for the various
tunables, and the workload being performed. For an fio async sequential
write workload on a 24 core NUMA system with 256 GB of RAM and 8 128 GB
SSDs, there is a roughly 25% performance improvement.

Future Work
===========

Analysis of the performance of the system with this patch applied shows
that a significant new bottleneck is the vdev disk queues, which also
need to be parallelized.  Prototyping of this change has occurred, and
there was a performance improvement, but more work needs to be done
before its stability has been verified and it is ready to be upstreamed.

Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Ported-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>

Porting Notes:
* Fix reservation test failures by increasing tolerance.

OpenZFS-issue: https://illumos.org/issues/9112
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3f3cc3c3
Closes #7682

2018-07-31 10:52:33 -07:00

crypto

OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

2016-10-03 14:51:15 -07:00

Update build system and packaging

2018-05-29 16:00:33 -07:00

Enforce PROP_ONETIME on zpool properties

2018-06-28 14:49:17 -07:00

lua

Fix coverity defects: zfs channel programs

2018-02-20 11:19:42 -08:00

sysevent

OpenZFS 8959 - Add notifications when a scrub is paused or resumed

2018-01-17 10:31:00 -08:00

abd.h

Update build system and packaging

2018-05-29 16:00:33 -07:00

aggsum.h

OpenZFS 8484 - Implement aggregate sum and use for arc counters

2018-06-06 09:35:59 -07:00

arc_impl.h

Add support for decryption faults in zinject

2018-05-02 15:36:20 -07:00

arc.h

OpenZFS 9465 - ARC check for 'anon_size > arc_c/2' can stall the system

2018-07-30 11:30:41 -07:00

avl_impl.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

avl.h

Remove dead code from AVL tree

2017-10-05 19:28:00 -07:00

blkptr.h

OpenZFS 8067 - zdb should be able to dump literal embedded block pointer

2017-07-07 11:28:01 -07:00

bplist.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

bpobj.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

bptree.h

Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t

2014-08-06 14:48:41 -07:00

bqueue.h

Illumos 5960, 5925

2016-01-08 15:08:19 -08:00

cityhash.h

OpenZFS 8484 - Implement aggregate sum and use for arc counters

2018-06-06 09:35:59 -07:00

dbuf.h

OpenZFS 9337 - zfs get all is slow due to uncached metadata

2018-07-12 10:49:27 -07:00

ddt.h

Incorrect maximum DVA value in DDE_GET_NDVAS()

2018-02-26 14:20:12 -08:00

dmu_impl.h

Fix race in dnode_check_slots_free()

2018-04-10 11:15:05 -07:00

dmu_objset.h

OpenZFS 9337 - zfs get all is slow due to uncached metadata

2018-07-12 10:49:27 -07:00

dmu_send.h

Raw receive should change key atomically

2018-02-21 12:31:03 -08:00

dmu_traverse.h

Native Encryption for ZFS on Linux

2017-08-14 10:36:48 -07:00

dmu_tx.h

Introduce kstat dmu_tx_dirty_frees_delay

2018-07-25 09:52:27 -07:00

dmu_zfetch.h

OpenZFS 6322 - ZFS indirect block predictive prefetch

2016-08-30 14:26:55 -07:00

dmu.h

OpenZFS 9442 - decrease indirect block size of spacemaps

2018-07-25 14:11:35 -07:00

dnode.h

OpenZFS 9337 - zfs get all is slow due to uncached metadata

2018-07-12 10:49:27 -07:00

dsl_bookmark.h

Illumos 4368, 4369.

2014-07-29 10:55:29 -07:00

dsl_crypt.h

Add support for decryption faults in zinject

2018-05-02 15:36:20 -07:00

dsl_dataset.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

dsl_deadlist.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

dsl_deleg.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

dsl_destroy.h

OpenZFS 7431 - ZFS Channel Programs

2018-02-08 15:28:18 -08:00

dsl_dir.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

dsl_pool.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

dsl_prop.h

Illumos 6171 - dsl_prop_unregister() slows down dataset eviction.

2016-01-12 10:53:12 -08:00

dsl_scan.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

dsl_synctask.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

dsl_userhold.h

Illumos #3740

2013-11-04 11:17:48 -08:00

edonr.h

OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

2016-10-03 14:51:15 -07:00

efi_partition.h

Fix spelling

2017-01-03 11:31:18 -06:00

frame.h

Suppress incorrect objtool warnings

2017-12-07 10:28:50 -08:00

hkdf.h

Encryption patch follow-up

2017-10-11 16:54:48 -04:00

Makefile.am

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

metaslab_impl.h

OpenZFS 9112 - Improve allocation performance on high-end systems

2018-07-31 10:52:33 -07:00

metaslab.h

OpenZFS 9112 - Improve allocation performance on high-end systems

2018-07-31 10:52:33 -07:00

mmp.h

Record skipped MMP writes in multihost_history

2018-03-06 15:15:15 -08:00

mntent.h

Make zfs mount according to relatime config in dataset

2016-04-05 18:55:59 -07:00

multilist.h

OpenZFS 7968 - multi-threaded spa_sync()

2017-03-20 18:36:00 -07:00

note.h

Update build system and packaging

2018-05-29 16:00:33 -07:00

nvpair_impl.h

OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations

2018-07-30 11:30:03 -07:00

nvpair.h

OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations

2018-07-30 11:30:03 -07:00

pathname.h

Add pn_alloc()/pn_free() functions

2016-04-21 09:49:25 -07:00

policy.h

Add zfs allow and zfs unallow support

2016-06-07 09:16:52 -07:00

range_tree.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

refcount.h

OpenZFS 8081 - Compiler warnings in zdb

2017-10-27 12:46:35 -07:00

rrwlock.h

Illumos 5008 - lock contention (rrw_exit) while running a read only load

2015-07-06 09:34:13 -07:00

sa_impl.h

Implement large_dnode pool feature

2016-06-24 13:13:21 -07:00

sa.h

Project Quota on ZFS

2018-02-13 14:54:54 -08:00

sdt.h

Add line info and SET_ERROR() to ZFS debug log

2017-07-25 23:09:48 -07:00

sha2.h

OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

2016-10-03 14:51:15 -07:00

skein.h

OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R

2016-10-03 14:51:15 -07:00

spa_boot.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

spa_checkpoint.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

spa_checksum.h

Implementation of AVX2 optimized Fletcher-4

2016-06-02 14:30:51 -07:00

spa_impl.h

OpenZFS 9112 - Improve allocation performance on high-end systems

2018-07-31 10:52:33 -07:00

spa.h

OpenZFS 9465 - ARC check for 'anon_size > arc_c/2' can stall the system

2018-07-30 11:30:41 -07:00

space_map.h

OpenZFS 9238 - ZFS Spacemap Encoding V2

2018-07-05 12:02:34 -07:00

space_reftree.h

Illumos #4101 , #4102 , #4103 , #4105 , #4106

2014-07-22 09:39:16 -07:00

sysevent.h

OpenZFS 6939 - add sysevents to zfs core for commands

2017-07-12 21:28:13 -07:00

trace_acl.h

Linux 4.16 compat: inode_set_iversion()

2018-02-08 21:25:19 -08:00

trace_arc.h

Support re-prioritizing asynchronous prefetches

2017-12-21 09:13:06 -08:00

trace_common.h

OpenZFS 6531 - Provide mechanism to artificially limit disk performance

2016-05-26 10:11:51 -07:00

trace_dbgmsg.h

Add line info and SET_ERROR() to ZFS debug log

2017-07-25 23:09:48 -07:00

trace_dbuf.h

Crash in dbuf_evict_one with DTRACE_PROBE

2017-08-09 11:04:41 -07:00

trace_dmu.h

tx_waited -> tx_dirty_delayed in trace_dmu.h

2018-01-31 16:13:26 -08:00

trace_dnode.h

Fix build-it compilation regression

2017-01-24 08:50:15 -08:00

trace_multilist.h

Fix build-it compilation regression

2017-01-24 08:50:15 -08:00

trace_txg.h

Fix build-it compilation regression

2017-01-24 08:50:15 -08:00

trace_vdev.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

trace_zil.h

OpenZFS 8585 - improve batching done in zil_commit()

2017-12-05 09:39:16 -08:00

trace_zio.h

Use cstyle -cpP in make cstyle check

2016-12-12 10:46:26 -08:00

trace_zrlock.h

Fix race in trace point in zrl_add_impl

2018-03-12 11:27:02 -07:00

trace.h

Remove duplicate typedefs from trace.h

2015-01-06 16:53:24 -08:00

txg_impl.h

OpenZFS 9464 - txg_kick() fails to see that we are quiescing

2018-06-04 14:56:06 -07:00

txg.h

OpenZFS 8063 - verify that we do not attempt to access inactive txg

2017-05-10 13:52:22 -04:00

u8_textprep_data.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

u8_textprep.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

uberblock_impl.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

uberblock.h

Multi-modifier protection (MMP)

2017-07-13 13:54:00 -04:00

uio_impl.h

Add basic uio support

2011-02-10 09:21:43 -08:00

unique.h

Illumos #3742

2013-11-04 10:55:25 -08:00

uuid.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

vdev_disk.h

Add support for autoexpand property

2018-07-23 15:40:15 -07:00

vdev_file.h

Use a dedicated taskq for vdev_file

2016-12-21 10:47:15 -08:00

vdev_impl.h

OpenZFS 9112 - Improve allocation performance on high-end systems

2018-07-31 10:52:33 -07:00

vdev_indirect_births.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

vdev_indirect_mapping.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

vdev_raidz_impl.h

Revert raidz_map and _col structure types

2018-01-09 14:46:52 -08:00

vdev_raidz.h

Use cstyle -cpP in make cstyle check

2016-12-12 10:46:26 -08:00

vdev_removal.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

vdev.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

xvattr.h

Linux 4.18 compat: inode timespec -> timespec64

2018-06-19 21:51:18 -07:00

zap_impl.h

OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space

2017-03-07 09:51:59 -08:00

zap_leaf.h

Fix ENOSPC in "Handle zap_add() failures in ..."

2018-04-18 14:19:50 -07:00

zap.h

OpenZFS 1300 - filename normalization doesn't work for removes

2017-02-02 14:13:41 -08:00

zcp_global.h

OpenZFS 7431 - ZFS Channel Programs

2018-02-08 15:28:18 -08:00

zcp_iter.h

OpenZFS 7431 - ZFS Channel Programs

2018-02-08 15:28:18 -08:00

zcp_prop.h

OpenZFS 7431 - ZFS Channel Programs

2018-02-08 15:28:18 -08:00

zcp.h

Add tunables for channel programs

2018-06-15 15:10:42 -07:00

zfeature.h

Revert "zhack: Add 'feature disable' command"

2016-05-17 11:52:07 -07:00

zfs_acl.h

Project Quota on ZFS

2018-02-13 14:54:54 -08:00

zfs_context.h

Linux 4.18 compat: inode timespec -> timespec64

2018-06-19 21:51:18 -07:00

zfs_ctldir.h

Rename zfs_sb_t -> zfsvfs_t

2017-03-10 09:51:33 -08:00

zfs_debug.h

OpenZFS 9236 - nuke spa_dbgmsg

2018-04-30 10:19:48 -07:00

zfs_delay.h

Update build system and packaging

2018-05-29 16:00:33 -07:00

zfs_dir.h

Rename zfs_sb_t -> zfsvfs_t

2017-03-10 09:51:33 -08:00

zfs_fuid.h

Update build system and packaging

2018-05-29 16:00:33 -07:00

zfs_ioctl.h

OpenZFS 9337 - zfs get all is slow due to uncached metadata

2018-07-12 10:49:27 -07:00

zfs_onexit.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

zfs_project.h

Project Quota on ZFS

2018-02-13 14:54:54 -08:00

zfs_ratelimit.h

Change checksum & IO delay ratelimit values

2018-03-04 17:34:51 -08:00

zfs_rlock.h

Rename zfs_sb_t -> zfsvfs_t

2017-03-10 09:51:33 -08:00

zfs_sa.h

Project Quota on ZFS

2018-02-13 14:54:54 -08:00

zfs_stat.h

Support custom build directories and move includes

2010-09-08 12:38:56 -07:00

zfs_vfsops.h

Fix zpl_mount() deadlock

2018-07-11 15:49:10 -07:00

zfs_vnops.h

RHEL 7.5 compat: FMODE_KABI_ITERATE

2018-05-02 15:01:24 -07:00

zfs_znode.h

Linux 4.18 compat: inode timespec -> timespec64

2018-06-19 21:51:18 -07:00

zil_impl.h

OpenZFS 8909 - 8585 can cause a use-after-free kernel panic

2017-12-28 10:18:04 -08:00

zil.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

zio_checksum.h

Remove dependency on linear ABD

2017-03-29 12:24:51 -07:00

zio_compress.h

DLPX-44812 integrate EP-220 large memory scalability

2016-11-29 14:34:27 -08:00

zio_crypt.h

Add support for decryption faults in zinject

2018-05-02 15:36:20 -07:00

zio_impl.h

Native Encryption for ZFS on Linux

2017-08-14 10:36:48 -07:00

zio_priority.h

OpenZFS 7614, 9064 - zfs device evacuation/removal

2018-04-14 12:16:17 -07:00

zio.h

OpenZFS 9112 - Improve allocation performance on high-end systems

2018-07-31 10:52:33 -07:00

zpl.h

Linux 4.18 compat: inode timespec -> timespec64

2018-06-19 21:51:18 -07:00

zrlock.h

OpenZFS 6328 - Fix cstyle errors in zfs codebase

2017-01-12 09:42:11 -08:00

zthr.h

OpenZFS 9166 - zfs storage pool checkpoint

2018-06-26 10:07:42 -07:00

zvol.h

Add port of FreeBSD 'volmode' property

2017-07-12 13:05:37 -07:00