mirror_zfs/module/zfs
Serapheim Dimitropoulos 6b64382b17 OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations
= Motivation

While dealing with another performance issue (see 126118f) we noticed
that we spend a lot of time in various places in the kernel when
constructing long nvlists. The problem is that when an nvlist is created
with the NV_UNIQUE_NAME set (which is the case most of the time), we do
a linear search through the whole list to ensure uniqueness for every
entry we add.

An example of the above scenario can be seen in the following
flamegraph, where more than have the time of the zfsdev_ioctl() is spent
on constructing nvlists.  Flamegraph:
https://sdimitro.github.io/img/flame/sdimitro_snap_unmount3.svg

Adding a table to speed up lookups will help situations where we just
construct an nvlist (like the scenario above), in addition to regular
lookups and removals.

= What this patch does

In this diff we've implemented a hash-table on top of the nvlist code
that converts most nvlist operations from O(# number of entries) to
O(1)* (the start is for amortized time as the hash-table grows and
shrinks depending on the # of entries - plain lookup is strictly O(1)).

= Performance Analysis

To analyze the performance improvement I just used the setup from the
snapshot deletion issue mentioned above in the Motivation section.
Basically I created 10K filesystems with one snapshot each and then I
just used the API of libZFS_Core to pass down an nvlist of all the
snapshots to have them deleted. The reason I used my own driver program
was to have clean performance results of what actually happens in the
kernel. The flamegraphs and wall clock times mentioned below were
gathered from the start to the end of the driver program's run. Between
trials the testpool used was completely destroyed, the system was
rebooted and the testpool was completely recreated. The reason for this
dance was to get consistent results.

== Results (before patch):

=== Sampling Flamegraphs

[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3.svg

=== Wall clock times (in seconds)

```
[Trial 4]
real        5.3
user        0.4
sys         2.3

[Trial 5]
real        8.2
user        0.4
sys         2.4

[Trial 6]
real        6.0
user        0.5
sys         2.3
```

== Results (after patch):

=== Sampling Flamegraphs

[Trial 1] https://sdimitro.github.io/img/flame/DLPX-53417/trial-Ae.svg
[Trial 2] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A2e.svg
[Trial 3] https://sdimitro.github.io/img/flame/DLPX-53417/trial-A3e.svg

=== Wall clock times (in seconds)

```
[Trial 4]
real        4.9
user        0.0
sys         0.9

[Trial 5]
real        3.8
user        0.0
sys         0.9

[Trial 6]
real        3.6
user        0.0
sys         0.9
```

== Analysis

The results between the trials are consistent so in this sections I will
only talk about the flamegraph results from trial-1 and the wall-clock
results from trial-4.

From trial-1 we can see that zfs_dev_ioctl() goes from 2,331 to 996
samples counts.  Specifically, the samples from fnvlist_add_nvlist() and
spa_history_log_nvl() are almost gone (~500 & ~800 to 5 & 5 samples),
leaving zfs_ioc_destroy_snaps() to dominate most samples from
zfs_dev_ioctl().

From trial-4 we see that the user time dropped to 0 secods. I believe
the consistent 0.4 seconds before my patch was applied was due to my
driver program constructing the long nvlist of snapshots so it can pass
it to the kernel. As for the system time, the effect there is more clear
(2.3 down to 0.9 seconds).

Porting Notes:
* DATA_TYPE_DONTCARE case added to switch in fm_nvprintr() and
  zpool_do_events_nvprint().

Authored by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9580
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/b5eca7b1
Closes #7748
2018-07-30 11:30:03 -07:00
..
abd.c Update build system and packaging 2018-05-29 16:00:33 -07:00
aggsum.c Fix preemptible warning in aggsum_add() 2018-06-07 15:55:11 -07:00
arc.c Refactor arc_hdr_realloc_crypt() 2018-07-24 12:20:04 -07:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
cityhash.c OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
dbuf_stats.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dbuf.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
ddt_zap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
ddt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_diff.c Fix issues found with zfs diff 2018-05-01 11:24:20 -07:00
dmu_object.c OpenZFS 9438 - Holes can lose birth time info if a block has a mix of birth times 2018-07-30 09:27:49 -07:00
dmu_objset.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
dmu_send.c Fix 'zfs recv' of non large_dnode send streams 2018-06-28 14:55:11 -07:00
dmu_traverse.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dmu_tx.c Introduce kstat dmu_tx_dirty_frees_delay 2018-07-25 09:52:27 -07:00
dmu_zfetch.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu.c Introduce kstat dmu_tx_dirty_frees_delay 2018-07-25 09:52:27 -07:00
dnode_sync.c OpenZFS 9439 - ZFS double-free due to failure to dirty indirect block 2018-07-30 09:28:09 -07:00
dnode.c OpenZFS 9439 - ZFS double-free due to failure to dirty indirect block 2018-07-30 09:28:09 -07:00
dsl_bookmark.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
dsl_crypt.c Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
dsl_dataset.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_deadlist.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_destroy.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_dir.c OpenZFS 9330 - stack overflow when creating a deeply nested dataset 2018-07-09 13:02:50 -07:00
dsl_pool.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_prop.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_scan.c dsl_scan_scrub_cb: don't double-account non-embedded blocks 2018-07-24 09:33:56 -07:00
dsl_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_userhold.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
gzip.c Update build system and packaging 2018-05-29 16:00:33 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
metaslab.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
mmp.c Update build system and packaging 2018-05-29 16:00:33 -07:00
multilist.c Update build system and packaging 2018-05-29 16:00:33 -07:00
pathname.c Update build system and packaging 2018-05-29 16:00:33 -07:00
policy.c Take user namespaces into account in policy checks 2018-03-07 15:40:42 -08:00
qat_compress.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat_crypt.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
qat.h Resolve QAT issues with incompressible data 2018-03-29 17:40:34 -07:00
range_tree.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
refcount.c Linux 4.11 compat: avoid refcount_t name conflict 2017-02-28 16:10:18 -08:00
rrwlock.c Fix spelling 2017-01-03 11:31:18 -06:00
sa.c Fix kernel unaligned access on sparc64 2018-07-11 13:10:40 -07:00
sha256.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_checkpoint.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
spa_config.c OpenZFS 9591 - ms_shift can be incorrectly changed 2018-06-21 09:35:26 -07:00
spa_errlog.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_history.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_misc.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
spa_stats.c Add pool state /proc entry, "SUSPENDED" pools 2018-06-06 09:33:54 -07:00
spa.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
space_map.c OpenZFS 9442 - decrease indirect block size of spacemaps 2018-07-25 14:11:35 -07:00
space_reftree.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
trace.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
txg.c OpenZFS 9464 - txg_kick() fails to see that we are quiescing 2018-06-04 14:56:06 -07:00
uberblock.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_disk.c Add support for autoexpand property 2018-07-23 15:40:15 -07:00
vdev_file.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_births.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_indirect_mapping.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
vdev_indirect.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
vdev_label.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
vdev_mirror.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_missing.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_queue.c Fix zio->io_priority failed (7 < 6) assert 2018-05-29 18:13:48 -07:00
vdev_raidz_math_aarch64_neon_common.h ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_aarch64_neon.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_aarch64_neonx2.c ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx512bw.c ABD: Adapt avx512bw raidz assembly 2016-12-15 17:31:33 -08:00
vdev_raidz_math_avx512f.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c ABD Vectorized raidz 2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_ssse3.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_raidz.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_removal.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
vdev_root.c OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
vdev.c Add support for autoexpand property 2018-07-23 15:40:15 -07:00
zap_leaf.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zap_micro.c OpenZFS 9329 - panic in zap_leaf_lookup() due to concurrent zapification 2018-05-31 10:53:49 -07:00
zap.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zcp_get.c Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
zcp_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zcp.c OpenZFS 9424 - ztest failure: "unprotected error in call to Lua API (Invalid value type 'function' for key 'error')" 2018-07-10 21:29:23 -07:00
zfeature.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zfs_acl.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_byteswap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ctldir.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zfs_debug.c enable zfs_dbgmsg() by default, without dprintf() 2018-03-21 15:37:32 -07:00
zfs_dir.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fm.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fuid.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ioctl.c OpenZFS 8906 - uts: illumos rootfs should support salted cksum 2018-07-27 08:35:28 -07:00
zfs_log.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_onexit.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_rlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_vfsops.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
zfs_vnops.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zfs_znode.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zil.c OpenZFS 9456 - ztest failure in zil_commit_waiter_timeout 2018-07-10 10:25:14 -07:00
zio_checksum.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_compress.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_crypt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_inject.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zpl_ctldir.c RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zpl_export.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
zpl_file.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zpl_inode.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zpl_super.c Fix zpl_mount() deadlock 2018-07-11 15:49:10 -07:00
zpl_xattr.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zrlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zthr.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zvol.c Linux compat 4.18: check_disk_size_change() 2018-06-15 15:05:21 -07:00