mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 02:27:36 +03:00

Files

T

Serapheim Dimitropoulos 37d5a3e04b Stop ganging due to past vdev write errors

= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
	/*
	 * Avoid writing single-copy data to a failing,
	 * non-redundant vdev, unless we've already tried all
	 * other vdevs.
	 */
	if ((vd->vdev_stat.vs_write_errors > 0 ||
	    vd->vdev_state < VDEV_STATE_HEALTHY) &&
	    d == 0 && !try_hard && vd->vdev_children == 0) {
		metaslab_trace_add(zal, mg, NULL, psize, d,
		    TRACE_VDEV_ERROR, allocator);
		goto next;
	}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

	NAME        STATE     READ WRITE CKSUM
	volpool     ONLINE       0     0     0
	  xvdb      ONLINE       0     1     0
	  xvdc      ONLINE       0     1     0
	  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003

2022-11-01 12:36:25 -07:00

abd.c

Avoid small buffer copying on write

2022-07-26 10:10:37 -07:00

aggsum.c

More aggsum optimizations

2021-06-09 13:05:34 -07:00

arc.c

Add Module Parameter Regarding Log Size Limit

2022-09-21 16:12:14 -07:00

blkptr.c

Add zstd support to zfs

2020-08-20 10:30:06 -07:00

bplist.c

Fast Clone Deletion

2019-07-26 10:54:14 -07:00

bpobj.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

bptree.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

bqueue.c

zfs recv hangs if max recordsize is less than received recordsize

2022-09-19 09:39:07 -07:00

btree.c

Add zfs_btree_verify_intensity kernel module parameter

2022-09-21 13:15:51 -07:00

dataset_kstats.c

Introduce write-mostly sums

2021-06-09 13:05:34 -07:00

dbuf_stats.c

Revert "Reduce dbuf_find() lock contention"

2022-09-21 13:15:51 -07:00

dbuf.c

Revert "Reduce dbuf_find() lock contention"

2022-09-21 13:15:51 -07:00

ddt_zap.c

Refactor dnode dirty context from dbuf_dirty

2020-02-26 16:09:17 -08:00

ddt.c

Tinker with slop space accounting with dedup

2021-09-14 12:38:05 -07:00

dmu_diff.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dmu_object.c

Introduce CPU_SEQID_UNSTABLE

2020-11-02 11:51:12 -08:00

dmu_objset.c

Add options to zfs redundant_metadata property

2022-11-01 12:25:58 -07:00

dmu_recv.c

Receive checks should allow unencrypted child datasets

2022-02-16 17:58:55 -08:00

dmu_redact.c

Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c

2022-09-21 13:15:51 -07:00

dmu_send.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dmu_traverse.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dmu_tx.c

Refactor Log Size Limit

2022-09-26 14:55:27 -07:00

dmu_zfetch.c

More speculative prefetcher improvements

2022-07-26 10:10:37 -07:00

dmu.c

Bring per_txg_dirty_frees_percent back to 30

2022-11-01 12:32:40 -07:00

dnode_sync.c

Report dnodes with faulty bonuslen

2022-02-16 17:58:55 -08:00

dnode.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dsl_bookmark.c

Fix -Wattribute-warning in dsl layer

2022-07-27 13:38:56 -07:00

dsl_crypt.c

Introduce a flag to skip comparing the local mac when raw sending

2022-02-04 16:14:56 -08:00

dsl_dataset.c

Remove unneeded "extern inline" function declarations

2022-02-16 17:58:56 -08:00

dsl_deadlist.c

Fix panic in dsl_process_sub_livelist for EINTR

2022-11-01 12:34:08 -07:00

dsl_deleg.c

Reduce loaded range tree memory usage

2019-10-09 10:36:03 -07:00

dsl_destroy.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dsl_dir.c

Fix ENOSPC when unlinking multiple files from full pool

2022-03-08 11:46:03 -08:00

dsl_pool.c

Refactor Log Size Limit

2022-09-26 14:55:27 -07:00

dsl_prop.c

Add options to zfs redundant_metadata property

2022-11-01 12:25:58 -07:00

dsl_scan.c

Fix scrub resume from newly created hole.

2022-07-26 10:10:37 -07:00

dsl_synctask.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

dsl_userhold.c

Replace sprintf()->snprintf() and strcpy()->strlcpy()

2020-06-07 11:42:12 -07:00

edonr_zfs.c

Add include files for prototypes

2020-06-18 12:21:25 -07:00

fm.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

gzip.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

hkdf.c

…

lz4.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

lzjb.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

Makefile.in

Distributed Spare (dRAID) Feature

2020-11-13 13:51:51 -08:00

metaslab.c

Stop ganging due to past vdev write errors

2022-11-01 12:36:25 -07:00

mmp.c

Optimize small random numbers generation

2021-09-14 12:10:17 -07:00

multilist.c

Optimize small random numbers generation

2021-09-14 12:10:17 -07:00

objlist.c

Implement Redacted Send/Receive

2019-06-19 09:48:12 -07:00

pathname.c

Replace ZFS on Linux references with OpenZFS

2020-10-08 20:10:13 -07:00

range_tree.c

Several sorted scrub optimizations

2022-07-26 10:10:37 -07:00

refcount.c

Export minimal zfs_refcount interfaces

2022-04-06 10:29:00 -07:00

rrwlock.c

Rename refcount.h to zfs_refcount.h

2020-07-29 16:35:33 -07:00

sa.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

sha256.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

skein_zfs.c

Add include files for prototypes

2020-06-18 12:21:25 -07:00

spa_boot.c

Add include files for prototypes

2020-06-18 12:21:25 -07:00

spa_checkpoint.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

spa_config.c

Cleaning up uio headers

2021-02-20 20:16:50 -08:00

spa_errlog.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

spa_history.c

Annotated dprintf as printf-like

2021-06-24 13:12:36 -07:00

spa_log_spacemap.c

Improve log spacemap load time

2022-07-26 10:10:37 -07:00

spa_misc.c

Remove refcount from spa_config_*()

2022-07-26 10:10:37 -07:00

spa_stats.c

Remove pool io kstats

2021-06-10 10:50:16 -07:00

spa.c

Improve log spacemap load time

2022-07-26 10:10:37 -07:00

space_map.c

Optimize small random numbers generation

2021-09-14 12:10:17 -07:00

space_reftree.c

Reduce loaded range tree memory usage

2019-10-09 10:36:03 -07:00

THIRDPARTYLICENSE.cityhash

OpenZFS 8484 - Implement aggregate sum and use for arc counters

2018-06-06 09:35:59 -07:00

THIRDPARTYLICENSE.cityhash.descrip

OpenZFS 8484 - Implement aggregate sum and use for arc counters

2018-06-06 09:35:59 -07:00

txg.c

Optimize txg_kick() process (#12274 )

2022-09-21 16:12:14 -07:00

uberblock.c

MMP interval and fail_intervals in uberblock

2019-03-21 12:47:57 -07:00

unique.c

Reduce loaded range tree memory usage

2019-10-09 10:36:03 -07:00

vdev_cache.c

Replace ASSERTV macro with compiler annotation

2019-12-05 12:37:00 -08:00

vdev_draid_rand.c

Distributed Spare (dRAID) Feature

2020-11-13 13:51:51 -08:00

vdev_draid.c

Improve too large physical ashift handling

2022-09-21 13:15:15 -07:00

vdev_indirect_births.c

Fixes: #8934 Large kmem_alloc

2019-07-10 15:54:49 -07:00

vdev_indirect_mapping.c

Replace ASSERTV macro with compiler annotation

2019-12-05 12:37:00 -08:00

vdev_indirect.c

module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable

2022-05-02 15:42:58 -07:00

vdev_initialize.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

vdev_label.c

Use fallthrough macro

2021-11-02 09:50:30 -07:00

vdev_mirror.c

Improve too large physical ashift handling

2022-09-21 13:15:15 -07:00

vdev_missing.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

vdev_queue.c

Avoid vq_lock drop in vdev_queue_aggregate()

2021-09-14 14:31:22 -07:00

vdev_raidz_math_aarch64_neon_common.h

FreeBSD: fix the build with Clang 11

2020-08-17 15:40:17 -07:00

vdev_raidz_math_aarch64_neon.c

Linux 5.0 compat: SIMD compatibility

2019-07-12 09:31:20 -07:00

vdev_raidz_math_aarch64_neonx2.c

Linux 5.0 compat: SIMD compatibility

2019-07-12 09:31:20 -07:00

vdev_raidz_math_avx2.c

FreeBSD: fix the build with Clang 11

2020-08-17 15:40:17 -07:00

vdev_raidz_math_avx512bw.c

Refactor ccompile.h to not include system headers

2020-07-25 20:09:50 -07:00

vdev_raidz_math_avx512f.c

FreeBSD: fix the build with Clang 11

2020-08-17 15:40:17 -07:00

vdev_raidz_math_impl.h

Distributed Spare (dRAID) Feature

2020-11-13 13:51:51 -08:00

vdev_raidz_math_powerpc_altivec_common.h

FreeBSD: fix the build with Clang 11

2020-08-17 15:40:17 -07:00

vdev_raidz_math_powerpc_altivec.c

Prefix zfs internal endian checks with _ZFS

2020-07-28 13:02:49 -07:00

vdev_raidz_math_scalar.c

Use fallthrough macro

2021-11-02 09:50:30 -07:00

vdev_raidz_math_sse2.c

FreeBSD: fix the build with Clang 11

2020-08-17 15:40:17 -07:00

vdev_raidz_math_ssse3.c

Refactor ccompile.h to not include system headers

2020-07-25 20:09:50 -07:00

vdev_raidz_math.c

Initialize parity blocks before RAID-Z reconstruction benchmarking

2021-09-14 14:32:16 -07:00

vdev_raidz.c

Improve too large physical ashift handling

2022-09-21 13:15:15 -07:00

vdev_rebuild.c

Fix sequential resilver drive failure race condition

2022-10-21 14:05:06 -07:00

vdev_removal.c

Improve log spacemap load time

2022-07-26 10:10:37 -07:00

vdev_root.c

Distributed Spare (dRAID) Feature

2020-11-13 13:51:51 -08:00

vdev_trim.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

vdev.c

Improve too large physical ashift handling

2022-09-21 13:15:15 -07:00

zap_leaf.c

Remove unneeded "extern inline" function declarations

2022-02-16 17:58:56 -08:00

zap_micro.c

Remove unneeded "extern inline" function declarations

2022-02-16 17:58:56 -08:00

zap.c

Remove unneeded "extern inline" function declarations

2022-02-16 17:58:56 -08:00

zcp_get.c

Add include files for prototypes

2020-06-18 12:21:25 -07:00

zcp_global.c

OpenZFS 8600 - ZFS channel programs - snapshot

2018-02-08 15:29:24 -08:00

zcp_iter.c

Fix typos in module/zfs/

2019-09-02 17:56:41 -07:00

zcp_set.c

Support setting user properties in a channel program

2020-02-14 13:41:42 -08:00

zcp_synctask.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

zcp.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

zfeature.c

Throw const on some strings

2020-10-02 17:44:10 -07:00

zfs_byteswap.c

Mark functions as static

2020-06-18 12:20:38 -07:00

zfs_fm.c

fm: remove unused variables

2022-05-02 15:42:58 -07:00

zfs_fuid.c

Fix regression in POSIX mode behavior

2021-03-19 22:50:46 -07:00

zfs_ioctl.c

Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive

2022-09-21 13:15:26 -07:00

zfs_log.c

Add Module Parameter Regarding Log Size Limit

2022-09-21 16:12:14 -07:00

zfs_onexit.c

file reference counts can get corrupted

2021-09-14 12:37:38 -07:00

zfs_quota.c

File incorrectly zeroed when receiving incremental stream that toggles -L

2020-06-09 10:41:01 -07:00

zfs_ratelimit.c

Change checksum & IO delay ratelimit values

2018-03-04 17:34:51 -08:00

zfs_replay.c

Use fallthrough macro

2021-11-02 09:50:30 -07:00

zfs_rlock.c

Add a "try" operation for range locks

2020-07-06 11:53:31 -07:00

zfs_sa.c

Extending FreeBSD UIO Struct

2021-01-20 21:27:30 -08:00

zfs_vnops.c

Revert behavior of 59eab109 on not-Linux

2022-08-02 10:05:14 -07:00

zil.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

zio_checksum.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

zio_compress.c

module: zfs: fix unused, remove argsused

2022-02-16 17:58:56 -08:00

zio_inject.c

Optimize small random numbers generation

2021-09-14 12:10:17 -07:00

zio.c

Fix scrub resume from newly created hole.

2022-07-26 10:10:37 -07:00

zle.c

Add include files for prototypes

2020-06-18 12:21:25 -07:00

zrlock.c

Remove dead code

2020-06-18 12:21:18 -07:00

zthr.c

Avoid memory allocations in the ARC eviction thread

2022-02-03 15:30:52 -08:00

zvol.c

Add Module Parameter Regarding Log Size Limit

2022-09-21 16:12:14 -07:00