Go to file
Serapheim Dimitropoulos 4dcc2bde9c
Stop ganging due to past vdev write errors
= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
	/*
	 * Avoid writing single-copy data to a failing,
	 * non-redundant vdev, unless we've already tried all
	 * other vdevs.
	 */
	if ((vd->vdev_stat.vs_write_errors > 0 ||
	    vd->vdev_state < VDEV_STATE_HEALTHY) &&
	    d == 0 && !try_hard && vd->vdev_children == 0) {
		metaslab_trace_add(zal, mg, NULL, psize, d,
		    TRACE_VDEV_ERROR, allocator);
		goto next;
	}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

	NAME        STATE     READ WRITE CKSUM
	volpool     ONLINE       0     0     0
	  xvdb      ONLINE       0     1     0
	  xvdc      ONLINE       0     1     0
	  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003
2022-10-11 12:27:41 -07:00
.github CI: revert --with-config=dist to hotfix Ubuntu 20.04 2022-09-14 16:26:57 -07:00
cmd zvol_wait logic may terminate prematurely 2022-10-11 12:12:04 -07:00
config zed: mark disks as REMOVED when they are removed 2022-09-28 09:48:46 -07:00
contrib PAM: Fix unchecked return value from zfs_key_config_load() 2022-10-05 17:09:24 -07:00
etc Update zfs-mount to load before fstab, matches systemd service. 2022-09-26 17:11:43 -07:00
include Expose libzutil error info in libpc_handle_t 2022-10-04 09:54:35 -07:00
lib Handle possible null pointers from malloc/strdup/strndup() 2022-10-06 17:18:40 -07:00
man Bring per_txg_dirty_frees_percent back to 30 2022-09-27 17:38:03 -07:00
module Stop ganging due to past vdev write errors 2022-10-11 12:27:41 -07:00
rpm Add zilstat script to report zil kstats in a user friendly manner 2022-09-02 13:24:07 -07:00
scripts Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
tests Handle possible null pointers from malloc/strdup/strndup() 2022-10-06 17:18:40 -07:00
udev Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
.editorconfig Add an .editorconfig; document git whitespace settings 2020-01-27 13:32:52 -08:00
.gitignore autoconf: use include directives instead of recursing down cmd 2022-05-10 10:18:38 -07:00
.gitmodules .gitmodules: link to openzfs github repository 2021-04-12 09:37:23 -07:00
AUTHORS Introduce BLAKE3 checksums as an OpenZFS feature 2022-06-08 15:55:57 -07:00
autogen.sh autogen.sh: paper over automake <1.14's lack of %reldir% support 2022-05-10 10:20:46 -07:00
CODE_OF_CONDUCT.md Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
configure.ac Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
copy-builtin copy-builtin: add hooks with sed/>> 2022-05-10 10:17:43 -07:00
COPYRIGHT Fix typos 2020-06-09 21:24:09 -07:00
LICENSE Update build system and packaging 2018-05-29 16:00:33 -07:00
Makefile.am Replace EXTRA_DIST with dist_noinst_DATA 2022-05-26 09:24:50 -07:00
META Linux 5.19 compat: META 2022-08-02 10:04:38 -07:00
NEWS Fix NEWS file 2020-08-26 21:44:41 -07:00
NOTICE Update build system and packaging 2018-05-29 16:00:33 -07:00
README.md README: Update OpenZFS website url 2022-01-06 16:25:01 -08:00
RELEASES.md Add RELEASES.md file 2021-04-02 16:33:40 -07:00
TEST Remove CI builder customization from TEST 2020-03-16 10:46:03 -07:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

img

OpenZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the OpenZFS community. This repository contains the code for running OpenZFS on Linux and FreeBSD.

codecov coverity

Official Resources

Installation

Full documentation for installing OpenZFS on your favorite operating system can be found at the Getting Started Page.

Contribute & Develop

We have a separate document with contribution guidelines.

We have a Code of Conduct.

Release

OpenZFS is released under a CDDL license. For more details see the NOTICE, LICENSE and COPYRIGHT files; UCRL-CODE-235197

Supported Kernels

  • The META file contains the officially recognized supported Linux kernel versions.
  • Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.