Go to file
Serapheim Dimitropoulos 7bf4c97a36
Bypass metaslab throttle for removal allocations
Context:
We recently had a scenario where a customer with 2x10TB disks at 95+%
fragmentation and capacity, wanted to migrate their disks to a 2x20TB
setup. So they added the 2 new disks and submitted the removal of the
first 10TB disk.  The removal took a lot more than expected (order of
more than a week to 2 weeks vs a couple of days) and once it was done it
generated a huge indirect mappign table in RAM (~16GB vs expected ~1GB).

Root-Cause:
The removal code calls `metaslab_alloc_dva()` to allocate a new block
for each evacuating block in the removing device and it tries to batch
them into 16MB segments. If it can't find such a segment it tries for
8MBs, 4MBs, all the way down to 512 bytes.

In our scenario what would happen is that `metaslab_alloc_dva()` from
the removal thread pick the new devices initially but wouldn't allocate
from them because of throttling in their metaslab allocation queue's
depth (see `metaslab_group_allocatable()`) as these devices are new and
favored for most types of allocations because of their free space. So
then the removal thread would look at the old fragmented disk for
allocations and wouldn't find any contiguous space and finally retry
with a smaller allocation size until it would to the low KB range. This
caused a lot of small mappings to be generated blowing up the size of
the indirect table. It also wasted a lot of CPU while the removal was
active making everything slow.

This patch:
Make all allocations coming from the device removal thread bypass the
throttle checks. These allocations are not even counted in the metaslab
allocation queues anyway so why check them?

Side-Fix:
Allocations with METASLAB_DONT_THROTTLE in their flags would not be
accounted at the throttle queues but they'd still abide by the
throttling rules which seems wrong. This patch fixes this by checking
for that flag in `metaslab_group_allocatable()`. I did a quick check to
see where else this flag is used and it doesn't seem like this change
would cause issues.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14159
2022-12-09 10:48:33 -08:00
.github Retire Ubuntu 18.04 CI builder 2022-11-30 10:25:57 -08:00
cmd zdb: Handle theoretical buffer overflow when printing float 2022-12-08 14:15:15 -08:00
config autoconf: add support for openEuler 2022-12-02 17:39:48 -08:00
contrib Coverity Model Update 2022-11-29 10:00:56 -08:00
etc Correct multipathd.target to .service 2022-11-18 11:36:19 -08:00
include Fix the last two CFI callback prototype mismatches 2022-11-29 09:56:16 -08:00
lib Do not pass -1 to strerror() from zfs_send_cb_impl() 2022-12-08 14:15:27 -08:00
man zfs.4: Correct the default for zfs_vdev_async_write_max_active 2022-11-28 11:41:14 -08:00
module Bypass metaslab throttle for removal allocations 2022-12-09 10:48:33 -08:00
rpm rpm: add support for openEuler 2022-11-29 09:27:22 -08:00
scripts Ubuntu 22.04 integration: ShellCheck 2022-11-18 11:24:48 -08:00
tests ZTS: Add missing tests to Makefile.am 2022-12-07 17:26:33 -08:00
udev Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
.editorconfig Add an .editorconfig; document git whitespace settings 2020-01-27 13:32:52 -08:00
.gitignore autoconf: use include directives instead of recursing down cmd 2022-05-10 10:18:38 -07:00
.gitmodules .gitmodules: link to openzfs github repository 2021-04-12 09:37:23 -07:00
AUTHORS zfs_rename: support RENAME_* flags 2022-10-28 09:49:20 -07:00
autogen.sh Ubuntu 22.04 integration: ShellCheck 2022-11-18 11:24:48 -08:00
CODE_OF_CONDUCT.md Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
configure.ac Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
copy-builtin copy-builtin: add hooks with sed/>> 2022-05-10 10:17:43 -07:00
COPYRIGHT Fix typos 2020-06-09 21:24:09 -07:00
LICENSE Update build system and packaging 2018-05-29 16:00:33 -07:00
Makefile.am Process script directory for all configs 2022-10-27 16:45:14 -07:00
META Linux 6.0 compat: META 2022-10-26 14:55:12 -07:00
NEWS Fix NEWS file 2020-08-26 21:44:41 -07:00
NOTICE Update build system and packaging 2018-05-29 16:00:33 -07:00
README.md README: Update OpenZFS website url 2022-01-06 16:25:01 -08:00
RELEASES.md Add RELEASES.md file 2021-04-02 16:33:40 -07:00
TEST Remove CI builder customization from TEST 2020-03-16 10:46:03 -07:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

img

OpenZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the OpenZFS community. This repository contains the code for running OpenZFS on Linux and FreeBSD.

codecov coverity

Official Resources

Installation

Full documentation for installing OpenZFS on your favorite operating system can be found at the Getting Started Page.

Contribute & Develop

We have a separate document with contribution guidelines.

We have a Code of Conduct.

Release

OpenZFS is released under a CDDL license. For more details see the NOTICE, LICENSE and COPYRIGHT files; UCRL-CODE-235197

Supported Kernels

  • The META file contains the officially recognized supported Linux kernel versions.
  • Supported FreeBSD versions are any supported branches and releases starting from 12.2-RELEASE.