Compare commits

..

86 Commits

Author SHA1 Message Date
Tony Hutter 2566592045 Tag zfs-2.2.4
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-30 10:01:15 -07:00
Alan Somers 3d4d61988a Fix updating the zvol_htable when renaming a zvol
When renaming a zvol, insert it into zvol_htable using the new name, not
the old name.  Otherwise some operations won't work.  For example,
"zfs set volsize" while the zvol is open.

Sponsored by:	Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Signed-off-by:	Alan Somers <asomers@FreeBSD.org>
Closes #16127
Closes #16128
2024-04-30 10:01:15 -07:00
Brian Behlendorf 61f3638a34 Add prefetch property
ZFS prefetch is currently governed by the zfs_prefetch_disable
tunable. However, this is a module-wide settings - if a specific
dataset benefits from prefetch, while others have issue with it,
an optimal solution does not exists.

This commit introduce the "prefetch" tri-state property, which enable
granular control (at dataset/volume level) for prefetching.

This patch does not remove the zfs_prefetch_disable, which remains
a system-wide switch for enable/disable prefetch. However, to avoid
duplication, it would be preferable to deprecate and then remove
the module tunable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Co-authored-by: Gionatan Danti <g.danti@assyoma.it>
Closes #15237 
Closes #15436
2024-04-30 10:01:15 -07:00
Don Brady 706307445e vdev probe to slow disk can stall mmp write checker
Simplify vdev probes in the zio_vdev_io_done context to
avoid holding the spa config lock for a long duration.

Also allow zpool clear if no evidence of another host
is using the pool.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15839
2024-04-30 10:01:15 -07:00
Don Brady ea3f7c12a9 Extend import_progress kstat with a notes field
Detail the import progress of log spacemaps as they can take a very
long time.  Also grab the spa_note() messages to, as they provide
insight into what is happening

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Closes #15539
2024-04-29 17:45:53 -07:00
George Wilson 6f323353d2 Add ashift validation when adding devices to a pool
Currently, zpool add allows users to add top-level vdevs that have
different ashifts but doing so prevents users from being able to
perform a top-level vdev removal. Often times consumers may not realize
that they have mismatched ashifts until the top-level removal fails.

This feature adds ashift validation to the zpool add command and will
fail the operation if the sector size of the specified vdev does not
match the existing pool. This behavior can be disabled by using the -f
flag. In addition, new flags have been added to provide fine-grained
control to disable specific checks. These flags
are:

--allow-in-use
--allow-ashift-mismatch
--allow-replicaton-mismatch

The force flag will disable all of these checks.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mark Maybee <mmaybee@delphix.com>
Signed-off-by: George Wilson <gwilson@delphix.com>
Closes #15509
2024-04-29 13:50:05 -07:00
Ameer Hamza b3b37b84e8 Fix arcstats for FreeBSD after zfetch support
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16141
2024-04-29 13:50:05 -07:00
Ameer Hamza 4d17e200dd Add zfetch stats in arcstats
arc_summary also reports zfetch stats but it's inconvenient to monitor
contiguously incrementing numbers. Adding them in arcstats allows us to
observe streams more conveniently.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #16094
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5972bb856c Use ASSERT0P() to check that a pointer is NULL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Tony Hutter ef3fea63eb GCC: Fixes for gcc 14 on Fedora 40
- Workaround dangling pointer in uu_list.c (#16124)
- Fix calloc() transposed arguments in zpool_vdev_os.c
- Make some temp variables unsigned to prevent triggering a
  '-Werror=alloc-size-larger-than' error.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #16124
Closes #16125
2024-04-29 13:50:05 -07:00
Brian Behlendorf 71216b91d2 Python 3.12 deprecated python3-distutils
As for python-3.12 the distutils package has been deprecated.
The latest ax_python_devel.m4 macro from the autoconf archive
has been updated accordingly so let's pull in the new version.

We can also drop the changes made to our customized version
to continue if the development version is not installed since
this functionality has been included upstream.

Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #16126
Closes #16129
2024-04-29 13:50:05 -07:00
Todd 284489893b zfs-kmod: fix empty rpm requires/conflicts
Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to
include the correct RPM requires/conflicts relationships.  With this
change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms
packages to be installed together.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com>
Closes #16121
2024-04-29 13:50:05 -07:00
Seth Troisi 6581b17842 ZTS: user_namespace_004.ksh avoid error in cleanup if unsupported
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16114
2024-04-29 13:50:05 -07:00
Seth Troisi 51d3c23150 Add newline to two zpool messages
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Seth Troisi <sethtroisi@google.com>
Closes #16113
2024-04-29 13:50:05 -07:00
Tino Reichardt 16c223eec9 Do no use .cfi_negate_ra_state within the assembly on Arm64
Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently.
See issue #14965 for deeper context.

On platforms without pointer authentication, .cfi_negate_ra_state can be
defined to a no-op:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413

I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8.

Reviewed-by: Andrew Turner <andrew.turner4@arm.com>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Closes #14965
Closes #15784
2024-04-29 13:50:05 -07:00
Andrew Turner 7aaf6ce9d8 Add the BTI elf note to the AArch64 SHA2 assembly
On ELF platforms there is a note to specify when an application or
library supports BTI. When linking one of these the linker needs
all input object files to have the note. If not it will not include
it in the output file.

Normally the compiler would generate it, but for assembly files we
need to do it our selves.

Add the note to the aarch64 sha256 and sha512 assembly files.

Tested by building with BTI enabled and using the -zbti-report=error
flag to lld that makes it an error if the note is missing.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Turner <andrew.turner4@arm.com>
Closes #16086
2024-04-29 13:50:05 -07:00
Rob N 3f817debb4 AUTHORS: refresh with recent new contributors
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16079
2024-04-29 13:50:05 -07:00
Jason Lee 97889c037a return NULL at end of send_progress_thread
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason Lee <jasonlee@lanl.gov>
Closes #16074
2024-04-29 13:50:05 -07:00
Maxim Filimonov 86b39b41a0 Fix locale-specific time
In `zpool status -t`, scrub date/time is reported using the C locale,
while trim time is reported using the current one. This is inconsistent.
This patch fixes that.

Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Maxim Filimonov <che@bein.link>
Closes #15878
Closes #15879
2024-04-29 13:50:05 -07:00
Pavel Snajdr 531572b590 Fix panics when truncating/deleting files
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate
to B_TRUE if the dirty record is of another type than dl. Adding
more explicit dr type check before trying to access dr_brtwrite.

Fixes two similar panics:

[ 1373.806119] VERIFY0(db->db_level) failed (0 == 1)
[ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1373.814979]  dump_stack_lvl+0x71/0x90
[ 1373.815799]  spl_panic+0xd3/0x100 [spl]
[ 1373.827709]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1373.829204]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1373.831010]  dnode_free_range+0x532/0x1220 [zfs]
[ 1373.833922]  dmu_free_long_range+0x4e0/0x930 [zfs]
[ 1373.835277]  zfs_trunc+0x75/0x1e0 [zfs]
[ 1373.837958]  zfs_freesp+0x9b/0x470 [zfs]
[ 1373.847236]  zfs_setattr+0x161a/0x3500 [zfs]
[ 1373.855267]  zpl_setattr+0x125/0x320 [zfs]
[ 1373.856725]  notify_change+0x1ee/0x4a0
[ 1373.859207]  do_truncate+0x7f/0xd0
[ 1373.859968]  do_sys_ftruncate+0x28e/0x2e0
[ 1373.860962]  do_syscall_64+0x38/0x90
[ 1373.861751]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

[ 1822.381337] VERIFY0(db->db_level) failed (0 == 1)
[ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty()
[ 1822.389232]  dump_stack_lvl+0x71/0x90
[ 1822.389920]  spl_panic+0xd3/0x100 [spl]
[ 1822.399567]  dbuf_undirty+0x62a/0x970 [zfs]
[ 1822.400583]  dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs]
[ 1822.401752]  dnode_free_range+0x532/0x1220 [zfs]
[ 1822.402841]  dmu_object_free+0x74/0x120 [zfs]
[ 1822.403869]  zfs_znode_delete+0x75/0x120 [zfs]
[ 1822.404906]  zfs_rmnode+0x3f6/0x7f0 [zfs]
[ 1822.405870]  zfs_inactive+0xa3/0x610 [zfs]
[ 1822.407803]  zpl_evict_inode+0x3e/0x90 [zfs]
[ 1822.408831]  evict+0xc1/0x1c0
[ 1822.409387]  do_unlinkat+0x147/0x300
[ 1822.410060]  __x64_sys_unlinkat+0x33/0x60
[ 1822.410802]  do_syscall_64+0x38/0x90
[ 1822.411458]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #15983
2024-04-29 13:50:05 -07:00
Alek P 74101f7e2a vdev props comment and manpage should include zfsd and FreeBSD mentions
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #15968
2024-04-29 13:50:05 -07:00
Don Brady c1c26a77ff Add slow disk diagnosis to ZED
Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
2024-04-29 13:50:05 -07:00
Tony Hutter db65272aef [2.2.4-only] Stub RAIDZ enums to prevent conflicts
Stub in the RAIDZ expansions enums for now so that the slow IO
commit merges cleanly.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2024-04-29 13:50:05 -07:00
Rob N da88fc4ac9 zap_leaf: make l_hash[] variable length to silence UBSAN
When UBSAN is active and OpenZFS is a debug build, the l_hash assert at
the bottom of zap_open_leaf() causes UBSAN to complain.

This follows the example in 786641dcf to shut it up.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15964
2024-04-29 13:50:05 -07:00
Paul Dagnelie 889152ce4a Give a better message from 'zpool get' with invalid pool name
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #15942
2024-04-29 13:50:05 -07:00
Rob N 5d859a2e22 xdr: header cleanup
#16047 notes that include/os/freebsd/spl/rpc/xdr.h carried an
(apparently) incompatible license. While looking into it, it seems that
this file is actually unnecessary these days - FreeBSD's kernel XDR has
XDR_CONTROL, xdrmem_control and XDR_GET_BYTES_AVAIL, while userspace has
XDR_CONTROL and xdrmem_control, and our implementation of
XDR_GET_BYTES_AVAIL for libspl works nicely with it. So this removes
that file outright.

To keep the includes in nvpair.c tidy, I've made a few small adjustments
to the Linux headers. By definition, rpc/types.h provides bool_t and is
included before rpc/xdr.h, so I've created rpc/types.h for Linux. This
isn't necessary for userspace; both FreeBSD native and tirpc on Linux
already have these headers set up correctly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16047 
Closes #16051
2024-04-29 13:50:05 -07:00
Robert Evans e0cfa1592d Fix buffer underflow if sysfs file is empty
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Jason Lee <jasonlee@lanl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16028
Closes #16035
2024-04-29 13:50:05 -07:00
Robert Evans d088fb7d24 ZTS: fix flakiness in cp_files_002_pos
Fix RANDOM to not return zero.

Overwriting with `dd ... count=0` does not test anything.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16029
2024-04-29 13:50:05 -07:00
Cameron Harr 67995229a8 Fix option string, adding -e and fixing order
The recently added '-e' option (PR #15769) missed adding the
new option in the online `zpool status` help command. This
adds the options and reorders a couple of the other options
that were not listed alphabetically.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Cameron Harr <harr1@llnl.gov>
Closes #16008
2024-04-29 13:50:05 -07:00
Rob N 2ff09e8fed freebsd: fix missing headers in distribution tarball
arc_os.h and freebsd_event.h aren't included in release tarballs, so the
build fails on FreeBSD. This fixes it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15963
2024-04-29 13:50:05 -07:00
Brian Behlendorf 9f1d3db730 Check for minimum partition size
On Linux block devices used for vdevs will by partitioned.  The block
device must be large enough for an 64M partition starting at offset
of 2048 sectors (part1), and a second 64M reserved partition at the
end of the device (part9).

This commit adds a capacity check when creating the GPT label to
immediately detect a device which is too small.  With the existing
code this would be caught slightly latter when attempting to use
the partition.  Catching it sooner let's us print a more useful error.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15898
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav 5dda8c0910 Add VERIFY0P() and ASSERT0P() macros.
These macros are similar to VERIFY0() and ASSERT0() but are intended
for pointers, and therefore use uintptr_t instead of int64_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-29 13:50:05 -07:00
Dag-Erling Smørgrav d6da6cbd74 Clean up existing VERIFY*() macros.
Chiefly:

- Remove unnecessary parentheses around variable names.
- Remove spaces between the type and variable in casts.
- Make the panic message for VERIFY0() reflect how the macro is used.
- Use %p to format pointers, except in Linux kernel code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kay Pedersen <mail@mkwg.de>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Dag-Erling Smørgrav <des@FreeBSD.org>
Closes #15225
2024-04-22 13:32:33 -07:00
Benda Xu 6732e223bf etc/init.d: decide which variant to use at build time.
Let Debian use the sysv-rc variant of the script, even when OpenRC is
installed. Unlike on Gentoo, OpenRC on Debian consumes both the
sysv-rc scripts and OpenRC ones. ZFS initscripts on Debian should be
the sysv-rc version to provide most compatibility and to integrate
with the rest of initscripts for dependency tracking.

Restrict the substitution in the Makefile to the dedicated list.

This construct is inspired by Mo Zhou's detection of the execution
shell and follows the strategy of Peter in 6ef28c526b.

As of 2024, the initscripts are mostly relevant on Debian, Gentoo and
their derivatives.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Issue #8063
Issue #8204
Issue #8359
Closes #15977
2024-04-22 09:28:06 -07:00
Benda Xu baaac31655 config/Substfiles.am: restrict to the dedicated list.
We recover the scope of $(SUBSTFILES) to explicitly control what files
are being generated from the corresponding .in.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Benda Xu <orv@debian.org>
Closes #15980
2024-04-22 09:28:06 -07:00
Shengqi Chen b0b0d07b13 man: move zfs_prepare_disk.8 to nodist_man_MANS
The commit b53077a added zfs_prepare_disk.8 to the wrong list
dist_man_MANS, in which @zfsexecdir@ will not be properly substituted.
This leads to wrong path in the manpage in generated release tarballs.

Reported-by: Benda Xu <orv@debian.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Closes #15979
2024-04-22 09:28:06 -07:00
Umer Saleem 8a56047135 Add support for zfs mount -R <filesystem>
This commit adds support for mounting a dataset along with all of
it's children with '-R' flag for zfs mount. There can be scenarios
where we want to mount all datasets under one hierarchy instead of
mounting all datasets present on system with '-a' flag.

'-R' flag should work on all root and non-root datasets. Usage
information and man page has been updated for zfs mount. A test
for verifying the behavior for '-R' flag is also added.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
Closes #16015
2024-04-22 09:28:06 -07:00
Rob Norris 9a7ef02f4d Linux 6.9 compat: blk_alloc_disk() now takes two args
There's an extra nullable arg for queue limits. Detect it, and set it to
NULL. Similar change for blk_mq_alloc_disk(), now three args, same
treatment.

Error return now has error encoded in the return, so detect with
IS_ERR() and explicitly NULL our own return.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob Norris 3bd7cd06b7 Linux 6.9 compat: bdev handles are now struct file
bdev_open_by_path() is replaced by bdev_file_open_by_path(), which
returns a plain old struct file*. Release function is gone entirely; the
regular file release function fput() will take care of the bdev
specifics.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16027
Closes #16033
2024-04-22 09:23:23 -07:00
Rob N b9c3040b10 vdev_disk: clean up spa/bdev mode conversion
43e8f6e37 introduced a subtle API misuse, in that it passed the output
from vdev_bdev_mode() back into itself. Fortunately, the
SPA_MODE_(READ|WRITE) bit values exactly map to the FMODE_(READ|WRITE) &
BLK_OPEN_(READ|WRITE) bit values, so it didn't result in a bug, but it
was hard to read and understand, so I cleaned it up.

In doing so, I noticed that the only call to vdev_bdev_mode() without
the "exclusive" flag set was in that misuse, and actually, we never do a
non-exclusive blkdev_get_by_path(). So I've just made exclusive be
always-on.


Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #15995
2024-04-22 09:23:23 -07:00
Robert Evans 5dbed50429 Linux 5.18+ compat: Detect filemap_range_has_page
In v5.18 `filemap_range_has_page` moved to `pagemap.h`

`pagemap.h` has been around since 3.10 so just include both

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #16034
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler 3fb0942cc5 udev: correctly handle partition #16 and later
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #15904
Closes #15970
2024-04-22 09:23:23 -07:00
Fabian-Gruenbichler fa2cbd4007 zvols: prevent overflow of minor device numbers
currently, the linux kernel allows 2^20 minor devices per major device
number.  ZFS reserves blocks of 2^4 minors per zvol: 1 for the zvol
itself, the other 15 for the first partitions of that zvol. as a result,
only 2^16 such blocks are available for use.

there are no checks in place to avoid overflowing into the major device
number when more than 2^16 zvols are allocated (with volmode=dev or
default). instead of ignoring this limit, which comes with all sorts of
weird knock-on effects, detect this situation and simply fail allocating
the zvol block device early on.

without this safeguard, the kernel will reject the attempt to create an
already existing block device, but ZFS doesn't handle this error and
gets confused about which zvol occupies which minor slot, potentially
resulting in kernel NULL derefs and other issues later on.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes #16006
2024-04-22 09:23:23 -07:00
Tony Hutter bb9542a2a0 Linux 6.8 compat: META (#16099)
Update the META file to reflect compatibility with the 6.8 kernel.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
2024-04-22 09:23:23 -07:00
Rob N 72e4996a54 bdev_discard_supported: understand discard_granularity=0
Kernel documentation for the discard_granularity property says:

    A discard_granularity of 0 means that the device does not support
    discard functionality.

Some older kernels had drivers (notably loop, but also some USB-SATA
adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but
have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c) the
discard entry point blkdev_issue_discard() has had a check for this,
which would immediately reject the call with EOPNOTSUPP, and throw a
scary diagnostic message into the log. See #16068.

Since 6.8, the block layer sets a non-zero default for
discard_granularity (torvalds/linux@3c407dc723), and a future kernel
will remove the check entirely[1].

As such, there's no good reason for us to enable discard when
discard_granularity=0. The kernel will never let the request go in
anyway; better that we just disable it so we can report it properly to
the user.

1. https://patchwork.kernel.org/project/linux-block/patch/20240312144826.1045212-2-hch@lst.de/

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit b181b2e604)
2024-04-19 10:19:53 -07:00
Alexander Motin 575872cc37 L2ARC: Relax locking during write
Previous code held ARC state sublist lock throughout all L2ARC
write process, which included number of allocations and even ZIO
issues.  Being blocked in any of those places the code could also
block ARC eviction, that could cause OOM activation or even dead-
lock if system is low on memory or one is too fragmented.

Fix it by dropping the lock as soon as we see a block eligible
for L2ARC writing and pick it up later using earlier inserted
marker.  While there, also reduce scope of hash lock, moving
ZIO allocation and other operations not requiring header access
out of it.  All operations requiring header access move under
hash lock, since L2_WRITING flag does not prevent header eviction
only transition to arc_l2c_only state with L1 header.

To be able to manipulate sublist lock and marker as needed add few
more multilist functions and modify one.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16040
2024-04-19 10:13:38 -07:00
Alexander Motin f4ce02ae42 Small fix to prefetch ranges aggregation
When after #16022 adding new range we aggregate more than two
existing ranges, that should be very rare, only if several streams
overlap, we may need to zero not the last range, but some earlier.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16072
2024-04-19 10:13:38 -07:00
Alexander Motin 97d7228f42 Remove db_state DB_NOFILL checks from syncing context
Syncing context should not depend on current state of dbuf, which
could already change several times in later transaction groups,
but rely solely on dirty record for the transaction group being
synced. Some of the checks seem already impossible, while instead
of others I think we should better check for absence of data in
the specific dirty record rather than DB_NOFILL.

Reviewed-by: Robert Evans <evansr@google.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16057
2024-04-19 10:13:38 -07:00
Alexander Motin 026fe79646 Speculative prefetch for reordered requests
Before this change speculative prefetcher was able to detect a stream
only if all of its accesses are perfectly sequential.  It was easy to
implement and is perfectly fine for single-threaded applications.
Unfortunately multi-threaded network servers, such as iSCSI, SMB or
NFS usually have plenty of threads and may often reorder requests,
preventing successful speculation and prefetch.

This change allows speculative prefetcher to detect streams even if
requests are reordered by introducing a list of 9 non-contiguous
ranges up to 16MB ahead of current stream position and filling the
gaps as more requests arrive.  It also allows stream to proceed
even with holes up to a certain configurable threshold (25%).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16022
2024-04-19 10:13:38 -07:00
Alexander Motin 602b5dca7b Fix read errors race after block cloning
Investigating read errors triggering panic fixed in #16042 I've
found that we have a race in a sync process between the moment
dirty record for cloned block is removed and the moment dbuf is
destroyed.  If dmu_buf_hold_array_by_dnode() take a hold on a
cloned dbuf before it is synced/destroyed, then dbuf_read_impl()
may see it still in DB_NOFILL state, but without the dirty record.
Such case is not an error, but equivalent to DB_UNCACHED, since
the dbuf block pointer is already updated by dbuf_write_ready().
Unfortunately it is impossible to safely change the dbuf state
to DB_UNCACHED there, since there may already be another cloning
in progress, that dropped dbuf lock before creating a new dirty
record, protected only by the range lock.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Robert Evans <evansr@google.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16052
2024-04-19 10:13:38 -07:00
Alexander Motin d5fb6abd36 Improve dbuf_read() error reporting
Previous code reported non-ZIO errors only via return value, but
not via parent ZIO.  It could cause NULL-dereference panics due
to dmu_buf_hold_array_by_dnode() ignoring the return value,
relying solely on parent ZIO status.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reported by:	Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16042
2024-04-19 10:13:38 -07:00
Alexander Motin 39993c3dfe BRT: Check pool clone stats in more tests
This should allow to catch some leaks, if those happen.

While there fix some cosmetic issues.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin e3c1c9153f BRT: Fix tests to work on non-empty pools
It should not normally happen, but if it does, better to not fail
everything for no good reason, or it may be hard to debug.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 2ea370a4e3 BRT: Fix holes cloning.
- When reading L0 block pointers handle buffers without ones and
without dirty records as a holes.  Those appear when dnode size
was increased, but the end was never written, so there are no new
indirection levels to store the pointers.  It makes no sense to
return EAGAIN here, since sync won't create new indirection levels
until there will be actual writes.
 - When cloning blocks set destination hole logical birth time
to the current TXG.  Otherwise if we are cloning over existing
data, newly created holes may not be properly replicated later.
Use BP_SET_BIRTH() when possible to not replicate its logic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes #15994
Closes #16007
2024-04-19 10:13:38 -07:00
Alexander Motin 3e91a9c525 BRT: Skip getting length in brt_entry_lookup()
Unlike DDT, where ZAP values may have different lengths due to
compression, all BRT entries are identical 8-byte counters.  It
does not make sense to first fetch the length only to assert it.
zap_lookup_uint64() is specifically designed to work with counters
of different size and should return error if something odd found.
Calling it straight allows to save some measurable CPU time.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15950
2024-04-19 10:13:38 -07:00
Alexander Motin c94f730078 BRT: Make BRT block sizes configurable
Similar to DDT make BRT data and indirect block sizes configurable
via module parameters.  I am not sure what would be the best yet,
but similar to DDT 4KB blocks kill all chances of compression on
vdev with ashift=12 or more, that on my tests reaches 3x.

While here, fix documentation for respective DDT parameters.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15967
2024-04-19 10:13:38 -07:00
Alexander Motin 457e62d7ca BRT: Relax brt_pending_apply() locking
Since brt_pending_apply() is running in syncing context, no other
brt_pending_tree accesses are possible for the TXG.  We don't need
to acquire brt_pending_lock here.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15955
2024-04-19 10:13:38 -07:00
Alexander Motin 19bf54b764 ZAP: Massively switch to _by_dnode() interfaces
Before this change ZAP called dnode_hold() for almost every block
access, that was clearly visible in profiler under heavy load, such
as BRT.  This patch makes it always hold the dnode reference between
zap_lockdir() and zap_unlockdir().  It allows to avoid most of dnode
operations between those.  It also adds several new _by_dnode() APIs
to ZAP and uses them in BRT code.  Also adds dmu_prefetch_by_dnode()
variant and uses it in the ZAP code.

After this there remains only one call to dmu_buf_dnode_enter(),
which seems to be unneeded.  So remove the call and the functions.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15951
2024-04-19 10:13:38 -07:00
Alexander Motin fdd8c0aea1 BRT: Skip duplicate BRT prefetches
If there is a pending entry for this block, then we've already
issued BRT prefetch for it within this TXG, so don't do it again.
BRT vdev lookup and following zap_prefetch_uint64() call can be
pretty expensive and should be avoided when not necessary.

Reviewed-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15941
2024-04-19 10:13:38 -07:00
Alexander Motin dced953b62 ZAP: Some cleanups/micro-optimizations
- Remove custom zap_memset(), use regular memset().
- Use PANIC() instead of opaque cmn_err(CE_PANIC).
- Provide entry parameter to zap_leaf_rehash_entry().
- Reduce branching in zap_leaf_array_create() inner loop.
- Remove signedness where it should not be.

Should be no function changes.

Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15976
2024-04-19 10:13:38 -07:00
Alexander Motin f7c1db6366 BRT: Change brt_pending_tree sorting order
It does not look important how exactly brt_pending_tree is sorted.
When cloning large file, it is quite likely that all of its blocks
have identical physical birth times, so comparing them first does
not provide useful entropy, while accesses additional cache line.
In most cases combination of vdev and offset provides unique result
and physical birth time comparison is not even needed.  Meanwhile,
when traversing the tree inside brt_pending_apply(), it can be
beneficial for dbuf cache and CPU cache hits to group processing
by vdev and so by the per-VDEV BRT ZAPs.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15954
2024-04-19 10:13:38 -07:00
Alexander Motin fa5de0c5cd Update resume token at object receive.
Before this change resume token was updated only on data receive.
Usually it is enough to resume replication without much overlap.
But we've got a report of a curios case, where replication source
was traversed with recursive grep, which through enabled atime
modified every object without modifying any data.  It produced
several gigabytes of replication traffic without a single data
write and so without a single resume point.

While the resume token was not designed to resume from an object,
I've found that the send implementation always sends object before
any data. So by requesting resume from offset 0 we are effectively
resuming from the object, followed (or not) by the data at offset
0, just as we need it.

Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15927
2024-04-19 10:13:38 -07:00
Alexander Motin 793a2cff2a Linux: Cleanup taskq threads spawn/exit
This changes taskq_thread_should_stop() to limit maximum exit rate
for idle threads to one per 5 seconds.  I believe the previous one
was broken, not allowing any thread exits for tasks arriving more
than one at a time and so completing while others are running.

Also while there:
 - Remove taskq_thread_spawn() calls on task allocation errors.
 - Remove extra taskq_thread_should_stop() call.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15873
2024-04-19 10:13:38 -07:00
Alexander Motin fdd97e0093 Refactor dmu_prefetch().
- Split dmu_prefetch_dnode() from dmu_prefetch() into a separate
function.  It is quite inconvenient to read the code where len = 0
means dnode prefetch instead indirect/data prefetch.  One function
doing both has no benefits, since the code paths are independent.
 - Improve dmu_prefetch() handling of long block ranges.  Instead
of limiting L0 data length to prefetch for to dmu_prefetch_max,
make dmu_prefetch_max limit the actual amount of prefetch at the
specified level, and, if there is more, prefetch all the rest at
higher indirection level.  It should improve random access times
within the prefetched range of any length, reducing importance of
specific dmu_prefetch_max value.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15076
2024-04-19 10:13:38 -07:00
Alexander Motin 3b8817db96 ZIL: Update Linux tracing after #15635
While picking parts from #14909 I've missed Linux tracing specific
ones, that went unnoticed in default configurations, but breaks the
build in some.

Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15730
2024-04-19 10:13:38 -07:00
Alexander Motin 25ea8ce94b ZIL: Improve next log block size prediction
Track history in context of bursts, not individual log blocks. It
allows to not blow away all the history by single large burst of
many block, and same time allows optimizations covering multiple
blocks in a burst and even predicted following burst.  For each
burst account its optimal block size and minimal first block size.
Use that statistics from the last 8 bursts to predict first block
size of the next burst.

Remove predefined set of block sizes. Allocate any size we see fit,
multiple of 4KB, as required by ZIL now.  With compression enabled
by default, ZFS already writes pretty random block sizes, so this
should not surprise space allocator any more.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15635
2024-04-19 10:13:38 -07:00
Alexander Motin 8b1a132de7 ZIO: Optimize zio_flush()
- Generalize vdev_nowritecache handling by traversing through the
VDEV tree and skipping children ZIOs where not supported.
 - Remove intermediate zio_null() in case of several VDEV children.
 - Remove children handling from zio_ioctl().  There are no other
use cases for this code beside DKIOCFLUSHWRITECACHED, and would there
be, I doubt they would so straightforward apply to all VDEV children.

Comparing to removed previous optimization this should improve cases
of redundant ZILs/SLOGs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15515
2024-04-19 10:13:38 -07:00
Alexander Motin 7ea8331009 ZIL: Detect single-threaded workloads
... by checking that previous block is fully written and flushed.
It allows to skip commit delays since we can give up on aggregation
in that case.  This removes zil_min_commit_timeout parameter, since
for single-threaded workloads it is not needed at all, while on very
fast devices even some multi-threaded workloads may get detected as
single-threaded and still bypass the wait.  To give multi-threaded
workloads more aggregation chances increase zfs_commit_timeout_pct
from 5 to 10%, as they should suffer less from additional latency.

Also single-threaded workloads detection allows in perspective better
prediction of the next block size.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Prakash Surya <prakash.surya@delphix.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15381
2024-04-19 10:13:38 -07:00
Rob N 3c5f354a8c zvol_os: fix compile with blk-mq on Linux 4.x
99741bde5 accesses a cached blk-mq hardware context through the mq_hctx
field of struct request. However, this field did not exist until 5.0.
Before that, the private function blk_mq_map_queue() was used to dig it
out of broader queue context. This commit detects this situation, and
handles it with a poor-man's simulation of that function.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16069
2024-04-17 10:10:24 -07:00
Rob N 5c0fe099ec zvol_os: fix build on Linux <3.13
99741bde5 introduced zvol_num_taskqs, but put it behind the HAVE_BLK_MQ
define, preventing builds on versions of Linux that don't have it
(<3.13, incl EL7).

Nothing about it seems dependent on blk-mq, so this just moves it out
from behind that define and so fixes the build.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #16062
2024-04-17 10:10:24 -07:00
Ameer Hamza 5fc134ff2f zvol: use multiple taskq
Currently, zvol uses a single taskq, resulting in throughput bottleneck
under heavy load due to lock contention on the single taskq. This patch
addresses the performance bottleneck under heavy load conditions by
utilizing multiple taskqs, thus mitigating lock contention. The number
of taskqs scale dynamically based on the available CPUs in the system,
as illustrated below:

                taskq   total
cpus    taskqs  threads threads
------- ------- ------- -------
1       1       32       32
2       1       32       32
4       1       32       32
8       2       16       32
16      3       11       33
32      5       7        35
64      8       8        64
128     11      12       132
256     16      16       256

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #15992
2024-04-17 10:10:24 -07:00
Rob Norris 7ad2616d37 vdev_disk: fix alignment check when buffer has non-zero starting offset
If a linear buffer spans multiple pages, and the first page has a
non-zero starting offset, the checker would not include the offset, and
so would think there was an alignment gap at the end of the first page,
rather than at the start.

That is, for a 16K buffer spread across five pages with an initial 512B
offset:

    [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.]

It would be interpreted as:

    [XXXXXXX.][XXXXXXXX]...

And be rejected as misaligned.

Since it's already a linear ABD, the "linearising" copy would just reuse
the buffer as-is, and the second check would failing, tripping the
VERIFY in vdev_disk_io_rw().

This commit fixes all this by including the offset in the check for
end-of-page alignment.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 1bf649cb0a)
2024-04-12 08:53:48 -07:00
Rob N d0d9dccc61 vdev_disk: ensure trim errors are returned immediately
After 08fd5ccc3, the discard issuing code was organised such that if
requesting an async discard or secure erase failed before the IO was
issued (that is, calling __blkdev_issue_discard() returned an error),
the failed zio would never be executed, resulting in txg_sync hanging
forever waiting for IO to finish.

This commit fixes that by immediately executing a failed zio on error.
To handle the successful synchronous op case, we fake an async op by,
when not using an asynchronous submission method, queuing the successful
result zio as part of the discard handler.

Since it was hard to understand the differences between discard and
secure erase, and sync and async, across different kernel versions, I've
commented and reorganised the code a bit to try and make everything more
contained and linear.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit ba9f587a77)
2024-04-11 12:25:40 -07:00
Rob Norris 28520cad25 vdev_disk: don't touch vbio after its handed off to the kernel
After IO is unplugged, it may complete immediately and vbio_completion
be called on interrupt context. That may interrupt or deschedule our
task. If its the last bio, the vbio will be freed. Then, we get
rescheduled, and try to write to freed memory through vbio->.

This patch just removes the the cleanup, and the corresponding assert.
These were leftovers from a previous iteration of vbio_submit() and were
always "belt and suspenders" ops anyway, never strictly required.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc
Reported-by: Rich Ercolani <rincebrain@gmail.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
(cherry picked from commit 917ff75e95)
2024-04-08 10:13:55 -07:00
Robert Evans deb7a84231 Fix corruption caused by mmap flushing problems
1) Make mmap flushes synchronous. Linux may skip flushing dirty pages
   already in writeback unless data-integrity sync is requested.

2) Change zfs_putpage to use TXG_WAIT. Otherwise dirty pages may be
   skipped due to DMU pushing back on TX assign.

3) Add missing mmap flush when doing block cloning.

4) While here, pass errors from putpage to writepage/writepages.

This change fixes corruption edge cases, but unfortunately adds
synchronous ZIL flushes for dirty mmap pages to llseek and bclone
operations. It may be possible to avoid these sync writes later
but would need more tricky refactoring of the writeback code.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Robert Evans <evansr@google.com>
Closes #15933 
Closes #16019
2024-03-29 17:10:04 -07:00
Rob Norris eebf00bee9 vdev_disk: default to classic submission for 2.2.x
We don't want to change to brand-new code in the middle of a stable
series, but we want it available to test for people running into page
splitting issues.

This commits make zfs_vdev_disk_classic=1 the default, and updates the
documentation to better explain what's going on.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
2024-03-28 13:29:46 -07:00
Rob Norris d0b3be763f abd_iter_page: don't use compound heads on Linux <4.5
Before 4.5 (specifically, torvalds/linux@ddc58f2), head and tail pages
in a compound page were refcounted separately. This means that using the
head page without taking a reference to it could see it cleaned up later
before we're finished with it. Specifically, bio_add_page() would take a
reference, and drop its reference after the bio completion callback
returns.

If the zio is executed immediately from the completion callback, this is
usually ok, as any data is referenced through the tail page referenced
by the ABD, and so becomes "live" that way. If there's a delay in zio
execution (high load, error injection), then the head page can be freed,
along with any dirty flags or other indicators that the underlying
memory is used. Later, when the zio completes and that memory is
accessed, its either unmapped and an unhandled fault takes down the
entire system, or it is mapped and we end up messing around in someone
else's memory. Both of these are very bad.

The solution on these older kernels is to take a reference to the head
page when we use it, and release it when we're done. There's not really
a sensible way under our current structure to do this; the "best" would
be to keep a list of head page references in the ABD, and release them
when the ABD is freed.

Since this additional overhead is totally unnecessary on 4.5+, where
head and tail pages share refcounts, I've opted to simply not use the
compound head in ABD page iteration there. This is theoretically less
efficient (though cleaning up head page references would add overhead),
but its safe, and we still get the other benefits of not mapping pages
before adding them to a bio and not mis-splitting pages.

There doesn't appear to be an obvious symbol name or config option we
can match on to discover this behaviour in configure (and the mm/page
APIs have changed a lot since then anyway), so I've gone with a simple
version check.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c6be6ce175)
2024-03-28 13:29:46 -07:00
Rob Norris cb599d27ed vdev_disk: use bio_chain() to submit multiple BIOs
Simplifies our code a lot, so we don't have to wait for each and
reassemble them.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 72fd834c47)
2024-03-28 13:29:46 -07:00
Rob Norris af3a5bb40d vdev_disk: add module parameter to select BIO submission method
This makes the submission method selectable at module load time via the
`zfs_vdev_disk_classic` parameter, allowing this change to be backported
to 2.2 safely, and disabled in favour of the "classic" submission method
if new problems come up.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df2169d141)
2024-03-28 13:29:46 -07:00
Rob Norris 51c2bd0def vdev_disk: rewrite BIO filling machinery to avoid split pages
This commit tackles a number of issues in the way BIOs (`struct bio`)
are constructed for submission to the Linux block layer.

The kernel has a hard upper limit on the number of pages/segments that
can be added to a BIO, as well as a separate limit for each device
(related to its queue depth and other scheduling characteristics).

ZFS counts the number of memory pages in the request ABD
(`abd_nr_pages_off()`, and then uses that as the number of segments to
put into the BIO, up to the hard upper limit. If it requires more than
the limit, it will create multiple BIOs.

Leaving aside the fact that page count method is wrong (see below), not
limiting to the device segment max means that the device driver will
need to split the BIO in half. This is alone is not necessarily a
problem, but it interacts with another issue to cause a much larger
problem.

The kernel function to add a segment to a BIO (`bio_add_page()`) takes a
`struct page` pointer, and offset+len within it. `struct page` can
represent a run of contiguous memory pages (known as a "compound page").
In can be of arbitrary length.

The ZFS functions that count ABD pages and load them into the BIO
(`abd_nr_pages_off()`, `bio_map()` and `abd_bio_map_off()`) will never
consider a page to be more than `PAGE_SIZE` (4K), even if the `struct
page` is for multiple pages. In this case, it will load the same `struct
page` into the BIO multiple times, with the offset adjusted each time.

With a sufficiently large ABD, this can easily lead to the BIO being
entirely filled much earlier than it could have been. This is also
further contributes to the problem caused by the incorrect segment limit
calculation, as its much easier to go past the device limit, and so
require a split.

Again, this is not a problem on its own.

The logic for "never submit more than `PAGE_SIZE`" is actually a little
more subtle. It will actually never submit a buffer that crosses a 4K
page boundary.

In practice, this is fine, as most ABDs are scattered, that is a list of
complete 4K pages, and so are loaded in as such.

Linear ABDs are typically allocated from slabs, and for small sizes they
are frequently not aligned to page boundaries. For example, a 12K
allocation can span four pages, eg:

     -- 4K -- -- 4K -- -- 4K -- -- 4K --
    |        |        |        |        |
          :## ######## ######## ######:    [1K, 4K, 4K, 3K]

Such an allocation would be loaded into a BIO as you see:

    [1K, 4K, 4K, 3K]

This tends not to be a problem in practice, because even if the BIO were
filled and needed to be split, each half would still have either a start
or end aligned to the logical block size of the device (assuming 4K at
least).

---

In ideal circumstances, these shortcomings don't cause any particular
problems. Its when they start to interact with other ZFS features that
things get interesting.

Aggregation will create a "gang" ABD, which is simply a list of other
ABDs. Iterating over a gang ABD is just iterating over each ABD within
it in turn.

Because the segments are simply loaded in order, we can end up with
uneven segments either side of the "gap" between the two ABDs. For
example, two 12K ABDs might be aggregated and then loaded as:

    [1K, 4K, 4K, 3K, 2K, 4K, 4K, 2K]

Should a split occur, each individual BIO can end up either having an
start or end offset that is not aligned to the logical block size, which
some drivers (eg SCSI) will reject. However, this tends not to happen
because the default aggregation limit usually keeps the BIO small enough
to not require more than one split, and most pages are actually full 4K
pages, so hitting an uneven gap is very rare anyway.

If the pool is under particular memory pressure, then an IO can be
broken down into a "gang block", a 512-byte block composed of a header
and up to three block pointers. Each points to a fragment of the
original write, or in turn, another gang block, breaking the original
data up over and over until space can be found in the pool for each of
them.

Each gang header is a separate 512-byte memory allocation from a slab,
that needs to be written down to disk. When the gang header is added to
the BIO, its a single 512-byte segment.

Pulling all this together, consider a large aggregated write of gang
blocks. This results a BIO containing lots of 512-byte segments. Given
our tendency to overfill the BIO, a split is likely, and most possible
split points will yield a pair of BIOs that are misaligned. Drivers that
care, like the SCSI driver, will reject them.

---

This commit is a substantial refactor and rewrite of much of `vdev_disk`
to sort all this out.

`vdev_bio_max_segs()` now returns the ideal maximum size for the device,
if available. There's also a tuneable `zfs_vdev_disk_max_segs` to
override this, to assist with testing.

We scan the ABD up front to count the number of pages within it, and to
confirm that if we submitted all those pages to one or more BIOs, it
could be split at any point with creating a misaligned BIO.  If the
pages in the BIO are not usable (as in any of the above situations), the
ABD is linearised, and then checked again. This is the same technique
used in `vdev_geom` on FreeBSD, adjusted for Linux's variable page size
and allocator quirks.

`vbio_t` is a cleanup and enhancement of the old `dio_request_t`. The
idea is simply that it can hold all the state needed to create, submit
and return multiple BIOs, including all the refcounts, the ABD copy if
it was needed, and so on. Apart from what I hope is a clearer interface,
the major difference is that because we know how many BIOs we'll need up
front, we don't need the old overflow logic that would grow the BIO
array, throw away all the old work and restart. We can get it right from
the start.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 06a196020e)
2024-03-28 13:29:46 -07:00
Rob Norris 03ff875e09 vdev_disk: make read/write IO function configurable
This is just setting up for the next couple of commits, which will add a
new IO function and a parameter to select it.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit c4a13ba483)
2024-03-28 13:29:46 -07:00
Rob Norris 13b5348848 vdev_disk: reorganise vdev_disk_io_start
Light reshuffle to make it a bit more linear to read and get rid of a
bunch of args that aren't needed in all cases.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 867178ae1d)
2024-03-28 13:29:46 -07:00
Rob Norris 4820185031 vdev_disk: rename existing functions to vdev_classic_*
This is just renaming the existing functions we're about to replace and
grouping them together to make the next commits easier to follow.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit f3b85d706b)
2024-03-28 13:29:46 -07:00
Rob Norris 52a2af6fd1 abd: add page iterator
The regular ABD iterators yield data buffers, so they have to map and
unmap pages into kernel memory. If the caller only wants to count
chunks, or can use page pointers directly, then the map/unmap is just
unnecessary overhead.

This adds adb_iterate_page_func, which yields unmapped struct page
instead.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit 390b448726)
2024-03-28 13:29:46 -07:00
Rob Norris 220bb7341e linux 5.4 compat: page_size()
Before 5.4 we have to do a little math.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #15533
Closes #15588
(cherry picked from commit df04efe321)
2024-03-28 13:29:46 -07:00
Rob N 58211157bf Linux 6.8 compat: use splice_copy_file_range() for fallback
Linux 6.8 removes generic_copy_file_range(), which had been reduced to a
simple wrapper around splice_copy_file_range(). Detect that function
directly and use it if generic_ is not available.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #15930
Closes #15931
(cherry picked from commit ef08a4d406)
2024-03-21 09:35:17 -07:00
155 changed files with 4783 additions and 1643 deletions
+18
View File
@@ -30,6 +30,7 @@ Andreas Dilger <adilger@dilger.ca>
Andrew Walker <awalker@ixsystems.com>
Benedikt Neuffer <github@itfriend.de>
Chengfei Zhu <chengfeix.zhu@intel.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chris Lindee <chris.lindee+github@gmail.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
@@ -43,6 +44,7 @@ Glenn Washburn <development@efficientek.com>
Gordan Bobic <gordan.bobic@gmail.com>
Gregory Bartholomew <gregory.lee.bartholomew@gmail.com>
hedong zhang <h_d_zhang@163.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
InsanePrawn <Insane.Prawny@gmail.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
@@ -57,6 +59,7 @@ KernelOfTruth <kerneloftruth@gmail.com>
Liu Hua <liu.hua130@zte.com.cn>
Liu Qing <winglq@gmail.com>
loli10K <ezomori.nozomu@gmail.com>
Mart Frauenlob <allkind@fastest.cc>
Matthias Blankertz <matthias@blankertz.org>
Michael Gmelin <grembo@FreeBSD.org>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
@@ -73,6 +76,9 @@ WHR <msl0000023508@gmail.com>
Yanping Gao <yanping.gao@xtaotech.com>
Youzhong Yang <youzhong@gmail.com>
# Signed-off-by: overriding Author:
Yuxin Wang <yuxinwang9999@gmail.com> <Bi11gates9999@gmail.com>
# Commits from strange places, long ago
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@7e1ea52c-4ff2-0310-8f11-9dd32ca42a1c>
Brian Behlendorf <behlendorf1@llnl.gov> <behlendo@fedora-17-amd64.(none)>
@@ -102,12 +108,15 @@ Brandon Thetford <brandon@dodecatec.com> <dodexahedron@users.noreply.github.com>
buzzingwires <buzzingwires@outlook.com> <131118055+buzzingwires@users.noreply.github.com>
Cedric Maunoury <cedric.maunoury@gmail.com> <38213715+cedricmaunoury@users.noreply.github.com>
Charles Suh <charles.suh@gmail.com> <charlessuh@users.noreply.github.com>
Chris Peredun <chris.peredun@ixsystems.com> <126915832+chrisperedun@users.noreply.github.com>
Dacian Reece-Stremtan <dacianstremtan@gmail.com> <35844628+dacianstremtan@users.noreply.github.com>
Damian Szuberski <szuberskidamian@gmail.com> <30863496+szubersk@users.noreply.github.com>
Daniel Hiepler <d-git@coderdu.de> <32984777+heeplr@users.noreply.github.com>
Daniel Kobras <d.kobras@science-computing.de> <sckobras@users.noreply.github.com>
Daniel Reichelt <hacking@nachtgeist.net> <nachtgeist@users.noreply.github.com>
David Quigley <david.quigley@intel.com> <dpquigl@users.noreply.github.com>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com> <31087738+dennisfriedrichsen@users.noreply.github.com>
Dex Wood <slash2314@gmail.com> <slash2314@users.noreply.github.com>
DHE <git@dehacked.net> <DeHackEd@users.noreply.github.com>
Dmitri John Ledkov <dimitri.ledkov@canonical.com> <19779+xnox@users.noreply.github.com>
Dries Michiels <driesm.michiels@gmail.com> <32487486+driesmp@users.noreply.github.com>
@@ -128,6 +137,7 @@ Harry Mallon <hjmallon@gmail.com> <1816667+hjmallon@users.noreply.github.com>
Hiếu Lê <leorize+oss@disroot.org> <alaviss@users.noreply.github.com>
Jake Howard <git@theorangeone.net> <RealOrangeOne@users.noreply.github.com>
James Cowgill <james.cowgill@mips.com> <jcowgill@users.noreply.github.com>
Jaron Kent-Dobias <jaron@kent-dobias.com> <kentdobias@users.noreply.github.com>
Jason King <jason.king@joyent.com> <jasonbking@users.noreply.github.com>
Jeff Dike <jdike@akamai.com> <52420226+jdike@users.noreply.github.com>
Jitendra Patidar <jitendra.patidar@nutanix.com> <53164267+jsai20@users.noreply.github.com>
@@ -137,7 +147,9 @@ John L. Hammond <john.hammond@intel.com> <35266395+jhammond-intel@users.noreply.
John-Mark Gurney <jmg@funkthat.com> <jmgurney@users.noreply.github.com>
John Ramsden <johnramsden@riseup.net> <johnramsden@users.noreply.github.com>
Jonathon Fernyhough <jonathon@m2x.dev> <559369+jonathonf@users.noreply.github.com>
Jose Luis Duran <jlduran@gmail.com> <jlduran@users.noreply.github.com>
Justin Hibbits <chmeeedalf@gmail.com> <chmeeedalf@users.noreply.github.com>
Kevin Greene <kevin.greene@delphix.com> <104801862+kxgreene@users.noreply.github.com>
Kevin Jin <lostking2008@hotmail.com> <33590050+jxdking@users.noreply.github.com>
Kevin P. Fleming <kevin@km6g.us> <kpfleming@users.noreply.github.com>
Krzysztof Piecuch <piecuch@kpiecuch.pl> <3964215+pikrzysztof@users.noreply.github.com>
@@ -148,9 +160,11 @@ Lorenz Hüdepohl <dev@stellardeath.org> <lhuedepohl@users.noreply.github.com>
Luís Henriques <henrix@camandro.org> <73643340+lumigch@users.noreply.github.com>
Marcin Skarbek <git@skarbek.name> <mskarbek@users.noreply.github.com>
Matt Fiddaman <github@m.fiddaman.uk> <81489167+matt-fidd@users.noreply.github.com>
Maxim Filimonov <che@bein.link> <part1zano@users.noreply.github.com>
Max Zettlmeißl <max@zettlmeissl.de> <6818198+maxz@users.noreply.github.com>
Michael Niewöhner <foss@mniewoehner.de> <c0d3z3r0@users.noreply.github.com>
Michael Zhivich <mzhivich@akamai.com> <33133421+mzhivich@users.noreply.github.com>
MigeljanImeri <ImeriMigel@gmail.com> <78048439+MigeljanImeri@users.noreply.github.com>
Mo Zhou <cdluminate@gmail.com> <5723047+cdluminate@users.noreply.github.com>
Nick Mattis <nickm970@gmail.com> <nmattis@users.noreply.github.com>
omni <omni+vagant@hack.org> <79493359+omnivagant@users.noreply.github.com>
@@ -164,6 +178,7 @@ Ping Huang <huangping@smartx.com> <101400146+hpingfs@users.noreply.github.com>
Piotr P. Stefaniak <pstef@freebsd.org> <pstef@users.noreply.github.com>
Richard Allen <belperite@gmail.com> <33836503+belperite@users.noreply.github.com>
Rich Ercolani <rincebrain@gmail.com> <214141+rincebrain@users.noreply.github.com>
Rick Macklem <rmacklem@uoguelph.ca> <64620010+rmacklem@users.noreply.github.com>
Rob Wing <rob.wing@klarasystems.com> <98866084+rob-wing@users.noreply.github.com>
Roman Strashkin <roman.strashkin@nexenta.com> <Ramzec@users.noreply.github.com>
Ryan Hirasaki <ryanhirasaki@gmail.com> <4690732+RyanHir@users.noreply.github.com>
@@ -174,6 +189,8 @@ Scott Colby <scott@scolby.com> <scolby33@users.noreply.github.com>
Sean Eric Fagan <kithrup@mac.com> <kithrup@users.noreply.github.com>
Spencer Kinny <spencerkinny1995@gmail.com> <30333052+Spencer-Kinny@users.noreply.github.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com> <75025422+nssrikanth@users.noreply.github.com>
Stefan Lendl <s.lendl@proxmox.com> <1321542+stfl@users.noreply.github.com>
Thomas Bertschinger <bertschinger@lanl.gov> <101425190+bertschinger@users.noreply.github.com>
Thomas Geppert <geppi@digitx.de> <geppi@users.noreply.github.com>
Tim Crawford <tcrawford@datto.com> <crawfxrd@users.noreply.github.com>
Tom Matthews <tom@axiom-partners.com> <tomtastic@users.noreply.github.com>
@@ -181,6 +198,7 @@ Tony Perkins <tperkins@datto.com> <62951051+tony-zfs@users.noreply.github.com>
Torsten Wörtwein <twoertwein@gmail.com> <twoertwein@users.noreply.github.com>
Tulsi Jain <tulsi.jain@delphix.com> <TulsiJain@users.noreply.github.com>
Václav Skála <skala@vshosting.cz> <33496485+vaclavskala@users.noreply.github.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com> <88050553+vaibhav-delphix@users.noreply.github.com>
Violet Purcell <vimproved@inventati.org> <66446404+vimproved@users.noreply.github.com>
Vipin Kumar Verma <vipin.verma@hpe.com> <75025470+vermavipinkumar@users.noreply.github.com>
Wolfgang Bumiller <w.bumiller@proxmox.com> <Blub@users.noreply.github.com>
+35
View File
@@ -88,9 +88,11 @@ CONTRIBUTORS:
Bassu <bassu@phi9.com>
Ben Allen <bsallen@alcf.anl.gov>
Ben Cordero <bencord0@condi.me>
Benda Xu <orv@debian.org>
Benedikt Neuffer <github@itfriend.de>
Benjamin Albrecht <git@albrecht.io>
Benjamin Gentil <benjgentil.pro@gmail.com>
Benjamin Sherman <benjamin@holyarmy.org>
Ben McGough <bmcgough@fredhutch.org>
Ben Rubson <ben.rubson@gmail.com>
Ben Wolsieffer <benwolsieffer@gmail.com>
@@ -111,6 +113,7 @@ CONTRIBUTORS:
bzzz77 <bzzz.tomas@gmail.com>
cable2999 <cable2999@users.noreply.github.com>
Caleb James DeLisle <calebdelisle@lavabit.com>
Cameron Harr <harr1@llnl.gov>
Cao Xuewen <cao.xuewen@zte.com.cn>
Carlo Landmeter <clandmeter@gmail.com>
Carlos Alberto Lopez Perez <clopez@igalia.com>
@@ -120,12 +123,15 @@ CONTRIBUTORS:
Chen Can <chen.can2@zte.com.cn>
Chengfei Zhu <chengfeix.zhu@intel.com>
Chen Haiquan <oc@yunify.com>
ChenHao Lu <18302010006@fudan.edu.cn>
Chip Parker <aparker@enthought.com>
Chris Burroughs <chris.burroughs@gmail.com>
Chris Davidson <christopher.davidson@gmail.com>
Chris Dunlap <cdunlap@llnl.gov>
Chris Dunlop <chris@onthe.net.au>
Chris Lindee <chris.lindee+github@gmail.com>
Chris McDonough <chrism@plope.com>
Chris Peredun <chris.peredun@ixsystems.com>
Chris Siden <chris.siden@delphix.com>
Chris Siebenmann <cks.github@cs.toronto.edu>
Christer Ekholm <che@chrekh.se>
@@ -144,6 +150,7 @@ CONTRIBUTORS:
Clint Armstrong <clint@clintarmstrong.net>
Coleman Kane <ckane@colemankane.org>
Colin Ian King <colin.king@canonical.com>
Colin Percival <cperciva@tarsnap.com>
Colm Buckley <colm@tuatha.org>
Crag Wang <crag0715@gmail.com>
Craig Loomis <cloomis@astro.princeton.edu>
@@ -156,6 +163,7 @@ CONTRIBUTORS:
Damiano Albani <damiano.albani@gmail.com>
Damian Szuberski <szuberskidamian@gmail.com>
Damian Wojsław <damian@wojslaw.pl>
Daniel Berlin <dberlin@dberlin.org>
Daniel Hiepler <d-git@coderdu.de>
Daniel Hoffman <dj.hoffman@delphix.com>
Daniel Kobras <d.kobras@science-computing.de>
@@ -176,8 +184,10 @@ CONTRIBUTORS:
David Quigley <david.quigley@intel.com>
Debabrata Banerjee <dbanerje@akamai.com>
D. Ebdrup <debdrup@freebsd.org>
Dennis R. Friedrichsen <dennis.r.friedrichsen@gmail.com>
Denys Rtveliashvili <denys@rtveliashvili.name>
Derek Dai <daiderek@gmail.com>
Dex Wood <slash2314@gmail.com>
DHE <git@dehacked.net>
Didier Roche <didrocks@ubuntu.com>
Dimitri John Ledkov <xnox@ubuntu.com>
@@ -235,9 +245,11 @@ CONTRIBUTORS:
Gionatan Danti <g.danti@assyoma.it>
Giuseppe Di Natale <guss80@gmail.com>
Glenn Washburn <development@efficientek.com>
gofaster <felix.gofaster@gmail.com>
Gordan Bobic <gordan@redsleeve.org>
Gordon Bergling <gbergling@googlemail.com>
Gordon Ross <gwr@nexenta.com>
Gordon Tetlow <gordon@freebsd.org>
Graham Christensen <graham@grahamc.com>
Graham Perrin <grahamperrin@gmail.com>
Gregor Kopka <gregor@kopka.net>
@@ -265,6 +277,7 @@ CONTRIBUTORS:
Igor Kozhukhov <ikozhukhov@gmail.com>
Igor Lvovsky <ilvovsky@gmail.com>
ilbsmart <wgqimut@gmail.com>
Ilkka Sovanto <github@ilkka.kapsi.fi>
illiliti <illiliti@protonmail.com>
ilovezfs <ilovezfs@icloud.com>
InsanePrawn <Insane.Prawny@gmail.com>
@@ -280,9 +293,11 @@ CONTRIBUTORS:
Jan Engelhardt <jengelh@inai.de>
Jan Kryl <jan.kryl@nexenta.com>
Jan Sanislo <oystr@cs.washington.edu>
Jaron Kent-Dobias <jaron@kent-dobias.com>
Jason Cohen <jwittlincohen@gmail.com>
Jason Harmening <jason.harmening@gmail.com>
Jason King <jason.brian.king@gmail.com>
Jason Lee <jasonlee@lanl.gov>
Jason Zaman <jasonzaman@gmail.com>
Javen Wu <wu.javen@gmail.com>
Jean-Baptiste Lallement <jean-baptiste@ubuntu.com>
@@ -313,6 +328,7 @@ CONTRIBUTORS:
Jonathon Fernyhough <jonathon@m2x.dev>
Jorgen Lundman <lundman@lundman.net>
Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Jose Luis Duran <jlduran@gmail.com>
Josh Soref <jsoref@users.noreply.github.com>
Joshua M. Clulow <josh@sysmgr.org>
José Luis Salvador Rufo <salvador.joseluis@gmail.com>
@@ -336,8 +352,10 @@ CONTRIBUTORS:
Kash Pande <kash@tripleback.net>
Kay Pedersen <christianpe96@gmail.com>
Keith M Wesolowski <wesolows@foobazco.org>
Kent Ross <k@mad.cash>
KernelOfTruth <kerneloftruth@gmail.com>
Kevin Bowling <kevin.bowling@kev009.com>
Kevin Greene <kevin.greene@delphix.com>
Kevin Jin <lostking2008@hotmail.com>
Kevin P. Fleming <kevin@km6g.us>
Kevin Tanguy <kevin.tanguy@ovh.net>
@@ -389,6 +407,7 @@ CONTRIBUTORS:
Mark Shellenbaum <Mark.Shellenbaum@Oracle.COM>
marku89 <mar42@kola.li>
Mark Wright <markwright@internode.on.net>
Mart Frauenlob <allkind@fastest.cc>
Martin Matuska <mm@FreeBSD.org>
Martin Rüegg <martin.rueegg@metaworx.ch>
Massimo Maggi <me@massimo-maggi.eu>
@@ -405,6 +424,7 @@ CONTRIBUTORS:
Matus Kral <matuskral@me.com>
Mauricio Faria de Oliveira <mfo@canonical.com>
Max Grossman <max.grossman@delphix.com>
Maxim Filimonov <che@bein.link>
Maximilian Mehnert <maximilian.mehnert@gmx.de>
Max Zettlmeißl <max@zettlmeissl.de>
Md Islam <mdnahian@outlook.com>
@@ -417,6 +437,7 @@ CONTRIBUTORS:
Michael Niewöhner <foss@mniewoehner.de>
Michael Zhivich <mzhivich@akamai.com>
Michal Vasilek <michal@vasilek.cz>
MigeljanImeri <ImeriMigel@gmail.com>
Mike Gerdts <mike.gerdts@joyent.com>
Mike Harsch <mike@harschsystems.com>
Mike Leddy <mike.leddy@gmail.com>
@@ -448,6 +469,7 @@ CONTRIBUTORS:
Olaf Faaland <faaland1@llnl.gov>
Oleg Drokin <green@linuxhacker.ru>
Oleg Stepura <oleg@stepura.com>
Olivier Certner <olce.freebsd@certner.fr>
Olivier Mazouffre <olivier.mazouffre@ims-bordeaux.fr>
omni <omni+vagant@hack.org>
Orivej Desh <orivej@gmx.fr>
@@ -479,6 +501,7 @@ CONTRIBUTORS:
Prasad Joshi <prasadjoshi124@gmail.com>
privb0x23 <privb0x23@users.noreply.github.com>
P.SCH <p88@yahoo.com>
Quartz <yyhran@163.com>
Quentin Zdanis <zdanisq@gmail.com>
Rafael Kitover <rkitover@gmail.com>
RageLtMan <sempervictus@users.noreply.github.com>
@@ -491,11 +514,15 @@ CONTRIBUTORS:
Riccardo Schirone <rschirone91@gmail.com>
Richard Allen <belperite@gmail.com>
Richard Elling <Richard.Elling@RichardElling.com>
Richard Kojedzinszky <richard@kojedz.in>
Richard Laager <rlaager@wiktel.com>
Richard Lowe <richlowe@richlowe.net>
Richard Sharpe <rsharpe@samba.org>
Richard Yao <ryao@gentoo.org>
Rich Ercolani <rincebrain@gmail.com>
Rick Macklem <rmacklem@uoguelph.ca>
rilysh <nightquick@proton.me>
Robert Evans <evansr@google.com>
Robert Novak <sailnfool@gmail.com>
Roberto Ricci <ricci@disroot.org>
Rob Norris <robn@despairlabs.com>
@@ -509,7 +536,9 @@ CONTRIBUTORS:
Ryan Lahfa <masterancpp@gmail.com>
Ryan Libby <rlibby@FreeBSD.org>
Ryan Moeller <freqlabs@FreeBSD.org>
Sam Atkinson <samatk@amazon.com>
Sam Hathaway <github.com@munkynet.org>
Sam James <sam@gentoo.org>
Sam Lunt <samuel.j.lunt@gmail.com>
Samuel VERSCHELDE <stormi-github@ylix.fr>
Samuel Wycliffe <samuelwycliffe@gmail.com>
@@ -530,6 +559,8 @@ CONTRIBUTORS:
Shaan Nobee <sniper111@gmail.com>
Shampavman <sham.pavman@nexenta.com>
Shaun Tancheff <shaun@aeonazure.com>
Shawn Bayern <sbayern@law.fsu.edu>
Shengqi Chen <harry-chen@outlook.com>
Shen Yan <shenyanxxxy@qq.com>
Simon Guest <simon.guest@tesujimath.org>
Simon Klinkert <simon.klinkert@gmail.com>
@@ -537,6 +568,7 @@ CONTRIBUTORS:
Spencer Kinny <spencerkinny1995@gmail.com>
Srikanth N S <srikanth.nagasubbaraoseetharaman@hpe.com>
Stanislav Seletskiy <s.seletskiy@gmail.com>
Stefan Lendl <s.lendl@proxmox.com>
Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Stephen Blinick <stephen.blinick@delphix.com>
sterlingjensen <sterlingjensen@users.noreply.github.com>
@@ -557,6 +589,7 @@ CONTRIBUTORS:
Teodor Spæren <teodor_spaeren@riseup.net>
TerraTech <TerraTech@users.noreply.github.com>
Thijs Cramer <thijs.cramer@gmail.com>
Thomas Bertschinger <bertschinger@lanl.gov>
Thomas Geppert <geppi@digitx.de>
Thomas Lamprecht <guggentom@hotmail.de>
Till Maas <opensource@till.name>
@@ -586,6 +619,7 @@ CONTRIBUTORS:
Turbo Fredriksson <turbo@bayour.com>
Tyler J. Stachecki <stachecki.tyler@gmail.com>
Umer Saleem <usaleem@ixsystems.com>
Vaibhav Bhanawat <vaibhav.bhanawat@delphix.com>
Valmiky Arquissandas <kayvlim@gmail.com>
Val Packett <val@packett.cool>
Vince van Oosten <techhazard@codeforyouand.me>
@@ -614,6 +648,7 @@ CONTRIBUTORS:
yuina822 <ayuichi@club.kyutech.ac.jp>
YunQiang Su <syq@debian.org>
Yuri Pankov <yuri.pankov@gmail.com>
Yuxin Wang <yuxinwang9999@gmail.com>
Yuxuan Shui <yshuiv7@gmail.com>
Zachary Bedell <zac@thebedells.org>
Zach Dykstra <dykstra.zachary@gmail.com>
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.2.3
Version: 2.2.4
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.7
Linux-Maximum: 6.8
Linux-Minimum: 3.10
+10 -1
View File
@@ -793,18 +793,27 @@ def section_dmu(kstats_dict):
zfetch_stats = isolate_section('zfetchstats', kstats_dict)
zfetch_access_total = int(zfetch_stats['hits'])+int(zfetch_stats['misses'])
zfetch_access_total = int(zfetch_stats['hits']) +\
int(zfetch_stats['future']) + int(zfetch_stats['stride']) +\
int(zfetch_stats['past']) + int(zfetch_stats['misses'])
prt_1('DMU predictive prefetcher calls:', f_hits(zfetch_access_total))
prt_i2('Stream hits:',
f_perc(zfetch_stats['hits'], zfetch_access_total),
f_hits(zfetch_stats['hits']))
future = int(zfetch_stats['future']) + int(zfetch_stats['stride'])
prt_i2('Hits ahead of stream:', f_perc(future, zfetch_access_total),
f_hits(future))
prt_i2('Hits behind stream:',
f_perc(zfetch_stats['past'], zfetch_access_total),
f_hits(zfetch_stats['past']))
prt_i2('Stream misses:',
f_perc(zfetch_stats['misses'], zfetch_access_total),
f_hits(zfetch_stats['misses']))
prt_i2('Streams limit reached:',
f_perc(zfetch_stats['max_streams'], zfetch_stats['misses']),
f_hits(zfetch_stats['max_streams']))
prt_i1('Stream strides:', f_hits(zfetch_stats['stride']))
prt_i1('Prefetches issued', f_hits(zfetch_stats['io_issued']))
print()
+50 -7
View File
@@ -157,6 +157,16 @@ cols = {
"free": [5, 1024, "ARC free memory"],
"avail": [5, 1024, "ARC available memory"],
"waste": [5, 1024, "Wasted memory due to round up to pagesize"],
"ztotal": [6, 1000, "zfetch total prefetcher calls per second"],
"zhits": [5, 1000, "zfetch stream hits per second"],
"zahead": [6, 1000, "zfetch hits ahead of streams per second"],
"zpast": [5, 1000, "zfetch hits behind streams per second"],
"zmisses": [7, 1000, "zfetch stream misses per second"],
"zmax": [4, 1000, "zfetch limit reached per second"],
"zfuture": [7, 1000, "zfetch stream future per second"],
"zstride": [7, 1000, "zfetch stream strides per second"],
"zissued": [7, 1000, "zfetch prefetches issued per second"],
"zactive": [7, 1000, "zfetch prefetches active per second"],
}
v = {}
@@ -164,6 +174,8 @@ hdr = ["time", "read", "ddread", "ddh%", "dmread", "dmh%", "pread", "ph%",
"size", "c", "avail"]
xhdr = ["time", "mfu", "mru", "mfug", "mrug", "unc", "eskip", "mtxmis",
"dread", "pread", "read"]
zhdr = ["time", "ztotal", "zhits", "zahead", "zpast", "zmisses", "zmax",
"zfuture", "zstride", "zissued", "zactive"]
sint = 1 # Default interval is 1 second
count = 1 # Default count is 1
hdr_intr = 20 # Print header every 20 lines of output
@@ -188,6 +200,8 @@ if sys.platform.startswith('freebsd'):
k = [ctl for ctl in sysctl.filter('kstat.zfs.misc.arcstats')
if ctl.type != sysctl.CTLTYPE_NODE]
k += [ctl for ctl in sysctl.filter('kstat.zfs.misc.zfetchstats')
if ctl.type != sysctl.CTLTYPE_NODE]
if not k:
sys.exit(1)
@@ -199,19 +213,28 @@ if sys.platform.startswith('freebsd'):
continue
name, value = s.name, s.value
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
if "arcstats" in name:
# Trims 'kstat.zfs.misc.arcstats' from the name
kstat[name[24:]] = int(value)
else:
kstat["zfetch_" + name[27:]] = int(value)
elif sys.platform.startswith('linux'):
def kstat_update():
global kstat
k = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
k1 = [line.strip() for line in open('/proc/spl/kstat/zfs/arcstats')]
if not k:
k2 = ["zfetch_" + line.strip() for line in
open('/proc/spl/kstat/zfs/zfetchstats')]
if k1 is None or k2 is None:
sys.exit(1)
del k[0:2]
del k1[0:2]
del k2[0:2]
k = k1 + k2
kstat = {}
for s in k:
@@ -239,6 +262,7 @@ def usage():
sys.stderr.write("\t -v : List all possible field headers and definitions"
"\n")
sys.stderr.write("\t -x : Print extended stats\n")
sys.stderr.write("\t -z : Print zfetch stats\n")
sys.stderr.write("\t -f : Specify specific fields to print (see -v)\n")
sys.stderr.write("\t -o : Redirect output to the specified file\n")
sys.stderr.write("\t -s : Override default field separator with custom "
@@ -357,6 +381,7 @@ def init():
global count
global hdr
global xhdr
global zhdr
global opfile
global sep
global out
@@ -368,15 +393,17 @@ def init():
xflag = False
hflag = False
vflag = False
zflag = False
i = 1
try:
opts, args = getopt.getopt(
sys.argv[1:],
"axo:hvs:f:p",
"axzo:hvs:f:p",
[
"all",
"extended",
"zfetch",
"outfile",
"help",
"verbose",
@@ -410,13 +437,15 @@ def init():
i += 1
if opt in ('-p', '--parsable'):
pretty_print = False
if opt in ('-z', '--zfetch'):
zflag = True
i += 1
argv = sys.argv[i:]
sint = int(argv[0]) if argv else sint
count = int(argv[1]) if len(argv) > 1 else (0 if len(argv) > 0 else 1)
if hflag or (xflag and desired_cols):
if hflag or (xflag and zflag) or ((zflag or xflag) and desired_cols):
usage()
if vflag:
@@ -425,6 +454,9 @@ def init():
if xflag:
hdr = xhdr
if zflag:
hdr = zhdr
update_hdr_intr()
# check if L2ARC exists
@@ -569,6 +601,17 @@ def calculate():
v["el2mru"] = d["evict_l2_eligible_mru"] // sint
v["el2inel"] = d["evict_l2_ineligible"] // sint
v["mtxmis"] = d["mutex_miss"] // sint
v["ztotal"] = (d["zfetch_hits"] + d["zfetch_future"] + d["zfetch_stride"] +
d["zfetch_past"] + d["zfetch_misses"]) // sint
v["zhits"] = d["zfetch_hits"] // sint
v["zahead"] = (d["zfetch_future"] + d["zfetch_stride"]) // sint
v["zpast"] = d["zfetch_past"] // sint
v["zmisses"] = d["zfetch_misses"] // sint
v["zmax"] = d["zfetch_max_streams"] // sint
v["zfuture"] = d["zfetch_future"] // sint
v["zstride"] = d["zfetch_stride"] // sint
v["zissued"] = d["zfetch_io_issued"] // sint
v["zactive"] = d["zfetch_io_active"] // sint
if l2exist:
v["l2hits"] = d["l2_hits"] // sint
+26 -31
View File
@@ -22,6 +22,7 @@
* Copyright (c) 2004, 2010, Oracle and/or its affiliates. All rights reserved.
*
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
/*
@@ -231,28 +232,6 @@ fmd_prop_get_int32(fmd_hdl_t *hdl, const char *name)
if (strcmp(name, "spare_on_remove") == 0)
return (1);
if (strcmp(name, "io_N") == 0 || strcmp(name, "checksum_N") == 0)
return (10); /* N = 10 events */
return (0);
}
int64_t
fmd_prop_get_int64(fmd_hdl_t *hdl, const char *name)
{
(void) hdl;
/*
* These can be looked up in mp->modinfo->fmdi_props
* For now we just hard code for phase 2. In the
* future, there can be a ZED based override.
*/
if (strcmp(name, "remove_timeout") == 0)
return (15ULL * 1000ULL * 1000ULL * 1000ULL); /* 15 sec */
if (strcmp(name, "io_T") == 0 || strcmp(name, "checksum_T") == 0)
return (1000ULL * 1000ULL * 1000ULL * 600ULL); /* 10 min */
return (0);
}
@@ -535,6 +514,19 @@ fmd_serd_exists(fmd_hdl_t *hdl, const char *name)
return (fmd_serd_eng_lookup(&mp->mod_serds, name) != NULL);
}
int
fmd_serd_active(fmd_hdl_t *hdl, const char *name)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return (0);
}
return (fmd_serd_eng_fired(sgp) || !fmd_serd_eng_empty(sgp));
}
void
fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
{
@@ -543,12 +535,10 @@ fmd_serd_reset(fmd_hdl_t *hdl, const char *name)
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "serd engine '%s' does not exist", name);
return;
} else {
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
fmd_serd_eng_reset(sgp);
fmd_hdl_debug(hdl, "serd_reset %s", name);
}
int
@@ -556,16 +546,21 @@ fmd_serd_record(fmd_hdl_t *hdl, const char *name, fmd_event_t *ep)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_eng_t *sgp;
int err;
if ((sgp = fmd_serd_eng_lookup(&mp->mod_serds, name)) == NULL) {
zed_log_msg(LOG_ERR, "failed to add record to SERD engine '%s'",
name);
return (0);
}
err = fmd_serd_eng_record(sgp, ep->ev_hrt);
return (fmd_serd_eng_record(sgp, ep->ev_hrt));
}
return (err);
void
fmd_serd_gc(fmd_hdl_t *hdl)
{
fmd_module_t *mp = (fmd_module_t *)hdl;
fmd_serd_hash_apply(&mp->mod_serds, fmd_serd_eng_gc, NULL);
}
/* FMD Timers */
@@ -579,7 +574,7 @@ _timer_notify(union sigval sv)
const fmd_hdl_ops_t *ops = mp->mod_info->fmdi_ops;
struct itimerspec its;
fmd_hdl_debug(hdl, "timer fired (%p)", ftp->ft_tid);
fmd_hdl_debug(hdl, "%s timer fired (%p)", mp->mod_name, ftp->ft_tid);
/* disarm the timer */
memset(&its, 0, sizeof (struct itimerspec));
+2 -1
View File
@@ -151,7 +151,6 @@ extern void fmd_hdl_vdebug(fmd_hdl_t *, const char *, va_list);
extern void fmd_hdl_debug(fmd_hdl_t *, const char *, ...);
extern int32_t fmd_prop_get_int32(fmd_hdl_t *, const char *);
extern int64_t fmd_prop_get_int64(fmd_hdl_t *, const char *);
#define FMD_STAT_NOALLOC 0x0 /* fmd should use caller's memory */
#define FMD_STAT_ALLOC 0x1 /* fmd should allocate stats memory */
@@ -195,10 +194,12 @@ extern size_t fmd_buf_size(fmd_hdl_t *, fmd_case_t *, const char *);
extern void fmd_serd_create(fmd_hdl_t *, const char *, uint_t, hrtime_t);
extern void fmd_serd_destroy(fmd_hdl_t *, const char *);
extern int fmd_serd_exists(fmd_hdl_t *, const char *);
extern int fmd_serd_active(fmd_hdl_t *, const char *);
extern void fmd_serd_reset(fmd_hdl_t *, const char *);
extern int fmd_serd_record(fmd_hdl_t *, const char *, fmd_event_t *);
extern int fmd_serd_fired(fmd_hdl_t *, const char *);
extern int fmd_serd_empty(fmd_hdl_t *, const char *);
extern void fmd_serd_gc(fmd_hdl_t *);
extern id_t fmd_timer_install(fmd_hdl_t *, void *, fmd_event_t *, hrtime_t);
extern void fmd_timer_remove(fmd_hdl_t *, id_t);
+2 -1
View File
@@ -310,8 +310,9 @@ fmd_serd_eng_reset(fmd_serd_eng_t *sgp)
}
void
fmd_serd_eng_gc(fmd_serd_eng_t *sgp)
fmd_serd_eng_gc(fmd_serd_eng_t *sgp, void *arg)
{
(void) arg;
fmd_serd_elem_t *sep, *nep;
hrtime_t hrt;
+1 -1
View File
@@ -77,7 +77,7 @@ extern int fmd_serd_eng_fired(fmd_serd_eng_t *);
extern int fmd_serd_eng_empty(fmd_serd_eng_t *);
extern void fmd_serd_eng_reset(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *);
extern void fmd_serd_eng_gc(fmd_serd_eng_t *, void *);
#ifdef __cplusplus
}
+117 -26
View File
@@ -23,6 +23,7 @@
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2016, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
#include <stddef.h>
@@ -47,11 +48,16 @@
#define DEFAULT_CHECKSUM_T 600 /* seconds */
#define DEFAULT_IO_N 10 /* events */
#define DEFAULT_IO_T 600 /* seconds */
#define DEFAULT_SLOW_IO_N 10 /* events */
#define DEFAULT_SLOW_IO_T 30 /* seconds */
#define CASE_GC_TIMEOUT_SECS 43200 /* 12 hours */
/*
* Our serd engines are named 'zfs_<pool_guid>_<vdev_guid>_{checksum,io}'. This
* #define reserves enough space for two 64-bit hex values plus the length of
* the longest string.
* Our serd engines are named in the following format:
* 'zfs_<pool_guid>_<vdev_guid>_{checksum,io,slow_io}'
* This #define reserves enough space for two 64-bit hex values plus the
* length of the longest string.
*/
#define MAX_SERDLEN (16 * 2 + sizeof ("zfs___checksum"))
@@ -68,6 +74,7 @@ typedef struct zfs_case_data {
int zc_pool_state;
char zc_serd_checksum[MAX_SERDLEN];
char zc_serd_io[MAX_SERDLEN];
char zc_serd_slow_io[MAX_SERDLEN];
int zc_has_remove_timer;
} zfs_case_data_t;
@@ -114,7 +121,8 @@ zfs_de_stats_t zfs_stats = {
{ "resource_drops", FMD_TYPE_UINT64, "resource related ereports" }
};
static hrtime_t zfs_remove_timeout;
/* wait 15 seconds after a removal */
static hrtime_t zfs_remove_timeout = SEC2NSEC(15);
uu_list_pool_t *zfs_case_pool;
uu_list_t *zfs_cases;
@@ -124,6 +132,8 @@ uu_list_t *zfs_cases;
#define ZFS_MAKE_EREPORT(type) \
FM_EREPORT_CLASS "." ZFS_ERROR_CLASS "." type
static void zfs_purge_cases(fmd_hdl_t *hdl);
/*
* Write out the persistent representation of an active case.
*/
@@ -170,6 +180,42 @@ zfs_case_unserialize(fmd_hdl_t *hdl, fmd_case_t *cp)
return (zcp);
}
/*
* count other unique slow-io cases in a pool
*/
static uint_t
zfs_other_slow_cases(fmd_hdl_t *hdl, const zfs_case_data_t *zfs_case)
{
zfs_case_t *zcp;
uint_t cases = 0;
static hrtime_t next_check = 0;
/*
* Note that plumbing in some external GC would require adding locking,
* since most of this module code is not thread safe and assumes there
* is only one thread running against the module. So we perform GC here
* inline periodically so that future delay induced faults will be
* possible once the issue causing multiple vdev delays is resolved.
*/
if (gethrestime_sec() > next_check) {
/* Periodically purge old SERD entries and stale cases */
fmd_serd_gc(hdl);
zfs_purge_cases(hdl);
next_check = gethrestime_sec() + CASE_GC_TIMEOUT_SECS;
}
for (zcp = uu_list_first(zfs_cases); zcp != NULL;
zcp = uu_list_next(zfs_cases, zcp)) {
if (zcp->zc_data.zc_pool_guid == zfs_case->zc_pool_guid &&
zcp->zc_data.zc_vdev_guid != zfs_case->zc_vdev_guid &&
zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_active(hdl, zcp->zc_data.zc_serd_slow_io)) {
cases++;
}
}
return (cases);
}
/*
* Iterate over any active cases. If any cases are associated with a pool or
* vdev which is no longer present on the system, close the associated case.
@@ -376,6 +422,14 @@ zfs_serd_name(char *buf, uint64_t pool_guid, uint64_t vdev_guid,
(long long unsigned int)vdev_guid, type);
}
static void
zfs_case_retire(fmd_hdl_t *hdl, zfs_case_t *zcp)
{
fmd_hdl_debug(hdl, "retiring case");
fmd_case_close(hdl, zcp->zc_case);
}
/*
* Solve a given ZFS case. This first checks to make sure the diagnosis is
* still valid, as well as cleaning up any pending timer associated with the
@@ -632,9 +686,7 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DATA)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0 ||
strcmp(class,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) == 0) {
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CONFIG_CACHE_WRITE)) == 0) {
zfs_stats.resource_drops.fmds_value.ui64++;
return;
}
@@ -702,6 +754,9 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (zcp->zc_data.zc_serd_checksum[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_reset(hdl,
zcp->zc_data.zc_serd_slow_io);
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_RSRC(FM_RESOURCE_STATECHANGE))) {
uint64_t state = 0;
@@ -730,7 +785,11 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
if (fmd_case_solved(hdl, zcp->zc_case))
return;
fmd_hdl_debug(hdl, "error event '%s'", class);
if (vdev_guid)
fmd_hdl_debug(hdl, "error event '%s', vdev %llu", class,
vdev_guid);
else
fmd_hdl_debug(hdl, "error event '%s'", class);
/*
* Determine if we should solve the case and generate a fault. We solve
@@ -779,6 +838,8 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_IO_FAILURE)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY)) ||
fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_PROBE_FAILURE))) {
const char *failmode = NULL;
boolean_t checkremove = B_FALSE;
@@ -814,6 +875,51 @@ zfs_fm_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl, const char *class)
}
if (fmd_serd_record(hdl, zcp->zc_data.zc_serd_io, ep))
checkremove = B_TRUE;
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_DELAY))) {
uint64_t slow_io_n, slow_io_t;
/*
* Create a slow io SERD engine when the VDEV has the
* 'vdev_slow_io_n' and 'vdev_slow_io_n' properties.
*/
if (zcp->zc_data.zc_serd_slow_io[0] == '\0' &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_N,
&slow_io_n) == 0 &&
nvlist_lookup_uint64(nvl,
FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_T,
&slow_io_t) == 0) {
zfs_serd_name(zcp->zc_data.zc_serd_slow_io,
pool_guid, vdev_guid, "slow_io");
fmd_serd_create(hdl,
zcp->zc_data.zc_serd_slow_io,
slow_io_n,
SEC2NSEC(slow_io_t));
zfs_case_serialize(zcp);
}
/* Pass event to SERD engine and see if this triggers */
if (zcp->zc_data.zc_serd_slow_io[0] != '\0' &&
fmd_serd_record(hdl, zcp->zc_data.zc_serd_slow_io,
ep)) {
/*
* Ignore a slow io diagnosis when other
* VDEVs in the pool show signs of being slow.
*/
if (zfs_other_slow_cases(hdl, &zcp->zc_data)) {
zfs_case_retire(hdl, zcp);
fmd_hdl_debug(hdl, "pool %llu has "
"multiple slow io cases -- skip "
"degrading vdev %llu",
(u_longlong_t)
zcp->zc_data.zc_pool_guid,
(u_longlong_t)
zcp->zc_data.zc_vdev_guid);
} else {
zfs_case_solve(hdl, zcp,
"fault.fs.zfs.vdev.slow_io");
}
}
} else if (fmd_nvl_class_match(hdl, nvl,
ZFS_MAKE_EREPORT(FM_EREPORT_ZFS_CHECKSUM))) {
/*
@@ -924,6 +1030,8 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_checksum);
if (zcp->zc_data.zc_serd_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_io);
if (zcp->zc_data.zc_serd_slow_io[0] != '\0')
fmd_serd_destroy(hdl, zcp->zc_data.zc_serd_slow_io);
if (zcp->zc_data.zc_has_remove_timer)
fmd_timer_remove(hdl, zcp->zc_remove_timer);
@@ -932,30 +1040,15 @@ zfs_fm_close(fmd_hdl_t *hdl, fmd_case_t *cs)
fmd_hdl_free(hdl, zcp, sizeof (zfs_case_t));
}
/*
* We use the fmd gc entry point to look for old cases that no longer apply.
* This allows us to keep our set of case data small in a long running system.
*/
static void
zfs_fm_gc(fmd_hdl_t *hdl)
{
zfs_purge_cases(hdl);
}
static const fmd_hdl_ops_t fmd_ops = {
zfs_fm_recv, /* fmdo_recv */
zfs_fm_timeout, /* fmdo_timeout */
zfs_fm_close, /* fmdo_close */
NULL, /* fmdo_stats */
zfs_fm_gc, /* fmdo_gc */
NULL, /* fmdo_gc */
};
static const fmd_prop_t fmd_props[] = {
{ "checksum_N", FMD_TYPE_UINT32, "10" },
{ "checksum_T", FMD_TYPE_TIME, "10min" },
{ "io_N", FMD_TYPE_UINT32, "10" },
{ "io_T", FMD_TYPE_TIME, "10min" },
{ "remove_timeout", FMD_TYPE_TIME, "15sec" },
{ NULL, 0, NULL }
};
@@ -996,8 +1089,6 @@ _zfs_diagnosis_init(fmd_hdl_t *hdl)
(void) fmd_stat_create(hdl, FMD_STAT_NOALLOC, sizeof (zfs_stats) /
sizeof (fmd_stat_t), (fmd_stat_t *)&zfs_stats);
zfs_remove_timeout = fmd_prop_get_int64(hdl, "remove_timeout");
}
void
+3
View File
@@ -523,6 +523,9 @@ zfs_retire_recv(fmd_hdl_t *hdl, fmd_event_t *ep, nvlist_t *nvl,
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.checksum")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.vdev.slow_io")) {
degrade_device = B_TRUE;
} else if (fmd_nvl_class_match(hdl, fault,
"fault.fs.zfs.device")) {
fault_device = B_FALSE;
+61 -14
View File
@@ -309,7 +309,8 @@ get_usage(zfs_help_t idx)
"[filesystem|volume|snapshot] ...\n"));
case HELP_MOUNT:
return (gettext("\tmount\n"
"\tmount [-flvO] [-o opts] <-a | filesystem>\n"));
"\tmount [-flvO] [-o opts] <-a|-R filesystem|"
"filesystem>\n"));
case HELP_PROMOTE:
return (gettext("\tpromote <clone-filesystem>\n"));
case HELP_RECEIVE:
@@ -6750,6 +6751,8 @@ zfs_do_holds(int argc, char **argv)
#define MOUNT_TIME 1 /* seconds */
typedef struct get_all_state {
char **ga_datasets;
int ga_count;
boolean_t ga_verbose;
get_all_cb_t *ga_cbp;
} get_all_state_t;
@@ -6796,19 +6799,35 @@ get_one_dataset(zfs_handle_t *zhp, void *data)
return (0);
}
static void
get_all_datasets(get_all_cb_t *cbp, boolean_t verbose)
static int
get_recursive_datasets(zfs_handle_t *zhp, void *data)
{
get_all_state_t state = {
.ga_verbose = verbose,
.ga_cbp = cbp
};
get_all_state_t *state = data;
int len = strlen(zfs_get_name(zhp));
for (int i = 0; i < state->ga_count; ++i) {
if (strcmp(state->ga_datasets[i], zfs_get_name(zhp)) == 0)
return (get_one_dataset(zhp, data));
else if ((strncmp(state->ga_datasets[i], zfs_get_name(zhp),
len) == 0) && state->ga_datasets[i][len] == '/') {
(void) zfs_iter_filesystems_v2(zhp, 0,
get_recursive_datasets, data);
}
}
zfs_close(zhp);
return (0);
}
if (verbose)
static void
get_all_datasets(get_all_state_t *state)
{
if (state->ga_verbose)
set_progress_header(gettext("Reading ZFS config"));
(void) zfs_iter_root(g_zfs, get_one_dataset, &state);
if (state->ga_datasets == NULL)
(void) zfs_iter_root(g_zfs, get_one_dataset, state);
else
(void) zfs_iter_root(g_zfs, get_recursive_datasets, state);
if (verbose)
if (state->ga_verbose)
finish_progress(gettext("done."));
}
@@ -7154,18 +7173,22 @@ static int
share_mount(int op, int argc, char **argv)
{
int do_all = 0;
int recursive = 0;
boolean_t verbose = B_FALSE;
int c, ret = 0;
char *options = NULL;
int flags = 0;
/* check options */
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":alvo:Of" : "al"))
while ((c = getopt(argc, argv, op == OP_MOUNT ? ":aRlvo:Of" : "al"))
!= -1) {
switch (c) {
case 'a':
do_all = 1;
break;
case 'R':
recursive = 1;
break;
case 'v':
verbose = B_TRUE;
break;
@@ -7207,7 +7230,7 @@ share_mount(int op, int argc, char **argv)
argv += optind;
/* check number of arguments */
if (do_all) {
if (do_all || recursive) {
enum sa_protocol protocol = SA_NO_PROTOCOL;
if (op == OP_SHARE && argc > 0) {
@@ -7216,14 +7239,38 @@ share_mount(int op, int argc, char **argv)
argv++;
}
if (argc != 0) {
if (argc != 0 && do_all) {
(void) fprintf(stderr, gettext("too many arguments\n"));
usage(B_FALSE);
}
if (argc == 0 && recursive) {
(void) fprintf(stderr,
gettext("no dataset provided\n"));
usage(B_FALSE);
}
start_progress_timer();
get_all_cb_t cb = { 0 };
get_all_datasets(&cb, verbose);
get_all_state_t state = { 0 };
if (argc == 0) {
state.ga_datasets = NULL;
state.ga_count = -1;
} else {
zfs_handle_t *zhp;
for (int i = 0; i < argc; i++) {
zhp = zfs_open(g_zfs, argv[i],
ZFS_TYPE_FILESYSTEM);
if (zhp == NULL)
usage(B_FALSE);
zfs_close(zhp);
}
state.ga_datasets = argv;
state.ga_count = argc;
}
state.ga_verbose = verbose;
state.ga_cbp = &cb;
get_all_datasets(&state);
if (cb.cb_used == 0) {
free(options);
+16
View File
@@ -1083,6 +1083,22 @@ main(int argc, char **argv)
libzfs_fini(g_zfs);
return (1);
}
if (record.zi_nlanes) {
switch (io_type) {
case ZIO_TYPE_READ:
case ZIO_TYPE_WRITE:
case ZIO_TYPES:
break;
default:
(void) fprintf(stderr, "I/O type for a delay "
"must be 'read' or 'write'\n");
usage();
libzfs_fini(g_zfs);
return (1);
}
}
if (!error)
error = ENXIO;
+2 -2
View File
@@ -438,7 +438,7 @@ static char *zpool_sysfs_gets(char *path)
return (NULL);
}
buf = calloc(sizeof (*buf), statbuf.st_size + 1);
buf = calloc(statbuf.st_size + 1, sizeof (*buf));
if (buf == NULL) {
close(fd);
return (NULL);
@@ -458,7 +458,7 @@ static char *zpool_sysfs_gets(char *path)
}
/* Remove trailing newline */
if (buf[count - 1] == '\n')
if (count > 0 && buf[count - 1] == '\n')
buf[count - 1] = 0;
close(fd);
+95 -51
View File
@@ -22,7 +22,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright (c) 2012 by Frederik Wessels. All rights reserved.
* Copyright (c) 2012 by Cyril Plisko. All rights reserved.
* Copyright (c) 2013 by Prasad Joshi (sTec). All rights reserved.
@@ -131,6 +131,13 @@ static int zpool_do_help(int argc, char **argv);
static zpool_compat_status_t zpool_do_load_compat(
const char *, boolean_t *);
enum zpool_options {
ZPOOL_OPTION_POWER = 1024,
ZPOOL_OPTION_ALLOW_INUSE,
ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH,
ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH
};
/*
* These libumem hooks provide a reasonable set of defaults for the allocator's
* debugging facilities.
@@ -347,7 +354,7 @@ get_usage(zpool_help_t idx)
{
switch (idx) {
case HELP_ADD:
return (gettext("\tadd [-fgLnP] [-o property=value] "
return (gettext("\tadd [-afgLnP] [-o property=value] "
"<pool> <vdev> ...\n"));
case HELP_ATTACH:
return (gettext("\tattach [-fsw] [-o property=value] "
@@ -413,7 +420,7 @@ get_usage(zpool_help_t idx)
"[<device> ...]\n"));
case HELP_STATUS:
return (gettext("\tstatus [--power] [-c [script1,script2,...]] "
"[-igLpPstvxD] [-T d|u] [pool] ... \n"
"[-DegiLpPstvx] [-T d|u] [pool] ...\n"
"\t [interval [count]]\n"));
case HELP_UPGRADE:
return (gettext("\tupgrade\n"
@@ -1009,8 +1016,9 @@ add_prop_list_default(const char *propname, const char *propval,
}
/*
* zpool add [-fgLnP] [-o property=value] <pool> <vdev> ...
* zpool add [-afgLnP] [-o property=value] <pool> <vdev> ...
*
* -a Disable the ashift validation checks
* -f Force addition of devices, even if they appear in use
* -g Display guid for individual vdev name.
* -L Follow links when resolving vdev path name.
@@ -1026,8 +1034,11 @@ add_prop_list_default(const char *propname, const char *propval,
int
zpool_do_add(int argc, char **argv)
{
boolean_t force = B_FALSE;
boolean_t check_replication = B_TRUE;
boolean_t check_inuse = B_TRUE;
boolean_t dryrun = B_FALSE;
boolean_t check_ashift = B_TRUE;
boolean_t force = B_FALSE;
int name_flags = 0;
int c;
nvlist_t *nvroot;
@@ -1038,8 +1049,18 @@ zpool_do_add(int argc, char **argv)
nvlist_t *props = NULL;
char *propval;
struct option long_options[] = {
{"allow-in-use", no_argument, NULL, ZPOOL_OPTION_ALLOW_INUSE},
{"allow-replication-mismatch", no_argument, NULL,
ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH},
{"allow-ashift-mismatch", no_argument, NULL,
ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt(argc, argv, "fgLno:P")) != -1) {
while ((c = getopt_long(argc, argv, "fgLno:P", long_options, NULL))
!= -1) {
switch (c) {
case 'f':
force = B_TRUE;
@@ -1069,6 +1090,15 @@ zpool_do_add(int argc, char **argv)
case 'P':
name_flags |= VDEV_NAME_PATH;
break;
case ZPOOL_OPTION_ALLOW_INUSE:
check_inuse = B_FALSE;
break;
case ZPOOL_OPTION_ALLOW_REPLICATION_MISMATCH:
check_replication = B_FALSE;
break;
case ZPOOL_OPTION_ALLOW_ASHIFT_MISMATCH:
check_ashift = B_FALSE;
break;
case '?':
(void) fprintf(stderr, gettext("invalid option '%c'\n"),
optopt);
@@ -1089,6 +1119,19 @@ zpool_do_add(int argc, char **argv)
usage(B_FALSE);
}
if (force) {
if (!check_inuse || !check_replication || !check_ashift) {
(void) fprintf(stderr, gettext("'-f' option is not "
"allowed with '--allow-replication-mismatch', "
"'--allow-ashift-mismatch', or "
"'--allow-in-use'\n"));
usage(B_FALSE);
}
check_inuse = B_FALSE;
check_replication = B_FALSE;
check_ashift = B_FALSE;
}
poolname = argv[0];
argc--;
@@ -1119,8 +1162,8 @@ zpool_do_add(int argc, char **argv)
}
/* pass off to make_root_vdev for processing */
nvroot = make_root_vdev(zhp, props, force, !force, B_FALSE, dryrun,
argc, argv);
nvroot = make_root_vdev(zhp, props, !check_inuse,
check_replication, B_FALSE, dryrun, argc, argv);
if (nvroot == NULL) {
zpool_close(zhp);
return (1);
@@ -1224,7 +1267,7 @@ zpool_do_add(int argc, char **argv)
ret = 0;
} else {
ret = (zpool_add(zhp, nvroot) != 0);
ret = (zpool_add(zhp, nvroot, check_ashift) != 0);
}
nvlist_free(props);
@@ -2246,7 +2289,6 @@ print_status_initialize(vdev_stat_t *vs, boolean_t verbose)
!vs->vs_scan_removing) {
char zbuf[1024];
char tbuf[256];
struct tm zaction_ts;
time_t t = vs->vs_initialize_action_time;
int initialize_pct = 100;
@@ -2256,8 +2298,8 @@ print_status_initialize(vdev_stat_t *vs, boolean_t verbose)
100 / (vs->vs_initialize_bytes_est + 1));
}
(void) localtime_r(&t, &zaction_ts);
(void) strftime(tbuf, sizeof (tbuf), "%c", &zaction_ts);
(void) ctime_r(&t, tbuf);
tbuf[24] = 0;
switch (vs->vs_initialize_state) {
case VDEV_INITIALIZE_SUSPENDED:
@@ -2297,7 +2339,6 @@ print_status_trim(vdev_stat_t *vs, boolean_t verbose)
!vs->vs_scan_removing) {
char zbuf[1024];
char tbuf[256];
struct tm zaction_ts;
time_t t = vs->vs_trim_action_time;
int trim_pct = 100;
@@ -2306,8 +2347,8 @@ print_status_trim(vdev_stat_t *vs, boolean_t verbose)
100 / (vs->vs_trim_bytes_est + 1));
}
(void) localtime_r(&t, &zaction_ts);
(void) strftime(tbuf, sizeof (tbuf), "%c", &zaction_ts);
(void) ctime_r(&t, tbuf);
tbuf[24] = 0;
switch (vs->vs_trim_state) {
case VDEV_TRIM_SUSPENDED:
@@ -2569,7 +2610,13 @@ print_status_config(zpool_handle_t *zhp, status_cbdata_t *cb, const char *name,
break;
case VDEV_AUX_ERR_EXCEEDED:
(void) printf(gettext("too many errors"));
if (vs->vs_read_errors + vs->vs_write_errors +
vs->vs_checksum_errors == 0 && children == 0 &&
vs->vs_slow_ios > 0) {
(void) printf(gettext("too many slow I/Os"));
} else {
(void) printf(gettext("too many errors"));
}
break;
case VDEV_AUX_IO_FAILURE:
@@ -3396,10 +3443,10 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
ms_status = zpool_enable_datasets(zhp, mntopts, 0);
if (ms_status == EZFS_SHAREFAILED) {
(void) fprintf(stderr, gettext("Import was "
"successful, but unable to share some datasets"));
"successful, but unable to share some datasets\n"));
} else if (ms_status == EZFS_MOUNTFAILED) {
(void) fprintf(stderr, gettext("Import was "
"successful, but unable to mount some datasets"));
"successful, but unable to mount some datasets\n"));
}
}
@@ -7064,7 +7111,6 @@ zpool_do_split(int argc, char **argv)
return (ret);
}
#define POWER_OPT 1024
/*
* zpool online [--power] <pool> <device> ...
@@ -7082,7 +7128,7 @@ zpool_do_online(int argc, char **argv)
int flags = 0;
boolean_t is_power_on = B_FALSE;
struct option long_options[] = {
{"power", no_argument, NULL, POWER_OPT},
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
@@ -7092,7 +7138,7 @@ zpool_do_online(int argc, char **argv)
case 'e':
flags |= ZFS_ONLINE_EXPAND;
break;
case POWER_OPT:
case ZPOOL_OPTION_POWER:
is_power_on = B_TRUE;
break;
case '?':
@@ -7205,7 +7251,7 @@ zpool_do_offline(int argc, char **argv)
boolean_t is_power_off = B_FALSE;
struct option long_options[] = {
{"power", no_argument, NULL, POWER_OPT},
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
@@ -7218,7 +7264,7 @@ zpool_do_offline(int argc, char **argv)
case 't':
istmp = B_TRUE;
break;
case POWER_OPT:
case ZPOOL_OPTION_POWER:
is_power_off = B_TRUE;
break;
case '?':
@@ -7318,7 +7364,7 @@ zpool_do_clear(int argc, char **argv)
char *pool, *device;
struct option long_options[] = {
{"power", no_argument, NULL, POWER_OPT},
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
@@ -7335,7 +7381,7 @@ zpool_do_clear(int argc, char **argv)
case 'X':
xtreme_rewind = B_TRUE;
break;
case POWER_OPT:
case ZPOOL_OPTION_POWER:
is_power_on = B_TRUE;
break;
case '?':
@@ -8864,7 +8910,7 @@ status_callback(zpool_handle_t *zhp, void *data)
printf_color(ANSI_BOLD, gettext("action: "));
printf_color(ANSI_YELLOW, gettext("Make sure the pool's devices"
" are connected, then reboot your system and\n\timport the "
"pool.\n"));
"pool or run 'zpool clear' to resume the pool.\n"));
break;
case ZPOOL_STATUS_IO_FAILURE_WAIT:
@@ -9064,22 +9110,22 @@ status_callback(zpool_handle_t *zhp, void *data)
}
/*
* zpool status [-c [script1,script2,...]] [-igLpPstvx] [--power] [-T d|u] ...
* zpool status [-c [script1,script2,...]] [-DegiLpPstvx] [--power] [-T d|u] ...
* [pool] [interval [count]]
*
* -c CMD For each vdev, run command CMD
* -D Display dedup status (undocumented)
* -e Display only unhealthy vdevs
* -i Display vdev initialization status.
* -g Display guid for individual vdev name.
* -i Display vdev initialization status.
* -L Follow links when resolving vdev path name.
* -p Display values in parsable (exact) format.
* -P Display full path for vdev name.
* -s Display slow IOs column.
* -v Display complete error logs
* -x Display only pools with potential problems
* -D Display dedup status (undocumented)
* -t Display vdev TRIM status.
* -T Display a timestamp in date(1) or Unix format
* -v Display complete error logs
* -x Display only pools with potential problems
* --power Display vdev enclosure slot power status
*
* Describes the health status of all pools or some subset.
@@ -9095,12 +9141,12 @@ zpool_do_status(int argc, char **argv)
char *cmd = NULL;
struct option long_options[] = {
{"power", no_argument, NULL, POWER_OPT},
{"power", no_argument, NULL, ZPOOL_OPTION_POWER},
{0, 0, 0, 0}
};
/* check options */
while ((c = getopt_long(argc, argv, "c:eigLpPsvxDtT:", long_options,
while ((c = getopt_long(argc, argv, "c:DegiLpPstT:vx", long_options,
NULL)) != -1) {
switch (c) {
case 'c':
@@ -9127,15 +9173,18 @@ zpool_do_status(int argc, char **argv)
}
cmd = optarg;
break;
case 'D':
cb.cb_dedup_stats = B_TRUE;
break;
case 'e':
cb.cb_print_unhealthy = B_TRUE;
break;
case 'i':
cb.cb_print_vdev_init = B_TRUE;
break;
case 'g':
cb.cb_name_flags |= VDEV_NAME_GUID;
break;
case 'i':
cb.cb_print_vdev_init = B_TRUE;
break;
case 'L':
cb.cb_name_flags |= VDEV_NAME_FOLLOW_LINKS;
break;
@@ -9148,22 +9197,19 @@ zpool_do_status(int argc, char **argv)
case 's':
cb.cb_print_slow_ios = B_TRUE;
break;
case 'v':
cb.cb_verbose = B_TRUE;
break;
case 'x':
cb.cb_explain = B_TRUE;
break;
case 'D':
cb.cb_dedup_stats = B_TRUE;
break;
case 't':
cb.cb_print_vdev_trim = B_TRUE;
break;
case 'T':
get_timestamp_arg(*optarg);
break;
case POWER_OPT:
case 'v':
cb.cb_verbose = B_TRUE;
break;
case 'x':
cb.cb_explain = B_TRUE;
break;
case ZPOOL_OPTION_POWER:
cb.cb_print_power = B_TRUE;
break;
case '?':
@@ -9202,7 +9248,6 @@ zpool_do_status(int argc, char **argv)
if (cb.vcdl != NULL)
free_vdev_cmd_data_list(cb.vcdl);
if (argc == 0 && cb.cb_count == 0)
(void) fprintf(stderr, gettext("no pools available\n"));
else if (cb.cb_explain && cb.cb_first && cb.cb_allpools)
@@ -10638,11 +10683,10 @@ found:
}
} else {
/*
* The first arg isn't a pool name,
* The first arg isn't the name of a valid pool.
*/
fprintf(stderr, gettext("missing pool name.\n"));
fprintf(stderr, "\n");
usage(B_FALSE);
fprintf(stderr, gettext("Cannot get properties of %s: "
"no such pool available.\n"), argv[0]);
return (1);
}
+4 -4
View File
@@ -20,7 +20,7 @@
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2018 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2013 Steven Hartland. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
@@ -3270,7 +3270,7 @@ ztest_vdev_add_remove(ztest_ds_t *zd, uint64_t id)
"log" : NULL, ztest_opts.zo_raid_children, zs->zs_mirrors,
1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
fnvlist_free(nvroot);
switch (error) {
@@ -3332,7 +3332,7 @@ ztest_vdev_class_add(ztest_ds_t *zd, uint64_t id)
nvroot = make_vdev_root(NULL, NULL, NULL, ztest_opts.zo_vdev_size, 0,
class, ztest_opts.zo_raid_children, zs->zs_mirrors, 1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
fnvlist_free(nvroot);
if (error == ENOSPC)
@@ -3439,7 +3439,7 @@ ztest_vdev_aux_add_remove(ztest_ds_t *zd, uint64_t id)
*/
nvlist_t *nvroot = make_vdev_root(NULL, aux, NULL,
(ztest_opts.zo_vdev_size * 5) / 4, 0, NULL, 0, 0, 1);
error = spa_vdev_add(spa, nvroot);
error = spa_vdev_add(spa, nvroot, B_FALSE);
switch (error) {
case 0:
+2 -1
View File
@@ -18,6 +18,7 @@ subst_sed_cmd = \
-e 's|@ASAN_ENABLED[@]|$(ASAN_ENABLED)|g' \
-e 's|@DEFAULT_INIT_NFS_SERVER[@]|$(DEFAULT_INIT_NFS_SERVER)|g' \
-e 's|@DEFAULT_INIT_SHELL[@]|$(DEFAULT_INIT_SHELL)|g' \
-e 's|@IS_SYSV_RC[@]|$(IS_SYSV_RC)|g' \
-e 's|@LIBFETCH_DYNAMIC[@]|$(LIBFETCH_DYNAMIC)|g' \
-e 's|@LIBFETCH_SONAME[@]|$(LIBFETCH_SONAME)|g' \
-e 's|@PYTHON[@]|$(PYTHON)|g' \
@@ -43,4 +44,4 @@ SUBSTFILES =
CLEANFILES += $(SUBSTFILES)
dist_noinst_DATA += $(SUBSTFILES:=.in)
$(call SUBST,%,)
$(SUBSTFILES): $(call SUBST,%,)
+5 -4
View File
@@ -80,10 +80,11 @@ AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_PYZFS], [
[AC_MSG_ERROR("Python $PYTHON_VERSION unknown")]
)
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION], [
AS_IF([test "x$enable_pyzfs" = xyes], [
AC_MSG_ERROR("Python $PYTHON_REQUIRED_VERSION development library is not installed")
], [test "x$enable_pyzfs" != xno], [
AS_IF([test "x$enable_pyzfs" = xyes], [
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION])
], [
AX_PYTHON_DEVEL([$PYTHON_REQUIRED_VERSION], [true])
AS_IF([test "x$ax_python_devel_found" = xno], [
enable_pyzfs=no
])
])
+229 -112
View File
@@ -4,18 +4,13 @@
#
# SYNOPSIS
#
# AX_PYTHON_DEVEL([version], [action-if-not-found])
# AX_PYTHON_DEVEL([version[,optional]])
#
# DESCRIPTION
#
# Note: Defines as a precious variable "PYTHON_VERSION". Don't override it
# in your configure.ac.
#
# Note: this is a slightly modified version of the original AX_PYTHON_DEVEL
# macro which accepts an additional [action-if-not-found] argument. This
# allow to detect if Python development is available without aborting the
# configure phase with an hard error in case it is not.
#
# This macro checks for Python and tries to get the include path to
# 'Python.h'. It provides the $(PYTHON_CPPFLAGS) and $(PYTHON_LIBS) output
# variables. It also exports $(PYTHON_EXTRA_LIBS) and
@@ -28,6 +23,11 @@
# version number. Don't use "PYTHON_VERSION" for this: that environment
# variable is declared as precious and thus reserved for the end-user.
#
# By default this will fail if it does not detect a development version of
# python. If you want it to continue, set optional to true, like
# AX_PYTHON_DEVEL([], [true]). The ax_python_devel_found variable will be
# "no" if it fails.
#
# This macro should work for all versions of Python >= 2.1.0. As an end
# user, you can disable the check for the python version by setting the
# PYTHON_NOVERSIONCHECK environment variable to something else than the
@@ -45,7 +45,6 @@
# Copyright (c) 2009 Matteo Settenvini <matteo@member.fsf.org>
# Copyright (c) 2009 Horst Knorr <hk_classes@knoda.org>
# Copyright (c) 2013 Daniel Mullner <muellner@math.stanford.edu>
# Copyright (c) 2018 loli10K <ezomori.nozomu@gmail.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
@@ -73,10 +72,18 @@
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.
#serial 21
#serial 36
AU_ALIAS([AC_PYTHON_DEVEL], [AX_PYTHON_DEVEL])
AC_DEFUN([AX_PYTHON_DEVEL],[
# Get whether it's optional
if test -z "$2"; then
ax_python_devel_optional=false
else
ax_python_devel_optional=$2
fi
ax_python_devel_found=yes
#
# Allow the use of a (user set) custom python version
#
@@ -87,23 +94,26 @@ AC_DEFUN([AX_PYTHON_DEVEL],[
AC_PATH_PROG([PYTHON],[python[$PYTHON_VERSION]])
if test -z "$PYTHON"; then
m4_ifvaln([$2],[$2],[
AC_MSG_ERROR([Cannot find python$PYTHON_VERSION in your system path])
PYTHON_VERSION=""
])
AC_MSG_WARN([Cannot find python$PYTHON_VERSION in your system path])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up, python development not available])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
#
# Check for a version of Python >= 2.1.0
#
AC_MSG_CHECKING([for a version of Python >= '2.1.0'])
ac_supports_python_ver=`$PYTHON -c "import sys; \
if test $ax_python_devel_found = yes; then
#
# Check for a version of Python >= 2.1.0
#
AC_MSG_CHECKING([for a version of Python >= '2.1.0'])
ac_supports_python_ver=`$PYTHON -c "import sys; \
ver = sys.version.split ()[[0]]; \
print (ver >= '2.1.0')"`
if test "$ac_supports_python_ver" != "True"; then
if test "$ac_supports_python_ver" != "True"; then
if test -z "$PYTHON_NOVERSIONCHECK"; then
AC_MSG_RESULT([no])
AC_MSG_FAILURE([
AC_MSG_WARN([
This version of the AC@&t@_PYTHON_DEVEL macro
doesn't work properly with versions of Python before
2.1.0. You may need to re-run configure, setting the
@@ -112,20 +122,27 @@ PYTHON_EXTRA_LIBS and PYTHON_EXTRA_LDFLAGS by hand.
Moreover, to disable this check, set PYTHON_NOVERSIONCHECK
to something else than an empty string.
])
if ! $ax_python_devel_optional; then
AC_MSG_FAILURE([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
else
AC_MSG_RESULT([skip at user request])
fi
else
else
AC_MSG_RESULT([yes])
fi
fi
#
# If the macro parameter ``version'' is set, honour it.
# A Python shim class, VPy, is used to implement correct version comparisons via
# string expressions, since e.g. a naive textual ">= 2.7.3" won't work for
# Python 2.7.10 (the ".1" being evaluated as less than ".3").
#
if test -n "$1"; then
if test $ax_python_devel_found = yes; then
#
# If the macro parameter ``version'' is set, honour it.
# A Python shim class, VPy, is used to implement correct version comparisons via
# string expressions, since e.g. a naive textual ">= 2.7.3" won't work for
# Python 2.7.10 (the ".1" being evaluated as less than ".3").
#
if test -n "$1"; then
AC_MSG_CHECKING([for a version of Python $1])
cat << EOF > ax_python_devel_vpy.py
class VPy:
@@ -133,7 +150,7 @@ class VPy:
return tuple(map(int, s.strip().replace("rc", ".").split(".")))
def __init__(self):
import sys
self.vpy = tuple(sys.version_info)
self.vpy = tuple(sys.version_info)[[:3]]
def __eq__(self, s):
return self.vpy == self.vtup(s)
def __ne__(self, s):
@@ -155,25 +172,69 @@ EOF
AC_MSG_RESULT([yes])
else
AC_MSG_RESULT([no])
AC_MSG_ERROR([this package requires Python $1.
AC_MSG_WARN([this package requires Python $1.
If you have it installed, but it isn't the default Python
interpreter in your system path, please pass the PYTHON_VERSION
variable to configure. See ``configure --help'' for reference.
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
#
# Check for Python include path
#
#
AC_MSG_CHECKING([for Python include path])
if test -z "$PYTHON_CPPFLAGS"; then
python_path=`$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('include'));"`
plat_python_path=`$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('platinclude'));"`
if test $ax_python_devel_found = yes; then
#
# Check if you have distutils, else fail
#
AC_MSG_CHECKING([for the sysconfig Python package])
ac_sysconfig_result=`$PYTHON -c "import sysconfig" 2>&1`
if test $? -eq 0; then
AC_MSG_RESULT([yes])
IMPORT_SYSCONFIG="import sysconfig"
else
AC_MSG_RESULT([no])
AC_MSG_CHECKING([for the distutils Python package])
ac_sysconfig_result=`$PYTHON -c "from distutils import sysconfig" 2>&1`
if test $? -eq 0; then
AC_MSG_RESULT([yes])
IMPORT_SYSCONFIG="from distutils import sysconfig"
else
AC_MSG_WARN([cannot import Python module "distutils".
Please check your Python installation. The error was:
$ac_sysconfig_result])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
if test $ax_python_devel_found = yes; then
#
# Check for Python include path
#
AC_MSG_CHECKING([for Python include path])
if test -z "$PYTHON_CPPFLAGS"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
# sysconfig module has different functions
python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_path ('include'));"`
plat_python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_path ('platinclude'));"`
else
# old distutils way
python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_inc ());"`
plat_python_path=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_inc (plat_specific=1));"`
fi
if test -n "${python_path}"; then
if test "${plat_python_path}" != "${python_path}"; then
python_path="-I$python_path -I$plat_python_path"
@@ -182,15 +243,15 @@ variable to configure. See ``configure --help'' for reference.
fi
fi
PYTHON_CPPFLAGS=$python_path
fi
AC_MSG_RESULT([$PYTHON_CPPFLAGS])
AC_SUBST([PYTHON_CPPFLAGS])
fi
AC_MSG_RESULT([$PYTHON_CPPFLAGS])
AC_SUBST([PYTHON_CPPFLAGS])
#
# Check for Python library path
#
AC_MSG_CHECKING([for Python library path])
if test -z "$PYTHON_LIBS"; then
#
# Check for Python library path
#
AC_MSG_CHECKING([for Python library path])
if test -z "$PYTHON_LIBS"; then
# (makes two attempts to ensure we've got a version number
# from the interpreter)
ac_python_version=`cat<<EOD | $PYTHON -
@@ -208,7 +269,7 @@ EOD`
ac_python_version=$PYTHON_VERSION
else
ac_python_version=`$PYTHON -c "import sys; \
print ('.'.join(sys.version.split('.')[[:2]]))"`
print ("%d.%d" % sys.version_info[[:2]])"`
fi
fi
@@ -220,7 +281,7 @@ EOD`
ac_python_libdir=`cat<<EOD | $PYTHON -
# There should be only one
import sysconfig
$IMPORT_SYSCONFIG
e = sysconfig.get_config_var('LIBDIR')
if e is not None:
print (e)
@@ -229,7 +290,7 @@ EOD`
# Now, for the library:
ac_python_library=`cat<<EOD | $PYTHON -
import sysconfig
$IMPORT_SYSCONFIG
c = sysconfig.get_config_vars()
if 'LDVERSION' in c:
print ('python'+c[['LDVERSION']])
@@ -249,88 +310,140 @@ EOD`
else
# old way: use libpython from python_configdir
ac_python_libdir=`$PYTHON -c \
"import sysconfig; \
"from sysconfig import get_python_lib as f; \
import os; \
print (os.path.join(sysconfig.get_path('platstdlib'), 'config'));"`
print (os.path.join(f(plat_specific=1, standard_lib=1), 'config'));"`
PYTHON_LIBS="-L$ac_python_libdir -lpython$ac_python_version"
fi
if test -z "PYTHON_LIBS"; then
m4_ifvaln([$2],[$2],[
AC_MSG_ERROR([
AC_MSG_WARN([
Cannot determine location of your Python DSO. Please check it was installed with
dynamic libraries enabled, or try setting PYTHON_LIBS by hand.
])
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
fi
AC_MSG_RESULT([$PYTHON_LIBS])
AC_SUBST([PYTHON_LIBS])
#
# Check for site packages
#
AC_MSG_CHECKING([for Python site-packages path])
if test -z "$PYTHON_SITE_PKG"; then
PYTHON_SITE_PKG=`$PYTHON -c "import distutils.sysconfig; \
print (distutils.sysconfig.get_python_lib(0,0));" 2>/dev/null || \
$PYTHON -c "import sysconfig; \
print (sysconfig.get_path('purelib'));"`
fi
AC_MSG_RESULT([$PYTHON_SITE_PKG])
AC_SUBST([PYTHON_SITE_PKG])
if test $ax_python_devel_found = yes; then
AC_MSG_RESULT([$PYTHON_LIBS])
AC_SUBST([PYTHON_LIBS])
#
# libraries which must be linked in when embedding
#
AC_MSG_CHECKING(python extra libraries)
if test -z "$PYTHON_EXTRA_LIBS"; then
PYTHON_EXTRA_LIBS=`$PYTHON -c "import sysconfig; \
#
# Check for site packages
#
AC_MSG_CHECKING([for Python site-packages path])
if test -z "$PYTHON_SITE_PKG"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
PYTHON_SITE_PKG=`$PYTHON -c "
$IMPORT_SYSCONFIG;
if hasattr(sysconfig, 'get_default_scheme'):
scheme = sysconfig.get_default_scheme()
else:
scheme = sysconfig._get_default_scheme()
if scheme == 'posix_local':
# Debian's default scheme installs to /usr/local/ but we want to find headers in /usr/
scheme = 'posix_prefix'
prefix = '$prefix'
if prefix == 'NONE':
prefix = '$ac_default_prefix'
sitedir = sysconfig.get_path('purelib', scheme, vars={'base': prefix})
print(sitedir)"`
else
# distutils.sysconfig way
PYTHON_SITE_PKG=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_lib(0,0));"`
fi
fi
AC_MSG_RESULT([$PYTHON_SITE_PKG])
AC_SUBST([PYTHON_SITE_PKG])
#
# Check for platform-specific site packages
#
AC_MSG_CHECKING([for Python platform specific site-packages path])
if test -z "$PYTHON_PLATFORM_SITE_PKG"; then
if test "$IMPORT_SYSCONFIG" = "import sysconfig"; then
PYTHON_PLATFORM_SITE_PKG=`$PYTHON -c "
$IMPORT_SYSCONFIG;
if hasattr(sysconfig, 'get_default_scheme'):
scheme = sysconfig.get_default_scheme()
else:
scheme = sysconfig._get_default_scheme()
if scheme == 'posix_local':
# Debian's default scheme installs to /usr/local/ but we want to find headers in /usr/
scheme = 'posix_prefix'
prefix = '$prefix'
if prefix == 'NONE':
prefix = '$ac_default_prefix'
sitedir = sysconfig.get_path('platlib', scheme, vars={'platbase': prefix})
print(sitedir)"`
else
# distutils.sysconfig way
PYTHON_PLATFORM_SITE_PKG=`$PYTHON -c "$IMPORT_SYSCONFIG; \
print (sysconfig.get_python_lib(1,0));"`
fi
fi
AC_MSG_RESULT([$PYTHON_PLATFORM_SITE_PKG])
AC_SUBST([PYTHON_PLATFORM_SITE_PKG])
#
# libraries which must be linked in when embedding
#
AC_MSG_CHECKING(python extra libraries)
if test -z "$PYTHON_EXTRA_LIBS"; then
PYTHON_EXTRA_LIBS=`$PYTHON -c "$IMPORT_SYSCONFIG; \
conf = sysconfig.get_config_var; \
print (conf('LIBS') + ' ' + conf('SYSLIBS'))"`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LIBS])
AC_SUBST(PYTHON_EXTRA_LIBS)
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LIBS])
AC_SUBST(PYTHON_EXTRA_LIBS)
#
# linking flags needed when embedding
#
AC_MSG_CHECKING(python extra linking flags)
if test -z "$PYTHON_EXTRA_LDFLAGS"; then
PYTHON_EXTRA_LDFLAGS=`$PYTHON -c "import sysconfig; \
#
# linking flags needed when embedding
#
AC_MSG_CHECKING(python extra linking flags)
if test -z "$PYTHON_EXTRA_LDFLAGS"; then
PYTHON_EXTRA_LDFLAGS=`$PYTHON -c "$IMPORT_SYSCONFIG; \
conf = sysconfig.get_config_var; \
print (conf('LINKFORSHARED'))"`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LDFLAGS])
AC_SUBST(PYTHON_EXTRA_LDFLAGS)
# Hack for macos, it sticks this in here.
PYTHON_EXTRA_LDFLAGS=`echo $PYTHON_EXTRA_LDFLAGS | sed 's/CoreFoundation.*$/CoreFoundation/'`
fi
AC_MSG_RESULT([$PYTHON_EXTRA_LDFLAGS])
AC_SUBST(PYTHON_EXTRA_LDFLAGS)
#
# final check to see if everything compiles alright
#
AC_MSG_CHECKING([consistency of all components of python development environment])
# save current global flags
ac_save_LIBS="$LIBS"
ac_save_LDFLAGS="$LDFLAGS"
ac_save_CPPFLAGS="$CPPFLAGS"
LIBS="$ac_save_LIBS $PYTHON_LIBS $PYTHON_EXTRA_LIBS $PYTHON_EXTRA_LIBS"
LDFLAGS="$ac_save_LDFLAGS $PYTHON_EXTRA_LDFLAGS"
CPPFLAGS="$ac_save_CPPFLAGS $PYTHON_CPPFLAGS"
AC_LANG_PUSH([C])
AC_LINK_IFELSE([
#
# final check to see if everything compiles alright
#
AC_MSG_CHECKING([consistency of all components of python development environment])
# save current global flags
ac_save_LIBS="$LIBS"
ac_save_LDFLAGS="$LDFLAGS"
ac_save_CPPFLAGS="$CPPFLAGS"
LIBS="$ac_save_LIBS $PYTHON_LIBS $PYTHON_EXTRA_LIBS"
LDFLAGS="$ac_save_LDFLAGS $PYTHON_EXTRA_LDFLAGS"
CPPFLAGS="$ac_save_CPPFLAGS $PYTHON_CPPFLAGS"
AC_LANG_PUSH([C])
AC_LINK_IFELSE([
AC_LANG_PROGRAM([[#include <Python.h>]],
[[Py_Initialize();]])
],[pythonexists=yes],[pythonexists=no])
AC_LANG_POP([C])
# turn back to default flags
CPPFLAGS="$ac_save_CPPFLAGS"
LIBS="$ac_save_LIBS"
LDFLAGS="$ac_save_LDFLAGS"
AC_LANG_POP([C])
# turn back to default flags
CPPFLAGS="$ac_save_CPPFLAGS"
LIBS="$ac_save_LIBS"
LDFLAGS="$ac_save_LDFLAGS"
AC_MSG_RESULT([$pythonexists])
AC_MSG_RESULT([$pythonexists])
if test ! "x$pythonexists" = "xyes"; then
m4_ifvaln([$2],[$2],[
AC_MSG_FAILURE([
if test ! "x$pythonexists" = "xyes"; then
AC_MSG_WARN([
Could not link test program to Python. Maybe the main Python library has been
installed in some non-standard library path. If so, pass it to configure,
via the LIBS environment variable.
@@ -340,9 +453,13 @@ EOD`
You probably have to install the development version of the Python package
for your distribution. The exact name of this package varies among them.
============================================================================
])
PYTHON_VERSION=""
])
])
if ! $ax_python_devel_optional; then
AC_MSG_ERROR([Giving up])
fi
ax_python_devel_found=no
PYTHON_VERSION=""
fi
fi
#
+15
View File
@@ -377,6 +377,14 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLK_MQ], [
(void) blk_mq_alloc_tag_set(&tag_set);
return BLK_STS_OK;
], [])
ZFS_LINUX_TEST_SRC([blk_mq_rq_hctx], [
#include <linux/blk-mq.h>
#include <linux/blkdev.h>
], [
struct request rq = {0};
struct blk_mq_hw_ctx *hctx = NULL;
rq.mq_hctx = hctx;
], [])
])
AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
@@ -384,6 +392,13 @@ AC_DEFUN([ZFS_AC_KERNEL_BLK_MQ], [
ZFS_LINUX_TEST_RESULT([blk_mq], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_MQ, 1, [block multiqueue is available])
AC_MSG_CHECKING([whether block multiqueue hardware context is cached in struct request])
ZFS_LINUX_TEST_RESULT([blk_mq_rq_hctx], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLK_MQ_RQ_HCTX, 1, [block multiqueue hardware context is cached in struct request])
], [
AC_MSG_RESULT(no)
])
], [
AC_MSG_RESULT(no)
])
+125 -34
View File
@@ -54,6 +54,26 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH], [
])
])
dnl #
dnl # 6.9.x API change
dnl # bdev_file_open_by_path() replaced bdev_open_by_path(),
dnl # and returns struct file*
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH], [
ZFS_LINUX_TEST_SRC([bdev_file_open_by_path], [
#include <linux/fs.h>
#include <linux/blkdev.h>
], [
struct file *file __attribute__ ((unused)) = NULL;
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
file = bdev_file_open_by_path(path, mode, holder, &h);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [
AC_MSG_CHECKING([whether blkdev_get_by_path() exists and takes 3 args])
ZFS_LINUX_TEST_RESULT([blkdev_get_by_path], [
@@ -73,7 +93,16 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_GET_BY_PATH], [
[bdev_open_by_path() exists])
AC_MSG_RESULT(yes)
], [
ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()])
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether bdev_file_open_by_path() exists])
ZFS_LINUX_TEST_RESULT([bdev_file_open_by_path], [
AC_DEFINE(HAVE_BDEV_FILE_OPEN_BY_PATH, 1,
[bdev_file_open_by_path() exists])
AC_MSG_RESULT(yes)
], [
AC_MSG_RESULT(no)
ZFS_LINUX_TEST_ERROR([blkdev_get_by_path()])
])
])
])
])
@@ -149,10 +178,19 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_RELEASE], [
])
])
dnl #
dnl # 6.9.x API change
dnl #
dnl # bdev_release() now private, but because bdev_file_open_by_path() returns
dnl # struct file*, we can just use fput(). So the blkdev_put test no longer
dnl # fails if not found.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_PUT], [
AC_MSG_CHECKING([whether blkdev_put() exists])
ZFS_LINUX_TEST_RESULT([blkdev_put], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_PUT, 1, [blkdev_put() exists])
], [
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether blkdev_put() accepts void* as arg 2])
@@ -168,7 +206,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_PUT], [
AC_DEFINE(HAVE_BDEV_RELEASE, 1,
[bdev_release() exists])
], [
ZFS_LINUX_TEST_ERROR([blkdev_put()])
AC_MSG_RESULT(no)
])
])
])
@@ -523,12 +561,29 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_BDEVNAME], [
])
dnl #
dnl # 5.19 API: blkdev_issue_secure_erase()
dnl # 4.7 API: __blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # 3.10 API: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # TRIM support: discard and secure erase. We make use of asynchronous
dnl # functions when available.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
ZFS_LINUX_TEST_SRC([blkdev_issue_secure_erase], [
dnl # 3.10:
dnl # sync discard: blkdev_issue_discard(..., 0)
dnl # sync erase: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # async discard: [not available]
dnl # async erase: [not available]
dnl #
dnl # 4.7:
dnl # sync discard: blkdev_issue_discard(..., 0)
dnl # sync erase: blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl # async discard: __blkdev_issue_discard(..., 0)
dnl # async erase: __blkdev_issue_discard(..., BLKDEV_DISCARD_SECURE)
dnl #
dnl # 5.19:
dnl # sync discard: blkdev_issue_discard(...)
dnl # sync erase: blkdev_issue_secure_erase(...)
dnl # async discard: __blkdev_issue_discard(...)
dnl # async erase: [not available]
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_DISCARD], [
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_noflags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
@@ -536,10 +591,33 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
sector_t nr_sects = 0;
int error __attribute__ ((unused));
error = blkdev_issue_secure_erase(bdev,
error = blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_flags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
unsigned long flags = 0;
int error __attribute__ ((unused));
error = blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, flags);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_async_noflags], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
struct bio *biop = NULL;
int error __attribute__ ((unused));
error = __blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, &biop);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_async_flags], [
#include <linux/blkdev.h>
],[
@@ -553,22 +631,52 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE], [
error = __blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, flags, &biop);
])
ZFS_LINUX_TEST_SRC([blkdev_issue_discard_flags], [
ZFS_LINUX_TEST_SRC([blkdev_issue_secure_erase], [
#include <linux/blkdev.h>
],[
struct block_device *bdev = NULL;
sector_t sector = 0;
sector_t nr_sects = 0;
unsigned long flags = 0;
int error __attribute__ ((unused));
error = blkdev_issue_discard(bdev,
sector, nr_sects, GFP_KERNEL, flags);
error = blkdev_issue_secure_erase(bdev,
sector, nr_sects, GFP_KERNEL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE], [
AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_DISCARD], [
AC_MSG_CHECKING([whether blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_noflags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_NOFLAGS, 1,
[blkdev_issue_discard() is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blkdev_issue_discard(flags) is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_FLAGS, 1,
[blkdev_issue_discard(flags) is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether __blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_async_noflags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_ASYNC_NOFLAGS, 1,
[__blkdev_issue_discard() is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether __blkdev_issue_discard(flags) is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_async_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_ASYNC_FLAGS, 1,
[__blkdev_issue_discard(flags) is available])
],[
AC_MSG_RESULT(no)
])
AC_MSG_CHECKING([whether blkdev_issue_secure_erase() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_secure_erase], [
AC_MSG_RESULT(yes)
@@ -576,24 +684,6 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE], [
[blkdev_issue_secure_erase() is available])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether __blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_async_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD_ASYNC, 1,
[__blkdev_issue_discard() is available])
],[
AC_MSG_RESULT(no)
AC_MSG_CHECKING([whether blkdev_issue_discard() is available])
ZFS_LINUX_TEST_RESULT([blkdev_issue_discard_flags], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_BLKDEV_ISSUE_DISCARD, 1,
[blkdev_issue_discard() is available])
],[
ZFS_LINUX_TEST_ERROR([blkdev_issue_discard()])
])
])
])
])
@@ -645,6 +735,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH
ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH
ZFS_AC_KERNEL_SRC_BLKDEV_PUT
ZFS_AC_KERNEL_SRC_BLKDEV_PUT_HOLDER
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_RELEASE
@@ -657,7 +748,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV], [
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_CHECK_MEDIA_CHANGE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_WHOLE
ZFS_AC_KERNEL_SRC_BLKDEV_BDEVNAME
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_SRC_BLKDEV_ISSUE_DISCARD
ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_SRC_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_SRC_BLKDEV_DISK_CHECK_MEDIA_CHANGE
@@ -678,7 +769,7 @@ AC_DEFUN([ZFS_AC_KERNEL_BLKDEV], [
ZFS_AC_KERNEL_BLKDEV_BDEV_WHOLE
ZFS_AC_KERNEL_BLKDEV_BDEVNAME
ZFS_AC_KERNEL_BLKDEV_GET_ERESTARTSYS
ZFS_AC_KERNEL_BLKDEV_ISSUE_SECURE_ERASE
ZFS_AC_KERNEL_BLKDEV_ISSUE_DISCARD
ZFS_AC_KERNEL_BLKDEV_BDEV_KOBJ
ZFS_AC_KERNEL_BLKDEV_PART_TO_DEV
ZFS_AC_KERNEL_BLKDEV_DISK_CHECK_MEDIA_CHANGE
+1
View File
@@ -4,6 +4,7 @@ dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_FILEMAP], [
ZFS_LINUX_TEST_SRC([filemap_range_has_page], [
#include <linux/fs.h>
#include <linux/pagemap.h>
],[
struct address_space *mapping = NULL;
loff_t lstart = 0;
+33
View File
@@ -50,6 +50,14 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_MAKE_REQUEST_FN], [
disk = blk_alloc_disk(NUMA_NO_NODE);
])
ZFS_LINUX_TEST_SRC([blk_alloc_disk_2arg], [
#include <linux/blkdev.h>
],[
struct queue_limits *lim = NULL;
struct gendisk *disk __attribute__ ((unused));
disk = blk_alloc_disk(lim, NUMA_NO_NODE);
])
ZFS_LINUX_TEST_SRC([blk_cleanup_disk], [
#include <linux/blkdev.h>
],[
@@ -96,6 +104,31 @@ AC_DEFUN([ZFS_AC_KERNEL_MAKE_REQUEST_FN], [
], [
AC_MSG_RESULT(no)
])
dnl #
dnl # Linux 6.9 API Change:
dnl # blk_alloc_queue() takes a nullable queue_limits arg.
dnl #
AC_MSG_CHECKING([whether blk_alloc_disk() exists and takes 2 args])
ZFS_LINUX_TEST_RESULT([blk_alloc_disk_2arg], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLK_ALLOC_DISK_2ARG], 1, [blk_alloc_disk() exists and takes 2 args])
dnl #
dnl # 5.20 API change,
dnl # Removed blk_cleanup_disk(), put_disk() should be used.
dnl #
AC_MSG_CHECKING([whether blk_cleanup_disk() exists])
ZFS_LINUX_TEST_RESULT([blk_cleanup_disk], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLK_CLEANUP_DISK], 1,
[blk_cleanup_disk() exists])
], [
AC_MSG_RESULT(no)
])
], [
AC_MSG_RESULT(no)
])
],[
AC_MSG_RESULT(no)
+17
View File
@@ -0,0 +1,17 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE], [
ZFS_LINUX_TEST_SRC([page_size], [
#include <linux/mm.h>
],[
unsigned long s;
s = page_size(NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_SIZE], [
AC_MSG_CHECKING([whether page_size() is available])
ZFS_LINUX_TEST_RESULT([page_size], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_MM_PAGE_SIZE, 1, [page_size() is available])
],[
AC_MSG_RESULT(no)
])
])
+27
View File
@@ -16,6 +16,9 @@ dnl #
dnl # 5.3: VFS copy_file_range() expected to do its own fallback,
dnl # generic_copy_file_range() added to support it
dnl #
dnl # 6.8: generic_copy_file_range() removed, replaced by
dnl # splice_copy_file_range()
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_copy_file_range], [
#include <linux/fs.h>
@@ -72,6 +75,30 @@ AC_DEFUN([ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE], [
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([splice_copy_file_range], [
#include <linux/splice.h>
], [
struct file *src_file __attribute__ ((unused)) = NULL;
loff_t src_off __attribute__ ((unused)) = 0;
struct file *dst_file __attribute__ ((unused)) = NULL;
loff_t dst_off __attribute__ ((unused)) = 0;
size_t len __attribute__ ((unused)) = 0;
splice_copy_file_range(src_file, src_off, dst_file, dst_off,
len);
])
])
AC_DEFUN([ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE], [
AC_MSG_CHECKING([whether splice_copy_file_range() is available])
ZFS_LINUX_TEST_RESULT([splice_copy_file_range], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_VFS_SPLICE_COPY_FILE_RANGE, 1,
[splice_copy_file_range() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE], [
ZFS_LINUX_TEST_SRC([vfs_clone_file_range], [
#include <linux/fs.h>
+4
View File
@@ -118,6 +118,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_VFS_IOV_ITER
ZFS_AC_KERNEL_SRC_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_SPLICE_COPY_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_SRC_VFS_DEDUPE_FILE_RANGE
@@ -166,6 +167,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_SRC_COPY_SPLICE_READ
ZFS_AC_KERNEL_SRC_SYNC_BDEV
ZFS_AC_KERNEL_SRC_MM_PAGE_SIZE
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@@ -266,6 +268,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_VFS_IOV_ITER
ZFS_AC_KERNEL_VFS_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_GENERIC_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_SPLICE_COPY_FILE_RANGE
ZFS_AC_KERNEL_VFS_REMAP_FILE_RANGE
ZFS_AC_KERNEL_VFS_CLONE_FILE_RANGE
ZFS_AC_KERNEL_VFS_DEDUPE_FILE_RANGE
@@ -314,6 +317,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_COPY_SPLICE_READ
ZFS_AC_KERNEL_SYNC_BDEV
ZFS_AC_KERNEL_MM_PAGE_SIZE
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
+5 -3
View File
@@ -578,13 +578,15 @@ AC_DEFUN([ZFS_AC_DEFAULT_PACKAGE], [
AC_MSG_CHECKING([default shell])
case "$VENDOR" in
gentoo) DEFAULT_INIT_SHELL="/sbin/openrc-run";;
alpine) DEFAULT_INIT_SHELL="/sbin/openrc-run";;
*) DEFAULT_INIT_SHELL="/bin/sh" ;;
gentoo|alpine) DEFAULT_INIT_SHELL=/sbin/openrc-run
IS_SYSV_RC=false ;;
*) DEFAULT_INIT_SHELL=/bin/sh
IS_SYSV_RC=true ;;
esac
AC_MSG_RESULT([$DEFAULT_INIT_SHELL])
AC_SUBST(DEFAULT_INIT_SHELL)
AC_SUBST(IS_SYSV_RC)
AC_MSG_CHECKING([default nfs server init script])
AS_IF([test "$VENDOR" = "debian"],
+1 -1
View File
@@ -189,7 +189,7 @@ Depends: dkms (>> 2.1.1.2-5),
file,
libc6-dev | libc-dev,
lsb-release,
python3-distutils | libpython3-stdlib (<< 3.6.4),
python3 (>> 3.12) | python3-distutils | libpython3-stdlib (<< 3.6.4),
${misc:Depends},
${perl:Depends}
Recommends: openzfs-zfs-zed, openzfs-zfsutils (>= ${source:Version}), ${linux:Recommends}
+1 -5
View File
@@ -7,11 +7,7 @@ DESCRIPTION
They have been tested successfully on:
* Debian GNU/Linux Wheezy
* Debian GNU/Linux Jessie
* Ubuntu Trusty
* CentOS 6.0
* CentOS 6.6
* Debian GNU/Linux Bookworm
* Gentoo
SUPPORT
+1 -1
View File
@@ -307,7 +307,7 @@ do_start()
# ----------------------------------------------------
if [ ! -e /sbin/openrc-run ]
if @IS_SYSV_RC@
then
case "$1" in
start)
+1 -1
View File
@@ -104,7 +104,7 @@ do_stop()
# ----------------------------------------------------
if [ ! -e /sbin/openrc-run ]
if @IS_SYSV_RC@
then
case "$1" in
start)
+1 -1
View File
@@ -114,7 +114,7 @@ do_stop()
# ----------------------------------------------------
if [ ! -e /sbin/openrc-run ]
if @IS_SYSV_RC@
then
case "$1" in
start)
+2 -1
View File
@@ -57,7 +57,8 @@ do_stop()
# ----------------------------------------------------
if [ ! -e /sbin/openrc-run ]; then
if @IS_SYSV_RC@
then
case "$1" in
start)
do_start
+2 -1
View File
@@ -93,7 +93,8 @@ do_reload()
# ----------------------------------------------------
if [ ! -e /sbin/openrc-run ]; then
if @IS_SYSV_RC@
then
case "$1" in
start)
do_start
+4 -2
View File
@@ -21,7 +21,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2022 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright Joyent, Inc.
* Copyright (c) 2013 Steven Hartland. All rights reserved.
* Copyright (c) 2016, Intel Corporation.
@@ -157,6 +157,8 @@ typedef enum zfs_error {
EZFS_CKSUM, /* insufficient replicas */
EZFS_RESUME_EXISTS, /* Resume on existing dataset without force */
EZFS_SHAREFAILED, /* filesystem share failed */
EZFS_RAIDZ_EXPAND_IN_PROGRESS, /* a raidz is currently expanding */
EZFS_ASHIFT_MISMATCH, /* can't add vdevs with different ashifts */
EZFS_UNKNOWN
} zfs_error_t;
@@ -260,7 +262,7 @@ _LIBZFS_H boolean_t zpool_skip_pool(const char *);
_LIBZFS_H int zpool_create(libzfs_handle_t *, const char *, nvlist_t *,
nvlist_t *, nvlist_t *);
_LIBZFS_H int zpool_destroy(zpool_handle_t *, const char *);
_LIBZFS_H int zpool_add(zpool_handle_t *, nvlist_t *);
_LIBZFS_H int zpool_add(zpool_handle_t *, nvlist_t *, boolean_t check_ashift);
typedef struct splitflags {
/* do not split, but return the config that would be split off */
+2 -2
View File
@@ -4,8 +4,6 @@ noinst_HEADERS = \
\
%D%/spl/acl/acl_common.h \
\
%D%/spl/rpc/xdr.h \
\
%D%/spl/sys/ia32/asm_linkage.h \
\
%D%/spl/sys/acl.h \
@@ -80,7 +78,9 @@ noinst_HEADERS = \
%D%/spl/sys/zmod.h \
%D%/spl/sys/zone.h \
\
%D%/zfs/sys/arc_os.h \
%D%/zfs/sys/freebsd_crypto.h \
%D%/zfs/sys/freebsd_event.h \
%D%/zfs/sys/vdev_os.h \
%D%/zfs/sys/zfs_bootenv_os.h \
%D%/zfs/sys/zfs_context_os.h \
-71
View File
@@ -1,71 +0,0 @@
/*
* Sun RPC is a product of Sun Microsystems, Inc. and is provided for
* unrestricted use provided that this legend is included on all tape
* media and as a part of the software program in whole or part. Users
* may copy or modify Sun RPC without charge, but are not authorized
* to license or distribute it to anyone else except as part of a product or
* program developed by the user.
*
* SUN RPC IS PROVIDED AS IS WITH NO WARRANTIES OF ANY KIND INCLUDING THE
* WARRANTIES OF DESIGN, MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE, OR ARISING FROM A COURSE OF DEALING, USAGE OR TRADE PRACTICE.
*
* Sun RPC is provided with no support and without any obligation on the
* part of Sun Microsystems, Inc. to assist in its use, correction,
* modification or enhancement.
*
* SUN MICROSYSTEMS, INC. SHALL HAVE NO LIABILITY WITH RESPECT TO THE
* INFRINGEMENT OF COPYRIGHTS, TRADE SECRETS OR ANY PATENTS BY SUN RPC
* OR ANY PART THEREOF.
*
* In no event will Sun Microsystems, Inc. be liable for any lost revenue
* or profits or other special, indirect and consequential damages, even if
* Sun has been advised of the possibility of such damages.
*
* Sun Microsystems, Inc.
* 2550 Garcia Avenue
* Mountain View, California 94043
*/
#ifndef _OPENSOLARIS_RPC_XDR_H_
#define _OPENSOLARIS_RPC_XDR_H_
#include <rpc/types.h>
#include_next <rpc/xdr.h>
#if !defined(_KERNEL) && !defined(_STANDALONE)
#include <assert.h>
/*
* Taken from sys/xdr/xdr_mem.c.
*
* FreeBSD's userland XDR doesn't implement control method (only the kernel),
* but OpenSolaris nvpair still depend on it, so we have to implement it here.
*/
static __inline bool_t
xdrmem_control(XDR *xdrs, int request, void *info)
{
xdr_bytesrec *xptr;
switch (request) {
case XDR_GET_BYTES_AVAIL:
xptr = (xdr_bytesrec *)info;
xptr->xc_is_last_record = TRUE;
xptr->xc_num_avail = xdrs->x_handy;
return (TRUE);
default:
assert(!"unexpected request");
}
return (FALSE);
}
#undef XDR_CONTROL
#define XDR_CONTROL(xdrs, req, op) \
(((xdrs)->x_ops->x_control == NULL) ? \
xdrmem_control((xdrs), (req), (op)) : \
(*(xdrs)->x_ops->x_control)(xdrs, req, op))
#endif /* !_KERNEL && !_STANDALONE */
#endif /* !_OPENSOLARIS_RPC_XDR_H_ */
+26 -14
View File
@@ -39,12 +39,14 @@
* ASSERT3U() - Assert unsigned X OP Y is true, if not panic.
* ASSERT3P() - Assert pointer X OP Y is true, if not panic.
* ASSERT0() - Assert value is zero, if not panic.
* ASSERT0P() - Assert pointer is null, if not panic.
* VERIFY() - Verify X is true, if not panic.
* VERIFY3B() - Verify boolean X OP Y is true, if not panic.
* VERIFY3S() - Verify signed X OP Y is true, if not panic.
* VERIFY3U() - Verify unsigned X OP Y is true, if not panic.
* VERIFY3P() - Verify pointer X OP Y is true, if not panic.
* VERIFY0() - Verify value is zero, if not panic.
* VERIFY0P() - Verify pointer is null, if not panic.
*/
#ifndef _SPL_DEBUG_H
@@ -89,8 +91,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%d " #OP " %d)\n", \
(boolean_t)(_verify3_left), \
(boolean_t)(_verify3_right)); \
(boolean_t)_verify3_left, \
(boolean_t)_verify3_right); \
} while (0)
#define VERIFY3S(LEFT, OP, RIGHT) do { \
@@ -100,8 +102,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%lld " #OP " %lld)\n", \
(long long) (_verify3_left), \
(long long) (_verify3_right)); \
(long long)_verify3_left, \
(long long)_verify3_right); \
} while (0)
#define VERIFY3U(LEFT, OP, RIGHT) do { \
@@ -111,8 +113,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%llu " #OP " %llu)\n", \
(unsigned long long) (_verify3_left), \
(unsigned long long) (_verify3_right)); \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right); \
} while (0)
#define VERIFY3P(LEFT, OP, RIGHT) do { \
@@ -121,19 +123,27 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
if (unlikely(!(_verify3_left OP _verify3_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%px " #OP " %px)\n", \
(void *) (_verify3_left), \
(void *) (_verify3_right)); \
"failed (%p " #OP " %p)\n", \
(void *)_verify3_left, \
(void *)_verify3_right); \
} while (0)
#define VERIFY0(RIGHT) do { \
const int64_t _verify3_left = (int64_t)(0); \
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
const int64_t _verify0_right = (int64_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"VERIFY0(" #RIGHT ") " \
"failed (0 == %lld)\n", \
(long long) (_verify3_right)); \
(long long)_verify0_right); \
} while (0)
#define VERIFY0P(RIGHT) do { \
const uintptr_t _verify0_right = (uintptr_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0P(" #RIGHT ") " \
"failed (NULL == %p)\n", \
(void *)_verify0_right); \
} while (0)
/*
@@ -151,6 +161,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#define ASSERT3P(x, y, z) \
((void) sizeof ((uintptr_t)(x)), (void) sizeof ((uintptr_t)(z)))
#define ASSERT0(x) ((void) sizeof ((uintptr_t)(x)))
#define ASSERT0P(x) ((void) sizeof ((uintptr_t)(x)))
#define IMPLY(A, B) \
((void) sizeof ((uintptr_t)(A)), (void) sizeof ((uintptr_t)(B)))
#define EQUIV(A, B) \
@@ -166,6 +177,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#define ASSERT3U VERIFY3U
#define ASSERT3P VERIFY3P
#define ASSERT0 VERIFY0
#define ASSERT0P VERIFY0P
#define ASSERT VERIFY
#define IMPLY(A, B) \
((void)(likely((!(A)) || (B)) || \
+2
View File
@@ -5,6 +5,7 @@ kernel_linux_HEADERS = \
%D%/kernel/linux/compiler_compat.h \
%D%/kernel/linux/dcache_compat.h \
%D%/kernel/linux/kmap_compat.h \
%D%/kernel/linux/mm_compat.h \
%D%/kernel/linux/mod_compat.h \
%D%/kernel/linux/page_compat.h \
%D%/kernel/linux/percpu_compat.h \
@@ -46,6 +47,7 @@ kernel_sys_HEADERS = \
kernel_spl_rpcdir = $(kerneldir)/spl/rpc
kernel_spl_rpc_HEADERS = \
%D%/spl/rpc/types.h \
%D%/spl/rpc/xdr.h
kernel_spl_sysdir = $(kerneldir)/spl/sys
@@ -563,9 +563,11 @@ static inline boolean_t
bdev_discard_supported(struct block_device *bdev)
{
#if defined(HAVE_BDEV_MAX_DISCARD_SECTORS)
return (!!bdev_max_discard_sectors(bdev));
return (bdev_max_discard_sectors(bdev) > 0 &&
bdev_discard_granularity(bdev) > 0);
#elif defined(HAVE_BLK_QUEUE_DISCARD)
return (!!blk_queue_discard(bdev_get_queue(bdev)));
return (blk_queue_discard(bdev_get_queue(bdev)) > 0 &&
bdev_get_queue(bdev)->limits.discard_granularity > 0);
#else
#error "Unsupported kernel"
#endif
+36
View File
@@ -0,0 +1,36 @@
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or https://opensource.org/licenses/CDDL-1.0.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2023, 2024, Klara Inc.
*/
#ifndef _ZFS_MM_COMPAT_H
#define _ZFS_MM_COMPAT_H
#include <linux/mm.h>
/* 5.4 introduced page_size(). Older kernels can use a trivial macro instead */
#ifndef HAVE_MM_PAGE_SIZE
#define page_size(p) ((unsigned long)(PAGE_SIZE << compound_order(p)))
#endif
#endif /* _ZFS_MM_COMPAT_H */
@@ -68,6 +68,7 @@ enum scope_prefix_types {
zfs_trim,
zfs_txg,
zfs_vdev,
zfs_vdev_disk,
zfs_vdev_file,
zfs_vdev_mirror,
zfs_vnops,
+30
View File
@@ -0,0 +1,30 @@
/*
* Copyright (c) 2008 Sun Microsystems, Inc.
* Written by Ricardo Correia <Ricardo.M.Correia@Sun.COM>
*
* This file is part of the SPL, Solaris Porting Layer.
*
* The SPL is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License as published by the
* Free Software Foundation; either version 2 of the License, or (at your
* option) any later version.
*
* The SPL is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* for more details.
*
* You should have received a copy of the GNU General Public License along
* with the SPL. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _SPL_RPC_TYPES_H
#define _SPL_RPC_TYPES_H
#include <sys/types.h>
/* Just enough to support rpc/xdr.h */
typedef int bool_t;
#endif /* SPL_RPC_TYPES_H */
-2
View File
@@ -23,8 +23,6 @@
#include <sys/types.h>
typedef int bool_t;
/*
* XDR enums and types.
*/
+25 -13
View File
@@ -34,12 +34,14 @@
* ASSERT3U() - Assert unsigned X OP Y is true, if not panic.
* ASSERT3P() - Assert pointer X OP Y is true, if not panic.
* ASSERT0() - Assert value is zero, if not panic.
* ASSERT0P() - Assert pointer is null, if not panic.
* VERIFY() - Verify X is true, if not panic.
* VERIFY3B() - Verify boolean X OP Y is true, if not panic.
* VERIFY3S() - Verify signed X OP Y is true, if not panic.
* VERIFY3U() - Verify unsigned X OP Y is true, if not panic.
* VERIFY3P() - Verify pointer X OP Y is true, if not panic.
* VERIFY0() - Verify value is zero, if not panic.
* VERIFY0P() - Verify pointer is null, if not panic.
*/
#ifndef _SPL_DEBUG_H
@@ -93,8 +95,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%d " #OP " %d)\n", \
(boolean_t)(_verify3_left), \
(boolean_t)(_verify3_right)); \
(boolean_t)_verify3_left, \
(boolean_t)_verify3_right); \
} while (0)
#define VERIFY3S(LEFT, OP, RIGHT) do { \
@@ -104,8 +106,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%lld " #OP " %lld)\n", \
(long long)(_verify3_left), \
(long long)(_verify3_right)); \
(long long)_verify3_left, \
(long long)_verify3_right); \
} while (0)
#define VERIFY3U(LEFT, OP, RIGHT) do { \
@@ -115,8 +117,8 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%llu " #OP " %llu)\n", \
(unsigned long long)(_verify3_left), \
(unsigned long long)(_verify3_right)); \
(unsigned long long)_verify3_left, \
(unsigned long long)_verify3_right); \
} while (0)
#define VERIFY3P(LEFT, OP, RIGHT) do { \
@@ -126,18 +128,26 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY3(" #LEFT " " #OP " " #RIGHT ") " \
"failed (%px " #OP " %px)\n", \
(void *) (_verify3_left), \
(void *) (_verify3_right)); \
(void *)_verify3_left, \
(void *)_verify3_right); \
} while (0)
#define VERIFY0(RIGHT) do { \
const int64_t _verify3_left = (int64_t)(0); \
const int64_t _verify3_right = (int64_t)(RIGHT); \
if (unlikely(!(_verify3_left == _verify3_right))) \
const int64_t _verify0_right = (int64_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0(0 == " #RIGHT ") " \
"VERIFY0(" #RIGHT ") " \
"failed (0 == %lld)\n", \
(long long) (_verify3_right)); \
(long long)_verify0_right); \
} while (0)
#define VERIFY0P(RIGHT) do { \
const uintptr_t _verify0_right = (uintptr_t)(RIGHT); \
if (unlikely(!(0 == _verify0_right))) \
spl_panic(__FILE__, __FUNCTION__, __LINE__, \
"VERIFY0P(" #RIGHT ") " \
"failed (NULL == %px)\n", \
(void *)_verify0_right); \
} while (0)
#define VERIFY_IMPLY(A, B) \
@@ -165,6 +175,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#define ASSERT3P(x, y, z) \
((void) sizeof ((uintptr_t)(x)), (void) sizeof ((uintptr_t)(z)))
#define ASSERT0(x) ((void) sizeof ((uintptr_t)(x)))
#define ASSERT0P(x) ((void) sizeof ((uintptr_t)(x)))
#define IMPLY(A, B) \
((void) sizeof ((uintptr_t)(A)), (void) sizeof ((uintptr_t)(B)))
#define EQUIV(A, B) \
@@ -180,6 +191,7 @@ spl_assert(const char *buf, const char *file, const char *func, int line)
#define ASSERT3U VERIFY3U
#define ASSERT3P VERIFY3P
#define ASSERT0 VERIFY0
#define ASSERT0P VERIFY0P
#define ASSERT VERIFY
#define IMPLY VERIFY_IMPLY
#define EQUIV VERIFY_EQUIV
+1 -1
View File
@@ -104,7 +104,7 @@ typedef struct taskq {
/* list node for the cpu hotplug callback */
struct hlist_node tq_hp_cb_node;
boolean_t tq_hp_support;
unsigned long lastshouldstop; /* when to purge dynamic */
unsigned long lastspawnstop; /* when to purge dynamic */
} taskq_t;
typedef struct taskq_ent {
+10 -4
View File
@@ -51,7 +51,9 @@
__field(uint64_t, zl_parse_lr_seq) \
__field(uint64_t, zl_parse_blk_count) \
__field(uint64_t, zl_parse_lr_count) \
__field(uint64_t, zl_cur_used) \
__field(uint64_t, zl_cur_size) \
__field(uint64_t, zl_cur_left) \
__field(uint64_t, zl_cur_max) \
__field(clock_t, zl_replay_time) \
__field(uint64_t, zl_replay_blks)
@@ -72,7 +74,9 @@
__entry->zl_parse_lr_seq = zilog->zl_parse_lr_seq; \
__entry->zl_parse_blk_count = zilog->zl_parse_blk_count;\
__entry->zl_parse_lr_count = zilog->zl_parse_lr_count; \
__entry->zl_cur_used = zilog->zl_cur_used; \
__entry->zl_cur_size = zilog->zl_cur_size; \
__entry->zl_cur_left = zilog->zl_cur_left; \
__entry->zl_cur_max = zilog->zl_cur_max; \
__entry->zl_replay_time = zilog->zl_replay_time; \
__entry->zl_replay_blks = zilog->zl_replay_blks;
@@ -82,7 +86,8 @@
"replay %u stop_sync %u logbias %u sync %u " \
"parse_error %u parse_blk_seq %llu parse_lr_seq %llu " \
"parse_blk_count %llu parse_lr_count %llu " \
"cur_used %llu replay_time %lu replay_blks %llu }"
"cur_size %llu cur_left %llu cur_max %llu replay_time %lu " \
"replay_blks %llu }"
#define ZILOG_TP_PRINTK_ARGS \
__entry->zl_lr_seq, __entry->zl_commit_lr_seq, \
@@ -92,7 +97,8 @@
__entry->zl_stop_sync, __entry->zl_logbias, __entry->zl_sync, \
__entry->zl_parse_error, __entry->zl_parse_blk_seq, \
__entry->zl_parse_lr_seq, __entry->zl_parse_blk_count, \
__entry->zl_parse_lr_count, __entry->zl_cur_used, \
__entry->zl_parse_lr_count, __entry->zl_cur_size, \
__entry->zl_cur_left, __entry->zl_cur_max, \
__entry->zl_replay_time, __entry->zl_replay_blks
#define ITX_TP_STRUCT_ENTRY \
+9
View File
@@ -79,6 +79,9 @@ typedef struct abd {
typedef int abd_iter_func_t(void *buf, size_t len, void *priv);
typedef int abd_iter_func2_t(void *bufa, void *bufb, size_t len, void *priv);
#if defined(__linux__) && defined(_KERNEL)
typedef int abd_iter_page_func_t(struct page *, size_t, size_t, void *);
#endif
extern int zfs_abd_scatter_enabled;
@@ -125,6 +128,10 @@ void abd_release_ownership_of_buf(abd_t *);
int abd_iterate_func(abd_t *, size_t, size_t, abd_iter_func_t *, void *);
int abd_iterate_func2(abd_t *, abd_t *, size_t, size_t, size_t,
abd_iter_func2_t *, void *);
#if defined(__linux__) && defined(_KERNEL)
int abd_iterate_page_func(abd_t *, size_t, size_t, abd_iter_page_func_t *,
void *);
#endif
void abd_copy_off(abd_t *, abd_t *, size_t, size_t, size_t);
void abd_copy_from_buf_off(abd_t *, const void *, size_t, size_t);
void abd_copy_to_buf_off(void *, abd_t *, size_t, size_t);
@@ -213,6 +220,8 @@ void abd_fini(void);
/*
* Linux ABD bio functions
* Note: these are only needed to support vdev_classic. See comment in
* vdev_disk.c.
*/
#if defined(__linux__) && defined(_KERNEL)
unsigned int abd_bio_map_off(struct bio *, abd_t *, unsigned int, size_t);
+23 -3
View File
@@ -21,6 +21,7 @@
/*
* Copyright (c) 2014 by Chunwei Chen. All rights reserved.
* Copyright (c) 2016, 2019 by Delphix. All rights reserved.
* Copyright (c) 2023, 2024, Klara Inc.
*/
#ifndef _ABD_IMPL_H
@@ -38,12 +39,30 @@ typedef enum abd_stats_op {
ABDSTAT_DECR /* Decrease abdstat values */
} abd_stats_op_t;
struct scatterlist; /* forward declaration */
/* forward declarations */
struct scatterlist;
struct page;
struct abd_iter {
/* public interface */
void *iter_mapaddr; /* addr corresponding to iter_pos */
size_t iter_mapsize; /* length of data valid at mapaddr */
union {
/* for abd_iter_map()/abd_iter_unmap() */
struct {
/* addr corresponding to iter_pos */
void *iter_mapaddr;
/* length of data valid at mapaddr */
size_t iter_mapsize;
};
/* for abd_iter_page() */
struct {
/* current page */
struct page *iter_page;
/* offset of data in page */
size_t iter_page_doff;
/* size of data in page */
size_t iter_page_dsize;
};
};
/* private */
abd_t *iter_abd; /* ABD being iterated through */
@@ -78,6 +97,7 @@ boolean_t abd_iter_at_end(struct abd_iter *);
void abd_iter_advance(struct abd_iter *, size_t);
void abd_iter_map(struct abd_iter *);
void abd_iter_unmap(struct abd_iter *);
void abd_iter_page(struct abd_iter *);
/*
* Helper macros
+3 -2
View File
@@ -739,8 +739,6 @@ void *dmu_buf_remove_user(dmu_buf_t *db, dmu_buf_user_t *user);
void *dmu_buf_get_user(dmu_buf_t *db);
objset_t *dmu_buf_get_objset(dmu_buf_t *db);
dnode_t *dmu_buf_dnode_enter(dmu_buf_t *db);
void dmu_buf_dnode_exit(dmu_buf_t *db);
/* Block until any in-progress dmu buf user evictions complete. */
void dmu_buf_user_evict_wait(void);
@@ -889,6 +887,9 @@ extern uint_t zfs_max_recordsize;
*/
void dmu_prefetch(objset_t *os, uint64_t object, int64_t level, uint64_t offset,
uint64_t len, enum zio_priority pri);
void dmu_prefetch_by_dnode(dnode_t *dn, int64_t level, uint64_t offset,
uint64_t len, enum zio_priority pri);
void dmu_prefetch_dnode(objset_t *os, uint64_t object, enum zio_priority pri);
typedef struct dmu_object_info {
/* All sizes are in bytes unless otherwise indicated. */
+1
View File
@@ -132,6 +132,7 @@ struct objset {
zfs_logbias_op_t os_logbias;
zfs_cache_type_t os_primary_cache;
zfs_cache_type_t os_secondary_cache;
zfs_prefetch_type_t os_prefetch;
zfs_sync_type_t os_sync;
zfs_redundant_metadata_type_t os_redundant_metadata;
uint64_t os_recordsize;
+11 -5
View File
@@ -45,18 +45,24 @@ typedef struct zfetch {
int zf_numstreams; /* number of zstream_t's */
} zfetch_t;
typedef struct zsrange {
uint16_t start;
uint16_t end;
} zsrange_t;
#define ZFETCH_RANGES 9 /* Fits zstream_t into 128 bytes */
typedef struct zstream {
list_node_t zs_node; /* link for zf_stream */
uint64_t zs_blkid; /* expect next access at this blkid */
uint_t zs_atime; /* time last prefetch issued */
zsrange_t zs_ranges[ZFETCH_RANGES]; /* ranges from future */
unsigned int zs_pf_dist; /* data prefetch distance in bytes */
unsigned int zs_ipf_dist; /* L1 prefetch distance in bytes */
uint64_t zs_pf_start; /* first data block to prefetch */
uint64_t zs_pf_end; /* data block to prefetch up to */
uint64_t zs_ipf_start; /* first data block to prefetch L1 */
uint64_t zs_ipf_end; /* data block to prefetch L1 up to */
list_node_t zs_node; /* link for zf_stream */
hrtime_t zs_atime; /* time last prefetch issued */
zfetch_t *zs_fetch; /* parent fetch */
boolean_t zs_missed; /* stream saw cache misses */
boolean_t zs_more; /* need more distant prefetch */
zfs_refcount_t zs_callers; /* number of pending callers */
@@ -74,7 +80,7 @@ void dmu_zfetch_init(zfetch_t *, struct dnode *);
void dmu_zfetch_fini(zfetch_t *);
zstream_t *dmu_zfetch_prepare(zfetch_t *, uint64_t, uint64_t, boolean_t,
boolean_t);
void dmu_zfetch_run(zstream_t *, boolean_t, boolean_t);
void dmu_zfetch_run(zfetch_t *, zstream_t *, boolean_t, boolean_t);
void dmu_zfetch(zfetch_t *, uint64_t, uint64_t, boolean_t, boolean_t,
boolean_t);
+2
View File
@@ -82,6 +82,8 @@ extern "C" {
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_CKSUM_T "vdev_cksum_t"
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_N "vdev_io_n"
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_IO_T "vdev_io_t"
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_N "vdev_slow_io_n"
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_SLOW_IO_T "vdev_slow_io_t"
#define FM_EREPORT_PAYLOAD_ZFS_VDEV_DELAYS "vdev_delays"
#define FM_EREPORT_PAYLOAD_ZFS_PARENT_GUID "parent_guid"
#define FM_EREPORT_PAYLOAD_ZFS_PARENT_TYPE "parent_type"
+13 -1
View File
@@ -21,7 +21,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2013, 2017 Joyent, Inc. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
@@ -191,6 +191,7 @@ typedef enum {
ZFS_PROP_REDACTED,
ZFS_PROP_REDACT_SNAPS,
ZFS_PROP_SNAPSHOTS_CHANGED,
ZFS_PROP_PREFETCH,
ZFS_NUM_PROPS
} zfs_prop_t;
@@ -363,6 +364,9 @@ typedef enum {
VDEV_PROP_CHECKSUM_T,
VDEV_PROP_IO_N,
VDEV_PROP_IO_T,
VDEV_PROP_RAIDZ_EXPANDING,
VDEV_PROP_SLOW_IO_N,
VDEV_PROP_SLOW_IO_T,
VDEV_NUM_PROPS
} vdev_prop_t;
@@ -543,6 +547,12 @@ typedef enum zfs_key_location {
ZFS_KEYLOCATION_LOCATIONS
} zfs_keylocation_t;
typedef enum {
ZFS_PREFETCH_NONE = 0,
ZFS_PREFETCH_METADATA = 1,
ZFS_PREFETCH_ALL = 2
} zfs_prefetch_type_t;
#define DEFAULT_PBKDF2_ITERATIONS 350000
#define MIN_PBKDF2_ITERATIONS 100000
@@ -1569,6 +1579,8 @@ typedef enum {
ZFS_ERR_NOT_USER_NAMESPACE,
ZFS_ERR_RESUME_EXISTS,
ZFS_ERR_CRYPTO_NOTSUP,
ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS,
ZFS_ERR_ASHIFT_MISMATCH,
} zfs_errno_t;
/*
+4 -1
View File
@@ -82,12 +82,15 @@ int multilist_is_empty(multilist_t *);
unsigned int multilist_get_num_sublists(multilist_t *);
unsigned int multilist_get_random_index(multilist_t *);
multilist_sublist_t *multilist_sublist_lock(multilist_t *, unsigned int);
void multilist_sublist_lock(multilist_sublist_t *);
multilist_sublist_t *multilist_sublist_lock_idx(multilist_t *, unsigned int);
multilist_sublist_t *multilist_sublist_lock_obj(multilist_t *, void *);
void multilist_sublist_unlock(multilist_sublist_t *);
void multilist_sublist_insert_head(multilist_sublist_t *, void *);
void multilist_sublist_insert_tail(multilist_sublist_t *, void *);
void multilist_sublist_insert_after(multilist_sublist_t *, void *, void *);
void multilist_sublist_insert_before(multilist_sublist_t *, void *, void *);
void multilist_sublist_move_forward(multilist_sublist_t *mls, void *obj);
void multilist_sublist_remove(multilist_sublist_t *, void *);
int multilist_sublist_is_empty(multilist_sublist_t *);
+9 -3
View File
@@ -20,7 +20,7 @@
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2021 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright 2013 Saso Kiselkov. All rights reserved.
@@ -769,7 +769,7 @@ extern int bpobj_enqueue_free_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx);
#define SPA_ASYNC_CONFIG_UPDATE 0x01
#define SPA_ASYNC_REMOVE 0x02
#define SPA_ASYNC_PROBE 0x04
#define SPA_ASYNC_FAULT_VDEV 0x04
#define SPA_ASYNC_RESILVER_DONE 0x08
#define SPA_ASYNC_RESILVER 0x10
#define SPA_ASYNC_AUTOEXPAND 0x20
@@ -784,7 +784,7 @@ extern int bpobj_enqueue_free_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx);
#define SPA_ASYNC_DETACH_SPARE 0x4000
/* device manipulation */
extern int spa_vdev_add(spa_t *spa, nvlist_t *nvroot);
extern int spa_vdev_add(spa_t *spa, nvlist_t *nvroot, boolean_t ashift_check);
extern int spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot,
int replacing, int rebuild);
extern int spa_vdev_detach(spa_t *spa, uint64_t guid, uint64_t pguid,
@@ -966,6 +966,10 @@ extern int spa_import_progress_set_max_txg(uint64_t pool_guid,
uint64_t max_txg);
extern int spa_import_progress_set_state(uint64_t pool_guid,
spa_load_state_t spa_load_state);
extern void spa_import_progress_set_notes(spa_t *spa,
const char *fmt, ...) __printflike(2, 3);
extern void spa_import_progress_set_notes_nolog(spa_t *spa,
const char *fmt, ...) __printflike(2, 3);
/* Pool configuration locks */
extern int spa_config_tryenter(spa_t *spa, int locks, const void *tag,
@@ -1109,6 +1113,8 @@ extern uint32_t spa_get_hostid(spa_t *spa);
extern void spa_activate_allocation_classes(spa_t *, dmu_tx_t *);
extern boolean_t spa_livelist_delete_check(spa_t *spa);
extern boolean_t spa_mmp_remote_host_activity(spa_t *spa);
extern spa_mode_t spa_mode(spa_t *spa);
extern uint64_t zfs_strtonum(const char *str, char **nptr);
+8 -8
View File
@@ -50,20 +50,20 @@ extern "C" {
#define MMP_SEQ_VALID_BIT 0x02
#define MMP_FAIL_INT_VALID_BIT 0x04
#define MMP_VALID(ubp) (ubp->ub_magic == UBERBLOCK_MAGIC && \
ubp->ub_mmp_magic == MMP_MAGIC)
#define MMP_INTERVAL_VALID(ubp) (MMP_VALID(ubp) && (ubp->ub_mmp_config & \
#define MMP_VALID(ubp) ((ubp)->ub_magic == UBERBLOCK_MAGIC && \
(ubp)->ub_mmp_magic == MMP_MAGIC)
#define MMP_INTERVAL_VALID(ubp) (MMP_VALID(ubp) && ((ubp)->ub_mmp_config & \
MMP_INTERVAL_VALID_BIT))
#define MMP_SEQ_VALID(ubp) (MMP_VALID(ubp) && (ubp->ub_mmp_config & \
#define MMP_SEQ_VALID(ubp) (MMP_VALID(ubp) && ((ubp)->ub_mmp_config & \
MMP_SEQ_VALID_BIT))
#define MMP_FAIL_INT_VALID(ubp) (MMP_VALID(ubp) && (ubp->ub_mmp_config & \
#define MMP_FAIL_INT_VALID(ubp) (MMP_VALID(ubp) && ((ubp)->ub_mmp_config & \
MMP_FAIL_INT_VALID_BIT))
#define MMP_INTERVAL(ubp) ((ubp->ub_mmp_config & 0x00000000FFFFFF00) \
#define MMP_INTERVAL(ubp) (((ubp)->ub_mmp_config & 0x00000000FFFFFF00) \
>> 8)
#define MMP_SEQ(ubp) ((ubp->ub_mmp_config & 0x0000FFFF00000000) \
#define MMP_SEQ(ubp) (((ubp)->ub_mmp_config & 0x0000FFFF00000000) \
>> 32)
#define MMP_FAIL_INT(ubp) ((ubp->ub_mmp_config & 0xFFFF000000000000) \
#define MMP_FAIL_INT(ubp) (((ubp)->ub_mmp_config & 0xFFFF000000000000) \
>> 48)
#define MMP_INTERVAL_SET(write) \
+5 -2
View File
@@ -22,6 +22,7 @@
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2017, Intel Corporation.
* Copyright (c) 2023, Klara Inc.
*/
#ifndef _SYS_VDEV_IMPL_H
@@ -273,7 +274,7 @@ struct vdev {
txg_list_t vdev_dtl_list; /* per-txg dirty DTL lists */
txg_node_t vdev_txg_node; /* per-txg dirty vdev linkage */
boolean_t vdev_remove_wanted; /* async remove wanted? */
boolean_t vdev_probe_wanted; /* async probe wanted? */
boolean_t vdev_fault_wanted; /* async faulted wanted? */
list_node_t vdev_config_dirty_node; /* config dirty list */
list_node_t vdev_state_dirty_node; /* state dirty list */
uint64_t vdev_deflate_ratio; /* deflation ratio (x512) */
@@ -453,12 +454,14 @@ struct vdev {
zfs_ratelimit_t vdev_checksum_rl;
/*
* Checksum and IO thresholds for tuning ZED
* Vdev properties for tuning ZED or zfsd
*/
uint64_t vdev_checksum_n;
uint64_t vdev_checksum_t;
uint64_t vdev_io_n;
uint64_t vdev_io_t;
uint64_t vdev_slow_io_n;
uint64_t vdev_slow_io_t;
};
#define VDEV_PAD_SIZE (8 << 10)
+8
View File
@@ -253,6 +253,9 @@ int zap_add_by_dnode(dnode_t *dn, const char *key,
int zap_add_uint64(objset_t *ds, uint64_t zapobj, const uint64_t *key,
int key_numints, int integer_size, uint64_t num_integers,
const void *val, dmu_tx_t *tx);
int zap_add_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints, int integer_size, uint64_t num_integers,
const void *val, dmu_tx_t *tx);
/*
* Set the attribute with the given name to the given value. If an
@@ -267,6 +270,9 @@ int zap_update(objset_t *ds, uint64_t zapobj, const char *name,
int zap_update_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints,
int integer_size, uint64_t num_integers, const void *val, dmu_tx_t *tx);
int zap_update_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints,
int integer_size, uint64_t num_integers, const void *val, dmu_tx_t *tx);
/*
* Get the length (in integers) and the integer size of the specified
@@ -292,6 +298,8 @@ int zap_remove_norm(objset_t *ds, uint64_t zapobj, const char *name,
int zap_remove_by_dnode(dnode_t *dn, const char *name, dmu_tx_t *tx);
int zap_remove_uint64(objset_t *os, uint64_t zapobj, const uint64_t *key,
int key_numints, dmu_tx_t *tx);
int zap_remove_uint64_by_dnode(dnode_t *dn, const uint64_t *key,
int key_numints, dmu_tx_t *tx);
/*
* Returns (in *count) the number of attributes in the specified zap
+1
View File
@@ -145,6 +145,7 @@ typedef struct zap {
dmu_buf_user_t zap_dbu;
objset_t *zap_objset;
uint64_t zap_object;
dnode_t *zap_dnode;
struct dmu_buf *zap_dbuf;
krwlock_t zap_rwlock;
boolean_t zap_ismicro;
+5 -5
View File
@@ -47,7 +47,7 @@ struct zap_stats;
* entries - header space (2*chunksize)
*/
#define ZAP_LEAF_NUMCHUNKS_BS(bs) \
(((1<<(bs)) - 2*ZAP_LEAF_HASH_NUMENTRIES_BS(bs)) / \
(((1U << (bs)) - 2 * ZAP_LEAF_HASH_NUMENTRIES_BS(bs)) / \
ZAP_LEAF_CHUNKSIZE - 2)
#define ZAP_LEAF_NUMCHUNKS(l) (ZAP_LEAF_NUMCHUNKS_BS(((l)->l_bs)))
@@ -80,7 +80,7 @@ struct zap_stats;
* chunks per entry (3).
*/
#define ZAP_LEAF_HASH_SHIFT_BS(bs) ((bs) - 5)
#define ZAP_LEAF_HASH_NUMENTRIES_BS(bs) (1 << ZAP_LEAF_HASH_SHIFT_BS(bs))
#define ZAP_LEAF_HASH_NUMENTRIES_BS(bs) (1U << ZAP_LEAF_HASH_SHIFT_BS(bs))
#define ZAP_LEAF_HASH_SHIFT(l) (ZAP_LEAF_HASH_SHIFT_BS(((l)->l_bs)))
#define ZAP_LEAF_HASH_NUMENTRIES(l) (ZAP_LEAF_HASH_NUMENTRIES_BS(((l)->l_bs)))
@@ -132,7 +132,7 @@ typedef struct zap_leaf_phys {
* with the ZAP_LEAF_CHUNK() macro.
*/
uint16_t l_hash[1];
uint16_t l_hash[];
} zap_leaf_phys_t;
typedef union zap_leaf_chunk {
@@ -163,7 +163,7 @@ typedef struct zap_leaf {
dmu_buf_user_t l_dbu;
krwlock_t l_rwlock;
uint64_t l_blkid; /* 1<<ZAP_BLOCK_SHIFT byte block off */
int l_bs; /* block size shift */
uint_t l_bs; /* block size shift */
dmu_buf_t *l_dbuf;
} zap_leaf_t;
@@ -243,7 +243,7 @@ extern boolean_t zap_entry_normalization_conflict(zap_entry_handle_t *zeh,
*/
extern void zap_leaf_init(zap_leaf_t *l, boolean_t sort);
extern void zap_leaf_byteswap(zap_leaf_phys_t *buf, int len);
extern void zap_leaf_byteswap(zap_leaf_phys_t *buf, size_t len);
extern void zap_leaf_split(zap_leaf_t *l, zap_leaf_t *nl, boolean_t sort);
extern void zap_leaf_stats(struct zap *zap, zap_leaf_t *l,
struct zap_stats *zs);
+7 -3
View File
@@ -181,7 +181,7 @@ typedef struct zil_vdev_node {
avl_node_t zv_node; /* AVL tree linkage */
} zil_vdev_node_t;
#define ZIL_PREV_BLKS 16
#define ZIL_BURSTS 8
/*
* Stable storage intent log management structure. One per dataset.
@@ -216,14 +216,18 @@ struct zilog {
uint64_t zl_parse_lr_count; /* number of log records parsed */
itxg_t zl_itxg[TXG_SIZE]; /* intent log txg chains */
list_t zl_itx_commit_list; /* itx list to be committed */
uint64_t zl_cur_used; /* current commit log size used */
uint64_t zl_cur_size; /* current burst full size */
uint64_t zl_cur_left; /* current burst remaining size */
uint64_t zl_cur_max; /* biggest record in current burst */
list_t zl_lwb_list; /* in-flight log write list */
avl_tree_t zl_bp_tree; /* track bps during log parse */
clock_t zl_replay_time; /* lbolt of when replay started */
uint64_t zl_replay_blks; /* number of log blocks replayed */
zil_header_t zl_old_header; /* debugging aid */
uint_t zl_prev_blks[ZIL_PREV_BLKS]; /* size - sector rounded */
uint_t zl_parallel; /* workload is multi-threaded */
uint_t zl_prev_rotor; /* rotor for zl_prev[] */
uint_t zl_prev_opt[ZIL_BURSTS]; /* optimal block size */
uint_t zl_prev_min[ZIL_BURSTS]; /* minimal first block size */
txg_node_t zl_dirty_link; /* protected by dp_dirty_zilogs list */
uint64_t zl_dirty_max_txg; /* highest txg used to dirty zilog */
+13 -2
View File
@@ -110,8 +110,8 @@ do { \
const uintptr_t __right = (uintptr_t)(RIGHT); \
if (!(__left OP __right)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s %s %s (0x%llx %s 0x%llx)", #LEFT, #OP, #RIGHT, \
(u_longlong_t)__left, #OP, (u_longlong_t)__right); \
"%s %s %s (%p %s %p)", #LEFT, #OP, #RIGHT, \
(void *)__left, #OP, (void *)__right); \
} while (0)
#define VERIFY0(LEFT) \
@@ -123,6 +123,15 @@ do { \
(u_longlong_t)__left); \
} while (0)
#define VERIFY0P(LEFT) \
do { \
const uintptr_t __left = (uintptr_t)(LEFT); \
if (!(__left == 0)) \
libspl_assertf(__FILE__, __FUNCTION__, __LINE__, \
"%s == 0 (%p == 0)", #LEFT, \
(void *)__left); \
} while (0)
#ifdef assert
#undef assert
#endif
@@ -137,6 +146,7 @@ do { \
#define ASSERT3P(x, y, z) \
((void) sizeof ((uintptr_t)(x)), (void) sizeof ((uintptr_t)(z)))
#define ASSERT0(x) ((void) sizeof ((uintptr_t)(x)))
#define ASSERT0P(x) ((void) sizeof ((uintptr_t)(x)))
#define ASSERT(x) ((void) sizeof ((uintptr_t)(x)))
#define assert(x) ((void) sizeof ((uintptr_t)(x)))
#define IMPLY(A, B) \
@@ -149,6 +159,7 @@ do { \
#define ASSERT3U VERIFY3U
#define ASSERT3P VERIFY3P
#define ASSERT0 VERIFY0
#define ASSERT0P VERIFY0P
#define ASSERT VERIFY
#define assert VERIFY
#define IMPLY(A, B) \
+10 -4
View File
@@ -505,14 +505,20 @@ uu_list_walk(uu_list_t *lp, uu_walk_fn_t *func, void *private, uint32_t flags)
}
if (lp->ul_debug || robust) {
uu_list_walk_t my_walk;
uu_list_walk_t *my_walk;
void *e;
list_walk_init(&my_walk, lp, flags);
my_walk = uu_zalloc(sizeof (*my_walk));
if (my_walk == NULL)
return (-1);
list_walk_init(my_walk, lp, flags);
while (status == UU_WALK_NEXT &&
(e = uu_list_walk_next(&my_walk)) != NULL)
(e = uu_list_walk_next(my_walk)) != NULL)
status = (*func)(e, private);
list_walk_fini(&my_walk);
list_walk_fini(my_walk);
uu_free(my_walk);
} else {
if (!reverse) {
for (np = lp->ul_null_node.uln_next;
+68 -16
View File
@@ -1112,14 +1112,11 @@
<var-decl name='prev' type-id='b03eadb4' visibility='default'/>
</data-member>
</class-decl>
<class-decl name='list' size-in-bits='256' is-struct='yes' visibility='default' id='e824dae9'>
<class-decl name='list' size-in-bits='192' is-struct='yes' visibility='default' id='e824dae9'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='list_size' type-id='b59d7dce' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='64'>
<var-decl name='list_offset' type-id='b59d7dce' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='128'>
<data-member access='public' layout-offset-in-bits='64'>
<var-decl name='list_head' type-id='b0b5e45e' visibility='default'/>
</data-member>
</class-decl>
@@ -1870,7 +1867,8 @@
<enumerator name='ZFS_PROP_REDACTED' value='93'/>
<enumerator name='ZFS_PROP_REDACT_SNAPS' value='94'/>
<enumerator name='ZFS_PROP_SNAPSHOTS_CHANGED' value='95'/>
<enumerator name='ZFS_NUM_PROPS' value='96'/>
<enumerator name='ZFS_PROP_PREFETCH' value='96'/>
<enumerator name='ZFS_NUM_PROPS' value='97'/>
</enum-decl>
<typedef-decl name='zfs_prop_t' type-id='4b000d60' id='58603c44'/>
<enum-decl name='zprop_source_t' naming-typedef-id='a2256d42' id='5903f80e'>
@@ -2878,6 +2876,9 @@
</function-type>
</abi-instr>
<abi-instr address-size='64' path='lib/libzfs/libzfs_crypto.c' language='LANG_C99'>
<array-type-def dimensions='1' type-id='38b51b3c' size-in-bits='832' id='02b72c00'>
<subrange length='13' type-id='7359adad' id='487fded1'/>
</array-type-def>
<array-type-def dimensions='1' type-id='fb7c6451' size-in-bits='256' id='64177143'>
<subrange length='32' type-id='7359adad' id='ae5bde82'/>
</array-type-def>
@@ -2890,6 +2891,10 @@
<class-decl name='_IO_codecvt' is-struct='yes' visibility='default' is-declaration-only='yes' id='a4036571'/>
<class-decl name='_IO_marker' is-struct='yes' visibility='default' is-declaration-only='yes' id='010ae0b9'/>
<class-decl name='_IO_wide_data' is-struct='yes' visibility='default' is-declaration-only='yes' id='79bd3751'/>
<class-decl name='__locale_data' is-struct='yes' visibility='default' is-declaration-only='yes' id='23de8b96'/>
<array-type-def dimensions='1' type-id='80f4b756' size-in-bits='832' id='39e6f84a'>
<subrange length='13' type-id='7359adad' id='487fded1'/>
</array-type-def>
<array-type-def dimensions='1' type-id='95e97e5e' size-in-bits='896' id='47394ee0'>
<subrange length='28' type-id='7359adad' id='3db583d7'/>
</array-type-def>
@@ -3010,6 +3015,24 @@
<typedef-decl name='__clock_t' type-id='bd54fe1a' id='4d66c6d7'/>
<typedef-decl name='__ssize_t' type-id='bd54fe1a' id='41060289'/>
<typedef-decl name='FILE' type-id='ec1ed955' id='aa12d1ba'/>
<class-decl name='__locale_struct' size-in-bits='1856' is-struct='yes' visibility='default' id='90cc1ce3'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='__locales' type-id='02b72c00' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='832'>
<var-decl name='__ctype_b' type-id='31347b7a' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='896'>
<var-decl name='__ctype_tolower' type-id='6d60f45d' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='960'>
<var-decl name='__ctype_toupper' type-id='6d60f45d' visibility='default'/>
</data-member>
<data-member access='public' layout-offset-in-bits='1024'>
<var-decl name='__names' type-id='39e6f84a' visibility='default'/>
</data-member>
</class-decl>
<typedef-decl name='__locale_t' type-id='f01e1813' id='b7ac9b5f'/>
<class-decl name='__sigset_t' size-in-bits='1024' is-struct='yes' naming-typedef-id='b9c97942' visibility='default' id='2616147f'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='__val' type-id='d2baa450' visibility='default'/>
@@ -3025,6 +3048,7 @@
</data-member>
</union-decl>
<typedef-decl name='__sigval_t' type-id='a094b870' id='eabacd01'/>
<typedef-decl name='locale_t' type-id='b7ac9b5f' id='973a4f8d'/>
<class-decl name='siginfo_t' size-in-bits='1024' is-struct='yes' naming-typedef-id='cb681f62' visibility='default' id='d8149419'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='si_signo' type-id='95e97e5e' visibility='default'/>
@@ -3260,9 +3284,13 @@
<pointer-type-def type-id='bb4788fa' size-in-bits='64' id='cecf4ea7'/>
<pointer-type-def type-id='010ae0b9' size-in-bits='64' id='e4c6fa61'/>
<pointer-type-def type-id='79bd3751' size-in-bits='64' id='c65a1f29'/>
<pointer-type-def type-id='23de8b96' size-in-bits='64' id='38b51b3c'/>
<pointer-type-def type-id='90cc1ce3' size-in-bits='64' id='f01e1813'/>
<qualified-type-def type-id='9b23c9ad' restrict='yes' id='8c85230f'/>
<qualified-type-def type-id='80f4b756' restrict='yes' id='9d26089a'/>
<pointer-type-def type-id='80f4b756' size-in-bits='64' id='7d3cd834'/>
<qualified-type-def type-id='95e97e5e' const='yes' id='2448a865'/>
<pointer-type-def type-id='2448a865' size-in-bits='64' id='6d60f45d'/>
<qualified-type-def type-id='aca3bac8' const='yes' id='2498fd78'/>
<pointer-type-def type-id='2498fd78' size-in-bits='64' id='eed6c816'/>
<qualified-type-def type-id='eed6c816' restrict='yes' id='a431a9da'/>
@@ -3295,6 +3323,7 @@
<class-decl name='_IO_codecvt' is-struct='yes' visibility='default' is-declaration-only='yes' id='a4036571'/>
<class-decl name='_IO_marker' is-struct='yes' visibility='default' is-declaration-only='yes' id='010ae0b9'/>
<class-decl name='_IO_wide_data' is-struct='yes' visibility='default' is-declaration-only='yes' id='79bd3751'/>
<class-decl name='__locale_data' is-struct='yes' visibility='default' is-declaration-only='yes' id='23de8b96'/>
<function-decl name='zpool_get_prop_int' mangled-name='zpool_get_prop_int' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zpool_get_prop_int'>
<parameter type-id='4c81de99'/>
<parameter type-id='5d0c23fb'/>
@@ -3399,6 +3428,10 @@
<function-decl name='dlerror' visibility='default' binding='global' size-in-bits='64'>
<return type-id='26a90f95'/>
</function-decl>
<function-decl name='uselocale' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='973a4f8d'/>
<return type-id='973a4f8d'/>
</function-decl>
<function-decl name='PKCS5_PBKDF2_HMAC_SHA1' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='80f4b756'/>
<parameter type-id='95e97e5e'/>
@@ -3482,8 +3515,9 @@
<parameter type-id='80f4b756'/>
<return type-id='26a90f95'/>
</function-decl>
<function-decl name='strerror' visibility='default' binding='global' size-in-bits='64'>
<function-decl name='strerror_l' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='95e97e5e'/>
<parameter type-id='973a4f8d'/>
<return type-id='26a90f95'/>
</function-decl>
<function-decl name='tcgetattr' visibility='default' binding='global' size-in-bits='64'>
@@ -3840,12 +3874,18 @@
<qualified-type-def type-id='9c313c2d' const='yes' id='c3b7ba7d'/>
<pointer-type-def type-id='c3b7ba7d' size-in-bits='64' id='713a56f5'/>
<pointer-type-def type-id='01a1b934' size-in-bits='64' id='566b3f52'/>
<qualified-type-def type-id='566b3f52' restrict='yes' id='c878edd6'/>
<pointer-type-def type-id='566b3f52' size-in-bits='64' id='82d4e9e8'/>
<qualified-type-def type-id='82d4e9e8' restrict='yes' id='aa19c230'/>
<pointer-type-def type-id='7e291ce6' size-in-bits='64' id='ca64ff60'/>
<pointer-type-def type-id='9da381c4' size-in-bits='64' id='cb785ebf'/>
<pointer-type-def type-id='1b055409' size-in-bits='64' id='9d424d31'/>
<pointer-type-def type-id='8e0af06e' size-in-bits='64' id='053457bd'/>
<pointer-type-def type-id='857bb57e' size-in-bits='64' id='75be733c'/>
<pointer-type-def type-id='a63d15a3' size-in-bits='64' id='a195f4a3'/>
<qualified-type-def type-id='a195f4a3' restrict='yes' id='33518961'/>
<pointer-type-def type-id='a195f4a3' size-in-bits='64' id='e80ff3ab'/>
<qualified-type-def type-id='e80ff3ab' restrict='yes' id='8f2c7109'/>
<pointer-type-def type-id='eae6431d' size-in-bits='64' id='0d41d328'/>
<pointer-type-def type-id='7a6844eb' size-in-bits='64' id='18c91f9e'/>
<pointer-type-def type-id='dddf6ca2' size-in-bits='64' id='d915a820'/>
@@ -4278,9 +4318,13 @@
<parameter type-id='9d424d31'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='getgrnam' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='80f4b756'/>
<return type-id='566b3f52'/>
<function-decl name='getgrnam_r' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='9d26089a'/>
<parameter type-id='c878edd6'/>
<parameter type-id='266fe297'/>
<parameter type-id='b59d7dce'/>
<parameter type-id='aa19c230'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='hasmntopt' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='48bea5ec'/>
@@ -4304,9 +4348,13 @@
<parameter type-id='18c91f9e'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='getpwnam' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='80f4b756'/>
<return type-id='a195f4a3'/>
<function-decl name='getpwnam_r' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='9d26089a'/>
<parameter type-id='33518961'/>
<parameter type-id='266fe297'/>
<parameter type-id='b59d7dce'/>
<parameter type-id='8f2c7109'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='strtol' visibility='default' binding='global' size-in-bits='64'>
<parameter type-id='9d26089a'/>
@@ -5671,7 +5719,10 @@
<enumerator name='VDEV_PROP_CHECKSUM_T' value='43'/>
<enumerator name='VDEV_PROP_IO_N' value='44'/>
<enumerator name='VDEV_PROP_IO_T' value='45'/>
<enumerator name='VDEV_NUM_PROPS' value='46'/>
<enumerator name='VDEV_PROP_RAIDZ_EXPANDING' value='46'/>
<enumerator name='VDEV_PROP_SLOW_IO_N' value='47'/>
<enumerator name='VDEV_PROP_SLOW_IO_T' value='48'/>
<enumerator name='VDEV_NUM_PROPS' value='49'/>
</enum-decl>
<typedef-decl name='vdev_prop_t' type-id='1573bec8' id='5aa5c90c'/>
<class-decl name='zpool_load_policy' size-in-bits='256' is-struct='yes' visibility='default' id='2f65b36f'>
@@ -6242,6 +6293,7 @@
<function-decl name='zpool_add' mangled-name='zpool_add' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zpool_add'>
<parameter type-id='4c81de99' name='zhp'/>
<parameter type-id='5ce45b60' name='nvroot'/>
<parameter type-id='c19b74c3' name='ashift_check'/>
<return type-id='95e97e5e'/>
</function-decl>
<function-decl name='zpool_export' mangled-name='zpool_export' visibility='default' binding='global' size-in-bits='64' elf-symbol-id='zpool_export'>
@@ -6693,7 +6745,7 @@
<enumerator name='LZC_SEND_FLAG_RAW' value='8'/>
<enumerator name='LZC_SEND_FLAG_SAVED' value='16'/>
</enum-decl>
<class-decl name='ddt_key' size-in-bits='320' is-struct='yes' visibility='default' id='e0a4a1cb'>
<class-decl name='ddt_key_t' size-in-bits='320' is-struct='yes' naming-typedef-id='67f6d2cf' visibility='default' id='5fae1718'>
<data-member access='public' layout-offset-in-bits='0'>
<var-decl name='ddk_cksum' type-id='39730d0b' visibility='default'/>
</data-member>
@@ -6701,7 +6753,7 @@
<var-decl name='ddk_prop' type-id='9c313c2d' visibility='default'/>
</data-member>
</class-decl>
<typedef-decl name='ddt_key_t' type-id='e0a4a1cb' id='67f6d2cf'/>
<typedef-decl name='ddt_key_t' type-id='5fae1718' id='67f6d2cf'/>
<enum-decl name='dmu_object_type' id='04b3b0b9'>
<underlying-type type-id='9cac1fee'/>
<enumerator name='DMU_OT_NONE' value='0'/>
+9 -4
View File
@@ -22,7 +22,7 @@
/*
* Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>
* Copyright (c) 2018 Datto Inc.
* Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
@@ -1724,7 +1724,7 @@ zpool_discard_checkpoint(zpool_handle_t *zhp)
* necessary verification to ensure that the vdev specification is well-formed.
*/
int
zpool_add(zpool_handle_t *zhp, nvlist_t *nvroot)
zpool_add(zpool_handle_t *zhp, nvlist_t *nvroot, boolean_t check_ashift)
{
zfs_cmd_t zc = {"\0"};
int ret;
@@ -1756,6 +1756,7 @@ zpool_add(zpool_handle_t *zhp, nvlist_t *nvroot)
zcmd_write_conf_nvlist(hdl, &zc, nvroot);
(void) strlcpy(zc.zc_name, zhp->zpool_name, sizeof (zc.zc_name));
zc.zc_flags = check_ashift;
if (zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, &zc) != 0) {
switch (errno) {
@@ -1899,7 +1900,8 @@ zpool_rewind_exclaim(libzfs_handle_t *hdl, const char *name, boolean_t dryrun,
(void) nvlist_lookup_int64(nv, ZPOOL_CONFIG_REWIND_TIME, &loss);
if (localtime_r((time_t *)&rewindto, &t) != NULL &&
strftime(timestr, 128, "%c", &t) != 0) {
ctime_r((time_t *)&rewindto, timestr) != NULL) {
timestr[24] = 0;
if (dryrun) {
(void) printf(dgettext(TEXT_DOMAIN,
"Would be able to return %s "
@@ -1961,7 +1963,8 @@ zpool_explain_recover(libzfs_handle_t *hdl, const char *name, int reason,
"Recovery is possible, but will result in some data loss.\n"));
if (localtime_r((time_t *)&rewindto, &t) != NULL &&
strftime(timestr, 128, "%c", &t) != 0) {
ctime_r((time_t *)&rewindto, timestr) != NULL) {
timestr[24] = 0;
(void) printf(dgettext(TEXT_DOMAIN,
"\tReturning the pool to its state as of %s\n"
"\tshould correct the problem. "),
@@ -5224,6 +5227,8 @@ zpool_get_vdev_prop_value(nvlist_t *nvprop, vdev_prop_t prop, char *prop_name,
case VDEV_PROP_CHECKSUM_T:
case VDEV_PROP_IO_N:
case VDEV_PROP_IO_T:
case VDEV_PROP_SLOW_IO_N:
case VDEV_PROP_SLOW_IO_T:
if (intval == UINT64_MAX) {
(void) strlcpy(buf, "-", len);
} else {
+1
View File
@@ -1053,6 +1053,7 @@ send_progress_thread(void *arg)
}
}
pthread_cleanup_pop(B_TRUE);
return (NULL);
}
static boolean_t
+10 -2
View File
@@ -22,7 +22,7 @@
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright 2020 Joyent, Inc. All rights reserved.
* Copyright (c) 2011, 2020 by Delphix. All rights reserved.
* Copyright (c) 2011, 2024 by Delphix. All rights reserved.
* Copyright 2016 Igor Kozhukhov <ikozhukhov@gmail.com>
* Copyright (c) 2017 Datto Inc.
* Copyright (c) 2020 The FreeBSD Foundation
@@ -317,6 +317,9 @@ libzfs_error_description(libzfs_handle_t *hdl)
case EZFS_RESUME_EXISTS:
return (dgettext(TEXT_DOMAIN, "Resuming recv on existing "
"dataset without force"));
case EZFS_ASHIFT_MISMATCH:
return (dgettext(TEXT_DOMAIN, "adding devices with "
"different physical sector sizes is not allowed"));
case EZFS_UNKNOWN:
return (dgettext(TEXT_DOMAIN, "unknown error"));
default:
@@ -763,6 +766,9 @@ zpool_standard_error_fmt(libzfs_handle_t *hdl, int error, const char *fmt, ...)
case ZFS_ERR_IOC_ARG_BADTYPE:
zfs_verror(hdl, EZFS_IOC_NOTSUPPORTED, fmt, ap);
break;
case ZFS_ERR_ASHIFT_MISMATCH:
zfs_verror(hdl, EZFS_ASHIFT_MISMATCH, fmt, ap);
break;
default:
zfs_error_aux(hdl, "%s", strerror(error));
zfs_verror(hdl, EZFS_UNKNOWN, fmt, ap);
@@ -1699,7 +1705,9 @@ zprop_parse_value(libzfs_handle_t *hdl, nvpair_t *elem, int prop,
(prop == VDEV_PROP_CHECKSUM_N ||
prop == VDEV_PROP_CHECKSUM_T ||
prop == VDEV_PROP_IO_N ||
prop == VDEV_PROP_IO_T)) {
prop == VDEV_PROP_IO_T ||
prop == VDEV_PROP_SLOW_IO_N ||
prop == VDEV_PROP_SLOW_IO_T)) {
*ivalp = UINT64_MAX;
}
+10
View File
@@ -273,6 +273,16 @@ zpool_label_disk(libzfs_handle_t *hdl, zpool_handle_t *zhp, const char *name)
vtoc->efi_parts[0].p_start = start_block;
vtoc->efi_parts[0].p_size = slice_size;
if (vtoc->efi_parts[0].p_size * vtoc->efi_lbasize < SPA_MINDEVSIZE) {
(void) close(fd);
efi_free(vtoc);
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN, "cannot "
"label '%s': partition would be less than the minimum "
"device size (64M)"), path);
return (zfs_error(hdl, EZFS_LABELFAILED, errbuf));
}
/*
* Why we use V_USR: V_BACKUP confuses users, and is considered
* disposable by some EFI utilities (since EFI doesn't have a backup
+2 -2
View File
@@ -62,7 +62,6 @@ dist_man_MANS = \
%D%/man8/zfs-userspace.8 \
%D%/man8/zfs-wait.8 \
%D%/man8/zfs_ids_to_path.8 \
%D%/man8/zfs_prepare_disk.8 \
%D%/man8/zgenhostid.8 \
%D%/man8/zinject.8 \
%D%/man8/zpool.8 \
@@ -115,7 +114,8 @@ endif
nodist_man_MANS = \
%D%/man8/zed.8 \
%D%/man8/zfs-mount-generator.8
%D%/man8/zfs-mount-generator.8 \
%D%/man8/zfs_prepare_disk.8
dist_noinst_DATA += $(dist_noinst_man_MANS) $(dist_man_MANS)
+4 -14
View File
@@ -186,18 +186,8 @@ reading it could cause a lock-up if the list grow too large
without limiting the output.
"(truncated)" will be shown if the list is larger than the limit.
.
.It Sy spl_taskq_thread_timeout_ms Ns = Ns Sy 10000 Pq uint
(Linux-only)
How long a taskq has to have had no work before we tear it down.
Previously, we would tear down a dynamic taskq worker as soon
as we noticed it had no work, but it was observed that this led
to a lot of churn in tearing down things we then immediately
spawned anew.
In practice, it seems any nonzero value will remove the vast
majority of this churn, while the nontrivially larger value
was chosen to help filter out the little remaining churn on
a mostly idle system.
Setting this value to
.Sy 0
will revert to the previous behavior.
.It Sy spl_taskq_thread_timeout_ms Ns = Ns Sy 5000 Pq uint
Minimum idle threads exit interval for dynamic taskqs.
Smaller values allow idle threads exit more often and potentially be
respawned again on demand, causing more churn.
.El
+72 -11
View File
@@ -2,6 +2,7 @@
.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
.\" Copyright (c) 2019 Datto Inc.
.\" Copyright (c) 2023, 2024 Klara, Inc.
.\" The contents of this file are subject to the terms of the Common Development
.\" and Distribution License (the "License"). You may not use this file except
.\" in compliance with the License. You can obtain a copy of the license at
@@ -15,7 +16,7 @@
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\"
.Dd July 21, 2023
.Dd January 9, 2024
.Dt ZFS 4
.Os
.
@@ -244,12 +245,25 @@ For blocks that could be forced to be a gang block (due to
.Sy metaslab_force_ganging ) ,
force this many of them to be gang blocks.
.
.It Sy zfs_ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
.It Sy brt_zap_prefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
Controls prefetching BRT records for blocks which are going to be cloned.
.
.It Sy brt_zap_default_bs Ns = Ns Sy 12 Po 4 KiB Pc Pq int
Default BRT ZAP data block size as a power of 2. Note that changing this after
creating a BRT on the pool will not affect existing BRTs, only newly created
ones.
.
.It Sy brt_zap_default_ibs Ns = Ns Sy 12 Po 4 KiB Pc Pq int
Default BRT ZAP indirect block size as a power of 2. Note that changing this
after creating a BRT on the pool will not affect existing BRTs, only newly
created ones.
.
.It Sy ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
Default DDT ZAP data block size as a power of 2. Note that changing this after
creating a DDT on the pool will not affect existing DDTs, only newly created
ones.
.
.It Sy zfs_ddt_zap_default_ibs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
.It Sy ddt_zap_default_ibs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
Default DDT ZAP indirect block size as a power of 2. Note that changing this
after creating a DDT on the pool will not affect existing DDTs, only newly
created ones.
@@ -530,6 +544,10 @@ However, this is limited by
Maximum micro ZAP size.
A micro ZAP is upgraded to a fat ZAP, once it grows beyond the specified size.
.
.It Sy zfetch_hole_shift Ns = Ns Sy 2 Pq uint
Log2 fraction of holes in speculative prefetch stream allowed for it to
proceed.
.
.It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
Min bytes to prefetch per stream.
Prefetch distance starts from the demand access size and quickly grows to
@@ -544,6 +562,13 @@ Max bytes to prefetch per stream.
.It Sy zfetch_max_idistance Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq uint
Max bytes to prefetch indirects for per stream.
.
.It Sy zfetch_max_reorder Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
Requests within this byte distance from the current prefetch stream position
are considered parts of the stream, reordered due to parallel processing.
Such requests do not advance the stream position immediately unless
.Sy zfetch_hole_shift
fill threshold is reached, but saved to fill holes in the stream later.
.
.It Sy zfetch_max_streams Ns = Ns Sy 8 Pq uint
Max number of streams per zfetch (prefetch streams per file).
.
@@ -798,7 +823,7 @@ Note that this should not be set below the ZED thresholds
(currently 10 checksums over 10 seconds)
or else the daemon may not trigger any action.
.
.It Sy zfs_commit_timeout_pct Ns = Ns Sy 5 Ns % Pq uint
.It Sy zfs_commit_timeout_pct Ns = Ns Sy 10 Ns % Pq uint
This controls the amount of time that a ZIL block (lwb) will remain "open"
when it isn't "full", and it has a thread waiting for it to be committed to
stable storage.
@@ -1345,6 +1370,42 @@ _
4 Driver No driver retries on driver errors.
.TE
.
.It Sy zfs_vdev_disk_max_segs Ns = Ns Sy 0 Pq uint
Maximum number of segments to add to a BIO (min 4).
If this is higher than the maximum allowed by the device queue or the kernel
itself, it will be clamped.
Setting it to zero will cause the kernel's ideal size to be used.
This parameter only applies on Linux.
This parameter is ignored if
.Sy zfs_vdev_disk_classic Ns = Ns Sy 1 .
.
.It Sy zfs_vdev_disk_classic Ns = Ns 0 Ns | Ns Sy 1 Pq uint
Controls the method used to submit IO to the Linux block layer
(default
.Sy 1 "classic" Ns
)
.Pp
If set to 1, the "classic" method is used.
This is the method that has been in use since the earliest versions of
ZFS-on-Linux.
It has known issues with highly fragmented IO requests and is less efficient on
many workloads, but it well known and well understood.
.Pp
If set to 0, the "new" method is used.
This method is available since 2.2.4 and should resolve all known issues and be
far more efficient, but has not had as much testing.
In the 2.2.x series, this parameter defaults to 1, to use the "classic" method.
.Pp
It is not recommended that you change it except on advice from the OpenZFS
developers.
If you do change it, please also open a bug report describing why you did so,
including the workload involved and any error messages.
.Pp
This parameter and the "classic" submission method will be removed in a future
release of OpenZFS once we have total confidence in the new method.
.Pp
This parameter only applies on Linux, and can only be set at module load time.
.
.It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int
Time before expiring
.Pa .zfs/snapshot .
@@ -2169,13 +2230,6 @@ This sets the maximum number of write bytes logged via WR_COPIED.
It tunes a tradeoff between additional memory copy and possibly worse log
space efficiency vs additional range lock/unlock.
.
.It Sy zil_min_commit_timeout Ns = Ns Sy 5000 Pq u64
This sets the minimum delay in nanoseconds ZIL care to delay block commit,
waiting for more records.
If ZIL writes are too fast, kernel may not be able sleep for so short interval,
increasing log latency above allowed by
.Sy zfs_commit_timeout_pct .
.
.It Sy zil_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int
Disable the cache flush commands that are normally sent to disk by
the ZIL after an LWB write has completed.
@@ -2333,6 +2387,13 @@ The number of requests which can be handled concurrently is controlled by
is ignored when running on a kernel that supports block multiqueue
.Pq Li blk-mq .
.
.It Sy zvol_num_taskqs Ns = Ns Sy 0 Pq uint
Number of zvol taskqs.
If
.Sy 0
(the default) then scaling is done internally to prefer 6 threads per taskq.
This only applies on Linux.
.
.It Sy zvol_threads Ns = Ns Sy 0 Pq uint
The number of system wide threads to use for processing zvol block IOs.
If
+9 -3
View File
@@ -44,7 +44,7 @@ section, below.
Every vdev has a set of properties that export statistics about the vdev
as well as control various behaviors.
Properties are not inherited from top-level vdevs, with the exception of
checksum_n, checksum_t, io_n, and io_t.
checksum_n, checksum_t, io_n, io_t, slow_io_n, and slow_io_t.
.Pp
The values of numeric properties can be specified using human-readable suffixes
.Po for example,
@@ -117,7 +117,7 @@ If this device is currently being removed from the pool
.Pp
The following native properties can be used to change the behavior of a vdev.
.Bl -tag -width "allocating"
.It Sy checksum_n , checksum_t , io_n , io_t
.It Sy checksum_n , checksum_t , io_n , io_t , slow_io_n , slow_io_t
Tune the fault management daemon by specifying checksum/io thresholds of <N>
errors in <T> seconds, respectively.
These properties can be set on leaf and top-level vdevs.
@@ -127,7 +127,13 @@ If the property is only set on the top-level vdev, this value will be used.
The value of these properties do not persist across vdev replacement.
For this reason, it is advisable to set the property on the top-level vdev -
not on the leaf vdev itself.
The default values are 10 errors in 600 seconds.
The default values for
.Sy OpenZFS on Linux
are 10 errors in 600 seconds.
For
.Sy OpenZFS on FreeBSD
defaults see
.Xr zfsd 8 .
.It Sy comment
A text comment up to 8192 characters long
.It Sy bootsize
+17
View File
@@ -1613,6 +1613,23 @@ If this property is set to
then only metadata is cached.
The default value is
.Sy all .
.It Sy prefetch Ns = Ns Sy all Ns | Ns Sy none Ns | Ns Sy metadata
Controls what speculative prefetch does.
If this property is set to
.Sy all ,
then both user data and metadata are prefetched.
If this property is set to
.Sy none ,
then neither user data nor metadata are prefetched.
If this property is set to
.Sy metadata ,
then only metadata are prefetched.
The default value is
.Sy all .
.Pp
Please note that the module parameter zfs_disable_prefetch=1 can
be used to totally disable speculative prefetch, bypassing anything
this property does.
.It Sy setuid Ns = Ns Sy on Ns | Ns Sy off
Controls whether the setuid bit is respected for the file system.
The default value is
+2 -2
View File
@@ -260,8 +260,8 @@ sufficient replicas exist to continue functioning.
The underlying conditions are as follows:
.Bl -bullet -compact
.It
The number of checksum errors exceeds acceptable levels and the device is
degraded as an indication that something may be wrong.
The number of checksum errors or slow I/Os exceeds acceptable levels and the
device is degraded as an indication that something may be wrong.
ZFS continues to use the device as necessary.
.It
The number of I/O errors exceeds acceptable levels.
+4 -2
View File
@@ -43,7 +43,7 @@
.Cm mount
.Op Fl Oflv
.Op Fl o Ar options
.Fl a Ns | Ns Ar filesystem
.Fl a Ns | Ns Fl R Ar filesystem Ns | Ns Ar filesystem
.Nm zfs
.Cm unmount
.Op Fl fu
@@ -61,7 +61,7 @@ Displays all ZFS file systems currently mounted.
.Cm mount
.Op Fl Oflv
.Op Fl o Ar options
.Fl a Ns | Ns Ar filesystem
.Fl a Ns | Ns Fl R Ar filesystem Ns | Ns Ar filesystem
.Xc
Mount ZFS filesystem on a path described by its
.Sy mountpoint
@@ -83,6 +83,8 @@ for more information.
.It Fl a
Mount all available ZFS file systems.
Invoked automatically as part of the boot process if configured.
.It Fl R
Mount the specified filesystems along with all their children.
.It Ar filesystem
Mount the specified filesystem.
.It Fl o Ar options
+1
View File
@@ -69,6 +69,7 @@ Force a vdev into the DEGRADED or FAULTED state.
.Nm zinject
.Fl d Ar vdev
.Fl D Ar latency : Ns Ar lanes
.Op Fl T Ar read|write
.Ar pool
.Xc
Add an artificial delay to I/O requests on a particular
+16 -2
View File
@@ -24,8 +24,9 @@
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
.\" Copyright 2017 Nexenta Systems, Inc.
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
.\" Copyright (c) 2024 by Delphix. All Rights Reserved.
.\"
.Dd March 16, 2022
.Dd March 8, 2024
.Dt ZPOOL-ADD 8
.Os
.
@@ -36,6 +37,7 @@
.Nm zpool
.Cm add
.Op Fl fgLnP
.Op Fl -allow-in-use -allow-replication-mismatch -allow-ashift-mismatch
.Oo Fl o Ar property Ns = Ns Ar value Oc
.Ar pool vdev Ns …
.
@@ -56,7 +58,8 @@ subcommand.
.It Fl f
Forces use of
.Ar vdev Ns s ,
even if they appear in use or specify a conflicting replication level.
even if they appear in use, have conflicting ashift values, or specify
a conflicting replication level.
Not all devices can be overridden in this manner.
.It Fl g
Display
@@ -91,6 +94,17 @@ See the
manual page for a list of valid properties that can be set.
The only property supported at the moment is
.Sy ashift .
.It Fl -allow-ashift-mismatch
Disable the ashift validation which allows mismatched ashift values in the
pool.
Adding top-level
.Ar vdev Ns s
with different sector sizes will prohibit future device removal operations, see
.Xr zpool-remove 8 .
.It Fl -allow-in-use
Allow vdevs to be added even if they might be in use in another pool.
.It Fl -allow-replication-mismatch
Allow vdevs with conflicting replication levels to be added to the pool.
.El
.
.Sh EXAMPLES
+4 -3
View File
@@ -50,9 +50,10 @@ If the pool was suspended it will be brought back online provided the
devices can be accessed.
Pools with
.Sy multihost
enabled which have been suspended cannot be resumed.
While the pool was suspended, it may have been imported on
another host, and resuming I/O could result in pool damage.
enabled which have been suspended cannot be resumed when there is evidence
that the pool was imported by another host.
The same checks performed during an import will be applied before the clear
proceeds.
.Bl -tag -width Ds
.It Fl -power
Power on the devices's slot in the storage enclosure and wait for the device
+9 -9
View File
@@ -36,7 +36,7 @@
.Sh SYNOPSIS
.Nm zpool
.Cm status
.Op Fl DeigLpPstvx
.Op Fl DegiLpPstvx
.Op Fl T Sy u Ns | Ns Sy d
.Op Fl c Op Ar SCRIPT1 Ns Oo , Ns Ar SCRIPT2 Oc Ns …
.Oo Ar pool Oc Ns …
@@ -69,14 +69,20 @@ See the
option of
.Nm zpool Cm iostat
for complete details.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk
and referenced
.Pq logically referenced in the pool
block counts and sizes by reference count.
.It Fl e
Only show unhealthy vdevs (not-ONLINE or with errors).
.It Fl i
Display vdev initialization status.
.It Fl g
Display vdev GUIDs instead of the normal device names
These GUIDs can be used in place of device names for the zpool
detach/offline/remove/replace commands.
.It Fl i
Display vdev initialization status.
.It Fl L
Display real paths for vdevs resolving all symbolic links.
This can be used to look up the current block device name regardless of the
@@ -90,12 +96,6 @@ the path.
This can be used in conjunction with the
.Fl L
flag.
.It Fl D
Display a histogram of deduplication statistics, showing the allocated
.Pq physically present on disk
and referenced
.Pq logically referenced in the pool
block counts and sizes by reference count.
.It Fl s
Display the number of leaf vdev slow I/O operations.
This is the number of I/O operations that didn't complete in
@@ -32,6 +32,14 @@
*/
#if defined(__aarch64__)
/* make gcc <= 9 happy */
#if LD_VERSION >= 233010000
#define CFI_NEGATE_RA_STATE .cfi_negate_ra_state
#else
#define CFI_NEGATE_RA_STATE
#endif
.text
.section .note.gnu.property,"a",@note
.p2align 3
@@ -51,7 +59,7 @@
zfs_blake3_compress_in_place_sse2:
.cfi_startproc
hint #25
.cfi_negate_ra_state
CFI_NEGATE_RA_STATE
sub sp, sp, #96
stp x29, x30, [sp, #64]
add x29, sp, #64
@@ -555,7 +563,7 @@ compress_pre:
zfs_blake3_compress_xof_sse2:
.cfi_startproc
hint #25
.cfi_negate_ra_state
CFI_NEGATE_RA_STATE
sub sp, sp, #96
stp x29, x30, [sp, #64]
add x29, sp, #64
@@ -608,7 +616,7 @@ zfs_blake3_compress_xof_sse2:
zfs_blake3_hash_many_sse2:
.cfi_startproc
hint #25
.cfi_negate_ra_state
CFI_NEGATE_RA_STATE
stp d15, d14, [sp, #-160]!
stp d13, d12, [sp, #16]
stp d11, d10, [sp, #32]
@@ -32,6 +32,14 @@
*/
#if defined(__aarch64__)
/* make gcc <= 9 happy */
#if LD_VERSION >= 233010000
#define CFI_NEGATE_RA_STATE .cfi_negate_ra_state
#else
#define CFI_NEGATE_RA_STATE
#endif
.text
.section .note.gnu.property,"a",@note
.p2align 3
@@ -51,7 +59,7 @@
zfs_blake3_compress_in_place_sse41:
.cfi_startproc
hint #25
.cfi_negate_ra_state
CFI_NEGATE_RA_STATE
sub sp, sp, #96
stp x29, x30, [sp, #64]
add x29, sp, #64
@@ -565,7 +573,7 @@ compress_pre:
zfs_blake3_compress_xof_sse41:
.cfi_startproc
hint #25
.cfi_negate_ra_state
CFI_NEGATE_RA_STATE
sub sp, sp, #96
stp x29, x30, [sp, #64]
add x29, sp, #64
@@ -21,6 +21,16 @@
#if defined(__aarch64__)
.section .note.gnu.property,"a",@note
.p2align 3
.word 4
.word 16
.word 5
.asciz "GNU"
.word 3221225472
.word 4
.word 3
.word 0
.text
.align 6
@@ -21,6 +21,16 @@
#if defined(__aarch64__)
.section .note.gnu.property,"a",@note
.p2align 3
.word 4
.word 16
.word 5
.asciz "GNU"
.word 3221225472
.word 4
.word 3
.word 0
.text
.align 6
+1
View File
@@ -41,6 +41,7 @@
#include <sys/types.h>
#include <sys/param.h>
#include <sys/string.h>
#include <rpc/types.h>
#include <rpc/xdr.h>
#include <sys/mod.h>
+1 -3
View File
@@ -417,10 +417,8 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
{
ASSERT(!abd_is_gang(abd));
abd_verify(abd);
memset(aiter, 0, sizeof (struct abd_iter));
aiter->iter_abd = abd;
aiter->iter_pos = 0;
aiter->iter_mapaddr = NULL;
aiter->iter_mapsize = 0;
}
/*
+1 -3
View File
@@ -1869,10 +1869,8 @@ zfs_readdir(vnode_t *vp, zfs_uio_t *uio, cred_t *cr, int *eofp,
ASSERT3S(outcount, <=, bufsize);
/* Prefetch znode */
if (prefetch)
dmu_prefetch(os, objnum, 0, 0, 0,
ZIO_PRIORITY_SYNC_READ);
dmu_prefetch_dnode(os, objnum, ZIO_PRIORITY_SYNC_READ);
/*
* Move to the next entry, fill in the previous offset.
+1 -1
View File
@@ -1273,7 +1273,7 @@ zvol_os_rename_minor(zvol_state_t *zv, const char *newname)
ASSERT(MUTEX_HELD(&zv->zv_state_lock));
/* Move to a new hashtable entry. */
zv->zv_hash = zvol_name_hash(zv->zv_name);
zv->zv_hash = zvol_name_hash(newname);
hlist_del(&zv->zv_hlink);
hlist_add_head(&zv->zv_hlink, ZVOL_HT_HEAD(zv->zv_hash));
+29 -56
View File
@@ -36,12 +36,12 @@ static int spl_taskq_thread_bind = 0;
module_param(spl_taskq_thread_bind, int, 0644);
MODULE_PARM_DESC(spl_taskq_thread_bind, "Bind taskq thread to CPU by default");
static uint_t spl_taskq_thread_timeout_ms = 10000;
static uint_t spl_taskq_thread_timeout_ms = 5000;
/* BEGIN CSTYLED */
module_param(spl_taskq_thread_timeout_ms, uint, 0644);
/* END CSTYLED */
MODULE_PARM_DESC(spl_taskq_thread_timeout_ms,
"Time to require a dynamic thread be idle before it gets cleaned up");
"Minimum idle threads exit interval for dynamic taskqs");
static int spl_taskq_thread_dynamic = 1;
module_param(spl_taskq_thread_dynamic, int, 0444);
@@ -594,8 +594,7 @@ taskq_dispatch(taskq_t *tq, task_func_t func, void *arg, uint_t flags)
ASSERT(tq->tq_nactive <= tq->tq_nthreads);
if ((flags & TQ_NOQUEUE) && (tq->tq_nactive == tq->tq_nthreads)) {
/* Dynamic taskq may be able to spawn another thread */
if (!(tq->tq_flags & TASKQ_DYNAMIC) ||
taskq_thread_spawn(tq) == 0)
if (taskq_thread_spawn(tq) == 0)
goto out;
}
@@ -629,11 +628,11 @@ taskq_dispatch(taskq_t *tq, task_func_t func, void *arg, uint_t flags)
spin_unlock(&t->tqent_lock);
wake_up(&tq->tq_work_waitq);
out:
/* Spawn additional taskq threads if required. */
if (!(flags & TQ_NOQUEUE) && tq->tq_nactive == tq->tq_nthreads)
(void) taskq_thread_spawn(tq);
out:
spin_unlock_irqrestore(&tq->tq_lock, irqflags);
return (rc);
}
@@ -676,10 +675,11 @@ taskq_dispatch_delay(taskq_t *tq, task_func_t func, void *arg,
ASSERT(!(t->tqent_flags & TQENT_FLAG_PREALLOC));
spin_unlock(&t->tqent_lock);
out:
/* Spawn additional taskq threads if required. */
if (tq->tq_nactive == tq->tq_nthreads)
(void) taskq_thread_spawn(tq);
out:
spin_unlock_irqrestore(&tq->tq_lock, irqflags);
return (rc);
}
@@ -704,9 +704,8 @@ taskq_dispatch_ent(taskq_t *tq, task_func_t func, void *arg, uint_t flags,
if ((flags & TQ_NOQUEUE) && (tq->tq_nactive == tq->tq_nthreads)) {
/* Dynamic taskq may be able to spawn another thread */
if (!(tq->tq_flags & TASKQ_DYNAMIC) ||
taskq_thread_spawn(tq) == 0)
goto out2;
if (taskq_thread_spawn(tq) == 0)
goto out;
flags |= TQ_FRONT;
}
@@ -742,11 +741,11 @@ taskq_dispatch_ent(taskq_t *tq, task_func_t func, void *arg, uint_t flags,
spin_unlock(&t->tqent_lock);
wake_up(&tq->tq_work_waitq);
out:
/* Spawn additional taskq threads if required. */
if (tq->tq_nactive == tq->tq_nthreads)
(void) taskq_thread_spawn(tq);
out2:
out:
spin_unlock_irqrestore(&tq->tq_lock, irqflags);
}
EXPORT_SYMBOL(taskq_dispatch_ent);
@@ -825,6 +824,7 @@ taskq_thread_spawn(taskq_t *tq)
if (!(tq->tq_flags & TASKQ_DYNAMIC))
return (0);
tq->lastspawnstop = jiffies;
if ((tq->tq_nthreads + tq->tq_nspawn < tq->tq_maxthreads) &&
(tq->tq_flags & TASKQ_ACTIVE)) {
spawning = (++tq->tq_nspawn);
@@ -836,9 +836,9 @@ taskq_thread_spawn(taskq_t *tq)
}
/*
* Threads in a dynamic taskq should only exit once it has been completely
* drained and no other threads are actively servicing tasks. This prevents
* threads from being created and destroyed more than is required.
* Threads in a dynamic taskq may exit once there is no more work to do.
* To prevent threads from being created and destroyed too often limit
* the exit rate to one per spl_taskq_thread_timeout_ms.
*
* The first thread is the thread list is treated as the primary thread.
* There is nothing special about the primary thread but in order to avoid
@@ -847,44 +847,22 @@ taskq_thread_spawn(taskq_t *tq)
static int
taskq_thread_should_stop(taskq_t *tq, taskq_thread_t *tqt)
{
if (!(tq->tq_flags & TASKQ_DYNAMIC))
ASSERT(!taskq_next_ent(tq));
if (!(tq->tq_flags & TASKQ_DYNAMIC) || !spl_taskq_thread_dynamic)
return (0);
if (!(tq->tq_flags & TASKQ_ACTIVE))
return (1);
if (list_first_entry(&(tq->tq_thread_list), taskq_thread_t,
tqt_thread_list) == tqt)
return (0);
int no_work =
((tq->tq_nspawn == 0) && /* No threads are being spawned */
(tq->tq_nactive == 0) && /* No threads are handling tasks */
(tq->tq_nthreads > 1) && /* More than 1 thread is running */
(!taskq_next_ent(tq)) && /* There are no pending tasks */
(spl_taskq_thread_dynamic)); /* Dynamic taskqs are allowed */
/*
* If we would have said stop before, let's instead wait a bit, maybe
* we'll see more work come our way soon...
*/
if (no_work) {
/* if it's 0, we want the old behavior. */
/* if the taskq is being torn down, we also want to go away. */
if (spl_taskq_thread_timeout_ms == 0 ||
!(tq->tq_flags & TASKQ_ACTIVE))
return (1);
unsigned long lasttime = tq->lastshouldstop;
if (lasttime > 0) {
if (time_after(jiffies, lasttime +
msecs_to_jiffies(spl_taskq_thread_timeout_ms)))
return (1);
else
return (0);
} else {
tq->lastshouldstop = jiffies;
}
} else {
tq->lastshouldstop = 0;
}
return (0);
ASSERT3U(tq->tq_nthreads, >, 1);
if (tq->tq_nspawn != 0)
return (0);
if (time_before(jiffies, tq->lastspawnstop +
msecs_to_jiffies(spl_taskq_thread_timeout_ms)))
return (0);
tq->lastspawnstop = jiffies;
return (1);
}
static int
@@ -935,10 +913,8 @@ taskq_thread(void *args)
if (list_empty(&tq->tq_pend_list) &&
list_empty(&tq->tq_prio_list)) {
if (taskq_thread_should_stop(tq, tqt)) {
wake_up_all(&tq->tq_wait_waitq);
if (taskq_thread_should_stop(tq, tqt))
break;
}
add_wait_queue_exclusive(&tq->tq_work_waitq, &wait);
spin_unlock_irqrestore(&tq->tq_lock, flags);
@@ -1013,9 +989,6 @@ taskq_thread(void *args)
tqt->tqt_id = TASKQID_INVALID;
tqt->tqt_flags = 0;
wake_up_all(&tq->tq_wait_waitq);
} else {
if (taskq_thread_should_stop(tq, tqt))
break;
}
set_current_state(TASK_INTERRUPTIBLE);
@@ -1122,7 +1095,7 @@ taskq_create(const char *name, int threads_arg, pri_t pri,
tq->tq_flags = (flags | TASKQ_ACTIVE);
tq->tq_next_id = TASKQID_INITIAL;
tq->tq_lowest_id = TASKQID_INITIAL;
tq->lastshouldstop = 0;
tq->lastspawnstop = jiffies;
INIT_LIST_HEAD(&tq->tq_free_list);
INIT_LIST_HEAD(&tq->tq_pend_list);
INIT_LIST_HEAD(&tq->tq_prio_list);
+1
View File
@@ -25,6 +25,7 @@
#include <sys/debug.h>
#include <sys/types.h>
#include <sys/sysmacros.h>
#include <rpc/types.h>
#include <rpc/xdr.h>
/*
+115 -8
View File
@@ -21,6 +21,7 @@
/*
* Copyright (c) 2014 by Chunwei Chen. All rights reserved.
* Copyright (c) 2019 by Delphix. All rights reserved.
* Copyright (c) 2023, 2024, Klara Inc.
*/
/*
@@ -59,7 +60,9 @@
#include <sys/zfs_znode.h>
#ifdef _KERNEL
#include <linux/kmap_compat.h>
#include <linux/mm_compat.h>
#include <linux/scatterlist.h>
#include <linux/version.h>
#endif
#ifdef _KERNEL
@@ -895,14 +898,9 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
{
ASSERT(!abd_is_gang(abd));
abd_verify(abd);
memset(aiter, 0, sizeof (struct abd_iter));
aiter->iter_abd = abd;
aiter->iter_mapaddr = NULL;
aiter->iter_mapsize = 0;
aiter->iter_pos = 0;
if (abd_is_linear(abd)) {
aiter->iter_offset = 0;
aiter->iter_sg = NULL;
} else {
if (!abd_is_linear(abd)) {
aiter->iter_offset = ABD_SCATTER(abd).abd_offset;
aiter->iter_sg = ABD_SCATTER(abd).abd_sgl;
}
@@ -915,6 +913,7 @@ abd_iter_init(struct abd_iter *aiter, abd_t *abd)
boolean_t
abd_iter_at_end(struct abd_iter *aiter)
{
ASSERT3U(aiter->iter_pos, <=, aiter->iter_abd->abd_size);
return (aiter->iter_pos == aiter->iter_abd->abd_size);
}
@@ -926,8 +925,15 @@ abd_iter_at_end(struct abd_iter *aiter)
void
abd_iter_advance(struct abd_iter *aiter, size_t amount)
{
/*
* Ensure that last chunk is not in use. abd_iterate_*() must clear
* this state (directly or abd_iter_unmap()) before advancing.
*/
ASSERT3P(aiter->iter_mapaddr, ==, NULL);
ASSERT0(aiter->iter_mapsize);
ASSERT3P(aiter->iter_page, ==, NULL);
ASSERT0(aiter->iter_page_doff);
ASSERT0(aiter->iter_page_dsize);
/* There's nothing left to advance to, so do nothing */
if (abd_iter_at_end(aiter))
@@ -1009,6 +1015,106 @@ abd_cache_reap_now(void)
}
#if defined(_KERNEL)
/*
* Yield the next page struct and data offset and size within it, without
* mapping it into the address space.
*/
void
abd_iter_page(struct abd_iter *aiter)
{
if (abd_iter_at_end(aiter)) {
aiter->iter_page = NULL;
aiter->iter_page_doff = 0;
aiter->iter_page_dsize = 0;
return;
}
struct page *page;
size_t doff, dsize;
if (abd_is_linear(aiter->iter_abd)) {
ASSERT3U(aiter->iter_pos, ==, aiter->iter_offset);
/* memory address at iter_pos */
void *paddr = ABD_LINEAR_BUF(aiter->iter_abd) + aiter->iter_pos;
/* struct page for address */
page = is_vmalloc_addr(paddr) ?
vmalloc_to_page(paddr) : virt_to_page(paddr);
/* offset of address within the page */
doff = offset_in_page(paddr);
/* total data remaining in abd from this position */
dsize = aiter->iter_abd->abd_size - aiter->iter_offset;
} else {
ASSERT(!abd_is_gang(aiter->iter_abd));
/* current scatter page */
page = sg_page(aiter->iter_sg);
/* position within page */
doff = aiter->iter_offset;
/* remaining data in scatterlist */
dsize = MIN(aiter->iter_sg->length - aiter->iter_offset,
aiter->iter_abd->abd_size - aiter->iter_pos);
}
ASSERT(page);
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 5, 0)
if (PageTail(page)) {
/*
* This page is part of a "compound page", which is a group of
* pages that can be referenced from a single struct page *.
* Its organised as a "head" page, followed by a series of
* "tail" pages.
*
* In OpenZFS, compound pages are allocated using the
* __GFP_COMP flag, which we get from scatter ABDs and SPL
* vmalloc slabs (ie >16K allocations). So a great many of the
* IO buffers we get are going to be of this type.
*
* The tail pages are just regular PAGE_SIZE pages, and can be
* safely used as-is. However, the head page has length
* covering itself and all the tail pages. If this ABD chunk
* spans multiple pages, then we can use the head page and a
* >PAGE_SIZE length, which is far more efficient.
*
* To do this, we need to adjust the offset to be counted from
* the head page. struct page for compound pages are stored
* contiguously, so we can just adjust by a simple offset.
*
* Before kernel 4.5, compound page heads were refcounted
* separately, such that moving back to the head page would
* require us to take a reference to it and releasing it once
* we're completely finished with it. In practice, that means
* when our caller is done with the ABD, which we have no
* insight into from here. Rather than contort this API to
* track head page references on such ancient kernels, we just
* compile this block out and use the tail pages directly. This
* is slightly less efficient, but makes everything far
* simpler.
*/
struct page *head = compound_head(page);
doff += ((page - head) * PAGESIZE);
page = head;
}
#endif
/* final page and position within it */
aiter->iter_page = page;
aiter->iter_page_doff = doff;
/* amount of data in the chunk, up to the end of the page */
aiter->iter_page_dsize = MIN(dsize, page_size(page) - doff);
}
/*
* Note: ABD BIO functions only needed to support vdev_classic. See comments in
* vdev_disk.c.
*/
/*
* bio_nr_pages for ABD.
* @off is the offset in @abd
@@ -1163,4 +1269,5 @@ MODULE_PARM_DESC(zfs_abd_scatter_min_size,
module_param(zfs_abd_scatter_max_order, uint, 0644);
MODULE_PARM_DESC(zfs_abd_scatter_max_order,
"Maximum order allocation used for a scatter ABD.");
#endif
#endif /* _KERNEL */
File diff suppressed because it is too large Load Diff
+3 -9
View File
@@ -1610,11 +1610,8 @@ zfs_readdir(struct inode *ip, zpl_dir_context_t *ctx, cred_t *cr)
if (done)
break;
/* Prefetch znode */
if (prefetch) {
dmu_prefetch(os, objnum, 0, 0, 0,
ZIO_PRIORITY_SYNC_READ);
}
if (prefetch)
dmu_prefetch_dnode(os, objnum, ZIO_PRIORITY_SYNC_READ);
/*
* Move to the next entry, fill in the previous offset.
@@ -3792,11 +3789,8 @@ zfs_putpage(struct inode *ip, struct page *pp, struct writeback_control *wbc,
dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
zfs_sa_upgrade_txholds(tx, zp);
err = dmu_tx_assign(tx, TXG_NOWAIT);
err = dmu_tx_assign(tx, TXG_WAIT);
if (err != 0) {
if (err == ERESTART)
dmu_tx_wait(tx);
dmu_tx_abort(tx);
#ifdef HAVE_VFS_FILEMAP_DIRTY_FOLIO
filemap_dirty_folio(page_mapping(pp), page_folio(pp));
+4 -4
View File
@@ -720,23 +720,23 @@ zpl_putpage(struct page *pp, struct writeback_control *wbc, void *data)
{
boolean_t *for_sync = data;
fstrans_cookie_t cookie;
int ret;
ASSERT(PageLocked(pp));
ASSERT(!PageWriteback(pp));
cookie = spl_fstrans_mark();
(void) zfs_putpage(pp->mapping->host, pp, wbc, *for_sync);
ret = zfs_putpage(pp->mapping->host, pp, wbc, *for_sync);
spl_fstrans_unmark(cookie);
return (0);
return (ret);
}
#ifdef HAVE_WRITEPAGE_T_FOLIO
static int
zpl_putfolio(struct folio *pp, struct writeback_control *wbc, void *data)
{
(void) zpl_putpage(&pp->page, wbc, data);
return (0);
return (zpl_putpage(&pp->page, wbc, data));
}
#endif
+14 -2
View File
@@ -26,6 +26,9 @@
#include <linux/compat.h>
#endif
#include <linux/fs.h>
#ifdef HAVE_VFS_SPLICE_COPY_FILE_RANGE
#include <linux/splice.h>
#endif
#include <sys/file.h>
#include <sys/zfs_znode.h>
#include <sys/zfs_vnops.h>
@@ -102,7 +105,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
ret = zpl_clone_file_range_impl(src_file, src_off,
dst_file, dst_off, len);
#ifdef HAVE_VFS_GENERIC_COPY_FILE_RANGE
#if defined(HAVE_VFS_GENERIC_COPY_FILE_RANGE)
/*
* Since Linux 5.3 the filesystem driver is responsible for executing
* an appropriate fallback, and a generic fallback function is provided.
@@ -111,6 +114,15 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
ret == -EAGAIN)
ret = generic_copy_file_range(src_file, src_off, dst_file,
dst_off, len, flags);
#elif defined(HAVE_VFS_SPLICE_COPY_FILE_RANGE)
/*
* Since 6.8 the fallback function is called splice_copy_file_range
* and has a slightly different signature.
*/
if (ret == -EOPNOTSUPP || ret == -EINVAL || ret == -EXDEV ||
ret == -EAGAIN)
ret = splice_copy_file_range(src_file, src_off, dst_file,
dst_off, len);
#else
/*
* Before Linux 5.3 the filesystem has to return -EOPNOTSUPP to signal
@@ -118,7 +130,7 @@ zpl_copy_file_range(struct file *src_file, loff_t src_off,
*/
if (ret == -EINVAL || ret == -EXDEV || ret == -EAGAIN)
ret = -EOPNOTSUPP;
#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE */
#endif /* HAVE_VFS_GENERIC_COPY_FILE_RANGE || HAVE_VFS_SPLICE_COPY_FILE_RANGE */
return (ret);
}
+128 -12
View File
@@ -37,6 +37,7 @@
#include <sys/spa_impl.h>
#include <sys/zvol.h>
#include <sys/zvol_impl.h>
#include <cityhash.h>
#include <linux/blkdev_compat.h>
#include <linux/task_io_accounting_ops.h>
@@ -53,6 +54,12 @@ static unsigned int zvol_request_sync = 0;
static unsigned int zvol_prefetch_bytes = (128 * 1024);
static unsigned long zvol_max_discard_blocks = 16384;
/*
* Switch taskq at multiple of 512 MB offset. This can be set to a lower value
* to utilize more threads for small files but may affect prefetch hits.
*/
#define ZVOL_TASKQ_OFFSET_SHIFT 29
#ifndef HAVE_BLKDEV_GET_ERESTARTSYS
static unsigned int zvol_open_timeout_ms = 1000;
#endif
@@ -76,6 +83,8 @@ static boolean_t zvol_use_blk_mq = B_FALSE;
static unsigned int zvol_blk_mq_blocks_per_thread = 8;
#endif
static unsigned int zvol_num_taskqs = 0;
#ifndef BLKDEV_DEFAULT_RQ
/* BLKDEV_MAX_RQ was renamed to BLKDEV_DEFAULT_RQ in the 5.16 kernel */
#define BLKDEV_DEFAULT_RQ BLKDEV_MAX_RQ
@@ -114,7 +123,11 @@ struct zvol_state_os {
boolean_t use_blk_mq;
};
static taskq_t *zvol_taskq;
typedef struct zv_taskq {
uint_t tqs_cnt;
taskq_t **tqs_taskq;
} zv_taskq_t;
static zv_taskq_t zvol_taskqs;
static struct ida zvol_ida;
typedef struct zv_request_stack {
@@ -532,6 +545,22 @@ zvol_request_impl(zvol_state_t *zv, struct bio *bio, struct request *rq,
}
zv_request_task_t *task;
zv_taskq_t *ztqs = &zvol_taskqs;
uint_t blk_mq_hw_queue = 0;
uint_t tq_idx;
uint_t taskq_hash;
#ifdef HAVE_BLK_MQ
if (rq)
#ifdef HAVE_BLK_MQ_RQ_HCTX
blk_mq_hw_queue = rq->mq_hctx->queue_num;
#else
blk_mq_hw_queue =
rq->q->queue_hw_ctx[rq->q->mq_map[rq->cpu]]->queue_num;
#endif
#endif
taskq_hash = cityhash4((uintptr_t)zv, offset >> ZVOL_TASKQ_OFFSET_SHIFT,
blk_mq_hw_queue, 0);
tq_idx = taskq_hash % ztqs->tqs_cnt;
if (rw == WRITE) {
if (unlikely(zv->zv_flags & ZVOL_RDONLY)) {
@@ -601,7 +630,7 @@ zvol_request_impl(zvol_state_t *zv, struct bio *bio, struct request *rq,
zvol_discard(&zvr);
} else {
task = zv_request_task_create(zvr);
taskq_dispatch_ent(zvol_taskq,
taskq_dispatch_ent(ztqs->tqs_taskq[tq_idx],
zvol_discard_task, task, 0, &task->ent);
}
} else {
@@ -609,7 +638,7 @@ zvol_request_impl(zvol_state_t *zv, struct bio *bio, struct request *rq,
zvol_write(&zvr);
} else {
task = zv_request_task_create(zvr);
taskq_dispatch_ent(zvol_taskq,
taskq_dispatch_ent(ztqs->tqs_taskq[tq_idx],
zvol_write_task, task, 0, &task->ent);
}
}
@@ -631,7 +660,7 @@ zvol_request_impl(zvol_state_t *zv, struct bio *bio, struct request *rq,
zvol_read(&zvr);
} else {
task = zv_request_task_create(zvr);
taskq_dispatch_ent(zvol_taskq,
taskq_dispatch_ent(ztqs->tqs_taskq[tq_idx],
zvol_read_task, task, 0, &task->ent);
}
}
@@ -1053,6 +1082,16 @@ zvol_alloc_non_blk_mq(struct zvol_state_os *zso)
if (zso->zvo_disk == NULL)
return (1);
zso->zvo_disk->minors = ZVOL_MINORS;
zso->zvo_queue = zso->zvo_disk->queue;
#elif defined(HAVE_BLK_ALLOC_DISK_2ARG)
struct gendisk *disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
if (IS_ERR(disk)) {
zso->zvo_disk = NULL;
return (1);
}
zso->zvo_disk = disk;
zso->zvo_disk->minors = ZVOL_MINORS;
zso->zvo_queue = zso->zvo_disk->queue;
#else
@@ -1103,6 +1142,17 @@ zvol_alloc_blk_mq(zvol_state_t *zv)
}
zso->zvo_queue = zso->zvo_disk->queue;
zso->zvo_disk->minors = ZVOL_MINORS;
#elif defined(HAVE_BLK_ALLOC_DISK_2ARG)
struct gendisk *disk = blk_mq_alloc_disk(&zso->tag_set, NULL, zv);
if (IS_ERR(disk)) {
zso->zvo_disk = NULL;
blk_mq_free_tag_set(&zso->tag_set);
return (1);
}
zso->zvo_disk = disk;
zso->zvo_queue = zso->zvo_disk->queue;
zso->zvo_disk->minors = ZVOL_MINORS;
#else
zso->zvo_disk = alloc_disk(ZVOL_MINORS);
if (zso->zvo_disk == NULL) {
@@ -1256,7 +1306,7 @@ zvol_os_free(zvol_state_t *zv)
del_gendisk(zv->zv_zso->zvo_disk);
#if defined(HAVE_SUBMIT_BIO_IN_BLOCK_DEVICE_OPERATIONS) && \
defined(HAVE_BLK_ALLOC_DISK)
(defined(HAVE_BLK_ALLOC_DISK) || defined(HAVE_BLK_ALLOC_DISK_2ARG))
#if defined(HAVE_BLK_CLEANUP_DISK)
blk_cleanup_disk(zv->zv_zso->zvo_disk);
#else
@@ -1313,6 +1363,13 @@ zvol_os_create_minor(const char *name)
if (idx < 0)
return (SET_ERROR(-idx));
minor = idx << ZVOL_MINOR_BITS;
if (MINOR(minor) != minor) {
/* too many partitions can cause an overflow */
zfs_dbgmsg("zvol: create minor overflow: %s, minor %u/%u",
name, minor, MINOR(minor));
ida_simple_remove(&zvol_ida, idx);
return (SET_ERROR(EINVAL));
}
zv = zvol_find_by_name_hash(name, hash, RW_NONE);
if (zv) {
@@ -1507,7 +1564,7 @@ zvol_os_rename_minor(zvol_state_t *zv, const char *newname)
strlcpy(zv->zv_name, newname, sizeof (zv->zv_name));
/* move to new hashtable entry */
zv->zv_hash = zvol_name_hash(zv->zv_name);
zv->zv_hash = zvol_name_hash(newname);
hlist_del(&zv->zv_hlink);
hlist_add_head(&zv->zv_hlink, ZVOL_HT_HEAD(zv->zv_hash));
@@ -1563,8 +1620,40 @@ zvol_init(void)
zvol_actual_threads = MIN(MAX(zvol_threads, 1), 1024);
}
/*
* Use atleast 32 zvol_threads but for many core system,
* prefer 6 threads per taskq, but no more taskqs
* than threads in them on large systems.
*
* taskq total
* cpus taskqs threads threads
* ------- ------- ------- -------
* 1 1 32 32
* 2 1 32 32
* 4 1 32 32
* 8 2 16 32
* 16 3 11 33
* 32 5 7 35
* 64 8 8 64
* 128 11 12 132
* 256 16 16 256
*/
zv_taskq_t *ztqs = &zvol_taskqs;
uint_t num_tqs = MIN(num_online_cpus(), zvol_num_taskqs);
if (num_tqs == 0) {
num_tqs = 1 + num_online_cpus() / 6;
while (num_tqs * num_tqs > zvol_actual_threads)
num_tqs--;
}
uint_t per_tq_thread = zvol_actual_threads / num_tqs;
if (per_tq_thread * num_tqs < zvol_actual_threads)
per_tq_thread++;
ztqs->tqs_cnt = num_tqs;
ztqs->tqs_taskq = kmem_alloc(num_tqs * sizeof (taskq_t *), KM_SLEEP);
error = register_blkdev(zvol_major, ZVOL_DRIVER);
if (error) {
kmem_free(ztqs->tqs_taskq, ztqs->tqs_cnt * sizeof (taskq_t *));
ztqs->tqs_taskq = NULL;
printk(KERN_INFO "ZFS: register_blkdev() failed %d\n", error);
return (error);
}
@@ -1584,11 +1673,22 @@ zvol_init(void)
1024);
}
#endif
zvol_taskq = taskq_create(ZVOL_DRIVER, zvol_actual_threads, maxclsyspri,
zvol_actual_threads, INT_MAX, TASKQ_PREPOPULATE | TASKQ_DYNAMIC);
if (zvol_taskq == NULL) {
unregister_blkdev(zvol_major, ZVOL_DRIVER);
return (-ENOMEM);
for (uint_t i = 0; i < num_tqs; i++) {
char name[32];
(void) snprintf(name, sizeof (name), "%s_tq-%u",
ZVOL_DRIVER, i);
ztqs->tqs_taskq[i] = taskq_create(name, per_tq_thread,
maxclsyspri, per_tq_thread, INT_MAX,
TASKQ_PREPOPULATE | TASKQ_DYNAMIC);
if (ztqs->tqs_taskq[i] == NULL) {
for (int j = i - 1; j >= 0; j--)
taskq_destroy(ztqs->tqs_taskq[j]);
unregister_blkdev(zvol_major, ZVOL_DRIVER);
kmem_free(ztqs->tqs_taskq, ztqs->tqs_cnt *
sizeof (taskq_t *));
ztqs->tqs_taskq = NULL;
return (-ENOMEM);
}
}
zvol_init_impl();
@@ -1599,9 +1699,22 @@ zvol_init(void)
void
zvol_fini(void)
{
zv_taskq_t *ztqs = &zvol_taskqs;
zvol_fini_impl();
unregister_blkdev(zvol_major, ZVOL_DRIVER);
taskq_destroy(zvol_taskq);
if (ztqs->tqs_taskq == NULL) {
ASSERT3U(ztqs->tqs_cnt, ==, 0);
} else {
for (uint_t i = 0; i < ztqs->tqs_cnt; i++) {
ASSERT3P(ztqs->tqs_taskq[i], !=, NULL);
taskq_destroy(ztqs->tqs_taskq[i]);
}
kmem_free(ztqs->tqs_taskq, ztqs->tqs_cnt *
sizeof (taskq_t *));
ztqs->tqs_taskq = NULL;
}
ida_destroy(&zvol_ida);
}
@@ -1622,6 +1735,9 @@ MODULE_PARM_DESC(zvol_request_sync, "Synchronously handle bio requests");
module_param(zvol_max_discard_blocks, ulong, 0444);
MODULE_PARM_DESC(zvol_max_discard_blocks, "Max number of blocks to discard");
module_param(zvol_num_taskqs, uint, 0444);
MODULE_PARM_DESC(zvol_num_taskqs, "Number of zvol taskqs");
module_param(zvol_prefetch_bytes, uint, 0644);
MODULE_PARM_DESC(zvol_prefetch_bytes, "Prefetch N bytes at zvol start+end");
+11
View File
@@ -345,6 +345,13 @@ zfs_prop_init(void)
{ NULL }
};
static const zprop_index_t prefetch_table[] = {
{ "none", ZFS_PREFETCH_NONE },
{ "metadata", ZFS_PREFETCH_METADATA },
{ "all", ZFS_PREFETCH_ALL },
{ NULL }
};
static const zprop_index_t sync_table[] = {
{ "standard", ZFS_SYNC_STANDARD },
{ "always", ZFS_SYNC_ALWAYS },
@@ -453,6 +460,10 @@ zfs_prop_init(void)
ZFS_CACHE_ALL, PROP_INHERIT,
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT | ZFS_TYPE_VOLUME,
"all | none | metadata", "SECONDARYCACHE", cache_table, sfeatures);
zprop_register_index(ZFS_PROP_PREFETCH, "prefetch",
ZFS_PREFETCH_ALL, PROP_INHERIT,
ZFS_TYPE_FILESYSTEM | ZFS_TYPE_SNAPSHOT | ZFS_TYPE_VOLUME,
"none | metadata | all", "PREFETCH", prefetch_table, sfeatures);
zprop_register_index(ZFS_PROP_LOGBIAS, "logbias", ZFS_LOGBIAS_LATENCY,
PROP_INHERIT, ZFS_TYPE_FILESYSTEM | ZFS_TYPE_VOLUME,
"latency | throughput", "LOGBIAS", logbias_table, sfeatures);
+6
View File
@@ -431,6 +431,12 @@ vdev_prop_init(void)
zprop_register_number(VDEV_PROP_IO_T, "io_t", UINT64_MAX,
PROP_DEFAULT, ZFS_TYPE_VDEV, "<seconds>", "IO_T", B_FALSE,
sfeatures);
zprop_register_number(VDEV_PROP_SLOW_IO_N, "slow_io_n", UINT64_MAX,
PROP_DEFAULT, ZFS_TYPE_VDEV, "<events>", "SLOW_IO_N", B_FALSE,
sfeatures);
zprop_register_number(VDEV_PROP_SLOW_IO_T, "slow_io_t", UINT64_MAX,
PROP_DEFAULT, ZFS_TYPE_VDEV, "<seconds>", "SLOW_IO_T", B_FALSE,
sfeatures);
/* default index (boolean) properties */
zprop_register_index(VDEV_PROP_REMOVING, "removing", 0,
+42
View File
@@ -826,6 +826,48 @@ abd_iterate_func(abd_t *abd, size_t off, size_t size,
return (ret);
}
#if defined(__linux__) && defined(_KERNEL)
int
abd_iterate_page_func(abd_t *abd, size_t off, size_t size,
abd_iter_page_func_t *func, void *private)
{
struct abd_iter aiter;
int ret = 0;
if (size == 0)
return (0);
abd_verify(abd);
ASSERT3U(off + size, <=, abd->abd_size);
abd_t *c_abd = abd_init_abd_iter(abd, &aiter, off);
while (size > 0) {
IMPLY(abd_is_gang(abd), c_abd != NULL);
abd_iter_page(&aiter);
size_t len = MIN(aiter.iter_page_dsize, size);
ASSERT3U(len, >, 0);
ret = func(aiter.iter_page, aiter.iter_page_doff,
len, private);
aiter.iter_page = NULL;
aiter.iter_page_doff = 0;
aiter.iter_page_dsize = 0;
if (ret != 0)
break;
size -= len;
c_abd = abd_advance_abd_iter(abd, c_abd, &aiter, len);
}
return (ret);
}
#endif
struct buf_arg {
void *arg_buf;
};
+98 -91
View File
@@ -3872,7 +3872,7 @@ arc_evict_state_impl(multilist_t *ml, int idx, arc_buf_hdr_t *marker,
ASSERT3P(marker, !=, NULL);
mls = multilist_sublist_lock(ml, idx);
mls = multilist_sublist_lock_idx(ml, idx);
for (hdr = multilist_sublist_prev(mls, marker); likely(hdr != NULL);
hdr = multilist_sublist_prev(mls, marker)) {
@@ -3984,6 +3984,26 @@ arc_evict_state_impl(multilist_t *ml, int idx, arc_buf_hdr_t *marker,
return (bytes_evicted);
}
static arc_buf_hdr_t *
arc_state_alloc_marker(void)
{
arc_buf_hdr_t *marker = kmem_cache_alloc(hdr_full_cache, KM_SLEEP);
/*
* A b_spa of 0 is used to indicate that this header is
* a marker. This fact is used in arc_evict_state_impl().
*/
marker->b_spa = 0;
return (marker);
}
static void
arc_state_free_marker(arc_buf_hdr_t *marker)
{
kmem_cache_free(hdr_full_cache, marker);
}
/*
* Allocate an array of buffer headers used as placeholders during arc state
* eviction.
@@ -3994,16 +4014,8 @@ arc_state_alloc_markers(int count)
arc_buf_hdr_t **markers;
markers = kmem_zalloc(sizeof (*markers) * count, KM_SLEEP);
for (int i = 0; i < count; i++) {
markers[i] = kmem_cache_alloc(hdr_full_cache, KM_SLEEP);
/*
* A b_spa of 0 is used to indicate that this header is
* a marker. This fact is used in arc_evict_state_impl().
*/
markers[i]->b_spa = 0;
}
for (int i = 0; i < count; i++)
markers[i] = arc_state_alloc_marker();
return (markers);
}
@@ -4011,7 +4023,7 @@ static void
arc_state_free_markers(arc_buf_hdr_t **markers, int count)
{
for (int i = 0; i < count; i++)
kmem_cache_free(hdr_full_cache, markers[i]);
arc_state_free_marker(markers[i]);
kmem_free(markers, sizeof (*markers) * count);
}
@@ -4055,7 +4067,7 @@ arc_evict_state(arc_state_t *state, arc_buf_contents_t type, uint64_t spa,
for (int i = 0; i < num_sublists; i++) {
multilist_sublist_t *mls;
mls = multilist_sublist_lock(ml, i);
mls = multilist_sublist_lock_idx(ml, i);
multilist_sublist_insert_tail(mls, markers[i]);
multilist_sublist_unlock(mls);
}
@@ -4120,7 +4132,7 @@ arc_evict_state(arc_state_t *state, arc_buf_contents_t type, uint64_t spa,
}
for (int i = 0; i < num_sublists; i++) {
multilist_sublist_t *mls = multilist_sublist_lock(ml, i);
multilist_sublist_t *mls = multilist_sublist_lock_idx(ml, i);
multilist_sublist_remove(mls, markers[i]);
multilist_sublist_unlock(mls);
}
@@ -8628,7 +8640,7 @@ l2arc_sublist_lock(int list_num)
* sublists being selected.
*/
idx = multilist_get_random_index(ml);
return (multilist_sublist_lock(ml, idx));
return (multilist_sublist_lock_idx(ml, idx));
}
/*
@@ -9040,9 +9052,9 @@ l2arc_blk_fetch_done(zio_t *zio)
static uint64_t
l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
{
arc_buf_hdr_t *hdr, *hdr_prev, *head;
uint64_t write_asize, write_psize, write_lsize, headroom;
boolean_t full;
arc_buf_hdr_t *hdr, *head, *marker;
uint64_t write_asize, write_psize, headroom;
boolean_t full, from_head = !arc_warm;
l2arc_write_callback_t *cb = NULL;
zio_t *pio, *wzio;
uint64_t guid = spa_load_guid(spa);
@@ -9051,10 +9063,11 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
ASSERT3P(dev->l2ad_vdev, !=, NULL);
pio = NULL;
write_lsize = write_asize = write_psize = 0;
write_asize = write_psize = 0;
full = B_FALSE;
head = kmem_cache_alloc(hdr_l2only_cache, KM_PUSHPAGE);
arc_hdr_set_flags(head, ARC_FLAG_L2_WRITE_HEAD | ARC_FLAG_HAS_L2HDR);
marker = arc_state_alloc_marker();
/*
* Copy buffers for L2ARC writing.
@@ -9069,40 +9082,34 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
continue;
}
multilist_sublist_t *mls = l2arc_sublist_lock(pass);
uint64_t passed_sz = 0;
VERIFY3P(mls, !=, NULL);
/*
* L2ARC fast warmup.
*
* Until the ARC is warm and starts to evict, read from the
* head of the ARC lists rather than the tail.
*/
if (arc_warm == B_FALSE)
hdr = multilist_sublist_head(mls);
else
hdr = multilist_sublist_tail(mls);
headroom = target_sz * l2arc_headroom;
if (zfs_compressed_arc_enabled)
headroom = (headroom * l2arc_headroom_boost) / 100;
for (; hdr; hdr = hdr_prev) {
/*
* Until the ARC is warm and starts to evict, read from the
* head of the ARC lists rather than the tail.
*/
multilist_sublist_t *mls = l2arc_sublist_lock(pass);
ASSERT3P(mls, !=, NULL);
if (from_head)
hdr = multilist_sublist_head(mls);
else
hdr = multilist_sublist_tail(mls);
while (hdr != NULL) {
kmutex_t *hash_lock;
abd_t *to_write = NULL;
if (arc_warm == B_FALSE)
hdr_prev = multilist_sublist_next(mls, hdr);
else
hdr_prev = multilist_sublist_prev(mls, hdr);
hash_lock = HDR_LOCK(hdr);
if (!mutex_tryenter(hash_lock)) {
/*
* Skip this buffer rather than waiting.
*/
skip:
/* Skip this buffer rather than waiting. */
if (from_head)
hdr = multilist_sublist_next(mls, hdr);
else
hdr = multilist_sublist_prev(mls, hdr);
continue;
}
@@ -9117,11 +9124,10 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
if (!l2arc_write_eligible(guid, hdr)) {
mutex_exit(hash_lock);
continue;
goto skip;
}
ASSERT(HDR_HAS_L1HDR(hdr));
ASSERT3U(HDR_GET_PSIZE(hdr), >, 0);
ASSERT3U(arc_hdr_size(hdr), >, 0);
ASSERT(hdr->b_l1hdr.b_pabd != NULL ||
@@ -9143,12 +9149,18 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
}
/*
* We rely on the L1 portion of the header below, so
* it's invalid for this header to have been evicted out
* of the ghost cache, prior to being written out. The
* ARC_FLAG_L2_WRITING bit ensures this won't happen.
* We should not sleep with sublist lock held or it
* may block ARC eviction. Insert a marker to save
* the position and drop the lock.
*/
arc_hdr_set_flags(hdr, ARC_FLAG_L2_WRITING);
if (from_head) {
multilist_sublist_insert_after(mls, hdr,
marker);
} else {
multilist_sublist_insert_before(mls, hdr,
marker);
}
multilist_sublist_unlock(mls);
/*
* If this header has b_rabd, we can use this since it
@@ -9179,32 +9191,45 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
&to_write);
if (ret != 0) {
arc_hdr_clear_flags(hdr,
ARC_FLAG_L2_WRITING);
ARC_FLAG_L2CACHE);
mutex_exit(hash_lock);
continue;
goto next;
}
l2arc_free_abd_on_write(to_write, asize, type);
}
hdr->b_l2hdr.b_dev = dev;
hdr->b_l2hdr.b_daddr = dev->l2ad_hand;
hdr->b_l2hdr.b_hits = 0;
hdr->b_l2hdr.b_arcs_state =
hdr->b_l1hdr.b_state->arcs_state;
mutex_enter(&dev->l2ad_mtx);
if (pio == NULL) {
/*
* Insert a dummy header on the buflist so
* l2arc_write_done() can find where the
* write buffers begin without searching.
*/
mutex_enter(&dev->l2ad_mtx);
list_insert_head(&dev->l2ad_buflist, head);
mutex_exit(&dev->l2ad_mtx);
}
list_insert_head(&dev->l2ad_buflist, hdr);
mutex_exit(&dev->l2ad_mtx);
arc_hdr_set_flags(hdr, ARC_FLAG_HAS_L2HDR |
ARC_FLAG_L2_WRITING);
(void) zfs_refcount_add_many(&dev->l2ad_alloc,
arc_hdr_size(hdr), hdr);
l2arc_hdr_arcstats_increment(hdr);
boolean_t commit = l2arc_log_blk_insert(dev, hdr);
mutex_exit(hash_lock);
if (pio == NULL) {
cb = kmem_alloc(
sizeof (l2arc_write_callback_t), KM_SLEEP);
cb->l2wcb_dev = dev;
cb->l2wcb_head = head;
/*
* Create a list to save allocated abd buffers
* for l2arc_log_blk_commit().
*/
list_create(&cb->l2wcb_abd_list,
sizeof (l2arc_lb_abd_buf_t),
offsetof(l2arc_lb_abd_buf_t, node));
@@ -9212,54 +9237,34 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
ZIO_FLAG_CANFAIL);
}
hdr->b_l2hdr.b_dev = dev;
hdr->b_l2hdr.b_hits = 0;
hdr->b_l2hdr.b_daddr = dev->l2ad_hand;
hdr->b_l2hdr.b_arcs_state =
hdr->b_l1hdr.b_state->arcs_state;
arc_hdr_set_flags(hdr, ARC_FLAG_HAS_L2HDR);
mutex_enter(&dev->l2ad_mtx);
list_insert_head(&dev->l2ad_buflist, hdr);
mutex_exit(&dev->l2ad_mtx);
(void) zfs_refcount_add_many(&dev->l2ad_alloc,
arc_hdr_size(hdr), hdr);
wzio = zio_write_phys(pio, dev->l2ad_vdev,
hdr->b_l2hdr.b_daddr, asize, to_write,
dev->l2ad_hand, asize, to_write,
ZIO_CHECKSUM_OFF, NULL, hdr,
ZIO_PRIORITY_ASYNC_WRITE,
ZIO_FLAG_CANFAIL, B_FALSE);
write_lsize += HDR_GET_LSIZE(hdr);
DTRACE_PROBE2(l2arc__write, vdev_t *, dev->l2ad_vdev,
zio_t *, wzio);
zio_nowait(wzio);
write_psize += psize;
write_asize += asize;
dev->l2ad_hand += asize;
l2arc_hdr_arcstats_increment(hdr);
vdev_space_update(dev->l2ad_vdev, asize, 0, 0);
mutex_exit(hash_lock);
/*
* Append buf info to current log and commit if full.
* arcstat_l2_{size,asize} kstats are updated
* internally.
*/
if (l2arc_log_blk_insert(dev, hdr)) {
/*
* l2ad_hand will be adjusted in
* l2arc_log_blk_commit().
*/
if (commit) {
/* l2ad_hand will be adjusted inside. */
write_asize +=
l2arc_log_blk_commit(dev, pio, cb);
}
zio_nowait(wzio);
next:
multilist_sublist_lock(mls);
if (from_head)
hdr = multilist_sublist_next(mls, marker);
else
hdr = multilist_sublist_prev(mls, marker);
multilist_sublist_remove(mls, marker);
}
multilist_sublist_unlock(mls);
@@ -9268,9 +9273,11 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz)
break;
}
arc_state_free_marker(marker);
/* No buffers selected for writing? */
if (pio == NULL) {
ASSERT0(write_lsize);
ASSERT0(write_psize);
ASSERT(!HDR_HAS_L1HDR(head));
kmem_cache_free(hdr_l2only_cache, head);
@@ -10598,7 +10605,7 @@ l2arc_log_blk_insert(l2arc_dev_t *dev, const arc_buf_hdr_t *hdr)
L2BLK_SET_TYPE((le)->le_prop, hdr->b_type);
L2BLK_SET_PROTECTED((le)->le_prop, !!(HDR_PROTECTED(hdr)));
L2BLK_SET_PREFETCH((le)->le_prop, !!(HDR_PREFETCH(hdr)));
L2BLK_SET_STATE((le)->le_prop, hdr->b_l1hdr.b_state->arcs_state);
L2BLK_SET_STATE((le)->le_prop, hdr->b_l2hdr.b_arcs_state);
dev->l2ad_log_blk_payload_asize += vdev_psize_to_asize(dev->l2ad_vdev,
HDR_GET_PSIZE(hdr));

Some files were not shown because too many files have changed in this diff Show More