Compare commits

..

59 Commits

Author SHA1 Message Date
Tony Hutter c840612ee1 Tag zfs-2.3.6
META file and changelog updated.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
2026-02-19 14:58:21 -08:00
Tony Hutter 65579f4cba CI: Test & fix Linux ZFS built-in build
ZFS can be built directly into the Linux kernel.  Add a test build
of this to the CI to verify it works.  The test build is only enabled
on Fedora runners (since they run the newest kernels) and is done in
parallel with ZTS.  The test build is done on vm2, since it typically
finishes ~15min before vm1 and thus has time to spare.

In addition:

- Update 'copy-builtin' to check that $1 is a directory
- Fix some VERIFYs that were causing the built-in build to fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18234
2026-02-19 14:58:21 -08:00
Alexx Saver 2032f21857 chksum: run 256K benchmark on demand, preserve chksum_stat_data
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexx Saver <lzsaver.eth@ethermail.io>
Co-authored-by: Adam Moss <c@yotes.com>
Closes #17945
Closes #17946
2026-02-17 10:18:14 -08:00
Tony Hutter dc58baf9d1 Linux 6.19 compat: META
Update the META file to reflect compatibility with the 6.19
kernel.

Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18197
2026-02-11 16:18:01 -08:00
Brooks Davis 06a88f9d13 nvpair: chase FreeBSD xdrproc_t definition
As of FreeBSD 16, xdrproc_t will take exactly two arguments in both
kernel and userspace in line with the Linux kernel.

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Alan Somers <asomers@freebsd.org>
Signed-off-by:	Brooks Davis <brooks@capabilitieslimited.co.uk>
Closes #18154
2026-02-11 16:18:01 -08:00
Alek P 88ce22ed95 remove thread unsafe debug code causing FreeBSD double free panic
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes #18140
2026-02-11 16:18:01 -08:00
Mark Johnston 366dad1cac FreeBSD: Remove references to DEBUG_VFS_LOCKS
This option is removed upstream in favour of plain INVARIANTS.

VNASSERT is always defined so I see no reason to use it conditionally.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Closes #18136
2026-02-11 16:18:01 -08:00
Alexander Motin 1dc5088e6a FreeBSD: Remove HAVE_INLINE_FLSL use
These macros are deprecated in FreeBSD kernel for several years,
and unneeded for much longer.  Instead, similar to Linux, let
kernel let compiler do the right things.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #18004
2026-02-11 16:18:01 -08:00
Rob Norris 135fffbc3e Linux 6.19: replace i_state access with inode_state_read_once()
Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #18053
2026-02-11 16:18:01 -08:00
Rob Norris 18065e9296 Linux 6.18: generic_drop_inode() and generic_delete_inode() renamed
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2026-02-11 16:18:01 -08:00
Rob Norris 00ee7f9430 linux/super: add tunable to request immediate reclaim of unused dentries
Traditionally, unused dentries would be cached in the dentry cache until
the associated entry is no longer on disk. The cached dentry continues
to hold an inode reference, causing the inode to be pinned (see previous
commit).

Here we implement the dentry op d_delete, which is roughly analogous to
the drop_inode superblock op, and add a zfs_delete_dentry tunable to
control its behaviour. By default it continues the traditional
behaviour, but when the tunable is enabled, we signal that an unused
dentry should be freed immediately, releasing its inode reference, and
so allowing that inode to be deleted if no longer in use.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2026-02-11 16:18:01 -08:00
Rob Norris 3662c7f33c linux/super: add tunable to request immediate reclaim of unused inodes
Traditionally, unused inodes would be held on the superblock inode cache
until the associated on-disk file is removed or the kernel requests
reclaim.  On filesystems with millions of rarely-used files, this can be
a lot of unusable memory.

Here we implement the superblock drop_inode method, and add a
zfs_delete_inode tunable to control its behaviour. By default it
continues the traditional behaviour, but when the tunable is enabled, we
signal that the inode should be deleted immediately when the last
reference is dropped, rather than cached. This releases the associated
data to the dbuf cache and ARC, allowing them to be reclaimed normally.

Sponsored-by: Klara, Inc.
Sponsored-by: Fastmail Pty Ltd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Closes #17746
2026-02-11 16:18:01 -08:00
Rob Norris 29bda86d7b config: restore ZFS_AC_KERNEL_DENTRY tests
Accidentally removed calls in ed048fdc5b.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #17621
2026-02-11 16:18:01 -08:00
Alex 6788dcd47c Fix a declaration position of the nth_page.
Compilation time bug introduced by 87df5e4 commit.
Fix for the compilation error(Linux kernel 6.18.0):
"zfs/module/os/linux/zfs/abd_os.c:920:32: error: implicit declaration
of function ‘nth_page’; did you mean ‘pte_page’?
[-Werror=implicit-function-declaration]".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: agiUnderground <alex.dev.cv@gmail.com>
Closes #18034
2026-02-11 13:33:19 -08:00
Erik Larsson beb25b936b Fix build for Linux 6.18 with PowerPC/RISC-V kernels. (#18145)
The macro 'flush_dcache_page(...)' modifies the page flags, but in Linux
6.18 the type of the page flags changed from 'unsigned long' to the
struct type 'memdesc_flags_t' with a single member 'f' which is the page
flags field.

Signed-off-by: Erik Larsson <catacombae@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2026-02-11 13:33:19 -08:00
John Cabaj d857aea6d4 Linux 6.19: handle --werror with CONFIG_OBJTOOL_WERROR=y
Linux upstream commit 56754f0f46f6: "objtool: Rename
--Werror to --werror" did just that, so we should check for
either "--Werror" or "--werror", else the build will fail

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: John Cabaj <john.cabaj@canonical.com>
Closes #18152
2026-02-11 13:33:19 -08:00
Brian Behlendorf f7ab47908b ZTS: update the relevant mmp test cases
- mmp_concurrent_import: added test case to verify that concurrent
  import correctness.  The pool may only be imported once.

- mmp_exported_import: an activity check is now required for pools
  which were cleanly exported if the system and pool hostids don't
  match.

- mmp_inactive_import: an activity check is now required for any
  pool which wasn't cleanly exported, even if the system and pool
  hostids match.

- mmp_on_uberblocks: updated expected uberblocks to take in to account
  the value MMP_INTERVAL_DEFAULT is set too.

- mmp_reset_interval: reduce the number of iterations from 10 to 3.
  This is sufficient to verify functionality and significantly speeds
  up the test.

- mmp_on_uberblocks: adjust the thresholds and increase the runtime
  to avoid false positives observed in CI.

- Update tests to use 'zhack action idle' instead of ztest to improve
  the reliability of the tests.

- Add additional log_note messages to test cases which have multiple
  verification steps to make it clear which portion of a test failed
  when reviewing the logs.

- Replace default_setup/cleanup_noexit calls with 'zpool create' and
  'zpool destroy' calls to avoid additional unnecessary dataset
  creation work.

- Update activity/noactivity check helper functions to use the
  ZFS_LOAD_INFO_DEBUG information now available from 'zpool import'
  to determine if this activity check ran and why.  This is more
  reliable in the CI than measuring the runtime.

- Removed all mmp tests from the zts-report.py exceptions list.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf d56f3cb331 zhack: add "action idle" subcommand
In order to reliably test the multihost protection we need two (or more)
systems attempting to import the pool at the same time.  Historically, we've
used ztest running in userspace to simulate an active pool and attempted to
import the pool with the kernel modules.  This works but ztest is a bit
unwieldy for this and if it crashes for unrelated reasons it can result
in false positives.

All we really need is the pool imported in userspace so the MMP thread is
active and writing out uberblocks.  We can extend zhack which already knows
how to import the pool read/write and add an option to leave the pool open
and idle.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf d8594ba2b8 zhack: add -G option to dump debug buffer
Add a -G option to zhack to dump the internal debug buffer on exit.
We were able to use the same code from zdb for this which was nice.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf a65bb7c518 mmp: claim sequence id before final import
As part of SPA_LOAD_IMPORT add an additional activity check to
detect simultaneous imports from different hosts.  This check is
only required when the timing is such that there's no activity
for the the read-only tryimport check to detect.  This extra
safety chceck operates as follows:

1. Repeats the following MMP check 10 times:
  a. Write out an MMP uberblock with the best txg and a random
     sequence id to all primary pool vdevs.
  b. Verify a minimum number of good writes such that even if
     the pool appears degraded on the remote host it will see
     at least one of the updated MMP uberblocks.
  c. Wait for the MMP interval this leaves a window for other
     racing hosts to make similar modifications which can be
     detected.
  d. Call vdev_uberblock_load() to determine the best uberblock
     to use, this should be the MMP uberblock just written.
  e. Verify the txg and random sequeunce number match the MMP
     uberblock written in 1a.

2. Restore the original MMP uberblocks.  This allows the check
   to be performed again if the pool fails to import for an
   unrelated reason.

This change also includes some refactoring and minor improvements.

- Never try loading earlier txgs during import when the import
  fails with EREMOTEIO or EINTER.  These errors don't indicate
  the txg is damaged but instead that its either in use on a
  remote host or the import was interactively cancelled.  No
  rewind is also performed for EBADD which can result from a
  stale trusted config when doing a verbatim import.

- Refactor the code for consistent logging of the multihost
  activity check using spa_load_note() and console messages
  indicating when the activity check was trigger and the result.

- Added MMP_*_MASK and MMP_SEQ_CLEAR() macros to allow easier
  modification of the sequence number in an uberblock.

- Added ZFS_LOAD_INFO_DEBUG environment variable which can be
  set to log to dump to stdout the spa_load_info nvlist returned
  during import.  This is used by the updated mmp test cases
  to determine if an activity check was run and its result.

- Standardize the mmp messages similarly to make it easier to
  find all the relevent mmp lines in the debug log.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf 328a823848 mmp: add spa_load_name() for tryimport
Tryimport adds a unique prefix to the pool name to avoid name
collisions.  This makes it awkward to log user-friendly info
during a tryimport.  Add a spa_load_name() function which can
be used to report the unmodified pool name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf 8bbd86693e mmp: move "Starting import" log message
Move the "Starting import" log message in to the import block so
it's matched with the "Fiinshed importing" debug message.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Brian Behlendorf 36c315571c mmp: further restrict mmp exported pool check
For a cleanly exported pools there exists a small window where
both systems may determine it's safe to import the pool and skip
the activity check.  Only allow the check to be skipped when the
last imported hostid matches the systems hostid and the pool was
cleanly exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Akash B <akash-b@hpe.com>
2026-02-11 13:33:19 -08:00
Rob Norris 8010a8a3ca spa_activity_check: narrow scope of MMP vars
They aren't used outside these very small blocks, and their initial
values are never used at all.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #17551
2026-02-11 13:33:19 -08:00
Paul Dagnelie a1d839eddd Enable zhack to work properly with 4k sector size disks
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Closes #17576
2026-02-11 13:33:12 -08:00
Paul Dagnelie 411249498e Add allocation profile export and zhack subcommand for import
When attempting to debug performance problems on large systems, one of
the major factors that affect performance is free space
fragmentation. This heavily affects the allocation process, which is an
area of active development in ZFS. Unfortunately, fragmenting a large
pool for testing purposes is time consuming; it usually involves filling
the pool and then repeatedly overwriting data until the free space
becomes fragmented, which can take many hours. And even if the time is
available, artificial workloads rarely generate the same fragmentation
patterns as the natural workloads they're attempting to mimic.

This patch has two parts. First, in zdb, we add the ability to export
the full allocation map of the pool. It iterates over each vdev,
printing every allocated segment in the ms_allocatable range tree. This
can be done while the pool is online, though in that case the allocation
map may actually be from several different TXGs as new ones are loaded
on demand.

The second is a new subcommand for zhack, zhack metaslab leak (and its
supporting kernel changes). This is a zhack subcommand that imports a
pool and then modified the range trees of the metaslabs, allowing the
sync process to write them out normall. It does not currently store
those allocations anywhere to make them reversible, and there is no
corresponding free subcommand (which would be extremely dangerous); this
is an irreversible process, only intended for performance testing. The
only way to reclaim the space afterwards is to destroy the pool or roll
back to a checkpoint.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes #17576
2026-02-11 10:27:01 -08:00
Tony Hutter 9a5027ccce CI: Test build Lustre against ZFS
The Lustre filessytem calls a number of exported ZFS functions.  Do a
test build on the Almalinux runners to make sure we're not breaking
Lustre.  We do the Lustre build in parallel with the normal ZTS test
for efficiency, since ZTS isn't very CPU intensive. The full Lustre
build takes around 15min when run on its own.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18161
2026-02-11 10:26:41 -08:00
Tony Hutter a50b8a727c CI: Fix qemu-1-setup failure, remove debug stuff
- For whatever reason, the runner will now startup with either two 75GB
  disks or one 150GB disk.  Previously the runner was always booting
  with two 75GB, but about a quarter of the time it now starts up
  with a single 150GB disk.  This caused qemu-1-setup.sh to fail
  since it expected the two 75GB disks.  This commit updates
  qemu-1-setup.sh to work with either disk config.

- Remove the watchdog from qemu-1-setup.sh.  It didn't turn out to be
  useful.

- Remove the timestamps that zfs-qemu.yml added to the qemu-1-setup.sh
  output.  The timestamps were redundant, since you can already
  download timestamped logs from the Github web interface.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18166
2026-02-11 10:26:36 -08:00
Alexander Moch bf4b271af1 CI: Add Alpine Linux 3.23 runner to the pipeline (#18087)
Add an Alpine Linux 3.23 runner to the CI chain to run OpenZFS builds
and tests against musl libc.

Currently, zfs_send_sparse is killed after 10 minutes on Alpine, causing
cascading EBUSY failures in the test suite. With zfs_send_sparse
disabled, the ZFS test suite reaches a pass rate of 94.62%.

This commit introduces the required Alpine-specific setup and a small
set of shell and cloud-init compatibility fixes that also apply to
existing Linux runners.

The Alpine runner is not enabled by default and is not executed for new
pull requests.

Sponsored-by: ERNW Research GmbH - https://ernw-research.de/

Signed-off-by: Alexander Moch <amoch@ernw.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
2026-02-11 10:26:30 -08:00
Tony Hutter 38ed094954 ZTS: add mount_loopback to test zfs behind loop dev
Add a test case to reproduce issue #17277:

1. Make a pool
2. Write a file to the pool
3. Mount the file as a loopback device
4. Make an XFS filesystem on the loopback device
5. Mount the XFS filesystem... <hangs>

Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Issue #17277
Closes #17329
2026-02-11 10:26:24 -08:00
Tony Hutter 497b9291d1 CI: Test 2.4.x in qemu-test-repo-vm.sh, quick mode
The qemu-test-repo-vm.sh script tests installs ZFS from different
repos.  Have it test from the new 2.4.x repos as well.

Also add a checkbox to run in "lookup mode".  This just does a
quick lookup to see what version is installed in each repo.  It does
not do a test install and module load.  It only takes 3min to run vs
over an hour for the full version.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18070
2026-02-11 10:25:56 -08:00
Brian Behlendorf 60b37bc647 CI: Add smatch static analysis workflow
Smatch is an actively maintained kernel-aware static analyzer
for C with a low false positive rate.  Since the code checker
can be run relatively quickly against the entire OpenZFS code
base (15 min) it makes sense to add it as a GitHub Actions
workflow.  Today smatch reports a significant numbers warnings
so the workflow is configured to always pass as long as the
analysis was run.  The results are available for reference.
Long term it would ideal to resolve all of the errors/warnings
at which point the workflow can be updated to fail when new
problems are detected.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Toomas Soome <tsoome@me.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #17935
2026-02-11 10:25:48 -08:00
Alexander Motin 44e6a07bff ZIO: Set minimum number of free issue threads to 32
Free issue threads might block waiting for synchronous DDT, BRT or
GANG header reads. So unlike other taskqs using ZTI_SCALE to scale
with number of CPUs, here we also need some amount of threads to
potentially saturate pool reads.  I am not sure we always want the
96 threads we had before ZTI_SCALE introduction at #11966 on small
systems, but lets make it at least 32.

While here, make free taskqs configurable, similar to read and
write ones.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17903
2025-12-19 19:55:14 -08:00
Alexander Motin 08d34f28f1 Suppress some ashift warnings
Do not warn about vdev ashifts being smaller then physical ashifts
in a pool status if the pool ashift property set and vdev ashift
satisfies it (bigger or equal), since user explicitly requested
this.  The ashift of individual vdevs are still reported.

Do not warn about vdev ashifts in zpool import, since it doesn't
matter much, and we don't even report individual vdevs ashifts
there.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17830
2025-12-19 19:55:14 -08:00
Alexander Motin 1397bf1e0e Explicit set ashift for non-leaf vdevs
Before this change ashift property was applied only to a leaf
vdevs.  As result, it worked only as a minimal value for parent
vdevs, since bigger physical_ashift value reported by any child
could be used instead when deciding parent's ashift, as if the
ashift property was never set.

This change explicitly passes ZPOOL_CONFIG_ASHIFT to all vdevs,
allowing override for parents only if the passed value is below
logical_ashift and so unacceptable.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17826
2025-12-19 19:55:14 -08:00
Alexander Motin c87a1f7137 raidz_test: Restore rand_data protection
It feels dirty to modify protection of a memory allocated via libc,
but at least we should try to restore it before freeing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-19 19:55:14 -08:00
Alexander Motin b2d052e617 raidz_test: Fix ZIO ABDs initialization
- When filling ABDs of several segments, consider offset.
 - "Corrupt" ABDs with actually different data to fail something.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-19 19:55:14 -08:00
Alexander Motin 23d4ce66f8 raidz_test: Set io_offset reasonably
- io_offset of 1 makes no sense.  Set default to 0.
 - Initialize io_offset in all cases.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-19 19:55:14 -08:00
Alexander Motin af9ae623e0 ZFS: Enable more logs for raidz_001_neg
The output is not so big here, so lets collect something useful.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com>
Closes #17977
2025-12-19 19:55:14 -08:00
Tony Hutter e35fdeb411 CI: Use Ubuntu mirrors instead of azure (#18057)
Use the official Ubuntu apt mirrors instead of
azure.archive.ubuntu.com, since that mirror can be slow:

    https://github.com/actions/runner-images/issues/7048

This can help speed up the 'Setup QEMU' stage.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18057
2025-12-19 19:55:14 -08:00
Tony Hutter 00ab445b51 CI: Change timeout values
The 'Setup QEMU' CI step updates and installs all packages necessary to
startup QEMU.  Typically the step takes a little over a minute, but
we've seen cases where it can take legitimately take more than 45min
minutes.  Change the timeout to 60 minutes.

In addition, change the 'Install dependencies' timeout to 60min since
we've also seen timeouts there.

Lastly, remove all timeouts from the zfs-qemu-packages workflow.
We do this so that we can always build packages from a branch, even if
the time it takes to do a CI step changes over time.  It's ok to
eliminate the timeouts from the zfs-qemu-packages completely since that
workflow is only run manually.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18056
2025-12-19 19:55:14 -08:00
Tony Hutter e3fe4293f7 CI: zfs-test-packages: Add in new repos
Test install from our new repos: zfs-latest, zfs-legacy,
zfs-2.3, zfs-2.2, from the zfs-test-packages workflow.
This on-demand workflow is use to verify that the zfs RPMs
in the repos are correct.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #17956
2025-12-19 19:55:14 -08:00
Tony Hutter e51c8c0e83 CI: Fix Ubuntu 22.01 rsend failures
For whatever reason, the single `log_note` in the `directory_diff`
function causes the function to stop executing on Ubuntu 22.  This
causes most of the rsend tests to fail.  Remove the line since it's only
informational.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #18032
2025-12-19 19:55:14 -08:00
Brian Behlendorf 0afe9b67c2 CI: exclude signed-off-by/reviewed-by from 72 char limit
Allow an author or reviewer's name and email address to exceed
the 72 character limit enforced by the commitcheck target.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #18030
2025-12-19 19:55:14 -08:00
bspengler-oss 4f77b30135 Fix HIGHMEM/kmap API violation in zfs_uiomove_bvec_impl()
Fix another instance where ZFS assumes multiple pages can be
mapped at once via zfs_kmap_local(), resulting in crashes and
potential memory corruption on HIGHMEM-enabled (typically 32-bit)
systems.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-19 19:55:14 -08:00
bspengler-oss 445879656b Preserve LIFO ordering of kmap ops in abd_raidz_gen_iterate()
ZFS typically preserves proper LIFO ordering regarding map/unmap
operations that wrap the Linux kernel's kmap interfaces that
require such ordering, but one instance in abd_raidz_gen_iterate()
did not.

Similar issues have been fixed in the Linux kernel in the past,
see for instance CVE-2025-39899 for userfaultfd.

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-19 19:55:14 -08:00
bspengler-oss 0dcb882037 Fix interaction of abd_iter_map()/abd_iter_unmap() with HIGHMEM
HIGHMEM kmap interfaces operate on only a single page at a time
yet ZFS hadn't accounted for this, resulting in crashes and
potential memory corruption on HIGHMEM (typically 32-bit) systems.
This was caught by PaX's KERNSEAL feature as it makes use of
HIGHMEM functionality on x64.

On typical 64-bit systems, this issue wouldn't have been observed,
as the map interfaces simply fall back to returning an address in
lowmem where the contiguous pages can be accessed directly.

Joint work with the PaX Team, tested by Mark van Dijk

Reviewed-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bspengler-oss <94915855+bspengler-oss@users.noreply.github.com>
Closes #15668
Closes #18030
2025-12-19 19:55:14 -08:00
Rob Norris 2fec0e3add Linux 6.18: META
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris b8c5d43f34 config/kmap_atomic: initialise test data
6.18 changes kmap_atomic() to take a const pointer. This is no problem
for the places we use it, but Clang fails the test due to a warning
about being unable to guarantee that uninitialised data will definitely
not change. Easily solved by forcibly initialising it.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris b3922eb8c1 zvol_id: make array length properly known at compile time
Using strlen() in an static array declaration is a GCC extension. Clang
calls it "gnu-folding-constant" and warns about it, which breaks the
build. If it were widespread we could just turn off the warning, but
since there's only one case, lets just change the array to an explicit
size.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris 8ebb586e0e Linux: bump -std to gnu11
Linux switched from -std=gnu89 to -std=gnu11 in 5.18
(torvalds/linux@e8c07082a8). We've always overridden that with gnu99
because we use some newer features.

More recent kernels are using C11 features in headers that we include.
GCC generally doesn't seem to care, but more recent versions of Clang
seem to be enforcing our gnu99 override more strictly, which breaks the
build in some configurations.

Just bumping our "override" to match the kernel seems to be the easiest
workaround. It's an effective no-op since 5.18, while still allowing us
to build on older kernels.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris f4ead6682c sha256_generic: make internal functions a little more private
Linux 6.18 has conflicting prototypes for various sha256_* and sha512_*
functions, which we get through a very long include chain. That's tough
to fix right now; easier is just to rename our internal functions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris a82b804250 Linux 6.18: namespace type moved to ns_common
The namespace type has moved from the namespace ops struct to the
"common" base namespace struct. Detect this and define a macro that does
the right thing for both versions.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris 8757506930 Linux 6.18: replace write_cache_pages()
Linux 6.18 removed write_cache_pages() without a usable replacement.
Here we implement a minimal zpl_write_cache_pages() that find the dirty
pages within the mapping, gets them into the expected state and hands
them off to zfs_putpage(), which handles the rest.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris c1f1464525 Linux 6.18: block_device_operations->getgeo takes struct gendisk*
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris 51ab0e2185 Linux 6.18: convert ida_simple_* calls
ida_simple_get() and ida_simple_remove() are removed in 6.18. However,
since 4.19 they have been simple wrappers around ida_alloc() and
ida_free(), so we can just use those directly.

Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Rob Norris 51421ecbe8 Linux 6.18: replace nth_page()
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
2025-12-17 11:24:44 -08:00
Alexander Moch bbbf438d66 linux: use sys/stat.h instead of linux/stat.h
glibc includes linux/stat.h for statx, but musl defines its own statx
struct and associated constants, which does not include STATX_MNT_ID
yet. Thus, including linux/stat.h directly should be avoided for
maximum libc compatibility.

Tested on:
  - glibc: x86_64, i686, aarch64, armv7l, armv6l
  - musl: x86_64, aarch64, armv7l, armv6l

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Tested-By: Achill Gilgenast <achill@achill.org>

Closes #17675
(cherry picked from commit ccf5a8a6fc)

Signed-off-by: classabbyamp <dev@placeviolette.net>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Co-authored-by: classabbyamp <5366828+classabbyamp@users.noreply.github.com>
2025-12-09 11:58:45 -08:00
Alexander Moch 2d9ba1e3c8 config: Fix LLVM-21 -Wuninitialized-const-pointer warning (#17997)
LLVM-21 enables -Wuninitialized-const-pointer which results in the
following compiler warning and the bdev_file_open_by_path() interface
not being detected for 6.9 and newer kernels.  The blk_holder_ops
are not used by the ZFS code so we can safely use a NULL argument
for this check.

    bdev_file_open_by_path/bdev_file_open_by_path.c:110:54: error:
    variable 'h' is uninitialized when passed as a const pointer
    argument here [-Werror,-Wuninitialized-const-pointer]

Reviewed-by: Rob Norris <robn@despairlabs.com>

Closes #17682
Closes #17684
(cherry picked from commit 9acedbacee)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Moch <mail@alexmoch.com>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-12-03 11:30:55 -08:00
111 changed files with 2991 additions and 790 deletions
+90 -32
View File
@@ -6,12 +6,19 @@
set -eu
# We've been seeing this script take over 15min to run. This may or
# may not be normal. Just to get a little more insight, print out
# a message to stdout with the top running process, and do this every
# 30 seconds. We can delete this watchdog later once we get a better
# handle on what the timeout value should be.
(while [ 1 ] ; do sleep 30 && echo "[watchdog: $(ps -eo cmd --sort=-pcpu | head -n 2 | tail -n 1)}')]"; done) &
# The default 'azure.archive.ubuntu.com' mirrors can be really slow.
# Prioritize the official Ubuntu mirrors.
#
# The normal apt-mirrors.txt will look like:
#
# http://azure.archive.ubuntu.com/ubuntu/ priority:1
# https://archive.ubuntu.com/ubuntu/ priority:2
# https://security.ubuntu.com/ubuntu/ priority:3
#
# Just delete the 'azure.archive.ubuntu.com' line.
sudo sed -i '/azure.archive.ubuntu.com/d' /etc/apt/apt-mirrors.txt
echo "Using mirrors:"
cat /etc/apt/apt-mirrors.txt
# install needed packages
export DEBIAN_FRONTEND="noninteractive"
@@ -27,35 +34,89 @@ ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -q -N ""
sudo systemctl stop docker.socket
sudo systemctl stop multipathd.socket
# remove default swapfile and /mnt
sudo swapoff -a
sudo umount -l /mnt
DISK="/dev/disk/cloud/azure_resource-part1"
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
sudo wipefs -aq $DISK
sudo systemctl daemon-reload
# Special case:
#
# For reasons unknown, the runner can boot-up with two different block device
# configurations. On one config you get two 75GB block devices, and on the
# other you get a single 150GB block device. Here's what both look like:
#
# --- Two 75GB block devices ---
# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
# sda 8:0 0 150G 0 disk
# ├─sda1 8:1 0 149G 0 part /
# ├─sda14 8:14 0 4M 0 part
# ├─sda15 8:15 0 106M 0 part /boot/efi
# └─sda16 259:0 0 913M 0 part /boot
#
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_root -> ../../sda
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_root-part1 -> ../../sda1
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part14 -> ../../sda14
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part15 -> ../../sda15
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part16 -> ../../sda16
#
# --- One 150GB block device ---
# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
# sda 8:0 0 75G 0 disk
# ├─sda1 8:1 0 74G 0 part /
# ├─sda14 8:14 0 4M 0 part
# ├─sda15 8:15 0 106M 0 part /boot/efi
# └─sda16 259:0 0 913M 0 part /boot
# sdb 8:16 0 75G 0 disk
# └─sdb1 8:17 0 75G 0 part
#
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_resource -> ../../sdb
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_resource-part1 -> ../../sdb1
# lrwxrwxrwx 1 root root 9 Jan 29 18:07 azure_root -> ../../sda
# lrwxrwxrwx 1 root root 10 Jan 29 18:07 azure_root-part1 -> ../../sda1
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part14 -> ../../sda14
# lrwxrwxrwx 1 root root 11 Jan 29 18:07 azure_root-part15 -> ../../sda15
#
# If we have the azure_resource-part1 partition, umount it, partition it, and
# use it as our ZFS disk and swap partition. If not, just create a file VDEV
# and swap file and use that instead.
# remove default swapfile and /mnt
if [ -e /dev/disk/cloud/azure_resource-part1 ] ; then
sudo umount -l /mnt
DISK="/dev/disk/cloud/azure_resource-part1"
sudo sed -e "s|^$DISK.*||g" -i /etc/fstab
sudo wipefs -aq $DISK
sudo systemctl daemon-reload
fi
sudo modprobe loop
sudo modprobe zfs
# partition the disk as needed
DISK="/dev/disk/cloud/azure_resource"
sudo sgdisk --zap-all $DISK
sudo sgdisk -p \
-n 1:0:+16G -c 1:"swap" \
-n 2:0:0 -c 2:"tests" \
$DISK
sync
sleep 1
if [ -e /dev/disk/cloud/azure_resource-part1 ] ; then
echo "We have two 75GB block devices"
# partition the disk as needed
DISK="/dev/disk/cloud/azure_resource"
sudo sgdisk --zap-all $DISK
sudo sgdisk -p \
-n 1:0:+16G -c 1:"swap" \
-n 2:0:0 -c 2:"tests" \
$DISK
sync
sleep 1
sudo fallocate -l 12G /test.ssd2
DISKS="$DISK-part2 /test.ssd2"
SWAP=$DISK-part1
else
echo "We have a single 150GB block device"
sudo fallocate -l 72G /test.ssd2
SWAP=/swapfile.ssd
sudo fallocate -l 16G $SWAP
sudo chmod 600 $SWAP
DISKS="/test.ssd2"
fi
# swap with same size as RAM (16GiB)
sudo mkswap $DISK-part1
sudo swapon $DISK-part1
# JBOD 2xdisk for OpenZFS storage (test vm's)
SSD1="$DISK-part2"
sudo fallocate -l 12G /test.ssd2
SSD2=$(sudo losetup -b 4096 -f /test.ssd2 --show)
sudo mkswap $SWAP
sudo swapon $SWAP
# adjust zfs module parameter and create pool
exec 1>/dev/null
@@ -64,7 +125,7 @@ ARC_MAX=$((1024*1024*512))
echo $ARC_MIN | sudo tee /sys/module/zfs/parameters/zfs_arc_min
echo $ARC_MAX | sudo tee /sys/module/zfs/parameters/zfs_arc_max
echo 1 | sudo tee /sys/module/zfs/parameters/zvol_use_blk_mq
sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 -O relatime=off \
sudo zpool create -f -o ashift=12 zpool $DISKS -O relatime=off \
-O atime=off -O xattr=sa -O compression=lz4 -O sync=disabled \
-O redundant_metadata=none -O mountpoint=/mnt/tests
@@ -72,6 +133,3 @@ sudo zpool create -f -o ashift=12 zpool $SSD1 $SSD2 -O relatime=off \
for i in /sys/block/s*/queue/scheduler; do
echo "none" | sudo tee $i
done
# Kill off our watchdog
kill $(jobs -p)
+41 -7
View File
@@ -43,6 +43,12 @@ case "$OS" in
OSv="almalinux9"
URL="https://repo.almalinux.org/almalinux/10/cloud/x86_64/images/AlmaLinux-10-GenericCloud-latest.x86_64.qcow2"
;;
alpine3-23)
OSNAME="Alpine Linux 3.23.2"
# Alpine Linux v3.22 and v3.23 are unknown to osinfo as of 2025-12-26.
OSv="alpinelinux3.21"
URL="https://dl-cdn.alpinelinux.org/alpine/v3.23/releases/cloud/generic_alpine-3.23.2-x86_64-bios-cloudinit-r0.qcow2"
;;
archlinux)
OSNAME="Archlinux"
URL="https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2"
@@ -223,13 +229,21 @@ if [ ${OS:0:7} != "freebsd" ]; then
hostname: $OS
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
ssh_authorized_keys:
- $PUBKEY
- name: root
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
- name: zfs
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- $PUBKEY
# Workaround for Alpine Linux.
lock_passwd: false
passwd: '*'
packages:
- sudo
- bash
growpart:
mode: auto
@@ -312,3 +326,23 @@ else
scp ~/src.txz "root@vm0:/tmp/src.txz"
ssh root@vm0 'tar -C / -zxf /tmp/src.txz'
fi
#
# Config for Alpine Linux similar to FreeBSD.
#
if [ ${OS:0:6} == "alpine" ]; then
while pidof /usr/bin/qemu-system-x86_64 >/dev/null; do
ssh 2>/dev/null zfs@vm0 "uname -a" && break
done
# Enable community and testing repositories.
ssh zfs@vm0 "sudo rm -rf /etc/apk/repositories"
ssh zfs@vm0 "sudo setup-apkrepos -c1"
ssh zfs@vm0 "echo '@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing' | sudo tee -a /etc/apk/repositories"
# Upgrade to edge or latest-stable.
#ssh zfs@vm0 "sudo sed -i 's#/v[0-9]\+\.[0-9]\+/#/edge/#g' /etc/apk/repositories"
#ssh zfs@vm0 "sudo sed -i 's#/v[0-9]\+\.[0-9]\+/#/latest-stable/#g' /etc/apk/repositories"
# Update and upgrade after repository setup.
ssh zfs@vm0 "sudo apk update"
ssh zfs@vm0 "sudo apk add --upgrade apk-tools"
ssh zfs@vm0 "sudo apk upgrade --available"
fi
+49 -1
View File
@@ -10,6 +10,32 @@
set -eu
function alpine() {
echo "##[group]Install Development Tools"
sudo apk add \
acl alpine-sdk attr autoconf automake bash build-base clang21 coreutils \
cpio cryptsetup curl curl-dev dhcpcd eudev eudev-dev eudev-libs findutils \
fio gawk gdb gettext-dev git grep jq libaio libaio-dev libcurl \
libtirpc-dev libtool libunwind libunwind-dev linux-headers linux-tools \
linux-virt linux-virt-dev lsscsi m4 make nfs-utils openssl-dev parted \
pax procps py3-cffi py3-distlib py3-packaging py3-setuptools python3 \
python3-dev qemu-guest-agent rng-tools rsync samba samba-server sed \
strace sysstat util-linux util-linux-dev wget words xfsprogs xxhash \
zlib-dev pamtester@testing
echo "##[endgroup]"
echo "##[group]Switch to eudev"
sudo setup-devd udev
echo "##[endgroup]"
echo "##[group]Install ksh93 from Source"
git clone --depth 1 https://github.com/ksh93/ksh.git /tmp/ksh
cd /tmp/ksh
./bin/package make
sudo ./bin/package install /
echo "##[endgroup]"
}
function archlinux() {
echo "##[group]Running pacman -Syu"
sudo btrfs filesystem resize max /
@@ -27,6 +53,10 @@ function archlinux() {
function debian() {
export DEBIAN_FRONTEND="noninteractive"
echo "##[group]Wait for cloud-init to finish"
cloud-init status --wait
echo "##[endgroup]"
echo "##[group]Running apt-get update+upgrade"
sudo sed -i '/[[:alpha:]]-backports/d' /etc/apt/sources.list
sudo apt-get update -y
@@ -90,6 +120,11 @@ function rhel() {
kernel-devel python3-setuptools qemu-guest-agent rng-tools rpcgen \
rpm-build rsync samba strace sysstat systemd watchdog wget xfsprogs-devel \
xxhash zlib-devel
# These are needed for building Lustre. We only install these on EL VMs since
# we don't plan to test build Lustre on other platforms.
sudo dnf install -y libnl3-devel libyaml-devel libmount-devel
echo "##[endgroup]"
}
@@ -140,6 +175,9 @@ case "$1" in
sudo dnf install -y kernel-abi-stablelists
echo "##[endgroup]"
;;
alpine*)
alpine
;;
archlinux)
archlinux
;;
@@ -188,6 +226,16 @@ test -z "${ONLY_DEPS:-}" || exit 0
# Start services
echo "##[group]Enable services"
case "$1" in
alpine*)
sudo -E rc-update add qemu-guest-agent
sudo -E rc-update add nfs
sudo -E rc-update add samba
sudo -E rc-update add dhcpcd
# Remove services related to cloud-init.
sudo -E rc-update del cloud-init default
sudo -E rc-update del cloud-final default
sudo -E rc-update del cloud-config default
;;
freebsd*)
# add virtio things
echo 'virtio_load="YES"' | sudo -E tee -a /boot/loader.conf
@@ -243,7 +291,7 @@ case "$1" in
esac
case "$1" in
archlinux|freebsd*)
alpine*|archlinux|freebsd*)
true
;;
*)
+15 -7
View File
@@ -58,13 +58,21 @@ for ((i=1; i<=VMs; i++)); do
fqdn: vm$i
users:
- name: root
shell: $BASH
- name: zfs
sudo: ALL=(ALL) NOPASSWD:ALL
shell: $BASH
ssh_authorized_keys:
- $PUBKEY
- name: root
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
- name: zfs
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh_authorized_keys:
- $PUBKEY
# Workaround for Alpine Linux.
lock_passwd: false
passwd: '*'
packages:
- sudo
- bash
growpart:
mode: auto
+51
View File
@@ -0,0 +1,51 @@
#!/usr/bin/env bash
######################################################################
# 6) Test if Lustre can still build against ZFS
######################################################################
set -e
# Build from the latest Lustre tag rather than the master branch. We do this
# under the assumption that master is going to have a lot of churn thus will be
# more prone to breaking the build than a point release. We don't want ZFS
# PR's reporting bad test results simply because upstream Lustre accidentally
# broke their build.
#
# Skip any RC tags, or any tags where the last version digit is 50 or more.
# Versions with 50 or more are development versions of Lustre.
repo=https://github.com/lustre/lustre-release.git
tag="$(git ls-remote --refs --exit-code --sort=version:refname --tags $repo | \
awk -F '_' '/-RC/{next}; /refs\/tags\/v/{if ($NF < 50){print}}' | \
tail -n 1 | sed 's/.*\///')"
echo "Cloning Lustre tag $tag"
git clone --depth 1 --branch "$tag" "$repo"
cd lustre-release
# Include Lustre patches to build against master/zfs-2.4.x. Once these
# patches are merged we can remove these lines.
patches=('https://review.whamcloud.com/changes/fs%2Flustre-release~62101/revisions/2/patch?download'
'https://review.whamcloud.com/changes/fs%2Flustre-release~63267/revisions/9/patch?download')
for p in "${patches[@]}" ; do
curl $p | base64 -d > patch
patch -p1 < patch || true
done
echo "Configure Lustre"
./autogen.sh
# EL 9 needs '--disable-gss-keyring'
./configure --with-zfs --disable-gss-keyring
echo "Building Lustre RPMs"
make rpms
ls *.rpm
# There's only a handful of Lustre RPMs we actually need to install
lustrerpms="$(ls *.rpm | grep -E 'kmod-lustre-osd-zfs-[0-9]|kmod-lustre-[0-9]|lustre-osd-zfs-mount-[0-9]')"
echo "Installing: $lustrerpms"
sudo dnf -y install $lustrerpms
sudo modprobe -v lustre
# Should see some Lustre lines in dmesg
sudo dmesg | grep -Ei 'lnet|lustre'
+122 -10
View File
@@ -4,7 +4,10 @@
# 6) load openzfs module and run the tests
#
# called on runner: qemu-6-tests.sh
# called on qemu-vm: qemu-6-tests.sh $OS $2/$3
# called on qemu-vm: qemu-6-tests.sh $OS $2 $3 [--lustre|--builtin] [quick|default]
#
# --lustre: Test build lustre in addition to the normal tests
# --builtin: Test build ZFS as a kernel built-in in addition to the normal tests
######################################################################
set -eu
@@ -38,6 +41,54 @@ function prefix() {
fi
}
function do_lustre_build() {
local rc=0
$HOME/zfs/.github/workflows/scripts/qemu-6-lustre-tests-vm.sh &> /var/tmp/lustre.txt || rc=$?
echo "$rc" > /var/tmp/lustre-exitcode.txt
if [ "$rc" != "0" ] ; then
echo "$rc" > /var/tmp/tests-exitcode.txt
fi
}
export -f do_lustre_build
# Test build ZFS into the kernel directly
function do_builtin_build() {
local rc=0
# Get currently full kernel version (like '6.18.8')
fullver=$(uname -r | grep -Eo '^[0-9]+\.[0-9]+\.[0-9]+')
# Get just the major ('6')
major=$(echo $fullver | grep -Eo '^[0-9]+')
(
set -e
wget https://cdn.kernel.org/pub/linux/kernel/v${major}.x/linux-$fullver.tar.xz
tar -xf $HOME/linux-$fullver.tar.xz
cd $HOME/linux-$fullver
make tinyconfig
./scripts/config --enable EFI_PARTITON
./scripts/config --enable BLOCK
# BTRFS_FS is easiest config option to enable CONFIG_ZLIB_INFLATE|DEFLATE
./scripts/config --enable BTRFS_FS
yes "" | make oldconfig
make prepare
cd $HOME/zfs
./configure --with-linux=$HOME/linux-$fullver --enable-linux-builtin --enable-debug
./copy-builtin $HOME/linux-$fullver
cd $HOME/linux-$fullver
./scripts/config --enable ZFS
yes "" | make oldconfig
make -j `nproc`
) &> /var/tmp/builtin.txt || rc=$?
echo "$rc" > /var/tmp/builtin-exitcode.txt
if [ "$rc" != "0" ] ; then
echo "$rc" > /var/tmp/tests-exitcode.txt
fi
}
export -f do_builtin_build
# called directly on the runner
if [ -z ${1:-} ]; then
cd "/var/tmp"
@@ -49,8 +100,24 @@ if [ -z ${1:-} ]; then
for ((i=1; i<=VMs; i++)); do
IP="192.168.122.1$i"
# We do an additional test build of Lustre against ZFS if we're vm2
# on almalinux*. At the time of writing, the vm2 tests were
# completing roughly 15min before the vm1 tests, so it makes sense
# to have vm2 do the build.
#
# In addition, we do an additional test build of ZFS as a Linux
# kernel built-in on Fedora. Again, we do it on vm2 to exploit vm2's
# early finish time.
extra=""
if [[ "$OS" == almalinux* ]] && [[ "$i" == "2" ]] ; then
extra="--lustre"
elif [[ "$OS" == fedora* ]] && [[ "$i" == "2" ]] ; then
extra="--builtin"
fi
daemonize -c /var/tmp -p vm${i}.pid -o vm${i}log.txt -- \
$SSH zfs@$IP $TESTS $OS $i $VMs $CI_TYPE
$SSH zfs@$IP $TESTS $OS $i $VMs $extra $CI_TYPE
# handly line by line and add info prefix
stdbuf -oL tail -fq vm${i}log.txt \
| while read -r line; do prefix "$i" "$line"; done &
@@ -70,9 +137,35 @@ if [ -z ${1:-} ]; then
exit 0
fi
# this part runs inside qemu vm
#############################################
# Everything from here on runs inside qemu vm
#############################################
# Process cmd line args
OS="$1"
shift
NUM="$1"
shift
DEN="$1"
shift
BUILD_LUSTRE=0
BUILD_BUILTIN=0
if [ "$1" == "--lustre" ] ; then
BUILD_LUSTRE=1
shift
elif [ "$1" == "--builtin" ] ; then
BUILD_BUILTIN=1
shift
fi
if [ "$1" == "quick" ] ; then
export RUNFILES="sanity.run"
fi
export PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin"
case "$1" in
case "$OS" in
freebsd*)
TDIR="/usr/local/share/zfs"
sudo kldstat -n zfs 2>/dev/null && sudo kldunload zfs
@@ -95,23 +188,42 @@ case "$1" in
;;
esac
# enable io_uring on el9/el10
case "$1" in
# Distribution-specific settings.
case "$OS" in
almalinux9|almalinux10|centos-stream*)
# Enable io_uring on Enterprise Linux 9 and 10.
sudo sysctl kernel.io_uring_disabled=0 > /dev/null
;;
alpine*)
# Ensure `/etc/zfs/zpool.cache` exists.
sudo mkdir -p /etc/zfs
sudo touch /etc/zfs/zpool.cache
sudo chmod 644 /etc/zfs/zpool.cache
;;
esac
# Lustre calls a number of exported ZFS module symbols. To make sure we don't
# change the symbols and break Lustre, do a quick Lustre build of the latest
# released Lustre against ZFS.
#
# Note that we do the Lustre test build in parallel with ZTS. ZTS isn't very
# CPU intensive, so we can use idle CPU cycles "guilt free" for the build.
# The Lustre build on its own takes ~15min.
if [ "$BUILD_LUSTRE" == "1" ] ; then
do_lustre_build &
elif [ "$BUILD_BUILTIN" == "1" ] ; then
# Try building ZFS directly into the Linux kernel (not as a module)
do_builtin_build &
fi
# run functional testings and save exitcode
cd /var/tmp
TAGS=$2/$3
if [ "$4" == "quick" ]; then
export RUNFILES="sanity.run"
fi
TAGS=$NUM/$DEN
sudo dmesg -c > dmesg-prerun.txt
mount > mount.txt
df -h > df-prerun.txt
$TDIR/zfs-tests.sh -vKO -s 3GB -T $TAGS
RV=$?
df -h > df-postrun.txt
echo $RV > tests-exitcode.txt
@@ -31,6 +31,12 @@ EOF
rm -f tmp$$
}
function showfile_tail() {
echo "##[group]$2 (final lines)"
tail -n 80 $1
echo "##[endgroup]"
}
# overview
cat /tmp/summary.txt
echo ""
@@ -46,6 +52,32 @@ fi
echo -e "\nFull logs for download:\n $1\n"
for ((i=1; i<=VMs; i++)); do
# Print Lustre build test results (the build is only done on vm2)
if [ -f vm$i/lustre-exitcode.txt ] ; then
rv=$(< vm$i/lustre-exitcode.txt)
if [ $rv = 0 ]; then
vm="vm$i"
else
vm="vm$i"
touch /tmp/have_failed_tests
fi
file="vm$i/lustre.txt"
test -s "$file" && showfile_tail "$file" "$vm: Lustre build"
fi
if [ -f vm$i/builtin-exitcode.txt ] ; then
rv=$(< vm$i/builtin-exitcode.txt)
if [ $rv = 0 ]; then
vm="vm$i"
else
vm="vm$i"
touch /tmp/have_failed_tests
fi
file="vm$i/builtin.txt"
test -s "$file" && showfile_tail "$file" "$vm: Linux built-in build"
fi
rv=$(cat vm$i/tests-exitcode.txt)
if [ $rv = 0 ]; then
+33 -8
View File
@@ -4,7 +4,11 @@
#
# USAGE:
#
# ./qemu-test-repo-vm [URL]
# ./qemu-test-repo-vm [--install] [URL]
#
# --lookup: When testing a repo, only lookup the latest package versions,
# don't try to install them. Installing all of them takes over
# an hour, so this is much quicker.
#
# URL: URL to use instead of http://download.zfsonlinux.org
# If blank, use the default repo from zfs-release RPM.
@@ -15,6 +19,13 @@ source /etc/os-release
OS="$ID"
VERSION="$VERSION_ID"
LOOKUP=""
if [ -n "$1" ] && [ "$1" == "--lookup" ] ; then
LOOKUP=1
shift
fi
ALTHOST=""
if [ -n "$1" ] ; then
ALTHOST="$1"
@@ -42,7 +53,19 @@ function test_install {
sudo sed -i "s;baseurl=http://download.zfsonlinux.org;baseurl=$host;g" /etc/yum.repos.d/zfs.repo
fi
sudo dnf -y install $args zfs zfs-test
baseurl=$(grep -A 5 "\[$repo\]" /etc/yum.repos.d/zfs.repo | awk -F'=' '/baseurl=/{print $2; exit}')
# Just do a version lookup - don't try to install any RPMs
if [ "$LOOKUP" == "1" ] ; then
package="$(dnf list $args zfs | tail -n 1 | awk '{print $2}')"
echo "$repo ${package} $baseurl" >> $SUMMARY
return
fi
if ! sudo dnf -y install $args zfs zfs-test ; then
echo "$repo ${package}...[FAILED] $baseurl" >> $SUMMARY
return
fi
# Load modules and create a simple pool as a sanity test.
sudo /usr/share/zfs/zfs.sh -r
@@ -51,7 +74,6 @@ function test_install {
sudo zpool status
# Print out repo name, rpm installed (kmod or dkms), and repo URL
baseurl=$(grep -A 5 "\[$repo\]" /etc/yum.repos.d/zfs.repo | awk -F'=' '/baseurl=/{print $2; exit}')
package=$(sudo rpm -qa | grep zfs | grep -E 'kmod|dkms')
echo "$repo $package $baseurl" >> $SUMMARY
@@ -70,16 +92,19 @@ almalinux*)
name=$(curl -Ls $url | grep 'dnf install' | grep -Eo 'zfs-release-[0-9]+-[0-9]+')
sudo dnf -y install https://zfsonlinux.org/epel/$name$(rpm --eval "%{dist}").noarch.rpm 2>&1
sudo rpm -qi zfs-release
test_install zfs $ALTHOST
test_install zfs-kmod $ALTHOST
test_install zfs-testing $ALTHOST
test_install zfs-testing-kmod $ALTHOST
for i in zfs zfs-kmod zfs-testing zfs-testing-kmod zfs-latest \
zfs-latest-kmod zfs-legacy zfs-legacy-kmod zfs-2.2 \
zfs-2.2-kmod zfs-2.3 zfs-2.3-kmod zfs-2.4 zfs-2.4-kmod; do
test_install $i $ALTHOST
done
;;
fedora*)
url='https://raw.githubusercontent.com/openzfs/openzfs-docs/refs/heads/master/docs/Getting%20Started/Fedora/index.rst'
name=$(curl -Ls $url | grep 'dnf install' | grep -Eo 'zfs-release-[0-9]+-[0-9]+')
sudo dnf -y install -y https://zfsonlinux.org/fedora/$name$(rpm --eval "%{dist}").noarch.rpm
test_install zfs $ALTHOST
for i in zfs zfs-latest zfs-legacy zfs-2.2 zfs-2.3 zfs-2.4 ; do
test_install $i $ALTHOST
done
;;
esac
echo "##[endgroup]"
+52
View File
@@ -0,0 +1,52 @@
name: smatch
on:
push:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
smatch:
runs-on: ubuntu-24.04
steps:
- name: Checkout smatch
uses: actions/checkout@v4
with:
repository: error27/smatch
ref: master
path: smatch
- name: Install smatch dependencies
run: |
sudo apt-get install -y llvm gcc make sqlite3 libsqlite3-dev libdbd-sqlite3-perl libssl-dev libtry-tiny-perl
- name: Make smatch
run: |
cd $GITHUB_WORKSPACE/smatch
make -j$(nproc)
- name: Checkout OpenZFS
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
path: zfs
- name: Install OpenZFS dependencies
run: |
cd $GITHUB_WORKSPACE/zfs
sudo apt-get purge -y snapd google-chrome-stable firefox
ONLY_DEPS=1 .github/workflows/scripts/qemu-3-deps-vm.sh ubuntu24
- name: Autogen.sh OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
./autogen.sh
- name: Configure OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
./configure --enable-debug
- name: Make OpenZFS
run: |
cd $GITHUB_WORKSPACE/zfs
make -j$(nproc) CHECK="$GITHUB_WORKSPACE/smatch/smatch" CC=$GITHUB_WORKSPACE/smatch/cgcc | tee $GITHUB_WORKSPACE/smatch.log
- name: Smatch results log
run: |
grep -E 'error:|warn:|warning:' $GITHUB_WORKSPACE/smatch.log
+12 -6
View File
@@ -42,6 +42,12 @@ on:
required: false
default: ""
description: "(optional) repo URL (blank: use http://download.zfsonlinux.org)"
lookup:
type: boolean
required: false
default: false
description: "(optional) do version lookup only on repo test"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
@@ -60,20 +66,16 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
timeout-minutes: 10
run: .github/workflows/scripts/qemu-1-setup.sh
- name: Start build machine
timeout-minutes: 10
run: .github/workflows/scripts/qemu-2-start.sh ${{ matrix.os }}
- name: Install dependencies
timeout-minutes: 20
run: |
.github/workflows/scripts/qemu-3-deps.sh ${{ matrix.os }}
- name: Build modules or Test repo
timeout-minutes: 30
run: |
set -e
if [ "${{ github.event.inputs.test_type }}" == "Test repo" ] ; then
@@ -81,7 +83,12 @@ jobs:
.github/workflows/scripts/qemu-prepare-for-build.sh
mkdir -p /tmp/repo
ssh zfs@vm0 '$HOME/zfs/.github/workflows/scripts/qemu-test-repo-vm.sh' ${{ github.event.inputs.repo_url }}
EXTRA=""
if [ "${{ github.event.inputs.lookup }}" == 'true' ] ; then
EXTRA="--lookup"
fi
ssh zfs@vm0 '$HOME/zfs/.github/workflows/scripts/qemu-test-repo-vm.sh' $EXTRA ${{ github.event.inputs.repo_url }}
else
EXTRA=""
if [ -n "${{ github.event.inputs.patch_level }}" ] ; then
@@ -94,7 +101,6 @@ jobs:
- name: Prepare artifacts
if: always()
timeout-minutes: 10
run: |
rsync -a zfs@vm0:/tmp/repo /tmp || true
.github/workflows/scripts/replace-dupes-with-symlinks.sh /tmp/repo
+11 -7
View File
@@ -10,6 +10,11 @@ on:
required: false
default: ""
description: "(optional) Experimental kernel version to install on Fedora (like '6.14' or '6.13.3-0.rc3')"
specific_os:
type: string
required: false
default: ""
description: "(optional) Only run on this specific OS (like 'fedora42' or 'alpine3-23')"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -58,6 +63,9 @@ jobs:
# They specified a custom kernel version for Fedora.
# Use only Fedora runners.
os_json=$(echo ${os_selection} | jq -c '[.[] | select(startswith("fedora"))]')
elif ${{ github.event.inputs.specific_os != '' }}; then
# Use only the specified runner.
os_json=$(jq -cn --arg os "${{ github.event.inputs.specific_os }}" '[ $os ]')
else
# Normal case
os_json=$(echo ${os_selection} | jq -c)
@@ -87,19 +95,15 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
- name: Setup QEMU
timeout-minutes: 20
run: |
# Add a timestamp to each line to debug timeouts
while IFS=$'\n' read -r line; do
echo "$(date +'%H:%M:%S') $line"
done < <(.github/workflows/scripts/qemu-1-setup.sh)
timeout-minutes: 60
run: .github/workflows/scripts/qemu-1-setup.sh
- name: Start build machine
timeout-minutes: 10
run: .github/workflows/scripts/qemu-2-start.sh ${{ matrix.os }}
- name: Install dependencies
timeout-minutes: 20
timeout-minutes: 60
run: .github/workflows/scripts/qemu-3-deps.sh ${{ matrix.os }} ${{ github.event.inputs.fedora_kernel_ver }}
- name: Build modules
+2 -2
View File
@@ -1,10 +1,10 @@
Meta: 1
Name: zfs
Branch: 1.0
Version: 2.3.5
Version: 2.3.6
Release: 1
Release-Tags: relext
License: CDDL
Author: OpenZFS
Linux-Maximum: 6.17
Linux-Maximum: 6.19
Linux-Minimum: 4.18
+19 -4
View File
@@ -264,9 +264,21 @@ cmp_data(raidz_test_opts_t *opts, raidz_map_t *rm)
static int
init_rand(void *data, size_t size, void *private)
{
size_t *offsetp = (size_t *)private;
size_t offset = *offsetp;
VERIFY3U(offset + size, <=, SPA_MAXBLOCKSIZE);
memcpy(data, (char *)rand_data + offset, size);
*offsetp = offset + size;
return (0);
}
static int
corrupt_rand_fill(void *data, size_t size, void *private)
{
(void) private;
memcpy(data, rand_data, size);
memset(data, 0xAA, size);
return (0);
}
@@ -278,7 +290,7 @@ corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
for (int i = 0; i < cnt; i++) {
raidz_col_t *col = &rr->rr_col[tgts[i]];
abd_iterate_func(col->rc_abd, 0, col->rc_size,
init_rand, NULL);
corrupt_rand_fill, NULL);
}
}
}
@@ -286,7 +298,8 @@ corrupt_colums(raidz_map_t *rm, const int *tgts, const int cnt)
void
init_zio_abd(zio_t *zio)
{
abd_iterate_func(zio->io_abd, 0, zio->io_size, init_rand, NULL);
size_t offset = 0;
abd_iterate_func(zio->io_abd, 0, zio->io_size, init_rand, &offset);
}
static void
@@ -373,7 +386,7 @@ init_raidz_map(raidz_test_opts_t *opts, zio_t **zio, const int parity)
*zio = umem_zalloc(sizeof (zio_t), UMEM_NOFAIL);
(*zio)->io_offset = 0;
(*zio)->io_offset = opts->rto_offset;
(*zio)->io_size = alloc_dsize;
(*zio)->io_abd = raidz_alloc(alloc_dsize);
init_zio_abd(*zio);
@@ -834,6 +847,8 @@ main(int argc, char **argv)
err = run_test(NULL);
}
mprotect(rand_data, SPA_MAXBLOCKSIZE, PROT_READ | PROT_WRITE);
umem_free(rand_data, SPA_MAXBLOCKSIZE);
kernel_fini();
+1 -1
View File
@@ -72,7 +72,7 @@ typedef struct raidz_test_opts {
static const raidz_test_opts_t rto_opts_defaults = {
.rto_ashift = 9,
.rto_offset = 1ULL << 0,
.rto_offset = 0,
.rto_dcols = 8,
.rto_dsize = 1<<19,
.rto_v = D_ALL,
+39 -6
View File
@@ -107,7 +107,9 @@ extern uint_t zfs_reconstruct_indirect_combinations_max;
extern uint_t zfs_btree_verify_intensity;
static const char cmdname[] = "zdb";
uint8_t dump_opt[256];
uint8_t dump_opt[512];
#define ALLOCATED_OPT 256
typedef void object_viewer_t(objset_t *, uint64_t, void *data, size_t size);
@@ -1651,6 +1653,16 @@ dump_metaslab_stats(metaslab_t *msp)
dump_histogram(rt->rt_histogram, ZFS_RANGE_TREE_HISTOGRAM_SIZE, 0);
}
static void
dump_allocated(void *arg, uint64_t start, uint64_t size)
{
uint64_t *off = arg;
if (*off != start)
(void) printf("ALLOC: %"PRIu64" %"PRIu64"\n", *off,
start - *off);
*off = start + size;
}
static void
dump_metaslab(metaslab_t *msp)
{
@@ -1667,13 +1679,24 @@ dump_metaslab(metaslab_t *msp)
(u_longlong_t)msp->ms_id, (u_longlong_t)msp->ms_start,
(u_longlong_t)space_map_object(sm), freebuf);
if (dump_opt['m'] > 2 && !dump_opt['L']) {
if (dump_opt[ALLOCATED_OPT] ||
(dump_opt['m'] > 2 && !dump_opt['L'])) {
mutex_enter(&msp->ms_lock);
VERIFY0(metaslab_load(msp));
}
if (dump_opt['m'] > 2 && !dump_opt['L']) {
zfs_range_tree_stat_verify(msp->ms_allocatable);
dump_metaslab_stats(msp);
metaslab_unload(msp);
mutex_exit(&msp->ms_lock);
}
if (dump_opt[ALLOCATED_OPT]) {
uint64_t off = msp->ms_start;
zfs_range_tree_walk(msp->ms_allocatable, dump_allocated,
&off);
if (off != msp->ms_start + msp->ms_size)
(void) printf("ALLOC: %"PRIu64" %"PRIu64"\n", off,
msp->ms_size - off);
}
if (dump_opt['m'] > 1 && sm != NULL &&
@@ -1688,6 +1711,12 @@ dump_metaslab(metaslab_t *msp)
SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
}
if (dump_opt[ALLOCATED_OPT] ||
(dump_opt['m'] > 2 && !dump_opt['L'])) {
metaslab_unload(msp);
mutex_exit(&msp->ms_lock);
}
if (vd->vdev_ops == &vdev_draid_ops)
ASSERT3U(msp->ms_size, <=, 1ULL << vd->vdev_ms_shift);
else
@@ -1724,8 +1753,9 @@ print_vdev_metaslab_header(vdev_t *vd)
}
}
(void) printf("\tvdev %10llu %s",
(u_longlong_t)vd->vdev_id, bias_str);
(void) printf("\tvdev %10llu\t%s metaslab shift %4llu",
(u_longlong_t)vd->vdev_id, bias_str,
(u_longlong_t)vd->vdev_ms_shift);
if (ms_flush_data_obj != 0) {
(void) printf(" ms_unflushed_phys object %llu",
@@ -9318,6 +9348,8 @@ main(int argc, char **argv)
{"all-reconstruction", no_argument, NULL, 'Y'},
{"livelist", no_argument, NULL, 'y'},
{"zstd-headers", no_argument, NULL, 'Z'},
{"allocated-map", no_argument, NULL,
ALLOCATED_OPT},
{0, 0, 0, 0}
};
@@ -9348,6 +9380,7 @@ main(int argc, char **argv)
case 'u':
case 'y':
case 'Z':
case ALLOCATED_OPT:
dump_opt[c]++;
dump_all = 0;
break;
+1 -1
View File
@@ -29,6 +29,6 @@
#define _ZDB_H
void dump_intent_log(zilog_t *);
extern uint8_t dump_opt[256];
extern uint8_t dump_opt[512];
#endif /* _ZDB_H */
-2
View File
@@ -48,8 +48,6 @@
#include "zdb.h"
extern uint8_t dump_opt[256];
static char tab_prefix[4] = "\t\t\t";
static void
+344 -6
View File
@@ -52,12 +52,15 @@
#include <sys/zio_compress.h>
#include <sys/zfeature.h>
#include <sys/dmu_tx.h>
#include <sys/backtrace.h>
#include <zfeature_common.h>
#include <libzutil.h>
#include <sys/metaslab_impl.h>
static importargs_t g_importargs;
static char *g_pool;
static boolean_t g_readonly;
static boolean_t g_dump_dbgmsg;
typedef enum {
ZHACK_REPAIR_OP_UNKNOWN = 0,
@@ -69,11 +72,23 @@ static __attribute__((noreturn)) void
usage(void)
{
(void) fprintf(stderr,
"Usage: zhack [-c cachefile] [-d dir] <subcommand> <args> ...\n"
"where <subcommand> <args> is one of the following:\n"
"Usage: zhack [-o tunable] [-c cachefile] [-d dir] [-G] "
"<subcommand> <args> ...\n"
" where <subcommand> <args> is one of the following:\n"
"\n");
(void) fprintf(stderr,
" global options:\n"
" -c <cachefile> reads config from the given cachefile\n"
" -d <dir> directory with vdevs for import\n"
" -o var=value... set global variable to an unsigned "
"32-bit integer\n"
" -G dump zfs_dbgmsg buffer before exiting\n"
"\n"
" action idle <pool> [-f] [-t seconds]\n"
" import the pool for a set time then export it\n"
" -t <seconds> sets the time the pool is imported\n"
"\n"
" feature stat <pool>\n"
" print information about enabled features\n"
" feature enable [-r] [-d desc] <pool> <feature>\n"
@@ -93,10 +108,46 @@ usage(void)
" -c repair corrupted label checksums\n"
" -u restore the label on a detached device\n"
"\n"
" <device> : path to vdev\n");
" <device> : path to vdev\n"
"\n"
" metaslab leak <pool>\n"
" apply allocation map from zdb to specified pool\n");
exit(1);
}
static void
dump_debug_buffer(void)
{
ssize_t ret __attribute__((unused));
if (!g_dump_dbgmsg)
return;
/*
* We use write() instead of printf() so that this function
* is safe to call from a signal handler.
*/
ret = write(STDERR_FILENO, "\n", 1);
zfs_dbgmsg_print(STDERR_FILENO, "zhack");
}
static void sig_handler(int signo)
{
struct sigaction action;
libspl_backtrace(STDERR_FILENO);
dump_debug_buffer();
/*
* Restore default action and re-raise signal so SIGSEGV and
* SIGABRT can trigger a core dump.
*/
action.sa_handler = SIG_DFL;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
(void) sigaction(signo, &action, NULL);
raise(signo);
}
static __attribute__((format(printf, 3, 4))) __attribute__((noreturn)) void
fatal(spa_t *spa, const void *tag, const char *fmt, ...)
@@ -114,6 +165,8 @@ fatal(spa_t *spa, const void *tag, const char *fmt, ...)
va_end(ap);
(void) fputc('\n', stderr);
dump_debug_buffer();
exit(1);
}
@@ -169,7 +222,7 @@ zhack_import(char *target, boolean_t readonly)
zfeature_checks_disable = B_TRUE;
error = spa_import(target, config, props,
(readonly ? ZFS_IMPORT_SKIP_MMP : ZFS_IMPORT_NORMAL));
(readonly ? ZFS_IMPORT_SKIP_MMP : ZFS_IMPORT_NORMAL));
fnvlist_free(config);
zfeature_checks_disable = B_FALSE;
if (error == EEXIST)
@@ -500,6 +553,259 @@ zhack_do_feature(int argc, char **argv)
return (0);
}
static void
zhack_do_action_idle(int argc, char **argv)
{
spa_t *spa;
char *target, *tmp;
int idle_time = 0;
int c;
optind = 1;
while ((c = getopt(argc, argv, "+t:")) != -1) {
switch (c) {
case 't':
idle_time = strtol(optarg, &tmp, 0);
if (*tmp) {
(void) fprintf(stderr, "error: time must "
"be an integer in seconds: %s\n", tmp);
usage();
}
if (idle_time < 0) {
(void) fprintf(stderr, "error: time must "
"not be negative: %d\n", idle_time);
usage();
}
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc < 1) {
(void) fprintf(stderr, "error: missing pool name\n");
usage();
}
target = argv[0];
zhack_spa_open(target, B_FALSE, FTAG, &spa);
fprintf(stdout, "Imported pool %s, idle for %d seconds\n",
target, idle_time);
sleep(idle_time);
spa_close(spa, FTAG);
}
static int
zhack_do_action(int argc, char **argv)
{
char *subcommand;
argc--;
argv++;
if (argc == 0) {
(void) fprintf(stderr,
"error: no import operation specified\n");
usage();
}
subcommand = argv[0];
if (strcmp(subcommand, "idle") == 0) {
zhack_do_action_idle(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
usage();
}
return (0);
}
static boolean_t
strstarts(const char *a, const char *b)
{
return (strncmp(a, b, strlen(b)) == 0);
}
static void
metaslab_force_alloc(metaslab_t *msp, uint64_t start, uint64_t size,
dmu_tx_t *tx)
{
ASSERT(msp->ms_disabled);
ASSERT(MUTEX_HELD(&msp->ms_lock));
uint64_t txg = dmu_tx_get_txg(tx);
uint64_t off = start;
while (off < start + size) {
uint64_t ostart, osize;
boolean_t found = zfs_range_tree_find_in(msp->ms_allocatable,
off, start + size - off, &ostart, &osize);
if (!found)
break;
zfs_range_tree_remove(msp->ms_allocatable, ostart, osize);
if (zfs_range_tree_is_empty(msp->ms_allocating[txg & TXG_MASK]))
vdev_dirty(msp->ms_group->mg_vd, VDD_METASLAB, msp,
txg);
zfs_range_tree_add(msp->ms_allocating[txg & TXG_MASK], ostart,
osize);
msp->ms_allocating_total += osize;
off = ostart + osize;
}
}
static void
zhack_do_metaslab_leak(int argc, char **argv)
{
int c;
char *target;
spa_t *spa;
optind = 1;
boolean_t force = B_FALSE;
while ((c = getopt(argc, argv, "f")) != -1) {
switch (c) {
case 'f':
force = B_TRUE;
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc < 1) {
(void) fprintf(stderr, "error: missing pool name\n");
usage();
}
target = argv[0];
zhack_spa_open(target, B_FALSE, FTAG, &spa);
spa_config_enter(spa, SCL_VDEV | SCL_ALLOC, FTAG, RW_READER);
char *line = NULL;
size_t cap = 0;
vdev_t *vd = NULL;
metaslab_t *prev = NULL;
dmu_tx_t *tx = NULL;
while (getline(&line, &cap, stdin) > 0) {
if (strstarts(line, "\tvdev ")) {
uint64_t vdev_id, ms_shift;
if (sscanf(line,
"\tvdev %10"PRIu64"\t%*s metaslab shift %4"PRIu64,
&vdev_id, &ms_shift) == 1) {
VERIFY3U(sscanf(line, "\tvdev %"PRIu64
"\t metaslab shift %4"PRIu64,
&vdev_id, &ms_shift), ==, 2);
}
vd = vdev_lookup_top(spa, vdev_id);
if (vd == NULL) {
fprintf(stderr, "error: no such vdev with "
"id %"PRIu64"\n", vdev_id);
break;
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (vd->vdev_ms_shift != ms_shift) {
fprintf(stderr, "error: ms_shift mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_shift,
ms_shift);
break;
}
} else if (strstarts(line, "\tmetaslabs ")) {
uint64_t ms_count;
VERIFY3U(sscanf(line, "\tmetaslabs %"PRIu64, &ms_count),
==, 1);
ASSERT(vd);
if (!force && vd->vdev_ms_count != ms_count) {
fprintf(stderr, "error: ms_count mismatch: %"
PRIu64" != %"PRIu64"\n", vd->vdev_ms_count,
ms_count);
break;
}
} else if (strstarts(line, "ALLOC:")) {
uint64_t start, size;
VERIFY3U(sscanf(line, "ALLOC: %"PRIu64" %"PRIu64"\n",
&start, &size), ==, 2);
ASSERT(vd);
metaslab_t *cur =
vd->vdev_ms[start >> vd->vdev_ms_shift];
if (prev != cur) {
if (prev) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
}
ASSERT(cur);
metaslab_disable(cur);
mutex_enter(&cur->ms_lock);
metaslab_load(cur);
prev = cur;
tx = dmu_tx_create_dd(
spa_get_dsl(vd->vdev_spa)->dp_root_dir);
dmu_tx_assign(tx, DMU_TX_WAIT);
}
metaslab_force_alloc(cur, start, size, tx);
} else {
continue;
}
}
if (tx) {
dmu_tx_commit(tx);
mutex_exit(&prev->ms_lock);
metaslab_enable(prev, B_FALSE, B_FALSE);
tx = NULL;
prev = NULL;
}
if (line)
free(line);
spa_config_exit(spa, SCL_VDEV | SCL_ALLOC, FTAG);
spa_close(spa, FTAG);
}
static int
zhack_do_metaslab(int argc, char **argv)
{
char *subcommand;
argc--;
argv++;
if (argc == 0) {
(void) fprintf(stderr,
"error: no metaslab operation specified\n");
usage();
}
subcommand = argv[0];
if (strcmp(subcommand, "leak") == 0) {
zhack_do_metaslab_leak(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
usage();
}
return (0);
}
#define ASHIFT_UBERBLOCK_SHIFT(ashift) \
MIN(MAX(ashift, UBERBLOCK_SHIFT), \
MAX_UBERBLOCK_SHIFT)
@@ -975,17 +1281,35 @@ zhack_do_label(int argc, char **argv)
int
main(int argc, char **argv)
{
struct sigaction action;
char *path[MAX_NUM_PATHS];
const char *subcommand;
int rv = 0;
int c;
/*
* Set up signal handlers, so if we crash due to bad on-disk data we
* can get more info. Unlike ztest, we don't bail out if we can't set
* up signal handlers, because zhack is very useful without them.
*/
action.sa_handler = sig_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGSEGV, &action, NULL) < 0) {
(void) fprintf(stderr, "zhack: cannot catch SIGSEGV: %s\n",
strerror(errno));
}
if (sigaction(SIGABRT, &action, NULL) < 0) {
(void) fprintf(stderr, "zhack: cannot catch SIGABRT: %s\n",
strerror(errno));
}
g_importargs.path = path;
dprintf_setup(&argc, argv);
zfs_prop_init();
while ((c = getopt(argc, argv, "+c:d:")) != -1) {
while ((c = getopt(argc, argv, "+c:d:Go:")) != -1) {
switch (c) {
case 'c':
g_importargs.cachefile = optarg;
@@ -994,6 +1318,13 @@ main(int argc, char **argv)
assert(g_importargs.paths < MAX_NUM_PATHS);
g_importargs.path[g_importargs.paths++] = optarg;
break;
case 'G':
g_dump_dbgmsg = B_TRUE;
break;
case 'o':
if (set_global_var(optarg) != 0)
exit(1);
break;
default:
usage();
break;
@@ -1011,10 +1342,14 @@ main(int argc, char **argv)
subcommand = argv[0];
if (strcmp(subcommand, "feature") == 0) {
if (strcmp(subcommand, "action") == 0) {
rv = zhack_do_action(argc, argv);
} else if (strcmp(subcommand, "feature") == 0) {
rv = zhack_do_feature(argc, argv);
} else if (strcmp(subcommand, "label") == 0) {
return (zhack_do_label(argc, argv));
} else if (strcmp(subcommand, "metaslab") == 0) {
rv = zhack_do_metaslab(argc, argv);
} else {
(void) fprintf(stderr, "error: unknown subcommand: %s\n",
subcommand);
@@ -1026,6 +1361,9 @@ main(int argc, char **argv)
"changes may not be committed to disk\n");
}
if (g_dump_dbgmsg)
dump_debug_buffer();
kernel_fini();
return (rv);
+3
View File
@@ -3883,6 +3883,9 @@ do_import(nvlist_t *config, const char *newname, const char *mntopts,
hostid, ctime(&timestamp));
}
if (getenv("ZFS_LOAD_INFO_DEBUG"))
dump_nvlist(nvinfo, 4);
return (1);
}
+33 -31
View File
@@ -270,14 +270,13 @@ is_spare(nvlist_t *config, const char *path)
* draid* Virtual dRAID spare
*/
static nvlist_t *
make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
make_leaf_vdev(const char *arg, boolean_t is_primary, uint64_t ashift)
{
char path[MAXPATHLEN];
struct stat64 statbuf;
nvlist_t *vdev = NULL;
const char *type = NULL;
boolean_t wholedisk = B_FALSE;
uint64_t ashift = 0;
int err;
/*
@@ -381,31 +380,6 @@ make_leaf_vdev(nvlist_t *props, const char *arg, boolean_t is_primary)
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK,
(uint64_t)wholedisk) == 0);
/*
* Override defaults if custom properties are provided.
*/
if (props != NULL) {
const char *value = NULL;
if (nvlist_lookup_string(props,
zpool_prop_to_name(ZPOOL_PROP_ASHIFT), &value) == 0) {
if (zfs_nicestrtonum(NULL, value, &ashift) != 0) {
(void) fprintf(stderr,
gettext("ashift must be a number.\n"));
return (NULL);
}
if (ashift != 0 &&
(ashift < ASHIFT_MIN || ashift > ASHIFT_MAX)) {
(void) fprintf(stderr,
gettext("invalid 'ashift=%" PRIu64 "' "
"property: only values between %" PRId32 " "
"and %" PRId32 " are allowed.\n"),
ashift, ASHIFT_MIN, ASHIFT_MAX);
return (NULL);
}
}
}
/*
* If the device is known to incorrectly report its physical sector
* size explicitly provide the known correct value.
@@ -1502,6 +1476,29 @@ construct_spec(nvlist_t *props, int argc, char **argv)
const char *type, *fulltype;
boolean_t is_log, is_special, is_dedup, is_spare;
boolean_t seen_logs;
uint64_t ashift = 0;
if (props != NULL) {
const char *value = NULL;
if (nvlist_lookup_string(props,
zpool_prop_to_name(ZPOOL_PROP_ASHIFT), &value) == 0) {
if (zfs_nicestrtonum(NULL, value, &ashift) != 0) {
(void) fprintf(stderr,
gettext("ashift must be a number.\n"));
return (NULL);
}
if (ashift != 0 &&
(ashift < ASHIFT_MIN || ashift > ASHIFT_MAX)) {
(void) fprintf(stderr,
gettext("invalid 'ashift=%" PRIu64 "' "
"property: only values between %" PRId32 " "
"and %" PRId32 " are allowed.\n"),
ashift, ASHIFT_MIN, ASHIFT_MAX);
return (NULL);
}
}
}
top = NULL;
toplevels = 0;
@@ -1608,9 +1605,9 @@ construct_spec(nvlist_t *props, int argc, char **argv)
children * sizeof (nvlist_t *));
if (child == NULL)
zpool_no_memory();
if ((nv = make_leaf_vdev(props, argv[c],
if ((nv = make_leaf_vdev(argv[c],
!(is_log || is_special || is_dedup ||
is_spare))) == NULL) {
is_spare), ashift)) == NULL) {
for (c = 0; c < children - 1; c++)
nvlist_free(child[c]);
free(child);
@@ -1674,6 +1671,10 @@ construct_spec(nvlist_t *props, int argc, char **argv)
ZPOOL_CONFIG_ALLOCATION_BIAS,
VDEV_ALLOC_BIAS_DEDUP) == 0);
}
if (ashift > 0) {
fnvlist_add_uint64(nv,
ZPOOL_CONFIG_ASHIFT, ashift);
}
if (strcmp(type, VDEV_TYPE_RAIDZ) == 0) {
verify(nvlist_add_uint64(nv,
ZPOOL_CONFIG_NPARITY,
@@ -1701,8 +1702,9 @@ construct_spec(nvlist_t *props, int argc, char **argv)
* We have a device. Pass off to make_leaf_vdev() to
* construct the appropriate nvlist describing the vdev.
*/
if ((nv = make_leaf_vdev(props, argv[0], !(is_log ||
is_special || is_dedup || is_spare))) == NULL)
if ((nv = make_leaf_vdev(argv[0], !(is_log ||
is_special || is_dedup || is_spare),
ashift)) == NULL)
goto spec_out;
verify(nvlist_add_uint64(nv,
+3 -6
View File
@@ -29,9 +29,8 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_GET_BY_PATH_4ARG], [
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
bdev = blkdev_get_by_path(path, mode, holder, &h);
bdev = blkdev_get_by_path(path, mode, holder, NULL);
])
])
@@ -48,9 +47,8 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BLKDEV_BDEV_OPEN_BY_PATH], [
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
bdh = bdev_open_by_path(path, mode, holder, &h);
bdh = bdev_open_by_path(path, mode, holder, NULL);
])
])
@@ -68,9 +66,8 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_BDEV_FILE_OPEN_BY_PATH], [
const char *path = "path";
fmode_t mode = 0;
void *holder = NULL;
struct blk_holder_ops h;
file = bdev_file_open_by_path(path, mode, holder, &h);
file = bdev_file_open_by_path(path, mode, holder, NULL);
])
])
+34
View File
@@ -119,15 +119,49 @@ AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK], [
])
])
dnl #
dnl # 6.18 API change
dnl # block_device_operation->getgeo takes struct gendisk* as first arg
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK], [
ZFS_LINUX_TEST_SRC([block_device_operations_getgeo_gendisk], [
#include <linux/blkdev.h>
static int blk_getgeo(struct gendisk *disk, struct hd_geometry *geo)
{
(void) disk, (void) geo;
return (0);
}
static const struct block_device_operations
bops __attribute__ ((unused)) = {
.getgeo = blk_getgeo,
};
], [], [])
])
AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK], [
AC_MSG_CHECKING([whether bops->getgeo() takes gendisk as first arg])
ZFS_LINUX_TEST_RESULT([block_device_operations_getgeo_gendisk], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK], [1],
[Define if getgeo() in block_device_operations takes struct gendisk * as its first arg])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS], [
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_CHECK_EVENTS
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_RELEASE_1ARG
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK
ZFS_AC_KERNEL_SRC_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK
])
AC_DEFUN([ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS], [
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_CHECK_EVENTS
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_RELEASE_VOID
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_REVALIDATE_DISK
ZFS_AC_KERNEL_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK
])
+25 -2
View File
@@ -46,14 +46,37 @@ AC_DEFUN([ZFS_AC_KERNEL_D_SET_D_OP], [
])
])
dnl #
dnl # 6.17 API change
dnl # sb->s_d_op removed; set_default_d_op(sb, dop) added
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_SET_DEFAULT_D_OP], [
ZFS_LINUX_TEST_SRC([set_default_d_op], [
#include <linux/dcache.h>
], [
set_default_d_op(NULL, NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_SET_DEFAULT_D_OP], [
AC_MSG_CHECKING([whether set_default_d_op() is available])
ZFS_LINUX_TEST_RESULT([set_default_d_op], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_SET_DEFAULT_D_OP, 1,
[Define if set_default_d_op() is available])
], [
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_DENTRY], [
ZFS_AC_KERNEL_SRC_D_OBTAIN_ALIAS
ZFS_AC_KERNEL_SRC_D_SET_D_OP
ZFS_AC_KERNEL_SRC_S_D_OP
ZFS_AC_KERNEL_SRC_SET_DEFAULT_D_OP
])
AC_DEFUN([ZFS_AC_KERNEL_DENTRY], [
ZFS_AC_KERNEL_D_OBTAIN_ALIAS
ZFS_AC_KERNEL_D_SET_D_OP
ZFS_AC_KERNEL_S_D_OP
ZFS_AC_KERNEL_SET_DEFAULT_D_OP
])
+24
View File
@@ -0,0 +1,24 @@
dnl #
dnl # 6.18 API change
dnl # - generic_drop_inode() renamed to inode_generic_drop()
dnl # - generic_delete_inode() renamed to inode_just_drop()
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_GENERIC_DROP], [
ZFS_LINUX_TEST_SRC([inode_generic_drop], [
#include <linux/fs.h>
],[
struct inode *ip = NULL;
inode_generic_drop(ip);
])
])
AC_DEFUN([ZFS_AC_KERNEL_INODE_GENERIC_DROP], [
AC_MSG_CHECKING([whether inode_generic_drop() exists])
ZFS_LINUX_TEST_RESULT([inode_generic_drop], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_GENERIC_DROP, 1,
[inode_generic_drop() exists])
],[
AC_MSG_RESULT(no)
])
])
+23
View File
@@ -0,0 +1,23 @@
dnl #
dnl # 6.19 API change. inode->i_state no longer accessible directly; helper
dnl # functions exist.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_INODE_STATE_READ_ONCE], [
ZFS_LINUX_TEST_SRC([inode_state_read_once], [
#include <linux/fs.h>
], [
struct inode i = {};
inode_state_read_once(&i);
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_INODE_STATE_READ_ONCE], [
AC_MSG_CHECKING([whether inode_state_read_once() exists])
ZFS_LINUX_TEST_RESULT([inode_state_read_once], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_INODE_STATE_READ_ONCE, 1,
[inode_state_read_once() exists])
],[
AC_MSG_RESULT(no)
])
])
+1 -1
View File
@@ -7,7 +7,7 @@ AC_DEFUN([ZFS_AC_KERNEL_SRC_KMAP_ATOMIC_ARGS], [
ZFS_LINUX_TEST_SRC([kmap_atomic], [
#include <linux/pagemap.h>
],[
struct page page;
struct page page = {};
kmap_atomic(&page);
])
])
+27
View File
@@ -16,9 +16,36 @@ AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_FLAG_ERROR], [
])
])
dnl #
dnl # Linux 6.18+ uses a struct typedef (memdesc_flags_t) instead of an
dnl # 'unsigned long' for the 'flags' field in 'struct page'.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_FLAGS_STRUCT], [
ZFS_LINUX_TEST_SRC([mm_page_flags_struct], [
#include <linux/mm.h>
static const struct page p __attribute__ ((unused)) = {
.flags = { .f = 0 }
};
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_FLAGS_STRUCT], [
AC_MSG_CHECKING([whether 'flags' in 'struct page' is a struct])
ZFS_LINUX_TEST_RESULT([mm_page_flags_struct], [
AC_MSG_RESULT([yes])
AC_DEFINE(HAVE_MM_PAGE_FLAGS_STRUCT, 1,
['flags' in 'struct page' is a struct])
],[
AC_MSG_RESULT([no])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_MM_PAGE_FLAGS], [
ZFS_AC_KERNEL_SRC_MM_PAGE_FLAG_ERROR
ZFS_AC_KERNEL_SRC_MM_PAGE_FLAGS_STRUCT
])
AC_DEFUN([ZFS_AC_KERNEL_MM_PAGE_FLAGS], [
ZFS_AC_KERNEL_MM_PAGE_FLAG_ERROR
ZFS_AC_KERNEL_MM_PAGE_FLAGS_STRUCT
])
+31
View File
@@ -0,0 +1,31 @@
dnl #
dnl # 6.18 API change
dnl # ns->ops->type was moved to ns->ns.ns_type (struct ns_common)
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_NS_COMMON_TYPE], [
ZFS_LINUX_TEST_SRC([ns_common_type], [
#include <linux/user_namespace.h>
],[
struct user_namespace ns;
ns.ns.ns_type = 0;
])
])
AC_DEFUN([ZFS_AC_KERNEL_NS_COMMON_TYPE], [
AC_MSG_CHECKING([whether ns_type is accessible through ns_common])
ZFS_LINUX_TEST_RESULT([ns_common_type], [
AC_MSG_RESULT(yes)
AC_DEFINE([HAVE_NS_COMMON_TYPE], 1,
[Define if ns_type is accessible through ns_common])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_NAMESPACE], [
ZFS_AC_KERNEL_SRC_NS_COMMON_TYPE
])
AC_DEFUN([ZFS_AC_KERNEL_NAMESPACE], [
ZFS_AC_KERNEL_NS_COMMON_TYPE
])
-79
View File
@@ -1,79 +0,0 @@
dnl #
dnl # 2.6.38 API change
dnl # ns_capable() was introduced
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_NS_CAPABLE], [
ZFS_LINUX_TEST_SRC([ns_capable], [
#include <linux/capability.h>
],[
ns_capable((struct user_namespace *)NULL, CAP_SYS_ADMIN);
])
])
AC_DEFUN([ZFS_AC_KERNEL_NS_CAPABLE], [
AC_MSG_CHECKING([whether ns_capable exists])
ZFS_LINUX_TEST_RESULT([ns_capable], [
AC_MSG_RESULT(yes)
],[
ZFS_LINUX_TEST_ERROR([ns_capable()])
])
])
dnl #
dnl # 2.6.39 API change
dnl # struct user_namespace was added to struct cred_t as cred->user_ns member
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_CRED_USER_NS], [
ZFS_LINUX_TEST_SRC([cred_user_ns], [
#include <linux/cred.h>
],[
struct cred cr;
cr.user_ns = (struct user_namespace *)NULL;
])
])
AC_DEFUN([ZFS_AC_KERNEL_CRED_USER_NS], [
AC_MSG_CHECKING([whether cred_t->user_ns exists])
ZFS_LINUX_TEST_RESULT([cred_user_ns], [
AC_MSG_RESULT(yes)
],[
ZFS_LINUX_TEST_ERROR([cred_t->user_ns()])
])
])
dnl #
dnl # 3.4 API change
dnl # kuid_has_mapping() and kgid_has_mapping() were added to distinguish
dnl # between internal kernel uids/gids and user namespace uids/gids.
dnl #
AC_DEFUN([ZFS_AC_KERNEL_SRC_KUID_HAS_MAPPING], [
ZFS_LINUX_TEST_SRC([kuid_has_mapping], [
#include <linux/uidgid.h>
],[
kuid_has_mapping((struct user_namespace *)NULL, KUIDT_INIT(0));
kgid_has_mapping((struct user_namespace *)NULL, KGIDT_INIT(0));
])
])
AC_DEFUN([ZFS_AC_KERNEL_KUID_HAS_MAPPING], [
AC_MSG_CHECKING([whether kuid_has_mapping/kgid_has_mapping exist])
ZFS_LINUX_TEST_RESULT([kuid_has_mapping], [
AC_MSG_RESULT(yes)
],[
ZFS_LINUX_TEST_ERROR([kuid_has_mapping()])
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_USERNS_CAPABILITIES], [
ZFS_AC_KERNEL_SRC_NS_CAPABLE
ZFS_AC_KERNEL_SRC_HAS_CAPABILITY
ZFS_AC_KERNEL_SRC_CRED_USER_NS
ZFS_AC_KERNEL_SRC_KUID_HAS_MAPPING
])
AC_DEFUN([ZFS_AC_KERNEL_USERNS_CAPABILITIES], [
ZFS_AC_KERNEL_NS_CAPABLE
ZFS_AC_KERNEL_HAS_CAPABILITY
ZFS_AC_KERNEL_CRED_USER_NS
ZFS_AC_KERNEL_KUID_HAS_MAPPING
])
+58
View File
@@ -0,0 +1,58 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_WRITEPAGE_T], [
dnl #
dnl # 6.3 API change
dnl # The writepage_t function type now has its first argument as
dnl # struct folio* instead of struct page*
dnl #
ZFS_LINUX_TEST_SRC([writepage_t_folio], [
#include <linux/writeback.h>
static int putpage(struct folio *folio,
struct writeback_control *wbc, void *data)
{ return 0; }
writepage_t func = putpage;
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_WRITEPAGE_T], [
AC_MSG_CHECKING([whether int (*writepage_t)() takes struct folio*])
ZFS_LINUX_TEST_RESULT([writepage_t_folio], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_WRITEPAGE_T_FOLIO, 1,
[int (*writepage_t)() takes struct folio*])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_WRITE_CACHE_PAGES], [
dnl #
dnl # 6.18 API change
dnl # write_cache_pages() has been removed.
dnl #
ZFS_LINUX_TEST_SRC([write_cache_pages], [
#include <linux/writeback.h>
], [
(void) write_cache_pages(NULL, NULL, NULL, NULL);
])
])
AC_DEFUN([ZFS_AC_KERNEL_WRITE_CACHE_PAGES], [
AC_MSG_CHECKING([whether write_cache_pages() is available])
ZFS_LINUX_TEST_RESULT([write_cache_pages], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_WRITE_CACHE_PAGES, 1,
[write_cache_pages() is available])
],[
AC_MSG_RESULT(no)
])
])
AC_DEFUN([ZFS_AC_KERNEL_SRC_WRITEBACK], [
ZFS_AC_KERNEL_SRC_WRITEPAGE_T
ZFS_AC_KERNEL_SRC_WRITE_CACHE_PAGES
])
AC_DEFUN([ZFS_AC_KERNEL_WRITEBACK], [
ZFS_AC_KERNEL_WRITEPAGE_T
ZFS_AC_KERNEL_WRITE_CACHE_PAGES
])
-26
View File
@@ -1,26 +0,0 @@
AC_DEFUN([ZFS_AC_KERNEL_SRC_WRITEPAGE_T], [
dnl #
dnl # 6.3 API change
dnl # The writepage_t function type now has its first argument as
dnl # struct folio* instead of struct page*
dnl #
ZFS_LINUX_TEST_SRC([writepage_t_folio], [
#include <linux/writeback.h>
static int putpage(struct folio *folio,
struct writeback_control *wbc, void *data)
{ return 0; }
writepage_t func = putpage;
],[])
])
AC_DEFUN([ZFS_AC_KERNEL_WRITEPAGE_T], [
AC_MSG_CHECKING([whether int (*writepage_t)() takes struct folio*])
ZFS_LINUX_TEST_RESULT([writepage_t_folio], [
AC_MSG_RESULT(yes)
AC_DEFINE(HAVE_WRITEPAGE_T_FOLIO, 1,
[int (*writepage_t)() takes struct folio*])
],[
AC_MSG_RESULT(no)
])
])
+10 -2
View File
@@ -59,6 +59,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_ACL
ZFS_AC_KERNEL_SRC_INODE_SETATTR
ZFS_AC_KERNEL_SRC_INODE_GETATTR
ZFS_AC_KERNEL_SRC_INODE_STATE_READ_ONCE
ZFS_AC_KERNEL_SRC_SHOW_OPTIONS
ZFS_AC_KERNEL_SRC_SHRINKER
ZFS_AC_KERNEL_SRC_MKDIR
@@ -70,6 +71,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_COMMIT_METADATA
ZFS_AC_KERNEL_SRC_SETATTR_PREPARE
ZFS_AC_KERNEL_SRC_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_SRC_DENTRY
ZFS_AC_KERNEL_SRC_TRUNCATE_SETSIZE
ZFS_AC_KERNEL_SRC_SECURITY_INODE
ZFS_AC_KERNEL_SRC_FST_MOUNT
@@ -120,7 +122,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_IDMAP_MNT_API
ZFS_AC_KERNEL_SRC_IDMAP_NO_USERNS
ZFS_AC_KERNEL_SRC_IATTR_VFSID
ZFS_AC_KERNEL_SRC_WRITEPAGE_T
ZFS_AC_KERNEL_SRC_WRITEBACK
ZFS_AC_KERNEL_SRC_RECLAIMED
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_SRC_REGISTER_SYSCTL_SZ
@@ -135,6 +137,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_SRC], [
ZFS_AC_KERNEL_SRC_TIMER
ZFS_AC_KERNEL_SRC_SUPER_BLOCK_S_WB_ERR
ZFS_AC_KERNEL_SRC_SOPS_FREE_INODE
ZFS_AC_KERNEL_SRC_NAMESPACE
ZFS_AC_KERNEL_SRC_INODE_GENERIC_DROP
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_SRC_CPU_HAS_FEATURE
@@ -177,6 +181,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_ACL
ZFS_AC_KERNEL_INODE_SETATTR
ZFS_AC_KERNEL_INODE_GETATTR
ZFS_AC_KERNEL_INODE_STATE_READ_ONCE
ZFS_AC_KERNEL_SHOW_OPTIONS
ZFS_AC_KERNEL_SHRINKER
ZFS_AC_KERNEL_MKDIR
@@ -188,6 +193,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_COMMIT_METADATA
ZFS_AC_KERNEL_SETATTR_PREPARE
ZFS_AC_KERNEL_INSERT_INODE_LOCKED
ZFS_AC_KERNEL_DENTRY
ZFS_AC_KERNEL_TRUNCATE_SETSIZE
ZFS_AC_KERNEL_SECURITY_INODE
ZFS_AC_KERNEL_FST_MOUNT
@@ -238,7 +244,7 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_IDMAP_MNT_API
ZFS_AC_KERNEL_IDMAP_NO_USERNS
ZFS_AC_KERNEL_IATTR_VFSID
ZFS_AC_KERNEL_WRITEPAGE_T
ZFS_AC_KERNEL_WRITEBACK
ZFS_AC_KERNEL_RECLAIMED
ZFS_AC_KERNEL_REGISTER_SYSCTL_TABLE
ZFS_AC_KERNEL_REGISTER_SYSCTL_SZ
@@ -254,6 +260,8 @@ AC_DEFUN([ZFS_AC_KERNEL_TEST_RESULT], [
ZFS_AC_KERNEL_TIMER
ZFS_AC_KERNEL_SUPER_BLOCK_S_WB_ERR
ZFS_AC_KERNEL_SOPS_FREE_INODE
ZFS_AC_KERNEL_NAMESPACE
ZFS_AC_KERNEL_INODE_GENERIC_DROP
case "$host_cpu" in
powerpc*)
ZFS_AC_KERNEL_CPU_HAS_FEATURE
+3 -3
View File
@@ -2,7 +2,7 @@ dnl #
dnl # Check for statx() function and STATX_MNT_ID availability
dnl #
AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
AC_CHECK_HEADERS([linux/stat.h],
AC_CHECK_HEADERS([sys/stat.h],
[have_stat_headers=yes],
[have_stat_headers=no])
@@ -14,7 +14,7 @@ AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
AC_MSG_CHECKING([for STATX_MNT_ID])
AC_COMPILE_IFELSE([
AC_LANG_PROGRAM([[
#include <linux/stat.h>
#include <sys/stat.h>
]], [[
struct statx stx;
int mask = STATX_MNT_ID;
@@ -29,6 +29,6 @@ AC_DEFUN([ZFS_AC_CONFIG_USER_STATX], [
])
])
], [
AC_MSG_WARN([linux/stat.h not found; skipping statx support])
AC_MSG_WARN([sys/stat.h not found; skipping statx support])
])
]) dnl end AC_DEFUN
-1
View File
@@ -604,5 +604,4 @@ class RaidzExpansionRunning(ZFSError):
errno = ZFS_ERR_RAIDZ_EXPAND_IN_PROGRESS
message = "A raidz device is currently expanding"
# vim: softtabstop=4 tabstop=4 expandtab shiftwidth=4
+4 -1
View File
@@ -8,7 +8,9 @@ usage()
exit 1
}
[ "$#" -eq 1 ] || usage
if ! [ -d "$1" ] ; then
usage
fi
KERNEL_DIR="$1"
if ! [ -e 'zfs_config.h' ]
@@ -31,6 +33,7 @@ cat > "$KERNEL_DIR/fs/zfs/Kconfig" <<EOF
config ZFS
tristate "ZFS filesystem support"
depends on EFI_PARTITION
depends on BLOCK
select ZLIB_INFLATE
select ZLIB_DEFLATE
help
+3
View File
@@ -104,6 +104,9 @@
#define spa_taskq_write_param_set_args(var) \
CTLTYPE_STRING, NULL, 0, spa_taskq_write_param, "A"
#define spa_taskq_free_param_set_args(var) \
CTLTYPE_STRING, NULL, 0, spa_taskq_free_param, "A"
#define fletcher_4_param_set_args(var) \
CTLTYPE_STRING, NULL, 0, fletcher_4_param, "A"
+4 -73
View File
@@ -290,80 +290,11 @@ extern unsigned char bcd_to_byte[256];
#define offsetof(type, field) __offsetof(type, field)
#endif
/*
* Find highest one bit set.
* Returns bit number + 1 of highest bit that is set, otherwise returns 0.
* High order bit is 31 (or 63 in _LP64 kernel).
*/
static __inline int
highbit(ulong_t i)
{
#if defined(HAVE_INLINE_FLSL)
return (flsl(i));
#else
int h = 1;
#define highbit(x) flsl(x)
#define lowbit(x) ffsl(x)
if (i == 0)
return (0);
#ifdef _LP64
if (i & 0xffffffff00000000ul) {
h += 32; i >>= 32;
}
#endif
if (i & 0xffff0000) {
h += 16; i >>= 16;
}
if (i & 0xff00) {
h += 8; i >>= 8;
}
if (i & 0xf0) {
h += 4; i >>= 4;
}
if (i & 0xc) {
h += 2; i >>= 2;
}
if (i & 0x2) {
h += 1;
}
return (h);
#endif
}
/*
* Find highest one bit set.
* Returns bit number + 1 of highest bit that is set, otherwise returns 0.
*/
static __inline int
highbit64(uint64_t i)
{
#if defined(HAVE_INLINE_FLSLL)
return (flsll(i));
#else
int h = 1;
if (i == 0)
return (0);
if (i & 0xffffffff00000000ULL) {
h += 32; i >>= 32;
}
if (i & 0xffff0000) {
h += 16; i >>= 16;
}
if (i & 0xff00) {
h += 8; i >>= 8;
}
if (i & 0xf0) {
h += 4; i >>= 4;
}
if (i & 0xc) {
h += 2; i >>= 2;
}
if (i & 0x2) {
h += 1;
}
return (h);
#endif
}
#define highbit64(x) flsll(x)
#define lowbit64(x) ffsll(x)
#ifdef __cplusplus
}
+15 -4
View File
@@ -34,6 +34,17 @@
#define d_alias d_u.d_alias
#ifdef HAVE_MM_PAGE_FLAGS_STRUCT
/*
* Starting from Linux 6.18, the 'flags' field in 'struct page' is defined
* to a struct ('memdesc_flags_t' typedef) instead of an unsigned long for
* improved typesafety.
*/
#define page_flags flags.f
#else
#define page_flags flags
#endif
/*
* Starting from Linux 5.13, flush_dcache_page() becomes an inline function
* and under some configurations, may indirectly referencing GPL-only
@@ -44,8 +55,8 @@
#include <linux/simd_powerpc.h>
#define flush_dcache_page(page) do { \
if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE) && \
test_bit(PG_dcache_clean, &(page)->flags)) \
clear_bit(PG_dcache_clean, &(page)->flags); \
test_bit(PG_dcache_clean, &(page)->page_flags)) \
clear_bit(PG_dcache_clean, &(page)->page_flags);\
} while (0)
#endif
/*
@@ -55,8 +66,8 @@
*/
#if defined __riscv && defined HAVE_FLUSH_DCACHE_PAGE_GPL_ONLY
#define flush_dcache_page(page) do { \
if (test_bit(PG_dcache_clean, &(page)->flags)) \
clear_bit(PG_dcache_clean, &(page)->flags); \
if (test_bit(PG_dcache_clean, &(page)->page_flags)) \
clear_bit(PG_dcache_clean, &(page)->page_flags);\
} while (0)
#endif
@@ -23,6 +23,7 @@
/*
* Copyright (C) 2011 Lawrence Livermore National Security, LLC.
* Copyright (C) 2015 Jörg Thalheim.
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
#ifndef _ZFS_VFS_H
@@ -262,4 +263,18 @@ zpl_is_32bit_api(void)
#define zpl_generic_fillattr(user_ns, ip, sp) generic_fillattr(ip, sp)
#endif
#ifdef HAVE_INODE_GENERIC_DROP
/* 6.18 API change. These were renamed, alias the old names to the new. */
#define generic_delete_inode(ip) inode_just_drop(ip)
#define generic_drop_inode(ip) inode_generic_drop(ip)
#endif
#ifndef HAVE_INODE_STATE_READ_ONCE
/*
* 6.19 API change. We should no longer access i_state directly. If the new
* helper function doesn't exist, define our own.
*/
#define inode_state_read_once(ip) READ_ONCE(ip->i_state)
#endif
#endif /* _ZFS_VFS_H */
+1 -1
View File
@@ -25,6 +25,6 @@
#ifndef _SPL_STAT_H
#define _SPL_STAT_H
#include <linux/stat.h>
#include <sys/stat.h>
#endif /* SPL_STAT_H */
+1
View File
@@ -55,6 +55,7 @@ extern const struct file_operations zpl_dir_file_operations;
extern void zpl_prune_sb(uint64_t nr_to_scan, void *arg);
extern const struct super_operations zpl_super_operations;
extern const struct dentry_operations zpl_dentry_operations;
extern const struct export_operations zpl_export_operations;
extern struct file_system_type zpl_fs_type;
+4
View File
@@ -863,6 +863,10 @@ typedef struct zpool_load_policy {
#define ZPOOL_CONFIG_MMP_SEQ "mmp_seq" /* not stored on disk */
#define ZPOOL_CONFIG_MMP_HOSTNAME "mmp_hostname" /* not stored on disk */
#define ZPOOL_CONFIG_MMP_HOSTID "mmp_hostid" /* not stored on disk */
#define ZPOOL_CONFIG_MMP_RESULT "mmp_result" /* not stored on disk */
#define ZPOOL_CONFIG_MMP_TRYIMPORT_NS "mmp_tryimport_ns" /* not stored */
#define ZPOOL_CONFIG_MMP_IMPORT_NS "mmp_import_ns" /* not stored on disk */
#define ZPOOL_CONFIG_MMP_CLAIM_NS "mmp_claim_ns" /* not stored on disk */
#define ZPOOL_CONFIG_ALLOCATION_BIAS "alloc_bias" /* not stored on disk */
#define ZPOOL_CONFIG_EXPANSION_TIME "expansion_time" /* not stored */
#define ZPOOL_CONFIG_REBUILD_STATS "org.openzfs:rebuild_stats"
+5
View File
@@ -33,6 +33,7 @@ extern "C" {
#define MMP_DEFAULT_IMPORT_INTERVALS 20
#define MMP_DEFAULT_FAIL_INTERVALS 10
#define MMP_MIN_FAIL_INTERVALS 2 /* min if != 0 */
#define MMP_IMPORT_VERIFY_ITERS 10
#define MMP_IMPORT_SAFETY_FACTOR 200 /* pct */
#define MMP_INTERVAL_OK(interval) MAX(interval, MMP_MIN_INTERVAL)
#define MMP_FAIL_INTVS_OK(fails) (fails == 0 ? 0 : MAX(fails, \
@@ -53,6 +54,9 @@ typedef struct mmp_thread {
vdev_t *mmp_last_leaf; /* last mmp write sent here */
uint64_t mmp_leaf_last_gen; /* last mmp write sent here */
uint32_t mmp_seq; /* intra-second update counter */
uint64_t mmp_tryimport_ns; /* tryimport activity check time */
uint64_t mmp_import_ns; /* import activity check time */
uint64_t mmp_claim_ns; /* claim activity check time */
} mmp_thread_t;
@@ -62,6 +66,7 @@ extern void mmp_thread_start(struct spa *spa);
extern void mmp_thread_stop(struct spa *spa);
extern void mmp_update_uberblock(struct spa *spa, struct uberblock *ub);
extern void mmp_signal_all_threads(void);
extern int mmp_claim_uberblock(spa_t *spa, vdev_t *vd, uberblock_t *ub);
/* Global tuning */
extern int param_set_multihost_interval(ZFS_MODULE_PARAM_ARGS);
+1
View File
@@ -1044,6 +1044,7 @@ extern void spa_set_rootblkptr(spa_t *spa, const blkptr_t *bp);
extern void spa_altroot(spa_t *, char *, size_t);
extern uint32_t spa_sync_pass(spa_t *spa);
extern char *spa_name(spa_t *spa);
extern char *spa_load_name(spa_t *spa);
extern uint64_t spa_guid(spa_t *spa);
extern uint64_t spa_load_guid(spa_t *spa);
extern uint64_t spa_last_synced_txg(spa_t *spa);
+2
View File
@@ -224,6 +224,7 @@ struct spa {
* Fields protected by spa_namespace_lock.
*/
char spa_name[ZFS_MAX_DATASET_NAME_LEN]; /* pool name */
char *spa_load_name; /* unmodified pool name */
char *spa_comment; /* comment */
avl_node_t spa_avl; /* node in spa_namespace_avl */
nvlist_t *spa_config; /* last synced config */
@@ -303,6 +304,7 @@ struct spa {
void *spa_cksum_tmpls[ZIO_CHECKSUM_FUNCTIONS];
uberblock_t spa_ubsync; /* last synced uberblock */
uberblock_t spa_uberblock; /* current uberblock */
boolean_t spa_activity_check; /* activity check required */
boolean_t spa_extreme_rewind; /* rewind past deferred frees */
kmutex_t spa_scrub_lock; /* resilver/scrub lock */
uint64_t spa_scrub_inflight; /* in-flight scrub bytes */
+16 -6
View File
@@ -51,6 +51,12 @@ extern "C" {
#define MMP_SEQ_VALID_BIT 0x02
#define MMP_FAIL_INT_VALID_BIT 0x04
#define MMP_INTERVAL_MASK 0x00000000FFFFFF00
#define MMP_SEQ_MASK 0x0000FFFF00000000
#define MMP_FAIL_INT_MASK 0xFFFF000000000000
#define MMP_SEQ_MAX UINT16_MAX
#define MMP_VALID(ubp) ((ubp)->ub_magic == UBERBLOCK_MAGIC && \
(ubp)->ub_mmp_magic == MMP_MAGIC)
#define MMP_INTERVAL_VALID(ubp) (MMP_VALID(ubp) && ((ubp)->ub_mmp_config & \
@@ -60,21 +66,25 @@ extern "C" {
#define MMP_FAIL_INT_VALID(ubp) (MMP_VALID(ubp) && ((ubp)->ub_mmp_config & \
MMP_FAIL_INT_VALID_BIT))
#define MMP_INTERVAL(ubp) (((ubp)->ub_mmp_config & 0x00000000FFFFFF00) \
#define MMP_INTERVAL(ubp) (((ubp)->ub_mmp_config & MMP_INTERVAL_MASK) \
>> 8)
#define MMP_SEQ(ubp) (((ubp)->ub_mmp_config & 0x0000FFFF00000000) \
#define MMP_SEQ(ubp) (((ubp)->ub_mmp_config & MMP_SEQ_MASK) \
>> 32)
#define MMP_FAIL_INT(ubp) (((ubp)->ub_mmp_config & 0xFFFF000000000000) \
#define MMP_FAIL_INT(ubp) (((ubp)->ub_mmp_config & MMP_FAIL_INT_MASK) \
>> 48)
#define MMP_INTERVAL_SET(write) \
(((uint64_t)(write & 0xFFFFFF) << 8) | MMP_INTERVAL_VALID_BIT)
(((uint64_t)((write) & 0xFFFFFF) << 8) | MMP_INTERVAL_VALID_BIT)
#define MMP_SEQ_SET(seq) \
(((uint64_t)(seq & 0xFFFF) << 32) | MMP_SEQ_VALID_BIT)
(((uint64_t)((seq) & 0xFFFF) << 32) | MMP_SEQ_VALID_BIT)
#define MMP_FAIL_INT_SET(fail) \
(((uint64_t)(fail & 0xFFFF) << 48) | MMP_FAIL_INT_VALID_BIT)
(((uint64_t)((fail) & 0xFFFF) << 48) | MMP_FAIL_INT_VALID_BIT)
#define MMP_SEQ_CLEAR(ubp) \
((ubp)->ub_mmp_config &= ~(MMP_SEQ_MASK | MMP_SEQ_VALID_BIT))
/*
* RAIDZ expansion reflow information.
+2
View File
@@ -212,6 +212,8 @@ extern void vdev_label_write(zio_t *zio, vdev_t *vd, int l, abd_t *buf, uint64_t
extern int vdev_label_read_bootenv(vdev_t *, nvlist_t *);
extern int vdev_label_write_bootenv(vdev_t *, nvlist_t *);
extern int vdev_uberblock_sync_list(vdev_t **, int, struct uberblock *, int);
extern int vdev_uberblock_compare(const struct uberblock *,
const struct uberblock *);
extern int vdev_check_boot_reserve(spa_t *, vdev_t *);
typedef enum {
+1 -1
View File
@@ -46,7 +46,7 @@ void zfs_file_close(zfs_file_t *fp);
int zfs_file_write(zfs_file_t *fp, const void *buf, size_t len, ssize_t *resid);
int zfs_file_pwrite(zfs_file_t *fp, const void *buf, size_t len, loff_t off,
ssize_t *resid);
uint8_t ashift, ssize_t *resid);
int zfs_file_read(zfs_file_t *fp, void *buf, size_t len, ssize_t *resid);
int zfs_file_pread(zfs_file_t *fp, void *buf, size_t len, loff_t off,
ssize_t *resid);
+1 -1
View File
@@ -33,7 +33,7 @@
#ifdef HAVE_STATX
#include <fcntl.h>
#include <linux/stat.h>
#include <sys/stat.h>
#endif
/*
+5
View File
@@ -2238,6 +2238,11 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *config, const char *newname,
zpool_get_load_policy(config, &policy);
if (getenv("ZFS_LOAD_INFO_DEBUG") && nv != NULL &&
nvlist_lookup_nvlist(nv, ZPOOL_CONFIG_LOAD_INFO, &nvinfo) == 0) {
dump_nvlist(nvinfo, 4);
}
if (error) {
char desc[1024];
char aux[256];
+43 -35
View File
@@ -98,57 +98,57 @@ static const char *const zfs_msgid_table[] = {
#define NMSGID (sizeof (zfs_msgid_table) / sizeof (zfs_msgid_table[0]))
static int
vdev_missing(vdev_stat_t *vs, uint_t vsc)
vdev_missing(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_CANT_OPEN &&
vs->vs_aux == VDEV_AUX_OPEN_FAILED);
}
static int
vdev_faulted(vdev_stat_t *vs, uint_t vsc)
vdev_faulted(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_FAULTED);
}
static int
vdev_errors(vdev_stat_t *vs, uint_t vsc)
vdev_errors(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_DEGRADED ||
vs->vs_read_errors != 0 || vs->vs_write_errors != 0 ||
vs->vs_checksum_errors != 0);
}
static int
vdev_broken(vdev_stat_t *vs, uint_t vsc)
vdev_broken(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_CANT_OPEN);
}
static int
vdev_offlined(vdev_stat_t *vs, uint_t vsc)
vdev_offlined(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_OFFLINE);
}
static int
vdev_removed(vdev_stat_t *vs, uint_t vsc)
vdev_removed(vdev_stat_t *vs, uint_t vsc, void *arg)
{
(void) vsc;
(void) vsc, (void) arg;
return (vs->vs_state == VDEV_STATE_REMOVED);
}
static int
vdev_non_native_ashift(vdev_stat_t *vs, uint_t vsc)
vdev_non_native_ashift(vdev_stat_t *vs, uint_t vsc, void *arg)
{
if (getenv("ZPOOL_STATUS_NON_NATIVE_ASHIFT_IGNORE") != NULL)
return (0);
uint64_t ashift = *(uint64_t *)arg;
return (VDEV_STAT_VALID(vs_physical_ashift, vsc) &&
(ashift == 0 || vs->vs_configured_ashift < ashift) &&
vs->vs_configured_ashift < vs->vs_physical_ashift);
}
@@ -156,8 +156,8 @@ vdev_non_native_ashift(vdev_stat_t *vs, uint_t vsc)
* Detect if any leaf devices that have seen errors or could not be opened.
*/
static boolean_t
find_vdev_problem(nvlist_t *vdev, int (*func)(vdev_stat_t *, uint_t),
boolean_t ignore_replacing)
find_vdev_problem(nvlist_t *vdev, int (*func)(vdev_stat_t *, uint_t, void *),
void *arg, boolean_t ignore_replacing)
{
nvlist_t **child;
uint_t c, children;
@@ -177,14 +177,16 @@ find_vdev_problem(nvlist_t *vdev, int (*func)(vdev_stat_t *, uint_t),
if (nvlist_lookup_nvlist_array(vdev, ZPOOL_CONFIG_CHILDREN, &child,
&children) == 0) {
for (c = 0; c < children; c++)
if (find_vdev_problem(child[c], func, ignore_replacing))
for (c = 0; c < children; c++) {
if (find_vdev_problem(child[c], func, arg,
ignore_replacing))
return (B_TRUE);
}
} else {
uint_t vsc;
vdev_stat_t *vs = (vdev_stat_t *)fnvlist_lookup_uint64_array(
vdev, ZPOOL_CONFIG_VDEV_STATS, &vsc);
if (func(vs, vsc) != 0)
if (func(vs, vsc, arg) != 0)
return (B_TRUE);
}
@@ -193,9 +195,11 @@ find_vdev_problem(nvlist_t *vdev, int (*func)(vdev_stat_t *, uint_t),
*/
if (nvlist_lookup_nvlist_array(vdev, ZPOOL_CONFIG_L2CACHE, &child,
&children) == 0) {
for (c = 0; c < children; c++)
if (find_vdev_problem(child[c], func, ignore_replacing))
for (c = 0; c < children; c++) {
if (find_vdev_problem(child[c], func, arg,
ignore_replacing))
return (B_TRUE);
}
}
return (B_FALSE);
@@ -220,7 +224,7 @@ find_vdev_problem(nvlist_t *vdev, int (*func)(vdev_stat_t *, uint_t),
*/
static zpool_status_t
check_status(nvlist_t *config, boolean_t isimport,
zpool_errata_t *erratap, const char *compat)
zpool_errata_t *erratap, const char *compat, uint64_t ashift)
{
pool_scan_stat_t *ps = NULL;
uint_t vsc, psc;
@@ -371,15 +375,15 @@ check_status(nvlist_t *config, boolean_t isimport,
* Bad devices in non-replicated config.
*/
if (vs->vs_state == VDEV_STATE_CANT_OPEN &&
find_vdev_problem(nvroot, vdev_faulted, B_TRUE))
find_vdev_problem(nvroot, vdev_faulted, NULL, B_TRUE))
return (ZPOOL_STATUS_FAULTED_DEV_NR);
if (vs->vs_state == VDEV_STATE_CANT_OPEN &&
find_vdev_problem(nvroot, vdev_missing, B_TRUE))
find_vdev_problem(nvroot, vdev_missing, NULL, B_TRUE))
return (ZPOOL_STATUS_MISSING_DEV_NR);
if (vs->vs_state == VDEV_STATE_CANT_OPEN &&
find_vdev_problem(nvroot, vdev_broken, B_TRUE))
find_vdev_problem(nvroot, vdev_broken, NULL, B_TRUE))
return (ZPOOL_STATUS_CORRUPT_LABEL_NR);
/*
@@ -402,35 +406,37 @@ check_status(nvlist_t *config, boolean_t isimport,
/*
* Missing devices in a replicated config.
*/
if (find_vdev_problem(nvroot, vdev_faulted, B_TRUE))
if (find_vdev_problem(nvroot, vdev_faulted, NULL, B_TRUE))
return (ZPOOL_STATUS_FAULTED_DEV_R);
if (find_vdev_problem(nvroot, vdev_missing, B_TRUE))
if (find_vdev_problem(nvroot, vdev_missing, NULL, B_TRUE))
return (ZPOOL_STATUS_MISSING_DEV_R);
if (find_vdev_problem(nvroot, vdev_broken, B_TRUE))
if (find_vdev_problem(nvroot, vdev_broken, NULL, B_TRUE))
return (ZPOOL_STATUS_CORRUPT_LABEL_R);
/*
* Devices with errors
*/
if (!isimport && find_vdev_problem(nvroot, vdev_errors, B_TRUE))
if (!isimport && find_vdev_problem(nvroot, vdev_errors, NULL, B_TRUE))
return (ZPOOL_STATUS_FAILING_DEV);
/*
* Offlined devices
*/
if (find_vdev_problem(nvroot, vdev_offlined, B_TRUE))
if (find_vdev_problem(nvroot, vdev_offlined, NULL, B_TRUE))
return (ZPOOL_STATUS_OFFLINE_DEV);
/*
* Removed device
*/
if (find_vdev_problem(nvroot, vdev_removed, B_TRUE))
if (find_vdev_problem(nvroot, vdev_removed, NULL, B_TRUE))
return (ZPOOL_STATUS_REMOVED_DEV);
/*
* Suboptimal, but usable, ashift configuration.
*/
if (find_vdev_problem(nvroot, vdev_non_native_ashift, B_FALSE))
if (!isimport &&
getenv("ZPOOL_STATUS_NON_NATIVE_ASHIFT_IGNORE") == NULL &&
find_vdev_problem(nvroot, vdev_non_native_ashift, &ashift, B_FALSE))
return (ZPOOL_STATUS_NON_NATIVE_ASHIFT);
/*
@@ -509,8 +515,10 @@ zpool_get_status(zpool_handle_t *zhp, const char **msgid,
ZFS_MAXPROPLEN, NULL, B_FALSE) != 0)
compatibility[0] = '\0';
uint64_t ashift = zpool_get_prop_int(zhp, ZPOOL_PROP_ASHIFT, NULL);
zpool_status_t ret = check_status(zhp->zpool_config, B_FALSE, errata,
compatibility);
compatibility, ashift);
if (msgid != NULL) {
if (ret >= NMSGID)
@@ -525,7 +533,7 @@ zpool_status_t
zpool_import_status(nvlist_t *config, const char **msgid,
zpool_errata_t *errata)
{
zpool_status_t ret = check_status(config, B_TRUE, errata, NULL);
zpool_status_t ret = check_status(config, B_TRUE, errata, NULL, 0);
if (ret >= NMSGID)
*msgid = NULL;
+3 -3
View File
@@ -1175,7 +1175,7 @@ zfs_file_write(zfs_file_t *fp, const void *buf, size_t count, ssize_t *resid)
*/
int
zfs_file_pwrite(zfs_file_t *fp, const void *buf,
size_t count, loff_t pos, ssize_t *resid)
size_t count, loff_t pos, uint8_t ashift, ssize_t *resid)
{
ssize_t rc, split, done;
int sectors;
@@ -1185,8 +1185,8 @@ zfs_file_pwrite(zfs_file_t *fp, const void *buf,
* system calls so that the process can be killed in between.
* This is used by ztest to simulate realistic failure modes.
*/
sectors = count >> SPA_MINBLOCKSHIFT;
split = (sectors > 0 ? rand() % sectors : 0) << SPA_MINBLOCKSHIFT;
sectors = count >> ashift;
split = (sectors > 0 ? rand() % sectors : 0) << ashift;
rc = pwrite64(fp->f_fd, buf, split, pos);
if (rc != -1) {
done = rc;
+20
View File
@@ -122,6 +122,24 @@ Example:
.Nm zhack Cm label repair Fl cu Ar device
Fix checksums and undetach a device
.
.It Xo
.Nm zhack
.Cm metaslab leak
.Op Fl f
.Ar pool
.Xc
Apply a fragmentation profile generated by
.Sy zdb
to the specified
.Ar pool Ns
\&.
.Pp
The
.Fl f
flag forces the profile to apply even if the vdevs in the
.Ar pool
don't have the same number of metaslabs as the fragmentation profile.
.
.El
.
.Sh GLOBAL OPTIONS
@@ -143,6 +161,8 @@ Search for
members in
.Ar dir .
Can be specified more than once.
.It Fl o Ar var Ns = Ns Ar value
Set the given tunable to the provided value.
.El
.
.Sh EXAMPLES
+82 -1
View File
@@ -17,7 +17,7 @@
.\" own identifying information:
.\" Portions Copyright [yyyy] [name of copyright owner]
.\"
.Dd May 24, 2025
.Dd September 15, 2025
.Dt ZFS 4
.Os
.
@@ -2551,6 +2551,49 @@ the xattr so as to not accumulate duplicates.
.It Sy zio_requeue_io_start_cut_in_line Ns = Ns Sy 0 Ns | Ns 1 Pq int
Prioritize requeued I/O.
.
.It Sy zfs_delete_inode Ns = Ns Sy 0 Ns | Ns 1 Pq int
Sets whether the kernel should free an inode structure when the last reference
is released, or cache it in memory.
Intended for testing/debugging.
.Pp
A live inode structure "pins" versious internal OpenZFS structures in memory,
which can result in large amounts of "unusable" memory on systems with lots of
infrequently-accessed files, until the kernel's memory pressure mechanism
asks OpenZFS to release them.
.Pp
The default value of
.Sy 0
always caches inodes that appear to still exist on disk.
Setting it to
.Sy 1
will immediately release unused inodes and their associated memory back to the
dbuf cache or the ARC for reuse, but may reduce performance if inodes are
frequently evicted and reloaded.
.Pp
This parameter is only available on Linux.
.
.It Sy zfs_delete_dentry Ns = Ns Sy 0 Ns | Ns 1 Pq int
Sets whether the kernel should free a dentry structure when it is no longer
required, or hold it in the dentry cache.
Intended for testing/debugging.
.
Since a dentry structure holds an inode reference, a cached dentry can "pin"
an inode in memory indefinitely, along with associated OpenZFS structures (See
.Sy zfs_delete_inode ) .
.Pp
The default value of
.Sy 0
instructs the kernel to cache entries and their associated inodes when they
are no longer directly referenced.
They will be reclaimed as part of the kernel's normal cache management
processes.
Setting it to
.Sy 1
will instruct the kernel to release directory entries and their inodes as soon
as they are no longer referenced by the filesystem.
.Pp
This parameter is only available on Linux.
.
.It Sy zio_taskq_batch_pct Ns = Ns Sy 80 Ns % Pq uint
Percentage of online CPUs which will run a worker thread for I/O.
These workers are responsible for I/O work such as compression, encryption,
@@ -2585,12 +2628,50 @@ Set value only applies to pools imported/created after that.
Set the queue and thread configuration for the IO read queues.
This is an advanced debugging parameter.
Don't change this unless you understand what it does.
Each of the four values corresponds to the issue, issue high-priority,
interrupt, and interrupt high-priority queues.
Valid values are
.Sy fixed,N,M
(M queues with N threads each),
.Sy scale[,MIN]
(scale with CPUs, minimum MIN total threads),
.Sy sync ,
and
.Sy null .
Set values only apply to pools imported/created after that.
.
.It Sy zio_taskq_write Ns = Ns Sy sync null scale null Pq charp
Set the queue and thread configuration for the IO write queues.
This is an advanced debugging parameter.
Don't change this unless you understand what it does.
Each of the four values corresponds to the issue, issue high-priority,
interrupt, and interrupt high-priority queues.
Valid values are
.Sy fixed,N,M
(M queues with N threads each),
.Sy scale[,MIN]
(scale with CPUs, minimum MIN total threads),
.Sy sync ,
and
.Sy null .
Set values only apply to pools imported/created after that.
.
.It Sy zio_taskq_free Ns = Ns Sy scale,32 null null null Pq charp
Set the queue and thread configuration for the IO free queues.
This is an advanced debugging parameter.
Don't change this unless you understand what it does.
Each of the four values corresponds to the issue, issue high-priority,
interrupt, and interrupt high-priority queues.
Valid values are
.Sy fixed,N,M
(M queues with N threads each),
.Sy scale[,MIN]
(scale with CPUs, minimum MIN total threads),
.Sy sync ,
and
.Sy null .
The default uses a minimum of 32 threads to improve parallelism for
DDT and BRT metadata operations during frees.
Set values only apply to pools imported/created after that.
.
.It Sy zvol_inhibit_dev Ns = Ns Sy 0 Ns | Ns 1 Pq uint
+12
View File
@@ -69,6 +69,13 @@
.Op Fl U Ar cache
.Ar poolname Op Ar vdev Oo Ar metaslab Oc Ns …
.Nm
.Fl -allocated-map
.Op Fl mAFLPXY
.Op Fl e Oo Fl V Oc Oo Fl p Ar path Oc Ns …
.Op Fl t Ar txg
.Op Fl U Ar cache
.Ar poolname Op Ar vdev Oo Ar metaslab Oc Ns …
.Nm
.Fl O
.Op Fl K Ar key
.Ar dataset path
@@ -128,6 +135,11 @@ that zdb may interpret inconsistent pool data and behave erratically.
.Sh OPTIONS
Display options:
.Bl -tag -width Ds
.It Fl Sy -allocated-map
Prints out a list of all the allocated regions in the pool.
Primarily intended for use with the
.Nm zhack metaslab leak
subcommand.
.It Fl b , -block-stats
Display statistics regarding the number, size
.Pq logical, physical and allocated
+1 -1
View File
@@ -2,7 +2,7 @@
# first. This ensures its module initialization function is run before
# any of the other module initialization functions which depend on it.
ZFS_MODULE_CFLAGS += -std=gnu99 -Wno-declaration-after-statement
ZFS_MODULE_CFLAGS += -std=gnu11 -Wno-declaration-after-statement
ZFS_MODULE_CFLAGS += -Wmissing-prototypes
ZFS_MODULE_CFLAGS += @KERNEL_DEBUG_CFLAGS@ @NO_FORMAT_ZERO_LENGTH@
-5
View File
@@ -62,11 +62,6 @@ CFLAGS+= -DZFS_DEBUG -g
CFLAGS += -DNDEBUG
.endif
.if defined(WITH_VFS_DEBUG) && ${WITH_VFS_DEBUG} == "true"
# kernel must also be built with this option for this to work
CFLAGS+= -DDEBUG_VFS_LOCKS
.endif
.if defined(WITH_GCOV) && ${WITH_GCOV} == "true"
CFLAGS+= -fprofile-arcs -ftest-coverage
.endif
+24 -17
View File
@@ -77,7 +77,8 @@ static const uint32_t SHA256_K[64] = {
h = g, g = f, f = e, e = d + T1; \
d = c, c = b, b = a, a = T1 + T2;
static void sha256_generic(uint32_t state[8], const void *data, size_t num_blks)
static void
icp_sha256_generic(uint32_t state[8], const void *data, size_t num_blks)
{
uint64_t blk;
@@ -173,7 +174,8 @@ static const uint64_t SHA512_K[80] = {
0x5fcb6fab3ad6faec, 0x6c44198c4a475817
};
static void sha512_generic(uint64_t state[8], const void *data, size_t num_blks)
static void
icp_sha512_generic(uint64_t state[8], const void *data, size_t num_blks)
{
uint64_t blk;
@@ -226,7 +228,8 @@ static void sha512_generic(uint64_t state[8], const void *data, size_t num_blks)
}
}
static void sha256_update(sha256_ctx *ctx, const uint8_t *data, size_t len)
static void
icp_sha256_update(sha256_ctx *ctx, const uint8_t *data, size_t len)
{
uint64_t pos = ctx->count[0];
uint64_t total = ctx->count[1];
@@ -258,7 +261,8 @@ static void sha256_update(sha256_ctx *ctx, const uint8_t *data, size_t len)
ctx->count[1] = total;
}
static void sha512_update(sha512_ctx *ctx, const uint8_t *data, size_t len)
static void
icp_sha512_update(sha512_ctx *ctx, const uint8_t *data, size_t len)
{
uint64_t pos = ctx->count[0];
uint64_t total = ctx->count[1];
@@ -290,7 +294,8 @@ static void sha512_update(sha512_ctx *ctx, const uint8_t *data, size_t len)
ctx->count[1] = total;
}
static void sha256_final(sha256_ctx *ctx, uint8_t *result, int bits)
static void
icp_sha256_final(sha256_ctx *ctx, uint8_t *result, int bits)
{
uint64_t mlen, pos = ctx->count[0];
uint8_t *m = ctx->wbuf;
@@ -334,7 +339,8 @@ static void sha256_final(sha256_ctx *ctx, uint8_t *result, int bits)
memset(ctx, 0, sizeof (*ctx));
}
static void sha512_final(sha512_ctx *ctx, uint8_t *result, int bits)
static void
icp_sha512_final(sha512_ctx *ctx, uint8_t *result, int bits)
{
uint64_t mlen, pos = ctx->count[0];
uint8_t *m = ctx->wbuf, *r;
@@ -461,14 +467,14 @@ SHA2Update(SHA2_CTX *ctx, const void *data, size_t len)
switch (ctx->algotype) {
case SHA256:
sha256_update(&ctx->sha256, data, len);
icp_sha256_update(&ctx->sha256, data, len);
break;
case SHA512:
case SHA512_HMAC_MECH_INFO_TYPE:
sha512_update(&ctx->sha512, data, len);
icp_sha512_update(&ctx->sha512, data, len);
break;
case SHA512_256:
sha512_update(&ctx->sha512, data, len);
icp_sha512_update(&ctx->sha512, data, len);
break;
}
}
@@ -479,32 +485,33 @@ SHA2Final(void *digest, SHA2_CTX *ctx)
{
switch (ctx->algotype) {
case SHA256:
sha256_final(&ctx->sha256, digest, 256);
icp_sha256_final(&ctx->sha256, digest, 256);
break;
case SHA512:
case SHA512_HMAC_MECH_INFO_TYPE:
sha512_final(&ctx->sha512, digest, 512);
icp_sha512_final(&ctx->sha512, digest, 512);
break;
case SHA512_256:
sha512_final(&ctx->sha512, digest, 256);
icp_sha512_final(&ctx->sha512, digest, 256);
break;
}
}
/* the generic implementation is always okay */
static boolean_t sha2_is_supported(void)
static boolean_t
icp_sha2_is_supported(void)
{
return (B_TRUE);
}
const sha256_ops_t sha256_generic_impl = {
.name = "generic",
.transform = sha256_generic,
.is_supported = sha2_is_supported
.transform = icp_sha256_generic,
.is_supported = icp_sha2_is_supported
};
const sha512_ops_t sha512_generic_impl = {
.name = "generic",
.transform = sha512_generic,
.is_supported = sha2_is_supported
.transform = icp_sha512_generic,
.is_supported = icp_sha2_is_supported
};
+4 -3
View File
@@ -3246,7 +3246,8 @@ nvs_xdr_nvl_fini(nvstream_t *nvs)
* xdrproc_t-compatible callbacks for xdr_array()
*/
#if defined(_KERNEL) && defined(__linux__) /* Linux kernel */
#if (defined(__FreeBSD_version) && __FreeBSD_version >= 1600010) || \
defined(_KERNEL) && defined(__linux__) /* Linux kernel */
#define NVS_BUILD_XDRPROC_T(type) \
static bool_t \
@@ -3255,7 +3256,7 @@ nvs_xdr_nvp_##type(XDR *xdrs, void *ptr) \
return (xdr_##type(xdrs, ptr)); \
}
#elif !defined(_KERNEL) && defined(XDR_CONTROL) /* tirpc */
#elif !defined(_KERNEL) && defined(XDR_CONTROL) /* tirpc, FreeBSD < 16 */
#define NVS_BUILD_XDRPROC_T(type) \
static bool_t \
@@ -3271,7 +3272,7 @@ nvs_xdr_nvp_##type(XDR *xdrs, ...) \
return (xdr_##type(xdrs, ptr)); \
}
#else /* FreeBSD, sunrpc */
#else /* FreeBSD kernel < 16, sunrpc */
#define NVS_BUILD_XDRPROC_T(type) \
static bool_t \
+2 -1
View File
@@ -164,8 +164,9 @@ zfs_file_write(zfs_file_t *fp, const void *buf, size_t count, ssize_t *resid)
int
zfs_file_pwrite(zfs_file_t *fp, const void *buf, size_t count, loff_t off,
ssize_t *resid)
uint8_t ashift, ssize_t *resid)
{
(void) ashift;
return (zfs_file_write_impl(fp, buf, count, &off, resid));
}
+2 -12
View File
@@ -100,14 +100,6 @@
VFS_SMR_DECLARE;
#ifdef DEBUG_VFS_LOCKS
#define VNCHECKREF(vp) \
VNASSERT((vp)->v_holdcnt > 0 && (vp)->v_usecount > 0, vp, \
("%s: wrong ref counts", __func__));
#else
#define VNCHECKREF(vp)
#endif
#if __FreeBSD_version >= 1400045
typedef uint64_t cookie_t;
#else
@@ -965,9 +957,6 @@ zfs_create(znode_t *dzp, const char *name, vattr_t *vap, int excl, int mode,
zfs_acl_ids_t acl_ids;
boolean_t fuid_dirtied;
uint64_t txtype;
#ifdef DEBUG_VFS_LOCKS
vnode_t *dvp = ZTOV(dzp);
#endif
if (is_nametoolong(zfsvfs, name))
return (SET_ERROR(ENAMETOOLONG));
@@ -1097,7 +1086,8 @@ zfs_create(znode_t *dzp, const char *name, vattr_t *vap, int excl, int mode,
getnewvnode_drop_reserve();
out:
VNCHECKREF(dvp);
VNASSERT(ZTOV(dzp)->v_holdcnt > 0 && ZTOV(dzp)->v_usecount > 0,
ZTOV(dzp), ("%s: wrong ref counts", __func__));
if (error == 0) {
*zpp = zp;
}
-13
View File
@@ -371,9 +371,6 @@ error:
return (ret);
}
void *failed_decrypt_buf;
int failed_decrypt_size;
/*
* This function handles all encryption and decryption in zfs. When
* encrypting it expects puio to reference the plaintext and cuio to
@@ -1663,9 +1660,6 @@ error:
return (ret);
}
void *failed_decrypt_buf;
int faile_decrypt_size;
/*
* Primary encryption / decryption entrypoint for zio data.
*/
@@ -1758,13 +1752,6 @@ zio_do_crypt_data(boolean_t encrypt, zio_crypt_key_t *key,
return (0);
error:
if (!encrypt) {
if (failed_decrypt_buf != NULL)
kmem_free(failed_decrypt_buf, failed_decrypt_size);
failed_decrypt_buf = kmem_alloc(datalen, KM_SLEEP);
failed_decrypt_size = datalen;
memcpy(failed_decrypt_buf, cipherbuf, datalen);
}
if (locked)
rw_exit(&key->zk_salt_lock);
if (authbuf != NULL)
+18 -1
View File
@@ -25,6 +25,10 @@
* SUCH DAMAGE.
*/
/*
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
#include <sys/types.h>
#include <sys/sysmacros.h>
#include <sys/kmem.h>
@@ -56,6 +60,19 @@ typedef struct zone_dataset {
} zone_dataset_t;
#ifdef CONFIG_USER_NS
/*
* Linux 6.18 moved the generic namespace type away from ns->ops->type onto
* ns_common itself.
*/
#ifdef HAVE_NS_COMMON_TYPE
#define ns_is_newuser(ns) \
((ns)->ns_type == CLONE_NEWUSER)
#else
#define ns_is_newuser(ns) \
((ns)->ops != NULL && (ns)->ops->type == CLONE_NEWUSER)
#endif
/*
* Returns:
* - 0 on success
@@ -84,7 +101,7 @@ user_ns_get(int fd, struct user_namespace **userns)
goto done;
}
ns = get_proc_ns(file_inode(nsfile));
if (ns->ops->type != CLONE_NEWUSER) {
if (!ns_is_newuser(ns)) {
error = ENOTTY;
goto done;
}
+24 -2
View File
@@ -23,6 +23,7 @@
* Copyright (c) 2014 by Chunwei Chen. All rights reserved.
* Copyright (c) 2019 by Delphix. All rights reserved.
* Copyright (c) 2023, 2024, Klara Inc.
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
/*
@@ -891,6 +892,14 @@ abd_iter_advance(struct abd_iter *aiter, size_t amount)
}
}
#ifndef nth_page
/*
* Since 6.18 nth_page() no longer exists, and is no longer required to iterate
* within a single SG entry, so we replace it with a simple addition.
*/
#define nth_page(p, n) ((p)+(n))
#endif
/*
* Map the current chunk into aiter. This can be safely called when the aiter
* has already exhausted, in which case this does nothing.
@@ -918,7 +927,14 @@ abd_iter_map(struct abd_iter *aiter)
aiter->iter_mapsize = MIN(aiter->iter_sg->length - offset,
aiter->iter_abd->abd_size - aiter->iter_pos);
paddr = zfs_kmap_local(sg_page(aiter->iter_sg));
struct page *page = sg_page(aiter->iter_sg);
if (PageHighMem(page)) {
page = nth_page(page, offset / PAGE_SIZE);
offset &= PAGE_SIZE - 1;
aiter->iter_mapsize = MIN(aiter->iter_mapsize,
PAGE_SIZE - offset);
}
paddr = zfs_kmap_local(page);
}
aiter->iter_mapaddr = (char *)paddr + offset;
@@ -936,8 +952,14 @@ abd_iter_unmap(struct abd_iter *aiter)
return;
if (!abd_is_linear(aiter->iter_abd)) {
size_t offset = aiter->iter_offset;
struct page *page = sg_page(aiter->iter_sg);
if (PageHighMem(page))
offset &= PAGE_SIZE - 1;
/* LINTED E_FUNC_SET_NOT_USED */
zfs_kunmap_local(aiter->iter_mapaddr - aiter->iter_offset);
zfs_kunmap_local(aiter->iter_mapaddr - offset);
}
ASSERT3P(aiter->iter_mapaddr, !=, NULL);
+2 -1
View File
@@ -115,8 +115,9 @@ zfs_file_write(zfs_file_t *fp, const void *buf, size_t count, ssize_t *resid)
*/
int
zfs_file_pwrite(zfs_file_t *fp, const void *buf, size_t count, loff_t off,
ssize_t *resid)
uint8_t ashift, ssize_t *resid)
{
(void) ashift;
ssize_t rc;
rc = kernel_write(fp, buf, count, &off);
+6 -4
View File
@@ -100,15 +100,17 @@ zfs_uiomove_bvec_impl(void *p, size_t n, zfs_uio_rw_t rw, zfs_uio_t *uio)
while (n && uio->uio_resid) {
void *paddr;
cnt = MIN(bv->bv_len - skip, n);
size_t offset = bv->bv_offset + skip;
cnt = MIN(PAGE_SIZE - (offset & ~PAGE_MASK),
MIN(bv->bv_len - skip, n));
paddr = zfs_kmap_local(bv->bv_page);
paddr = zfs_kmap_local(bv->bv_page + (offset >> PAGE_SHIFT));
if (rw == UIO_READ) {
/* Copy from buffer 'p' to the bvec data */
memcpy(paddr + bv->bv_offset + skip, p, cnt);
memcpy(paddr + (offset & ~PAGE_MASK), p, cnt);
} else {
/* Copy from bvec data to buffer 'p' */
memcpy(p, paddr + bv->bv_offset + skip, cnt);
memcpy(p, paddr + (offset & ~PAGE_MASK), cnt);
}
zfs_kunmap_local(paddr);
+6
View File
@@ -1521,6 +1521,12 @@ zfs_domount(struct super_block *sb, zfs_mnt_t *zm, int silent)
sb->s_xattr = zpl_xattr_handlers;
sb->s_export_op = &zpl_export_operations;
#ifdef HAVE_SET_DEFAULT_D_OP
set_default_d_op(sb, &zpl_dentry_operations);
#else
sb->s_d_op = &zpl_dentry_operations;
#endif
/* Set features for file system. */
zfs_set_fuid_feature(zfsvfs);
+2 -1
View File
@@ -3516,7 +3516,8 @@ zfs_link(znode_t *tdzp, znode_t *szp, char *name, cred_t *cr,
boolean_t is_tmpfile = 0;
uint64_t txg;
is_tmpfile = (sip->i_nlink == 0 && (sip->i_state & I_LINKABLE));
is_tmpfile = (sip->i_nlink == 0 &&
(inode_state_read_once(sip) & I_LINKABLE));
ASSERT(S_ISDIR(ZTOI(tdzp)->i_mode));
+74
View File
@@ -22,6 +22,7 @@
/*
* Copyright (c) 2011, Lawrence Livermore National Security, LLC.
* Copyright (c) 2015 by Chunwei Chen. All rights reserved.
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
@@ -444,6 +445,7 @@ zpl_putpage(struct page *pp, struct writeback_control *wbc, void *data)
return (ret);
}
#ifdef HAVE_WRITE_CACHE_PAGES
#ifdef HAVE_WRITEPAGE_T_FOLIO
static int
zpl_putfolio(struct folio *pp, struct writeback_control *wbc, void *data)
@@ -465,6 +467,78 @@ zpl_write_cache_pages(struct address_space *mapping,
#endif
return (result);
}
#else
static inline int
zpl_write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, void *data)
{
pgoff_t start = wbc->range_start >> PAGE_SHIFT;
pgoff_t end = wbc->range_end >> PAGE_SHIFT;
struct folio_batch fbatch;
folio_batch_init(&fbatch);
/*
* This atomically (-ish) tags all DIRTY pages in the range with
* TOWRITE, allowing users to continue dirtying or undirtying pages
* while we get on with writeback, without us treading on each other.
*/
tag_pages_for_writeback(mapping, start, end);
int err = 0;
unsigned int npages;
/*
* Grab references to the TOWRITE pages just flagged. This may not get
* all of them, so we do it in a loop until there are none left.
*/
while ((npages = filemap_get_folios_tag(mapping, &start, end,
PAGECACHE_TAG_TOWRITE, &fbatch)) != 0) {
/* Loop over each page and write it out. */
struct folio *folio;
while ((folio = folio_batch_next(&fbatch)) != NULL) {
folio_lock(folio);
/*
* If the folio has been remapped, or is no longer
* dirty, then there's nothing to do.
*/
if (folio->mapping != mapping ||
!folio_test_dirty(folio)) {
folio_unlock(folio);
continue;
}
/*
* If writeback is already in progress, wait for it to
* finish. We continue after this even if the page
* ends up clean; zfs_putpage() will skip it if no
* further work is required.
*/
while (folio_test_writeback(folio))
folio_wait_bit(folio, PG_writeback);
/*
* Write it out and collect any error. zfs_putpage()
* will clear the TOWRITE and DIRTY flags, and return
* with the page unlocked.
*/
int ferr = zpl_putpage(&folio->page, wbc, data);
if (err == 0 && ferr != 0)
err = ferr;
/* Housekeeping for the caller. */
wbc->nr_to_write -= folio_nr_pages(folio);
}
/* Release any remaining references on the batch. */
folio_batch_release(&fbatch);
}
return (err);
}
#endif
static int
zpl_writepages(struct address_space *mapping, struct writeback_control *wbc)
+83 -4
View File
@@ -22,6 +22,8 @@
/*
* Copyright (c) 2011, Lawrence Livermore National Security, LLC.
* Copyright (c) 2023, Datto Inc. All rights reserved.
* Copyright (c) 2025, Klara, Inc.
* Copyright (c) 2025, Rob Norris <robn@despairlabs.com>
*/
@@ -32,7 +34,22 @@
#include <sys/zpl.h>
#include <linux/iversion.h>
#include <linux/version.h>
#include <linux/vfs_compat.h>
/*
* What to do when the last reference to an inode is released. If 0, the kernel
* will cache it on the superblock. If 1, the inode will be freed immediately.
* See zpl_drop_inode().
*/
int zfs_delete_inode = 0;
/*
* What to do when the last reference to a dentry is released. If 0, the kernel
* will cache it until the entry (file) is destroyed. If 1, the dentry will be
* marked for cleanup, at which time its inode reference will be released. See
* zpl_dentry_delete().
*/
int zfs_delete_dentry = 0;
static struct inode *
zpl_inode_alloc(struct super_block *sb)
@@ -77,11 +94,36 @@ zpl_dirty_inode(struct inode *ip, int flags)
}
/*
* When ->drop_inode() is called its return value indicates if the
* inode should be evicted from the inode cache. If the inode is
* unhashed and has no links the default policy is to evict it
* immediately.
* ->drop_inode() is called when the last reference to an inode is released.
* Its return value indicates if the inode should be destroyed immediately, or
* cached on the superblock structure.
*
* By default (zfs_delete_inode=0), we call generic_drop_inode(), which returns
* "destroy immediately" if the inode is unhashed and has no links (roughly: no
* longer exists on disk). On datasets with millions of rarely-accessed files,
* this can cause a large amount of memory to be "pinned" by cached inodes,
* which in turn pin their associated dnodes and dbufs, until the kernel starts
* reporting memory pressure and requests OpenZFS release some memory (see
* zfs_prune()).
*
* When set to 1, we call generic_delete_inode(), which always returns "destroy
* immediately", resulting in inodes being destroyed immediately, releasing
* their associated dnodes and dbufs to the dbuf cached and the ARC to be
* evicted as normal.
*
* Note that the "last reference" doesn't always mean the last _userspace_
* reference; the dentry cache also holds a reference, so "busy" inodes will
* still be kept alive that way (subject to dcache tuning).
*/
static int
zpl_drop_inode(struct inode *ip)
{
if (zfs_delete_inode)
return (generic_delete_inode(ip));
return (generic_drop_inode(ip));
}
/*
* The ->evict_inode() callback must minimally truncate the inode pages,
* and call clear_inode(). For 2.6.35 and later kernels this will
* simply update the inode state, with the sync occurring before the
@@ -470,6 +512,7 @@ const struct super_operations zpl_super_operations = {
.destroy_inode = zpl_inode_destroy,
.dirty_inode = zpl_dirty_inode,
.write_inode = NULL,
.drop_inode = zpl_drop_inode,
.evict_inode = zpl_evict_inode,
.put_super = zpl_put_super,
.sync_fs = zpl_sync_fs,
@@ -480,6 +523,35 @@ const struct super_operations zpl_super_operations = {
.show_stats = NULL,
};
/*
* ->d_delete() is called when the last reference to a dentry is released. Its
* return value indicates if the dentry should be destroyed immediately, or
* retained in the dentry cache.
*
* By default (zfs_delete_dentry=0) the kernel will always cache unused
* entries. Each dentry holds an inode reference, so cached dentries can hold
* the final inode reference indefinitely, leading to the inode and its related
* data being pinned (see zpl_drop_inode()).
*
* When set to 1, we signal that the dentry should be destroyed immediately and
* never cached. This reduces memory usage, at the cost of higher overheads to
* lookup a file, as the inode and its underlying data (dnode/dbuf) need to be
* reloaded and reinflated.
*
* Note that userspace does not have direct control over dentry references and
* reclaim; rather, this is part of the kernel's caching and reclaim subsystems
* (eg vm.vfs_cache_pressure).
*/
static int
zpl_dentry_delete(const struct dentry *dentry)
{
return (zfs_delete_dentry ? 1 : 0);
}
const struct dentry_operations zpl_dentry_operations = {
.d_delete = zpl_dentry_delete,
};
struct file_system_type zpl_fs_type = {
.owner = THIS_MODULE,
.name = ZFS_DRIVER,
@@ -491,3 +563,10 @@ struct file_system_type zpl_fs_type = {
.mount = zpl_mount,
.kill_sb = zpl_kill_sb,
};
ZFS_MODULE_PARAM(zfs, zfs_, delete_inode, INT, ZMOD_RW,
"Delete inodes as soon as the last reference is released.");
ZFS_MODULE_PARAM(zfs, zfs_, delete_dentry, INT, ZMOD_RW,
"Delete dentries from dentry cache as soon as the last reference is "
"released.");
+24 -9
View File
@@ -21,7 +21,7 @@
*/
/*
* Copyright (c) 2012, 2020 by Delphix. All rights reserved.
* Copyright (c) 2024, Rob Norris <robn@despairlabs.com>
* Copyright (c) 2024, 2025, Rob Norris <robn@despairlabs.com>
* Copyright (c) 2024, Klara, Inc.
*/
@@ -1066,12 +1066,13 @@ zvol_os_clear_private(zvol_state_t *zv)
* tiny devices. For devices over 1 Mib a standard head and sector count
* is used to keep the cylinders count reasonable.
*/
static int
zvol_getgeo(struct block_device *bdev, struct hd_geometry *geo)
static inline int
zvol_getgeo_impl(struct gendisk *disk, struct hd_geometry *geo)
{
zvol_state_t *zv = bdev->bd_disk->private_data;
zvol_state_t *zv = disk->private_data;
sector_t sectors;
ASSERT3P(zv, !=, NULL);
ASSERT3U(zv->zv_open_count, >, 0);
sectors = get_capacity(zv->zv_zso->zvo_disk);
@@ -1090,6 +1091,20 @@ zvol_getgeo(struct block_device *bdev, struct hd_geometry *geo)
return (0);
}
#ifdef HAVE_BLOCK_DEVICE_OPERATIONS_GETGEO_GENDISK
static int
zvol_getgeo(struct gendisk *disk, struct hd_geometry *geo)
{
return (zvol_getgeo_impl(disk, geo));
}
#else
static int
zvol_getgeo(struct block_device *bdev, struct hd_geometry *geo)
{
return (zvol_getgeo_impl(bdev->bd_disk, geo));
}
#endif
/*
* Why have two separate block_device_operations structs?
*
@@ -1531,7 +1546,7 @@ zvol_os_free(zvol_state_t *zv)
if (zv->zv_zso->use_blk_mq)
blk_mq_free_tag_set(&zv->zv_zso->tag_set);
ida_simple_remove(&zvol_ida,
ida_free(&zvol_ida,
MINOR(zv->zv_zso->zvo_dev) >> ZVOL_MINOR_BITS);
cv_destroy(&zv->zv_removing_cv);
@@ -1665,7 +1680,7 @@ zvol_os_create_minor(const char *name)
if (zvol_inhibit_dev)
return (0);
idx = ida_simple_get(&zvol_ida, 0, 0, kmem_flags_convert(KM_SLEEP));
idx = ida_alloc(&zvol_ida, kmem_flags_convert(KM_SLEEP));
if (idx < 0)
return (SET_ERROR(-idx));
minor = idx << ZVOL_MINOR_BITS;
@@ -1673,7 +1688,7 @@ zvol_os_create_minor(const char *name)
/* too many partitions can cause an overflow */
zfs_dbgmsg("zvol: create minor overflow: %s, minor %u/%u",
name, minor, MINOR(minor));
ida_simple_remove(&zvol_ida, idx);
ida_free(&zvol_ida, idx);
return (SET_ERROR(EINVAL));
}
@@ -1681,7 +1696,7 @@ zvol_os_create_minor(const char *name)
if (zv) {
ASSERT(MUTEX_HELD(&zv->zv_state_lock));
mutex_exit(&zv->zv_state_lock);
ida_simple_remove(&zvol_ida, idx);
ida_free(&zvol_ida, idx);
return (SET_ERROR(EEXIST));
}
@@ -1783,7 +1798,7 @@ out_doi:
rw_exit(&zvol_state_lock);
error = zvol_os_add_disk(zv->zv_zso->zvo_disk);
} else {
ida_simple_remove(&zvol_ida, idx);
ida_free(&zvol_ida, idx);
}
return (error);
+8 -8
View File
@@ -1111,13 +1111,6 @@ abd_raidz_gen_iterate(abd_t **cabds, abd_t *dabd, size_t off,
func_raidz_gen(caddrs, daddr, len, dlen);
for (i = parity-1; i >= 0; i--) {
abd_iter_unmap(&caiters[i]);
c_cabds[i] =
abd_advance_abd_iter(cabds[i], c_cabds[i],
&caiters[i], len);
}
if (dsize > 0) {
abd_iter_unmap(&daiter);
c_dabd =
@@ -1126,6 +1119,13 @@ abd_raidz_gen_iterate(abd_t **cabds, abd_t *dabd, size_t off,
dsize -= dlen;
}
for (i = parity - 1; i >= 0; i--) {
abd_iter_unmap(&caiters[i]);
c_cabds[i] =
abd_advance_abd_iter(cabds[i], c_cabds[i],
&caiters[i], len);
}
csize -= len;
}
abd_exit_critical(flags);
@@ -1194,7 +1194,7 @@ abd_raidz_rec_iterate(abd_t **cabds, abd_t **tabds,
func_raidz_rec(xaddrs, len, caddrs, mul);
for (i = parity-1; i >= 0; i--) {
for (i = parity - 1; i >= 0; i--) {
abd_iter_unmap(&xiters[i]);
abd_iter_unmap(&citers[i]);
c_tabds[i] =
+1 -1
View File
@@ -212,7 +212,7 @@ dataset_kstats_rename(dataset_kstats_t *dk, const char *name)
char *ds_name;
ds_name = KSTAT_NAMED_STR_PTR(&dkv->dkv_ds_name);
ASSERT3S(ds_name, !=, NULL);
ASSERT3P(ds_name, !=, NULL);
(void) strlcpy(ds_name, name,
KSTAT_NAMED_STR_BUFLEN(&dkv->dkv_ds_name));
}
+1 -1
View File
@@ -1508,7 +1508,7 @@ ddt_configure(ddt_t *ddt, boolean_t new)
DMU_POOL_DIRECTORY_OBJECT, name, sizeof (uint64_t), 1,
&ddt->ddt_dir_object);
if (error == 0) {
ASSERT3U(spa->spa_meta_objset, ==, ddt->ddt_os);
ASSERT3P(spa->spa_meta_objset, ==, ddt->ddt_os);
error = zap_lookup(ddt->ddt_os, ddt->ddt_dir_object,
DDT_DIR_VERSION, sizeof (uint64_t), 1,
+2 -2
View File
@@ -262,7 +262,7 @@ ddt_log_update_entry(ddt_t *ddt, ddt_log_t *ddl, ddt_lightweight_entry_t *ddlwe)
void
ddt_log_entry(ddt_t *ddt, ddt_lightweight_entry_t *ddlwe, ddt_log_update_t *dlu)
{
ASSERT3U(dlu->dlu_dbp, !=, NULL);
ASSERT3P(dlu->dlu_dbp, !=, NULL);
ddt_log_update_entry(ddt, ddt->ddt_log_active, ddlwe);
ddt_histogram_add_entry(ddt, &ddt->ddt_log_histogram, ddlwe);
@@ -312,7 +312,7 @@ ddt_log_entry(ddt_t *ddt, ddt_lightweight_entry_t *ddlwe, ddt_log_update_t *dlu)
void
ddt_log_commit(ddt_t *ddt, ddt_log_update_t *dlu)
{
ASSERT3U(dlu->dlu_dbp, !=, NULL);
ASSERT3P(dlu->dlu_dbp, !=, NULL);
ASSERT3U(dlu->dlu_block+1, ==, dlu->dlu_ndbp);
ASSERT3U(dlu->dlu_offset, >, 0);
+135 -23
View File
@@ -145,6 +145,15 @@
* Additionally, the duration is then extended by a random 25% to attempt to to
* detect simultaneous imports. For example, if both partner hosts are rebooted
* at the same time and automatically attempt to import the pool.
*
* Once the read-only activity check completes and the pool is determined to
* be inactive a second check is performed to claim the pool. During this
* phase the host writes out MMP uberblocks to each of the devices which are
* identical to the best uberblock but with a randomly selected sequence id.
* The "best" uberblock is then read back and it must contain this new sequence
* number. This check is performed multiple times to ensure that there is
* no window where a concurrently importing system can incorrectly determine
* the pool to be inactive.
*/
/*
@@ -237,8 +246,8 @@ mmp_thread_start(spa_t *spa)
if (!mmp->mmp_thread) {
mmp->mmp_thread = thread_create(NULL, 0, mmp_thread,
spa, 0, &p0, TS_RUN, defclsyspri);
zfs_dbgmsg("MMP thread started pool '%s' "
"gethrtime %llu", spa_name(spa), gethrtime());
zfs_dbgmsg("mmp: mmp thread started spa=%s "
"gethrtime=%llu", spa_name(spa), gethrtime());
}
mutex_exit(&mmp->mmp_thread_lock);
}
@@ -257,7 +266,7 @@ mmp_thread_stop(spa_t *spa)
cv_wait(&mmp->mmp_thread_cv, &mmp->mmp_thread_lock);
}
mutex_exit(&mmp->mmp_thread_lock);
zfs_dbgmsg("MMP thread stopped pool '%s' gethrtime %llu",
zfs_dbgmsg("mmp: mmp thread stopped spa=%s gethrtime=%llu",
spa_name(spa), gethrtime());
ASSERT(mmp->mmp_thread == NULL);
@@ -449,9 +458,9 @@ mmp_write_uberblock(spa_t *spa)
spa_config_enter_mmp(spa, SCL_STATE, mmp_tag, RW_READER);
lock_acquire_time = gethrtime() - lock_acquire_time;
if (lock_acquire_time > (MSEC2NSEC(MMP_MIN_INTERVAL) / 10))
zfs_dbgmsg("MMP SCL_STATE acquisition pool '%s' took %llu ns "
"gethrtime %llu", spa_name(spa), lock_acquire_time,
gethrtime());
zfs_dbgmsg("mmp: long SCL_STATE acquisition, spa=%s "
"acquire_time=%llu gethrtime=%llu", spa_name(spa),
lock_acquire_time, gethrtime());
mutex_enter(&mmp->mmp_io_lock);
@@ -474,8 +483,8 @@ mmp_write_uberblock(spa_t *spa)
spa_mmp_history_add(spa, mmp->mmp_ub.ub_txg,
gethrestime_sec(), mmp->mmp_delay, NULL, 0,
mmp->mmp_kstat_id++, error);
zfs_dbgmsg("MMP error choosing leaf pool '%s' "
"gethrtime %llu fail_mask %#x", spa_name(spa),
zfs_dbgmsg("mmp: error choosing leaf, spa=%s "
"gethrtime=%llu fail_mask=%#x", spa_name(spa),
gethrtime(), error);
}
mutex_exit(&mmp->mmp_io_lock);
@@ -485,11 +494,11 @@ mmp_write_uberblock(spa_t *spa)
vd = spa->spa_mmp.mmp_last_leaf;
if (mmp->mmp_skip_error != 0) {
mmp->mmp_skip_error = 0;
zfs_dbgmsg("MMP write after skipping due to unavailable "
"leaves, pool '%s' gethrtime %llu leaf %llu",
zfs_dbgmsg("mmp: write after skipping due to unavailable "
"leaves, spa=%s gethrtime=%llu vdev=%llu error=%d",
spa_name(spa), (u_longlong_t)gethrtime(),
(u_longlong_t)vd->vdev_guid);
(u_longlong_t)vd->vdev_guid, mmp->mmp_skip_error);
mmp->mmp_skip_error = 0;
}
if (mmp->mmp_zio_root == NULL)
@@ -540,6 +549,108 @@ mmp_write_uberblock(spa_t *spa)
zio_nowait(zio);
}
static void
mmp_claim_uberblock_sync_done(zio_t *zio)
{
uint64_t *good_writes = zio->io_private;
if (zio->io_error == 0 && zio->io_vd->vdev_top->vdev_ms_array != 0)
atomic_inc_64(good_writes);
}
/*
* Write the uberblock to the first label of all leaves of the specified vdev.
* Two writes required for each mirror, one for a singleton, and parity+1 for
* raidz or draid vdevs.
*/
static void
mmp_claim_uberblock_sync(zio_t *zio, uint64_t *good_writes,
uint64_t *req_writes, uberblock_t *ub, vdev_t *vd, int flags)
{
for (uint64_t c = 0; c < vd->vdev_children; c++) {
vdev_t *cvd = vd->vdev_child[c];
if (cvd->vdev_islog || cvd->vdev_isspare || cvd->vdev_isl2cache)
continue;
if (cvd->vdev_top == cvd) {
uint64_t nparity = vdev_get_nparity(cvd);
if (nparity) {
*req_writes += nparity + 1;
} else {
*req_writes +=
MIN(MAX(cvd->vdev_children, 1), 2);
}
}
mmp_claim_uberblock_sync(zio, good_writes, req_writes,
ub, cvd, flags);
}
if (!vd->vdev_ops->vdev_op_leaf)
return;
if (!vdev_writeable(vd))
return;
if (vd->vdev_ops == &vdev_draid_spare_ops)
return;
abd_t *ub_abd = abd_alloc_for_io(VDEV_UBERBLOCK_SIZE(vd), B_TRUE);
abd_copy_from_buf(ub_abd, ub, sizeof (uberblock_t));
abd_zero_off(ub_abd, sizeof (uberblock_t),
VDEV_UBERBLOCK_SIZE(vd) - sizeof (uberblock_t));
vdev_label_write(zio, vd, 0, ub_abd,
VDEV_UBERBLOCK_OFFSET(vd, VDEV_UBERBLOCK_COUNT(vd) -
MMP_BLOCKS_PER_LABEL), VDEV_UBERBLOCK_SIZE(vd),
mmp_claim_uberblock_sync_done, good_writes,
flags | ZIO_FLAG_DONT_PROPAGATE);
abd_free(ub_abd);
}
int
mmp_claim_uberblock(spa_t *spa, vdev_t *vd, uberblock_t *ub)
{
int flags = ZIO_FLAG_CONFIG_WRITER | ZIO_FLAG_CANFAIL;
uint64_t good_writes = 0;
uint64_t req_writes = 0;
zio_t *zio;
ASSERT(MMP_VALID(ub));
ASSERT(MMP_SEQ_VALID(ub));
spa_config_enter(spa, SCL_ALL, mmp_tag, RW_WRITER);
/* Sync the uberblock to all writeable leaves */
zio = zio_root(spa, NULL, NULL, flags);
mmp_claim_uberblock_sync(zio, &good_writes, &req_writes, ub, vd, flags);
(void) zio_wait(zio);
/* Flush the new uberblocks so they're immediately visible */
zio = zio_root(spa, NULL, NULL, flags);
zio_flush(zio, vd);
(void) zio_wait(zio);
spa_config_exit(spa, SCL_ALL, mmp_tag);
zfs_dbgmsg("mmp: claiming uberblock, spa=%s txg=%llu seq=%llu "
"req_writes=%llu good_writes=%llu", spa_load_name(spa),
(u_longlong_t)ub->ub_txg, (u_longlong_t)MMP_SEQ(ub),
(u_longlong_t)req_writes, (u_longlong_t)good_writes);
/*
* To guarantee visibility from a remote host we require a minimum
* number of good writes. For raidz/draid vdevs parity+1 writes, for
* mirrors 2 writes, and for singletons 1 write.
*/
if (req_writes == 0 || good_writes < req_writes)
return (SET_ERROR(EIO));
return (0);
}
static __attribute__((noreturn)) void
mmp_thread(void *arg)
{
@@ -616,11 +727,11 @@ mmp_thread(void *arg)
next_time = gethrtime() + mmp_interval / leaves;
if (mmp_fail_ns != last_mmp_fail_ns) {
zfs_dbgmsg("MMP interval change pool '%s' "
"gethrtime %llu last_mmp_interval %llu "
"mmp_interval %llu last_mmp_fail_intervals %u "
"mmp_fail_intervals %u mmp_fail_ns %llu "
"skip_wait %d leaves %d next_time %llu",
zfs_dbgmsg("mmp: interval change, spa=%s "
"gethrtime=%llu last_mmp_interval=%llu "
"mmp_interval=%llu last_mmp_fail_intervals=%u "
"mmp_fail_intervals=%u mmp_fail_ns=%llu "
"skip_wait=%d leaves=%d next_time=%llu",
spa_name(spa), (u_longlong_t)gethrtime(),
(u_longlong_t)last_mmp_interval,
(u_longlong_t)mmp_interval, last_mmp_fail_intervals,
@@ -635,9 +746,9 @@ mmp_thread(void *arg)
*/
if ((!last_spa_multihost && multihost) ||
(last_spa_suspended && !suspended)) {
zfs_dbgmsg("MMP state change pool '%s': gethrtime %llu "
"last_spa_multihost %u multihost %u "
"last_spa_suspended %u suspended %u",
zfs_dbgmsg("mmp: state change spa=%s: gethrtime=%llu "
"last_spa_multihost=%u multihost=%u "
"last_spa_suspended=%u suspended=%u",
spa_name(spa), (u_longlong_t)gethrtime(),
last_spa_multihost, multihost, last_spa_suspended,
suspended);
@@ -663,9 +774,10 @@ mmp_thread(void *arg)
*/
if (multihost && !suspended && mmp_fail_intervals &&
(gethrtime() - mmp->mmp_last_write) > mmp_fail_ns) {
zfs_dbgmsg("MMP suspending pool '%s': gethrtime %llu "
"mmp_last_write %llu mmp_interval %llu "
"mmp_fail_intervals %llu mmp_fail_ns %llu txg %llu",
zfs_dbgmsg("mmp: suspending pool, spa=%s "
"gethrtime=%llu mmp_last_write=%llu "
"mmp_interval=%llu mmp_fail_intervals=%llu "
"mmp_fail_ns=%llu txg=%llu",
spa_name(spa), (u_longlong_t)gethrtime(),
(u_longlong_t)mmp->mmp_last_write,
(u_longlong_t)mmp_interval,
+658 -193
View File
File diff suppressed because it is too large Load Diff
+21 -4
View File
@@ -413,7 +413,7 @@ spa_load_failed(spa_t *spa, const char *fmt, ...)
(void) vsnprintf(buf, sizeof (buf), fmt, adx);
va_end(adx);
zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa->spa_name,
zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa_load_name(spa),
spa->spa_trust_config ? "trusted" : "untrusted", buf);
}
@@ -427,7 +427,7 @@ spa_load_note(spa_t *spa, const char *fmt, ...)
(void) vsnprintf(buf, sizeof (buf), fmt, adx);
va_end(adx);
zfs_dbgmsg("spa_load(%s, config %s): %s", spa->spa_name,
zfs_dbgmsg("spa_load(%s, config %s): %s", spa_load_name(spa),
spa->spa_trust_config ? "trusted" : "untrusted", buf);
spa_import_progress_set_notes_nolog(spa, "%s", buf);
@@ -857,6 +857,9 @@ spa_remove(spa_t *spa)
if (spa->spa_root)
spa_strfree(spa->spa_root);
if (spa->spa_load_name)
spa_strfree(spa->spa_load_name);
while ((dp = list_remove_head(&spa->spa_config_list)) != NULL) {
if (dp->scd_path != NULL)
spa_strfree(dp->scd_path);
@@ -1242,7 +1245,7 @@ spa_vdev_enter(spa_t *spa)
mutex_enter(&spa->spa_vdev_top_lock);
mutex_enter(&spa_namespace_lock);
ASSERT0(spa->spa_export_thread);
ASSERT0P(spa->spa_export_thread);
vdev_autotrim_stop_all(spa);
@@ -1261,7 +1264,7 @@ spa_vdev_detach_enter(spa_t *spa, uint64_t guid)
mutex_enter(&spa->spa_vdev_top_lock);
mutex_enter(&spa_namespace_lock);
ASSERT0(spa->spa_export_thread);
ASSERT0P(spa->spa_export_thread);
vdev_autotrim_stop_all(spa);
@@ -1777,6 +1780,19 @@ spa_name(spa_t *spa)
return (spa->spa_name);
}
char *
spa_load_name(spa_t *spa)
{
/*
* During spa_tryimport() the pool name includes a unique prefix.
* Returns the original name which can be used for log messages.
*/
if (spa->spa_load_name)
return (spa->spa_load_name);
return (spa->spa_name);
}
uint64_t
spa_guid(spa_t *spa)
{
@@ -3089,6 +3105,7 @@ EXPORT_SYMBOL(spa_set_rootblkptr);
EXPORT_SYMBOL(spa_altroot);
EXPORT_SYMBOL(spa_sync_pass);
EXPORT_SYMBOL(spa_name);
EXPORT_SYMBOL(spa_load_name);
EXPORT_SYMBOL(spa_guid);
EXPORT_SYMBOL(spa_last_synced_txg);
EXPORT_SYMBOL(spa_first_txg);
+2 -1
View File
@@ -228,7 +228,8 @@ vdev_file_io_strategy(void *arg)
abd_return_buf_copy(zio->io_abd, buf, size);
} else {
buf = abd_borrow_buf_copy(zio->io_abd, zio->io_size);
err = zfs_file_pwrite(vf->vf_file, buf, size, off, &resid);
err = zfs_file_pwrite(vf->vf_file, buf, size, off,
vd->vdev_ashift, &resid);
abd_return_buf(zio->io_abd, buf, size);
}
zio->io_error = err;
+6 -4
View File
@@ -1500,7 +1500,7 @@ retry:
* conflicting uberblocks on disk with the same txg. The solution is simple:
* among uberblocks with equal txg, choose the one with the latest timestamp.
*/
static int
int
vdev_uberblock_compare(const uberblock_t *ub1, const uberblock_t *ub2)
{
int cmp = TREE_CMP(ub1->ub_txg, ub2->ub_txg);
@@ -1631,8 +1631,10 @@ vdev_uberblock_load(vdev_t *rvd, uberblock_t *ub, nvlist_t **config)
* matches the txg for our uberblock.
*/
if (cb.ubl_vd != NULL) {
vdev_dbgmsg(cb.ubl_vd, "best uberblock found for spa %s. "
"txg %llu", spa->spa_name, (u_longlong_t)ub->ub_txg);
vdev_dbgmsg(cb.ubl_vd, "best uberblock found for spa %s, "
"txg=%llu seq=%llu", spa_load_name(spa),
(u_longlong_t)ub->ub_txg,
(u_longlong_t)(MMP_SEQ_VALID(ub) ? MMP_SEQ(ub) : 0));
if (ub->ub_raidz_reflow_info !=
cb.ubl_latest.ub_raidz_reflow_info) {
@@ -1640,7 +1642,7 @@ vdev_uberblock_load(vdev_t *rvd, uberblock_t *ub, nvlist_t **config)
"spa=%s best uberblock (txg=%llu info=0x%llx) "
"has different raidz_reflow_info than latest "
"uberblock (txg=%llu info=0x%llx)",
spa->spa_name,
spa_load_name(spa),
(u_longlong_t)ub->ub_txg,
(u_longlong_t)ub->ub_raidz_reflow_info,
(u_longlong_t)cb.ubl_latest.ub_txg,
+13 -11
View File
@@ -155,11 +155,11 @@ chksum_run(chksum_stat_t *cs, abd_t *abd, void *ctx, int round,
switch (round) {
case 1: /* 1k */
size = 1<<10; loops = 128; break;
case 2: /* 2k */
case 2: /* 4k */
size = 1<<12; loops = 64; break;
case 3: /* 4k */
case 3: /* 16k */
size = 1<<14; loops = 32; break;
case 4: /* 16k */
case 4: /* 64k */
size = 1<<16; loops = 16; break;
case 5: /* 256k */
size = 1<<18; loops = 8; break;
@@ -212,6 +212,7 @@ chksum_benchit(chksum_stat_t *cs)
chksum_run(cs, abd, ctx, 2, &cs->bs4k);
chksum_run(cs, abd, ctx, 3, &cs->bs16k);
chksum_run(cs, abd, ctx, 4, &cs->bs64k);
chksum_run(cs, abd, ctx, 5, &cs->bs256k);
chksum_run(cs, abd, ctx, 6, &cs->bs1m);
abd_free(abd);
@@ -249,15 +250,16 @@ chksum_benchmark(void)
if (chksum_stat_limit == AT_DONE)
return;
/* count implementations */
chksum_stat_cnt = 1; /* edonr */
chksum_stat_cnt += 1; /* skein */
chksum_stat_cnt += sha256->getcnt();
chksum_stat_cnt += sha512->getcnt();
chksum_stat_cnt += blake3->getcnt();
chksum_stat_data = kmem_zalloc(
sizeof (chksum_stat_t) * chksum_stat_cnt, KM_SLEEP);
if (chksum_stat_limit == AT_STARTUP) {
chksum_stat_cnt = 1; /* edonr */
chksum_stat_cnt += 1; /* skein */
chksum_stat_cnt += sha256->getcnt();
chksum_stat_cnt += sha512->getcnt();
chksum_stat_cnt += blake3->getcnt();
chksum_stat_data = kmem_zalloc(
sizeof (chksum_stat_t) * chksum_stat_cnt, KM_SLEEP);
}
/* edonr - needs to be the first one here (slow CPU check) */
cs = &chksum_stat_data[cbid++];
+1 -1
View File
@@ -128,7 +128,7 @@ zio_compress_data(enum zio_compress c, abd_t *src, abd_t **dst, size_t s_len,
uint8_t complevel;
zio_compress_info_t *ci = &zio_compress_table[c];
ASSERT3U(ci->ci_compress, !=, NULL);
ASSERT3P(ci->ci_compress, !=, NULL);
ASSERT3U(s_len, >, 0);
complevel = ci->ci_level;
+3 -1
View File
@@ -7,7 +7,9 @@ REF="HEAD"
test_commit_bodylength()
{
length="72"
body=$(git log --no-show-signature -n 1 --pretty=%b "$REF" | grep -Ev "http(s)*://" | grep -E -m 1 ".{$((length + 1))}")
body=$(git log --no-show-signature -n 1 --pretty=%b "$REF" |
grep -Evi -e "http(s)*://" -e "signed-off-by:" -e "reviewed-by:" |
grep -E -m 1 ".{$((length + 1))}")
if [ -n "$body" ]; then
echo "error: commit message body contains line over ${length} characters"
return 1
+2 -2
View File
@@ -22,10 +22,10 @@
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
# Filter out objtools '--Werror' flag.
# Filter out objtools '--Werror or --werror' flag.
objtool="@abs_objtool_binary@"
args=$(echo "$*" | sed s/--Werror//)
args=$(echo "$*" | sed 's/--Werror\|--werror//')
if [ -z "$objtool" ]; then
echo "$(basename "$0"): No objtool binary configured" 1>&2
+1 -1
View File
@@ -378,7 +378,7 @@ tags = ['functional', 'cli_root', 'zfs_wait']
[tests/functional/cli_root/zhack]
tests = ['zhack_label_repair_001', 'zhack_label_repair_002',
'zhack_label_repair_003', 'zhack_label_repair_004']
'zhack_label_repair_003', 'zhack_label_repair_004', 'zhack_metaslab_leak']
pre =
post =
tags = ['functional', 'cli_root', 'zhack']
+4 -2
View File
@@ -161,11 +161,13 @@ tags = ['functional', 'mmap']
tests = ['mmp_on_thread', 'mmp_on_uberblocks', 'mmp_on_off', 'mmp_interval',
'mmp_active_import', 'mmp_inactive_import', 'mmp_exported_import',
'mmp_write_uberblocks', 'mmp_reset_interval', 'multihost_history',
'mmp_on_zdb', 'mmp_write_distribution', 'mmp_hostid', 'mmp_write_slow_disk']
'mmp_on_zdb', 'mmp_write_distribution', 'mmp_hostid', 'mmp_write_slow_disk',
'mmp_concurrent_import']
tags = ['functional', 'mmp']
timeout = 1200
[tests/functional/mount:Linux]
tests = ['umount_unlinked_drain']
tests = ['umount_unlinked_drain', 'mount_loopback']
tags = ['functional', 'mount']
[tests/functional/pam:Linux]
-4
View File
@@ -247,7 +247,6 @@ maybe = {
'l2arc/persist_l2arc_005_pos': ['FAIL', known_reason],
'largest_pool/largest_pool_001_pos': ['FAIL', known_reason],
'mmap/mmap_sync_001_pos': ['FAIL', known_reason],
'mmp/mmp_on_uberblocks': ['FAIL', known_reason],
'pam/setup': ['SKIP', "pamtester might be not available"],
'pool_checkpoint/checkpoint_discard_busy': ['FAIL', 11946],
'projectquota/setup': ['SKIP', exec_reason],
@@ -366,9 +365,6 @@ elif sys.platform.startswith('linux'):
'io/io_uring': ['SKIP', 'io_uring support required'],
'limits/filesystem_limit': ['SKIP', known_reason],
'limits/snapshot_limit': ['SKIP', known_reason],
'mmp/mmp_active_import': ['FAIL', known_reason],
'mmp/mmp_exported_import': ['FAIL', known_reason],
'mmp/mmp_inactive_import': ['FAIL', known_reason],
'stat/statx_dioalign': ['SKIP', 'statx_reason'],
})
+1 -1
View File
@@ -26,7 +26,7 @@ echo "================================================================="
sudo tail -n $lines /proc/spl/kstat/zfs/dbgmsg
# reset dbgmsg
sudo bash -c "echo > /proc/spl/kstat/zfs/dbgmsg"
sudo sh -c "echo > /proc/spl/kstat/zfs/dbgmsg"
echo "================================================================="
echo " End of zfs_dbgmsg log"
+1 -1
View File
@@ -31,7 +31,7 @@ for f in /proc/spl/kstat/zfs/*/multihost; do
echo "================================================================="
sudo tail -n $lines $f
sudo bash -c "echo > $f"
sudo sh -c "echo > $f"
done
echo "================================================================="
+2
View File
@@ -100,6 +100,7 @@ export SYSTEM_FILES_COMMON='awk
uniq
vmstat
wc
which
xargs
xxh128sum'
@@ -146,6 +147,7 @@ export SYSTEM_FILES_LINUX='attr
lscpu
lsmod
lsscsi
mkfs.xfs
mkswap
modprobe
mountpoint
+1 -3
View File
@@ -559,7 +559,7 @@ function default_cleanup_noexit
# Here, we loop through the pools we're allowed to
# destroy, only destroying them if it's safe to do
# so.
while [ ! -z ${ALL_POOLS} ]
while [ -n "${ALL_POOLS}" ]
do
for pool in ${ALL_POOLS}
do
@@ -3803,8 +3803,6 @@ function directory_diff # dir_a dir_b
# do not match there is a "c" entry in one of the columns).
if rsync --version | grep -q "[, ] crtimes"; then
args+=("--crtimes")
else
log_note "This rsync package does not support --crtimes (-N)."
fi
# If we are testing a ZIL replay, we need to ignore timestamp changes.
+3
View File
@@ -1004,6 +1004,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/cli_root/zhack/zhack_label_repair_002.ksh \
functional/cli_root/zhack/zhack_label_repair_003.ksh \
functional/cli_root/zhack/zhack_label_repair_004.ksh \
functional/cli_root/zhack/zhack_metaslab_leak.ksh \
functional/cli_root/zpool_add/add_nested_replacing_spare.ksh \
functional/cli_root/zpool_add/add-o_ashift.ksh \
functional/cli_root/zpool_add/add_prop_ashift.ksh \
@@ -1666,6 +1667,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/mmap/setup.ksh \
functional/mmp/cleanup.ksh \
functional/mmp/mmp_active_import.ksh \
functional/mmp/mmp_concurrent_import.ksh \
functional/mmp/mmp_exported_import.ksh \
functional/mmp/mmp_hostid.ksh \
functional/mmp/mmp_inactive_import.ksh \
@@ -1682,6 +1684,7 @@ nobase_dist_datadir_zfs_tests_tests_SCRIPTS += \
functional/mmp/setup.ksh \
functional/mount/cleanup.ksh \
functional/mount/setup.ksh \
functional/mount/mount_loopback.ksh \
functional/mount/umount_001.ksh \
functional/mount/umountall_001.ksh \
functional/mount/umount_unlinked_drain.ksh \
@@ -0,0 +1,70 @@
#!/bin/ksh
# SPDX-License-Identifier: CDDL-1.0
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
#
# Description:
#
# Test whether zhack metaslab leak functions correctly
#
# Strategy:
#
# 1. Create pool on a loopback device with some test data
# 2. Gather pool capacity stats
# 3. Generate fragmentation data with zdb
# 4. Destroy the pool
# 5. Create a new pool with the same configuration
# 6. Export the pool
# 7. Apply the fragmentation information with zhack metaslab leak
# 8. Import the pool
# 9. Verify that pool capacity stats match
. "$STF_SUITE"/include/libtest.shlib
verify_runnable "global"
function cleanup
{
zpool destroy $TESTPOOL
rm $tmp
}
log_onexit cleanup
log_assert "zhack metaslab leak leaks the right amount of space"
typeset tmp=$(mktemp)
log_must zpool create $TESTPOOL $DISKS
for i in `seq 1 16`; do
log_must dd if=/dev/urandom of=/$TESTPOOL/f$i bs=1M count=16
log_must zpool sync $TESTPOOL
done
for i in `seq 2 2 16`; do
log_must rm /$TESTPOOL/f$i
done
for i in `seq 1 16`; do
log_must touch /$TESTPOOL/g$i
log_must zpool sync $TESTPOOL
done
alloc=$(zpool get -Hpo value alloc $TESTPOOL)
log_must eval "zdb -m --allocated-map $TESTPOOL > $tmp"
log_must zpool destroy $TESTPOOL
log_must zpool create $TESTPOOL $DISKS
log_must zpool export $TESTPOOL
log_must eval "zhack metaslab leak $TESTPOOL < $tmp"
log_must zpool import $TESTPOOL
alloc2=$(zpool get -Hpo value alloc $TESTPOOL)
[[ $((alloc * 1.05)) -gt $alloc2 ]] && [[ $alloc -lt $alloc2 ]] || \
log_fail "space usage changed too much: $alloc to $alloc2"
log_pass "zhack metaslab leak behaved correctly"
+3 -3
View File
@@ -25,6 +25,8 @@ export DISK=${DISKS%% *}
export HOSTID_FILE="/etc/hostid"
export HOSTID1=01234567
export HOSTID2=89abcdef
export HOSTID3=aaaabbbb
export HOSTID4=ccccdddd
export TXG_TIMEOUT_LONG=5000
export TXG_TIMEOUT_DEFAULT=5
@@ -32,7 +34,7 @@ export TXG_TIMEOUT_DEFAULT=5
export MMP_POOL=mmppool
export MMP_DIR=$TEST_BASE_DIR/mmp
export MMP_CACHE=$MMP_DIR/zpool.cache
export MMP_ZTEST_LOG=$MMP_DIR/ztest.log
export MMP_ZHACK_LOG=$MMP_DIR/zhack.log
export MMP_HISTORY=100
export MMP_HISTORY_OFF=0
@@ -43,5 +45,3 @@ export MMP_INTERVAL_MIN=100
export MMP_IMPORT_INTERVALS=20
export MMP_FAIL_INTERVALS_DEFAULT=10
export MMP_FAIL_INTERVALS_MIN=2
export MMP_TEST_DURATION_DEFAULT=$((MMP_IMPORT_INTERVALS*MMP_INTERVAL_DEFAULT/1000))
+24 -23
View File
@@ -99,11 +99,11 @@ function mmp_pool_create_simple # pool dir
log_must zpool set multihost=on $pool
}
function mmp_pool_create # pool dir
function mmp_pool_create_zhack # pool dir
{
typeset pool=${1:-$MMP_POOL}
typeset dir=${2:-$MMP_DIR}
typeset opts="-VVVVV -T120 -M -k0 -f $dir -E -p $pool"
typeset opts="-d $dir action idle -t120 $pool"
mmp_pool_create_simple $pool $dir
@@ -112,11 +112,11 @@ function mmp_pool_create # pool dir
log_must mmp_clear_hostid
log_must mmp_set_hostid $HOSTID2
log_note "Starting ztest in the background as hostid $HOSTID1"
log_must eval "ZFS_HOSTID=$HOSTID1 ztest $opts >$MMP_ZTEST_LOG 2>&1 &"
log_note "Starting zhack in the background as hostid $HOSTID1"
log_must eval "ZFS_HOSTID=$HOSTID1 zhack $opts >$MMP_ZHACK_LOG 2>&1 &"
while ! is_pool_imported "$pool" "-d $dir"; do
log_must pgrep ztest
log_must pgrep zhack
log_must sleep 5
done
}
@@ -126,10 +126,10 @@ function mmp_pool_destroy # pool dir
typeset pool=${1:-$MMP_POOL}
typeset dir=${2:-$MMP_DIR}
ZTESTPID=$(pgrep ztest)
if [ -n "$ZTESTPID" ]; then
log_must kill $ZTESTPID
wait $ZTESTPID
ZHACKPID=$(pgrep zhack)
if [ -n "$ZHACKPID" ]; then
log_must kill $ZHACKPID
wait $ZHACKPID
fi
if poolexists $pool; then
@@ -158,33 +158,34 @@ function import_no_activity_check # pool opts
typeset pool=$1
typeset opts=$2
typeset max_duration=$((MMP_TEST_DURATION_DEFAULT-1))
SECONDS=0
zpool import $opts $pool
RESULT=$(ZFS_LOAD_INFO_DEBUG=1 zpool import $opts $pool)
typeset rc=$?
if [[ $SECONDS -gt $max_duration ]]; then
log_fail "ERROR: import_no_activity_check unexpected activity \
check (${SECONDS}s gt $max_duration)"
# mmp_result: 3 (ESRCH) no activity check was run not required
# mmp_result: 6 (ENXIO) no activity check was run hostid not set
if ! echo "$RESULT" | grep -q "mmp_result: 3" &&
! echo "$RESULT" | grep -q "mmp_result: 6"; then
log_note "ERROR: $RESULT"
log_fail "ERROR: import_no_activity_check unexpected activity check"
fi
return $rc
}
function import_activity_check # pool opts act_test_duration
function import_activity_check # pool opts
{
typeset pool=$1
typeset opts=$2
typeset min_duration=${3:-$MMP_TEST_DURATION_DEFAULT}
SECONDS=0
zpool import $opts $pool
RESULT=$(ZFS_LOAD_INFO_DEBUG=1 zpool import $opts $pool)
typeset rc=$?
if [[ $SECONDS -le $min_duration ]]; then
log_fail "ERROR: import_activity_check expected activity check \
(${SECONDS}s le min_duration $min_duration)"
# mmp_result: 0 (Success) check was run no activity detected
# mmp_result: 121 (EREMOTEIO) check was run activity detected
# mmp_result: 4 (EINTR) check was run but interrupted by user
if ! echo "$RESULT" | grep -q "mmp_result: 0"; then
log_note "ERROR: $RESULT"
log_fail "ERROR: import_activity_check expected activity check"
fi
return $rc
@@ -24,10 +24,10 @@
# with one hostid be importable by a host with a different hostid.
#
# STRATEGY:
# 1. Simulate an active pool on another host with ztest.
# 1. Simulate an active pool on another host with zhack.
# 2. Verify 'zpool import' reports an active pool.
# 3. Verify 'zpool import [-f] $MMP_POOL' cannot import the pool.
# 4. Kill ztest to make pool eligible for import.
# 4. Kill zhack to make pool eligible for import.
# 5. Verify 'zpool import' fails with the expected error message.
# 6. Verify 'zpool import $MMP_POOL' fails with the expected message.
# 7. Verify 'zpool import -f $MMP_POOL' can now import the pool.
@@ -44,16 +44,16 @@ function cleanup
{
mmp_pool_destroy $MMP_POOL $MMP_DIR
log_must mmp_clear_hostid
ZTESTPID=$(pgrep ztest)
if [ -n "$ZTESTPID" ]; then
for pid in $ZTESTPID; do
ZHACKPID=$(pgrep zhack)
if [ -n "$ZHACKPID" ]; then
for pid in $ZHACKPID; do
log_must kill -9 $pid
done
else
# if ztest not running and log present, ztest crashed
if [ -f $MMP_ZTEST_LOG ]; then
log_note "ztest appears to have crashed. Tail of log:"
tail -n 50 $MMP_ZTEST_LOG
# if zhack is not running and log present, zhack crashed
if [ -f $MMP_ZHACK_LOG ]; then
log_note "zhack appears to have crashed. Tail of log:"
tail -n 50 $MMP_ZHACK_LOG
fi
fi
}
@@ -61,15 +61,18 @@ function cleanup
log_assert "multihost=on|off active pool activity checks"
log_onexit cleanup
# 1. Simulate an active pool on another host with ztest.
# 1. Simulate an active pool on another host with zhack.
log_note "Simulate an active pool on another host with zhack"
mmp_pool_destroy $MMP_POOL $MMP_DIR
mmp_pool_create $MMP_POOL $MMP_DIR
mmp_pool_create_zhack $MMP_POOL $MMP_DIR
# 2. Verify 'zpool import' reports an active pool.
log_note "Verify 'zpool import' reports an active pool"
log_must mmp_set_hostid $HOSTID2
log_must is_pool_imported $MMP_POOL "-d $MMP_DIR"
# 3. Verify 'zpool import [-f] $MMP_POOL' cannot import the pool.
log_note "Verify 'zpool import [-f] $MMP_POOL' cannot import the pool"
MMP_IMPORTED_MSG="Cannot import '$MMP_POOL': pool is imported"
log_must try_pool_import $MMP_POOL "-d $MMP_DIR" "$MMP_IMPORTED_MSG"
@@ -84,14 +87,15 @@ for i in {1..10}; do
"$MMP_IMPORTED_MSG"
done
# 4. Kill ztest to make pool eligible for import. Poll with 'zpool status'.
ZTESTPID=$(pgrep ztest)
if [ -n "$ZTESTPID" ]; then
log_must kill -9 $ZTESTPID
# 4. Kill zhack to make pool eligible for import. Poll with 'zpool status'.
log_note "Kill zhack to make pool eligible for import. Poll with 'zpool status'"
ZHACKPID=$(pgrep zhack)
if [ -n "$ZHACKPID" ]; then
log_must kill -9 $ZHACKPID
fi
log_must wait_pool_imported $MMP_POOL "-d $MMP_DIR"
if [ -f $MMP_ZTEST_LOG ]; then
log_must rm $MMP_ZTEST_LOG
if [ -f $MMP_ZHACK_LOG ]; then
log_must rm $MMP_ZHACK_LOG
fi
# 5. Verify 'zpool import' fails with the expected error message, when
@@ -99,6 +103,7 @@ fi
# - hostid=matches - safe to import the pool
# - hostid=different - previously imported on a different system
#
log_note "Verify 'zpool import' fails with the expected error message"
log_must mmp_clear_hostid
MMP_IMPORTED_MSG="Set a unique system hostid"
log_must check_pool_import $MMP_POOL "-d $MMP_DIR" "action" "$MMP_IMPORTED_MSG"
@@ -113,13 +118,16 @@ MMP_IMPORTED_MSG="The pool was last accessed by another system."
log_must check_pool_import $MMP_POOL "-d $MMP_DIR" "status" "$MMP_IMPORTED_MSG"
# 6. Verify 'zpool import $MMP_POOL' fails with the expected message.
log_note "Verify 'zpool import $MMP_POOL' fails with the expected message"
MMP_IMPORTED_MSG="pool was previously in use from another system."
log_must try_pool_import $MMP_POOL "-d $MMP_DIR" "$MMP_IMPORTED_MSG"
# 7. Verify 'zpool import -f $MMP_POOL' can now import the pool.
log_note "Verify 'zpool import -f $MMP_POOL' can now import the pool"
log_must import_activity_check $MMP_POOL "-f -d $MMP_DIR"
# 8 Verify pool may be exported/imported without -f argument.
log_note "Verify pool may be exported/imported without -f argument"
log_must zpool export $MMP_POOL
log_must import_no_activity_check $MMP_POOL "-d $MMP_DIR"
@@ -0,0 +1,133 @@
#!/bin/ksh -p
# SPDX-License-Identifier: CDDL-1.0
#
# CDDL HEADER START
#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#
# CDDL HEADER END
#
#
# Copyright (c) 2026 by Lawrence Livermore National Security, LLC.
#
# DESCRIPTION:
# Verify that even when importing a shared pool simultaneously
# on systems with different host ids at most one will succeed.
#
# STRATEGY:
# 1. Create an multihost enabled pool
# 2. zhack imports: $HOSTID1 (matching) and $HOSTID1 (matching)
# 3. zhack imports: $HOSTID1 (matching) and $HOSTID2 (different)
# 4. zhack imports: $HOSTID3 (different) and $HOSTID4 (different)
#
. $STF_SUITE/include/libtest.shlib
. $STF_SUITE/tests/functional/mmp/mmp.cfg
. $STF_SUITE/tests/functional/mmp/mmp.kshlib
verify_runnable "both"
function cleanup
{
ZHACKPIDS=$(pgrep zhack)
if [ -n "$ZHACKPIDS" ]; then
for pid in $ZHACKPIDS; do
log_must kill -9 $pid
done
fi
log_must rm -f $MMP_ZHACK_LOG.1 $MMP_ZHACK_LOG.2
mmp_pool_destroy $MMP_POOL $MMP_DIR
log_must mmp_clear_hostid
}
# Verify that pool was imported by at most one of the zhack processes.
# Check both the return code and expected import message.
function verify_zhack
{
IMPORT_COUNT=0
IMPORT_MSGS=0
ZHACKPIDS=$(pgrep zhack)
for pid in $ZHACKPIDS; do
wait $pid
STATUS=$?
if [[ $STATUS -eq 0 ]]; then
(( IMPORT_COUNT++ ))
fi
log_note "PID $pid exited with status $STATUS"
done
grep -H "Imported pool $MMP_POOL" $MMP_ZHACK_LOG.1 && (( IMPORT_MSGS++ ))
grep -H "Imported pool $MMP_POOL" $MMP_ZHACK_LOG.2 && (( IMPORT_MSGS++ ))
if [[ $IMPORT_MSGS -gt 1 ]]; then
cat $MMP_ZHACK_LOG.*
log_fail "Multiple import success messages"
fi
if [[ $IMPORT_COUNT -gt 1 ]]; then
cat $MMP_ZHACK_LOG.*
log_fail "Multiple import success return codes"
fi
if [[ $IMPORT_MSGS -ne $IMPORT_COUNT ]]; then
cat $MMP_ZHACK_LOG.*
log_fail "Messages ($IMPORT_MSGS) differs from count ($IMPORT_COUNT)"
fi
}
OPTS="-d $MMP_DIR action idle -t5 $MMP_POOL"
log_assert "multihost=on concurrent imports"
log_onexit cleanup
# 1. Create a multihost enabled pool with HOSTID1
mmp_pool_create_simple $MMP_POOL $MMP_DIR
log_must zpool export -F $MMP_POOL
# 2. zhack imports: $HOSTID1 (matching) and $HOSTID1 (matching)
# Activity check required because the pool was exported with -F above, the
# claim phase will detect the double import despite matching hostids.
log_note "zhack import with $HOSTID1 (matching) and $HOSTID1 (matching)"
log_must eval "ZFS_HOSTID=$HOSTID1 zhack $OPTS >$MMP_ZHACK_LOG.1 2>&1 &"
log_must eval "ZFS_HOSTID=$HOSTID1 zhack $OPTS >$MMP_ZHACK_LOG.2 2>&1 &"
log_must verify_zhack
mmp_clear_hostid
mmp_set_hostid $HOSTID1
log_must import_activity_check $MMP_POOL "-d $MMP_DIR"
log_must zpool export $MMP_POOL
# 3. zhack imports: $HOSTID1 (matching) and $HOSTID2 (different)
# Activity check skipped for HOSTID1 it is expected to import successfully.
# zhack with HOSTID2 will run the activity check and detect the active pool.
log_note "zhack import with $HOSTID1 (matching) and $HOSTID2 (different)"
log_must eval "ZFS_HOSTID=$HOSTID1 zhack $OPTS >$MMP_ZHACK_LOG.1 2>&1 &"
log_must eval "ZFS_HOSTID=$HOSTID2 zhack $OPTS >$MMP_ZHACK_LOG.2 2>&1 &"
log_must verify_zhack
mmp_clear_hostid
mmp_set_hostid $HOSTID3
log_must import_activity_check $MMP_POOL "-d $MMP_DIR"
log_must zpool export $MMP_POOL
# 4. zhack imports: $HOSTID1 (different) and $HOSTID2 (different)
# Both zhacks will run the activity checks, depending on the exact timing
# one may succeed and the other fail, or both may fail.
log_note "zhack import with $HOSTID1 (different) and $HOSTID2 (different)"
log_must eval "ZFS_HOSTID=$HOSTID1 zhack $OPTS >$MMP_ZHACK_LOG.1 2>&1 &"
log_must eval "ZFS_HOSTID=$HOSTID2 zhack $OPTS >$MMP_ZHACK_LOG.2 2>&1 &"
log_must verify_zhack
log_pass "multihost=on concurrent imports"
@@ -28,7 +28,7 @@
# 3. Verify multihost=off and hostids differ (no activity check)
# 4. Verify multihost=off and hostid zero allowed (no activity check)
# 5. Verify multihost=on and hostids match (no activity check)
# 6. Verify multihost=on and hostids differ (no activity check)
# 6. Verify multihost=on and hostids differ (activity check)
# 7. Verify multihost=on and hostid zero fails (no activity check)
#
@@ -40,7 +40,7 @@ verify_runnable "both"
function cleanup
{
default_cleanup_noexit
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
log_must mmp_clear_hostid
}
@@ -49,9 +49,10 @@ log_onexit cleanup
# 1. Create a zpool
log_must mmp_set_hostid $HOSTID1
default_setup_noexit $DISK
log_must zpool create -f $TESTPOOL $DISK
# 2. Verify multihost=off and hostids match (no activity check)
log_note "Verify multihost=off and hostids match (no activity check)"
log_must zpool set multihost=off $TESTPOOL
for opt in "" "-f"; do
@@ -60,6 +61,7 @@ for opt in "" "-f"; do
done
# 3. Verify multihost=off and hostids differ (no activity check)
log_note "Verify multihost=off and hostids differ (no activity check)"
for opt in "" "-f"; do
log_must mmp_pool_set_hostid $TESTPOOL $HOSTID1
log_must zpool export $TESTPOOL
@@ -69,6 +71,7 @@ for opt in "" "-f"; do
done
# 4. Verify multihost=off and hostid zero allowed (no activity check)
log_note "Verify multihost=off and hostid zero allowed (no activity check)"
log_must mmp_clear_hostid
for opt in "" "-f"; do
@@ -77,6 +80,7 @@ for opt in "" "-f"; do
done
# 5. Verify multihost=on and hostids match (no activity check)
log_note "Verify multihost=on and hostids match (no activity check)"
log_must mmp_pool_set_hostid $TESTPOOL $HOSTID1
log_must zpool set multihost=on $TESTPOOL
@@ -85,16 +89,18 @@ for opt in "" "-f"; do
log_must import_no_activity_check $TESTPOOL $opt
done
# 6. Verify multihost=on and hostids differ (no activity check)
# 6. Verify multihost=on and hostids differ (activity check)
log_note "Verify multihost=on and hostids differ (activity check)"
for opt in "" "-f"; do
log_must mmp_pool_set_hostid $TESTPOOL $HOSTID1
log_must zpool export $TESTPOOL
log_must mmp_clear_hostid
log_must mmp_set_hostid $HOSTID2
log_must import_no_activity_check $TESTPOOL $opt
log_must import_activity_check $TESTPOOL $opt
done
# 7. Verify multihost=on and hostid zero fails (no activity check)
log_note "Verify multihost=on and hostid zero fails (no activity check)"
log_must zpool export $TESTPOOL
log_must mmp_clear_hostid
@@ -40,7 +40,7 @@ verify_runnable "both"
function cleanup
{
default_cleanup_noexit
datasetexists $MMP_POOL && destroy_pool $MMP_POOL
log_must rm $MMP_DIR/file.{0,1,2,3,4,5}
log_must rmdir $MMP_DIR
log_must mmp_clear_hostid
@@ -56,7 +56,7 @@ log_must mkdir -p $MMP_DIR
log_must truncate -s $MINVDEVSIZE $MMP_DIR/file.{0,1,2,3,4,5}
# 1. Create a non-redundant pool
log_must zpool create $MMP_POOL $MMP_DIR/file.0
log_must zpool create -f $MMP_POOL $MMP_DIR/file.0
# 2. Create an 'etc' dataset containing a valid hostid file; caching is
# disabled on the dataset to force the hostid to be read from disk.
@@ -71,16 +71,19 @@ mntpnt_fs=$(get_prop mountpoint $MMP_POOL/fs)
log_must mkfile 1M $mntpnt_fs/file
# 4. Verify multihost cannot be enabled until the /etc/hostid is linked
log_note "Verify multihost cannot be enabled until the /etc/hostid is linked"
log_mustnot zpool set multihost=on $MMP_POOL
log_mustnot ls -l $HOSTID_FILE
log_must ln -s $mntpnt_etc/hostid $HOSTID_FILE
log_must zpool set multihost=on $MMP_POOL
# 5. Verify vdevs may be attached and detached
log_note "Verify vdevs may be attached and detached"
log_must zpool attach $MMP_POOL $MMP_DIR/file.0 $MMP_DIR/file.1
log_must zpool detach $MMP_POOL $MMP_DIR/file.1
# 6. Verify normal, cache, log and special vdevs can be added
log_note "Verify normal, cache, log and special vdevs can be added"
log_must zpool add $MMP_POOL $MMP_DIR/file.1
log_must zpool add $MMP_POOL $MMP_DIR/file.2
log_must zpool add $MMP_POOL cache $MMP_DIR/file.3
@@ -88,6 +91,7 @@ log_must zpool add $MMP_POOL log $MMP_DIR/file.4
log_must zpool add $MMP_POOL special $MMP_DIR/file.5
# 7. Verify normal, cache, and log vdevs can be removed
log_note "Verify normal, cache, and log vdevs can be removed"
log_must zpool remove $MMP_POOL $MMP_DIR/file.2
log_must zpool remove $MMP_POOL $MMP_DIR/file.3
log_must zpool remove $MMP_POOL $MMP_DIR/file.4
@@ -27,7 +27,7 @@
# 2. Verify multihost=off and hostids match (no activity check)
# 3. Verify multihost=off and hostids differ (no activity check)
# 4. Verify multihost=off and hostid allowed (no activity check)
# 5. Verify multihost=on and hostids match (no activity check)
# 5. Verify multihost=on and hostids match (activity check)
# 6. Verify multihost=on and hostids differ (activity check)
# 7. Verify mmp_write and mmp_fail are set correctly
# 8. Verify multihost=on and hostid zero fails (no activity check)
@@ -42,7 +42,7 @@ verify_runnable "both"
function cleanup
{
default_cleanup_noexit
datasetexists $TESTPOOL && destroy_pool $TESTPOOL
log_must mmp_clear_hostid
log_must set_tunable64 MULTIHOST_INTERVAL $MMP_INTERVAL_DEFAULT
}
@@ -52,9 +52,10 @@ log_onexit cleanup
# 1. Create a zpool
log_must mmp_set_hostid $HOSTID1
default_setup_noexit $DISK
log_must zpool create -f $TESTPOOL $DISK
# 2. Verify multihost=off and hostids match (no activity check)
log_note "Verify multihost=off and hostids match (no activity check)"
log_must zpool set multihost=off $TESTPOOL
for opt in "" "-f"; do
@@ -63,6 +64,7 @@ for opt in "" "-f"; do
done
# 3. Verify multihost=off and hostids differ (no activity check)
log_note "Verify multihost=off and hostids differ (no activity check)"
log_must zpool export -F $TESTPOOL
log_must mmp_clear_hostid
log_must mmp_set_hostid $HOSTID2
@@ -70,21 +72,24 @@ log_mustnot import_no_activity_check $TESTPOOL ""
log_must import_no_activity_check $TESTPOOL "-f"
# 4. Verify multihost=off and hostid zero allowed (no activity check)
log_note "Verify multihost=off and hostid zero allowed (no activity check)"
log_must zpool export -F $TESTPOOL
log_must mmp_clear_hostid
log_mustnot import_no_activity_check $TESTPOOL ""
log_must import_no_activity_check $TESTPOOL "-f"
# 5. Verify multihost=on and hostids match (no activity check)
# 5. Verify multihost=on and hostids match (activity check)
log_note "Verify multihost=on and hostids match (activity check)"
log_must mmp_pool_set_hostid $TESTPOOL $HOSTID1
log_must zpool set multihost=on $TESTPOOL
for opt in "" "-f"; do
log_must zpool export -F $TESTPOOL
log_must import_no_activity_check $TESTPOOL $opt
log_must import_activity_check $TESTPOOL $opt
done
# 6. Verify multihost=on and hostids differ (activity check)
log_note "Verify multihost=on and hostids differ (activity check)"
log_must zpool export -F $TESTPOOL
log_must mmp_clear_hostid
log_must mmp_set_hostid $HOSTID2
@@ -92,10 +97,12 @@ log_mustnot import_activity_check $TESTPOOL ""
log_must import_activity_check $TESTPOOL "-f"
# 7. Verify mmp_write and mmp_fail are set correctly
log_note "Verify mmp_write and mmp_fail are set correctly"
log_must zpool export -F $TESTPOOL
log_must verify_mmp_write_fail_present ${DISK[0]}
# 8. Verify multihost=on and hostid zero fails (no activity check)
log_note "Verify multihost=on and hostid zero fails (no activity check)"
log_must mmp_clear_hostid
MMP_IMPORTED_MSG="Set a unique system hostid"
log_must check_pool_import $TESTPOOL "-f" "action" "$MMP_IMPORTED_MSG"
@@ -104,9 +111,10 @@ log_mustnot import_no_activity_check $TESTPOOL "-f"
# 9. Verify activity check duration based on mmp_write and mmp_fail
# Specify a short test via tunables but import pool imported while
# tunables set to default duration.
log_note "Verify activity check duration based on mmp_write and mmp_fail"
log_must set_tunable64 MULTIHOST_INTERVAL $MMP_INTERVAL_MIN
log_must mmp_clear_hostid
log_must mmp_set_hostid $HOSTID1
log_must import_activity_check $TESTPOOL "-f" $MMP_TEST_DURATION_DEFAULT
log_must import_activity_check $TESTPOOL "-f"
log_pass "multihost=on|off inactive pool activity checks passed"

Some files were not shown because too many files have changed in this diff Show More