mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-06-11 08:26:39 +03:00

Author	SHA1	Message	Date
Aditya Gollamudi	b481a8bbbf	Make zpool status dedup table support raw bytes -p output Check if -p flag is enabled, and if so print dedup table with raw bytes. Restructure the logic in zutil_pool to check if -p flag is enabled before printing either the bytes or raw numbers. Calls to print the data for DDT now all use zfs_nicenum_format(). Increased DDT histogram column buffers to 32 bytes to prevent truncation when -p is enabled. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Adi Gollamudi <adigollamudi@gmail.com> Closes #11626 Closes #17926	2026-03-13 09:53:56 -07:00
Sean Eric Fagan	f109c7bb98	Add the --file-layout (-f) option to zdb(8) Displays the physical raidz block layout for a given file. This leverages the internal vdev_raidz_map_alloc() function to find the map of how the block data is laid out across the child disks. The column entry for each row looks like: +------------+ \| D2 43 \| \| 6020da \| +------------+ representing here the logical data column 2 that is 43 sectors high starting at sector 0x6020da. With -H, the output is a list of disks, LBAs, and block counts, given in 512 byte block values. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: sean.fagan@klarasystems.com Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com> Co-authored-by: Sean Fagan <sean.fagan@klarasystems.com> Closes #18264	2026-03-12 14:41:23 -07:00
Tony Hutter	d2f5cb3a50	Move range_tree, btree, highbit64 to common code Break out the range_tree, btree, and highbit64/lowbit64 code from kernel space into shared kernel and userspace code. This is needed for the updated `zpool status -vv` error byte range reporting that will be coming in a future commit. That commit needs the range_tree code in kernel and userspace. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #18133	2026-02-22 11:43:51 -08:00
Rob Norris	d11c661544	zdb: handle key load/derive failures a bit more gracefully There's no real need to outright crash if key loading fails; we can just unwind nicely. Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18230	2026-02-20 13:37:43 -08:00
Rob Norris	9f874ad092	zdb: don't try to load key for unencrypted dataset Previously using -K/--key on an unencrypted dataset would trip a VERIFY, because the dataset has nowhere to load the key into. Now, just ignore it. This makes zdb much easier to drive when there's a mix of encrypt and non-encrypted datasets, as the key can provided for all of them (at least, assuming the same encryption root, which is a common enough case). Sponsored-by: TrueNAS Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <rob.norris@truenas.com> Closes #18230	2026-02-20 13:37:11 -08:00
Rob Norris	85391ee931	build: add SPDX license tags to build system files Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #18077	2026-01-08 15:08:03 -08:00
Allan Jude	1d43387dd8	zdb: Add -O option for -r to specify object-id "zdb -r -O pool/dataset obj-id destination" will copy the file with object-id obj-id to the named destination; without -O it'll still be interpreted as a pathname. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Sean Eric Fagan <sean.fagan@klarasystems.com> Closes #16307	2025-12-18 09:25:09 -08:00
Rob Norris	4e3b88927c	libzpool: separate driver-side include Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17861	2025-11-12 10:01:04 -08:00
Rob Norris	6e12f0bd77	spa_misc: add an API for spa_namespace_lock This is useful as debugging support, as it lets namespace lock operations be traced directly. It will also be useful for future work to reduce the use of spa_namespace_lock, traditionally a source of difficult deadlocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #17906	2025-11-10 14:23:39 -08:00
Shreshth3	3ea8ca8c0f	zdb: fix bug with -A flag Fixes #10544. According to the manpage, zdb -A should ignore all assertions. But it currently does not do that. This commit fixes this bug. Signed-off-by: Shreshth Srivastava <shreshthsrivastava2@gmail.com> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17825	2025-10-20 09:30:20 -04:00
Ivan Shapovalov	f8b082b5af	zdb: adjust block histogram binning strategy Previously, a bin included all blocks _starting_ from given size (e.g., a "4K" bin would include all blocks within the [4K; 8K) region). This is counter-intuitive and does not match the typical use-case of the block histogram (that is, to estimate disk usage considering how ZFS' block allocation works). In other words, if I'm looking at the "4K" row, I'm interested in records that _fit into_ a 4K block. Adjust the binning strategy such that a bin includes all blocks _up to_ given size, such that e.g. a "4K" bin would include all blocks within the (2K; 4K] region. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #16999	2025-10-06 09:35:32 -07:00
Ivan Shapovalov	3a1a22abb4	zdb: factor out block histogram bin number computation Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #16999	2025-10-06 09:35:28 -07:00
Ivan Shapovalov	c0a874fced	zdb: add `--class=(normal\|special\|...)` to filter blocks by alloc class When counting blocks to generate block size histograms (`-bb`), accept a `--class=` argument (as a comma-separated list of either "normal", "special", "dedup" or "other") to only consider blocks that belong to these metaslab classes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #16999	2025-10-06 09:35:23 -07:00
Ivan Shapovalov	8e97b98140	zdb: add `--bin=(lsize\|psize\|asize)` arg to control histogram binning When counting blocks to generate block size histograms (`-bb`), accept a `--bin=` argument to force placing blocks into all three bins based on this size. E.g. with `--bin=lsize`, a block with lsize=512K, psize=128K, asize=256K will be placed into the "512K" bin in all three output columns. This way, by looking at the "512K" row the user will be able to determine how well was ZFS able to compress blocks of this logical size. Conversely, with `--bin=psize`, by looking at the "128K" row the user will be able to determine how much overhead was incurred for storage of blocks of this physical size. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #16999	2025-10-06 09:35:02 -07:00
Ivan Shapovalov	1269fa9b79	zdb: convert `ALLOCATED_OPT` into anonymous enum We are adding more long-only options, so use an enum for all of them to avoid manually numbering these constants. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name> Closes #16999	2025-10-06 09:34:03 -07:00
patrickxia	5c38029f4b	zdb: add ZFS_KEYFORMAT_RAW support for -K option This change adds support for ZFS_KEYFORMAT_RAW to zdb_derive_key in zdb.c. The implementation reads the raw key from the file specified by the -K option which is consistent with how raw keys are handled in the other parts of ZFS, along with a check to ensure that the keyfile doesn't have too many bytes. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Patrick Xia <patrickx@google.com> Closes #17783	2025-09-25 12:05:42 -07:00
Alexander Motin	ea37c30fcb	zdb: Fix asize overflow in verify_livelist_allocs() Spacemap entry might be too big to fit into a block pointer ashift. We hit an assertion trying to run `zdb -bvy` on a large pool. But it seems the code does not really need size there, since we only need to search for a range of offsets, so setting it to zero should just make btree return position just before the first entry. I suspect the previous code could actually miss the first entry due to this if its size was smaller. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17764	2025-09-23 16:09:37 -07:00
Paul Dagnelie	8f15d2e4d5	Add allocation profile export and zhack subcommand for import When attempting to debug performance problems on large systems, one of the major factors that affect performance is free space fragmentation. This heavily affects the allocation process, which is an area of active development in ZFS. Unfortunately, fragmenting a large pool for testing purposes is time consuming; it usually involves filling the pool and then repeatedly overwriting data until the free space becomes fragmented, which can take many hours. And even if the time is available, artificial workloads rarely generate the same fragmentation patterns as the natural workloads they're attempting to mimic. This patch has two parts. First, in zdb, we add the ability to export the full allocation map of the pool. It iterates over each vdev, printing every allocated segment in the ms_allocatable range tree. This can be done while the pool is online, though in that case the allocation map may actually be from several different TXGs as new ones are loaded on demand. The second is a new subcommand for zhack, zhack metaslab leak (and its supporting kernel changes). This is a zhack subcommand that imports a pool and then modified the range trees of the metaslabs, allowing the sync process to write them out normall. It does not currently store those allocations anywhere to make them reversible, and there is no corresponding free subcommand (which would be extremely dangerous); this is an irreversible process, only intended for performance testing. The only way to reclaim the space afterwards is to destroy the pool or roll back to a checkpoint. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #17576	2025-09-10 11:13:24 -07:00
Mark Johnston	0d54ae2880	zdb: Fix format strings on 32-bit systems Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Mark Johnston <markj@FreeBSD.org> Closes #17665	2025-08-25 08:59:41 -07:00
Alexander Motin	94413bc75d	zdb: Filter log spacemaps by vdev When requested to dump metaslabs only for specific vdev, apply the filter also to log spacemaps to reduce the output. Unfortunately filtering by metaslab numbers is more difficult so leave those. While there, tune the output formatting. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17643	2025-08-21 11:19:46 -04:00
Alan Somers	d3c1d27afd	zdb: better handling for corrupt block pointers When dumping indirect blocks, attempt to print corrupt block pointers rather than abort the program. When corruption is detected zdb will exit with an error code of 3. Sponsored by: ConnectWise Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Reviewed-by: Alek Pinchuk <alek.pinchuk@connectwise.com> Signed-off-by: Alan Somers <asomers@gmail.com> Closes #17166	2025-08-12 14:16:37 -07:00
Rob Norris	82d6f7b047	Prefer VERIFY0P(n) over VERIFY3P(n, ==, NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:42 -07:00
Rob Norris	f7bdd84328	Prefer VERIFY0P(n) over VERIFY(n == NULL) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:37 -07:00
Rob Norris	5c7df3bcac	Prefer VERIFY0(n) over VERIFY3U(n, ==, 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:41:25 -07:00
Rob Norris	c39e076f23	Prefer VERIFY0(n) over VERIFY(n == 0) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17591	2025-08-07 11:40:59 -07:00
Alexander Motin	4ae8bf406b	Allow physical rewrite without logical During regular block writes ZFS sets both logical and physical birth times equal to the current TXG. During dedup and block cloning logical birth time is still set to the current TXG, but physical may be copied from the original block that was used. This represents the fact that logically user data has changed, but the physically it is the same old block. But block rewrite introduces a new situation, when block is not changed logically, but stored in a different place of the pool. From ARC, scrub and some other perspectives this is a new block, but for example for user applications or incremental replication it is not. Somewhat similar thing happen during remap phase of device removal, but in that case space blocks are still acounted as allocated at their logical birth times. This patch introduces a new "rewrite" flag in the block pointer structure, allowing to differentiate physical rewrite (when the block is actually reallocated at the physical birth time) from the device reval case (when the logical birth time is used). The new functionality is not used at this point, and the only expected change is that error log is now kept in terms of physical physical birth times, rather than logical, since if a block with logged error was somehow rewritten, then the previous error does not matter any more. This change also introduces a new TRAVERSE_LOGICAL flag to the traverse code, allowing zfs send, redact and diff to work in context of logical birth times, ignoring physical-only rewrites. It also changes nothing at this point due to lack of those writes, but they will come in a following patch. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17565	2025-08-06 10:36:07 -07:00
Igor Ostapenko	cb5e7e097d	range_tree: Provide more debug details upon unexpected add/remove Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com> Closes #17581	2025-07-31 10:44:42 -04:00
Andriy Tkachuk	4bd7a2eaa5	zdb: fix checksum calculation for decompressed blocks Currently, when reading compressed blocks with -R and decompressing them with :d option and specifying lsize, which is normally bigger than psize for compressed blocks, the checksum is calculated on decompressed data. But it makes no sense since zfs always calculates checksum on physical, i.e. compressed data. So reading the same block produces different checksum results depending on how we read it, whether we decompress it or not, which, again, makes no sense. Fix: use psize instead of lsize when calculating the checksum so that it is always calculated on the physical block size, no matter was it compressed or not. Signed-off-by: Andriy Tkachuk <andriy.tkachuk@seagate.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <alexander.motin@TrueNAS.com> Closes #17547	2025-07-24 21:24:15 -04:00
Rob Norris	bf38c15071	everywhere: misc unnecessary var init/update These are all cases where we initialise or update a variable, and then never use it. None of them particularly matter, as the compiler should optimise them all away during dead store elimination, but some static analysers complain about them and they are extra work for casual readers to follow, so worth removing. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #17551	2025-07-22 15:23:58 -07:00
Alexander Motin	be1e991a1a	Allow and prefer special vdevs as ZIL Before this change ZIL blocks were allocated only from normal or SLOG vdevs. In typical situation when special vdevs are SSDs and normal are HDDs it could cause weird inversions when data blocks are written to SSDs, but ZIL referencing them to HDDs. This change assumes that special vdevs typically have much better (or at least not worse) latency than normal, and so in absence of SLOGs should store ZIL blocks. It means similar to normal vdevs introduction of special embedded log allocation class and updating the allocation fallback order to: SLOG -> special embedded log -> special -> normal embedded log -> normal. The code tries to guess whether data block is going to be written to normal or special vdev (it can not be done precisely before compression) and prefer indirect writes for blocks written to a special vdev to avoid double-write. For blocks that are going to be written to normal vdev, special vdev by default plays as SLOG, reducing write latency by the cost of higher special vdev wear, but it is tunable via module parameter. This should allow HDD pools with decent SSD as special vdev to work under synchronous workloads without requiring additional SLOG SSD, impractical in many scenarios. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17505	2025-07-18 18:44:14 -07:00
Paul Dagnelie	b21e04e8d9	Fix zdb pool/ with -k When examining the root dataset with zdb -k, we get into a mismatched state. main() knows we are not examining the whole pool, but it strips off the trailing slash. import_checkpointed_state() then thinks we are examining the whole pool, and does not update the target path appropriately. The fix is to directly inform import_checkpointed_state that we are examining a filesystem, and not the whole pool. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17536	2025-07-15 17:01:49 -07:00
Rob Norris	fce18e04d5	libzpool: tunable-based option interface for zdb/ztest Removes the old dlsym() based option setter and adds a new function handle_tunable_option() that can set, get and list all the tunables in the system. And then wire it up to zdb and ztest. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #17537	2025-07-15 15:47:03 -07:00
Paul Dagnelie	a981cb69e4	Implement dynamic gang header sizes ZFS gang block headers are currently fixed at 512 bytes. This is increasingly wasteful in the era of larger disk sector sizes. This PR allows any size allocation to work as a gang header. It also contains supporting changes to ZDB to make gang headers easier to work with. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17004	2025-07-09 14:02:53 -07:00
Paul Dagnelie	9250403ba6	Make ganging redundancy respect redundant_metadata property (#17073 ) The redundant_metadata setting in ZFS allows users to trade resilience for performance and space savings. This applies to all data and metadata blocks in zfs, with one exception: gang blocks. Gang blocks currently just take the copies property of the IO being ganged and, if it's 1, sets it to 2. This means that we always make at least two copies of a gang header, which is good for resilience. However, if the users care more about performance than resilience, their gang blocks will be even more of a penalty than usual. We add logic to calculate the number of gang headers copies directly, and store it as a separate IO property. This is stored in the IO properties and not calculated when we decide to gang because by that point we may not have easy access to the relevant information about what kind of block is being stored. We also check the redundant_metadata property when doing so, and use that to decide whether to store an extra copy of the gang headers, compared to the underlying blocks. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Co-authored-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2025-03-19 15:58:29 -07:00
Rob Norris	eb9098ed47	SPDX: license tags: CDDL-1.0 Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>	2025-03-13 17:56:27 -07:00
Alexander Motin	0ea44e576b	Fix deduplication of overridden blocks Implementation of DDT pruning introduced verification of DVAs in a block pointer during ddt_lookup() to not by mistake free previous pruned incarnation of the entry. But when writing a new block in zio_ddt_write() we might have the DVAs only from override pointer, which may never have "D" flag to be confused with pruned DDT entry, and we'll abandon those DVAs if we find a matching entry in DDT. This fixes deduplication for blocks written via dmu_sync() for purposes of indirect ZIL write records, that I have tested. And I suspect it might actually allow deduplication for Direct I/O, even though in an odd way -- first write block directly and then delete it later during TXG commit if found duplicate, which part I haven't tested. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #17120	2025-03-13 13:27:57 -04:00
Paul Dagnelie	fc460bfbaf	Add more DDT tests The new Fast Dedup feature has a lot of moving parts, and only some of them have tests. We have some tests for prefetch and quota, and a generic ZAP shrinking test, but we don't have anything for the pruning command or specific to DDT zap shrinking. Here we add a couple small new tests for zpool ddtprune and DDT-specific ZAP shrinking. Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Paul Dagnelie <paul.dagnelie@klarasystems.com> Closes #17049	2025-03-04 20:02:34 -05:00
Ameer Hamza	ab3db6d15d	arc: avoid possible deadlock in arc_read In l2arc_evict(), the config lock may be acquired in reverse order (e.g., first the config lock (writer), then a hash lock) unlike in arc_read() during scenarios like L2ARC device removal. To avoid deadlocks, if the attempt to acquire the config lock (reader) fails in arc_read(), release the hash lock, wait for the config lock, and retry from the beginning. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #17071	2025-02-25 14:32:12 -05:00
Rob Norris	68473c4fd8	range_tree: convert remaining range_* defs to zfs_range_* Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>	2025-02-14 15:37:56 -08:00
Ivan Volosyuk	d4a5a7e3aa	Linux 6.12 compat: Rename range_tree_* to zfs_range_tree_* Linux 6.12 has conflicting range_tree_{find,destroy,clear} symbols. Signed-off-by: Ivan Volosyuk <Ivan.Volosyuk@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com>	2025-02-14 15:37:48 -08:00
Rob Norris	2507db612d	zdb_il: use flex array member to access ZIL records In `6f50f8e16` we added flex arrays to lr_XX_t structs to silence kernel bounds check warnings. Userspace code was mostly not updated to use them though. It seems that in the right circumstances, compilers can get confused about sizes in the same way, and throw warnings. This commits switch those uses over to use the flex array fields also. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16832	2024-12-04 19:03:20 -05:00
Alexander Motin	c33a55b0c2	Allow dsl_deadlist_open() return errors In some cases like dsl_dataset_hold_obj() it is possible to handle those errors, so failure to hold dataset should be better than kernel panic. Some other places where these errors are still not handled but asserted should be less dangerous just as unreachable. We have a user report about pool corruption leading to assertions on these errors. Hopefully this will make behavior a bit nicer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16836	2024-12-04 15:15:58 -08:00
Rob Norris	0ffa6f3464	zdb: show dedup table and log attributes There's interesting info in there that is going to help with understanding dedup behavior at any given moment. Since this is a format change, tests that rely on that output have been modified to match. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Closes #16755	2024-11-26 19:28:19 -05:00
Alexander Motin	fd6e8c1d2a	BRT: Rework structures and locks to be per-vdev While block cloning operation from the beginning was made per-vdev, before this change most of its data were protected by two pool- wide locks. It created lots of lock contention in many workload. This change makes most of block cloning data structures per-vdev, which allows to lock them separately. The only pool-wide lock now it spa_brt_lock, protecting array of per-vdev pointers and in most cases taken as reader. Also this splits per-vdev locks into three different ones: bv_pending_lock protects the AVL-tree of pending operations in open context, bv_mos_entries_lock protects BRT ZAP object from while being prefetched, and bv_lock protects the rest of per-vdev context during TXG commit process. There should be no functional difference aside of some optimizations. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pawel Jakub Dawidek <pjd@FreeBSD.org> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16740	2024-11-15 15:04:11 -08:00
Rob Norris	673efbbf5d	zdb: add extra -T flag to show histograms of BRT refcounts Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16692	2024-11-01 12:08:33 -04:00
Rob Norris	b2f6de7b58	zdb: show bp in uberblock dump Just another useful nugget of info in times of strife. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16667	2024-10-20 10:01:49 -07:00
Martin Matuška	7e4be92750	zdb: fix printf format in dump_zap() When compiling zdb.c on 32-bit platforms, a format conversion error is reported for a printf() in dump_zap(). Change %l to macro %" PRIu64 " to match the platform size of a 64-bit unsigned integer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16635	2024-10-11 09:55:17 -07:00
rilysh	86737c5927	Avoid computing strlen() inside loops Compiling with -O0 (no proper optimizations), strlen() call in loops for comparing the size, isn't being called/initialized before the actual loop gets started, which causes n-numbers of strlen() calls (as long as the string is). Keeping the length before entering in the loop is a good idea. On some places, even with -O2, both GCC and Clang can't recognize this pattern, which seem to happen in an array of char pointer. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: rilysh <nightquick@proton.me> Closes #16584	2024-10-02 09:10:06 -07:00
Sanjeev Bagewadi	20232ecfaa	Support for longnames for files/directories (Linux part) This patch adds the ability for zfs to support file/dir name up to 1023 bytes. This number is chosen so we can support up to 255 4-byte characters. This new feature is represented by the new feature flag feature@longname. A new dataset property "longname" is also introduced to toggle longname support for each dataset individually. This property can be disabled, even if it contains longname files. In such case, new file cannot be created with longname but existing longname files can still be looked up. Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname. Note that NFS server may needs to use exportfs_get_name to reconnect dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So NFS may not work when longname is enabled. Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even though we add code to support it here, it won't actually work. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15921	2024-10-01 13:40:27 -07:00
Sanjeev Bagewadi	3cf2bfa570	Allocate zap_attribute_t from kmem instead of stack This patch is preparatory work for long name feature. It changes all users of zap_attribute_t to allocate it from kmem instead of stack. It also make zap_attribute_t and zap_name_t structure variable length. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #15921	2024-10-01 13:39:08 -07:00

1 2 3 4 5 ...

401 Commits