mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-24 19:28:53 +03:00

Author	SHA1	Message	Date
Ameer Hamza	23a489a411	zdb: detect cachefile automatically otherwise force import If a pool is created with the cache file located in a non-default path /etc/default/zpool.cache, removed, or the cachefile property is set to none, zdb fails to show the pool unless we specify the cache file or use the -e option. This PR automates this process. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16071	2024-06-03 16:28:43 -07:00
Rob Norris	ae512620d0	icp: remove skein module Nothing calls it through the KCF interface, so this is all unused. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16209	2024-05-31 15:13:39 -07:00
Rob Norris	57249bcddc	icp: brutally remove unused AES modes Still retaining the struture, for now. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16209	2024-05-31 15:12:51 -07:00
Martin Matuška	ae22044da9	spl: fix compilation without HAVE_BACKTRACE The __maybe_unused macro is defined in spl/sys/debug.h Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16229	2024-05-29 10:51:01 -07:00
Rob N	d0aa9dbccf	Use memset to zero stack allocations containing unions C99 6.7.8.17 says that when an undesignated initialiser is used, only the first element of a union is initialised. If the first element is not the largest within the union, how the remaining space is initialised is up to the compiler. GCC extends the initialiser to the entire union, while Clang treats the remainder as padding, and so initialises according to whatever automatic/implicit initialisation rules are currently active. When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN, -ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag sets the policy for automatic/implicit initialisation of variables on the stack. Taken together, this means that when compiling under CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only zero the first element in a union, and the rest will be filled with a pattern. This is significant for aes_ctx_t, which in aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero, but then used as a gcm_ctx_t, which is the fifth element in the union, and thus gets pattern initialisation. Later, it's assumed to be zero, resulting in a hang. As confusing and undiscoverable as it is, by the spec, we are at fault when we initialise a structure containing a union with the zero initializer. As such, this commit replaces these uses with an explicit memset(0). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16135 Closes #16206	2024-05-24 19:00:29 -07:00
Rob Norris	1ea8c59441	backtrace: rework for signal safety Mostly, try a lot harder to not allocate anything. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16181	2024-05-14 09:48:51 -07:00
Rob Norris	3974ef045e	libspl: lift backtrace into a separate file If it's going to be used directly by zdb/ztest, then it sort of doesn't make sense to carry it with the assert code. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16181	2024-05-14 09:48:45 -07:00
Rob Norris	e7b451941b	zdb/ztest: use libspl backtrace for crashes We can show much nicer backtraces these days, lets use them. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16181	2024-05-14 09:48:39 -07:00
Alan Somers	b64afa41d5	Better control the thread pool size when mounting datasets Ever since `a10d50f999`, ZFS has mounted file systems in parallel when importing a pool. It uses a fixed size of 512 for the thread pool. But since `c183d164aa`, it has also imported pools in parallel. So the total number of threads at one time is 513 * npools + 1. That can easily exceed the system's limit on the number of threads per process, which will cause one or more pools to be unable to allocate any worker threads, forcing them to fallback to slow serial mounting . To forestall that, manage the threadpool size in /sbin/zpool, not libzfs. Use the same size (512), but divided by the number of pools. This is a backwards-incompatible change to the libzfs abi. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Closes #16178	2024-05-14 09:36:21 -07:00
Alan Somers	eced2e2f1e	libzfs: Fix mounting datasets under thread limit pressure During parallel zpool import, /sbin/zpool will create a separate thread pool for each pool, used to mount that pool's datasets. If the total thread count exceed's the system's limit on threads per process, then tpool_dispatch may fail. If it does, directly execute the mount operation instead. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Closes #16178 Fixes #16172	2024-05-14 09:36:14 -07:00
Alan Somers	f625d038d2	tpool_dispatch: fail if it cannot start at least 1 worker. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Closes #16178	2024-05-14 09:35:48 -07:00
chenqiuhao1997	41ae864b69	Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN. In P2ALIGN, the result would be incorrect when align is unsigned integer and x is larger than max value of the type of align. In that case, -(align) would be a positive integer, which means high bits would be zero and finally stay zero after '&' when align is converted to a larger integer type. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Youzhong Yang <yyang@mathworks.com> Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com> Closes #15940	2024-05-10 08:47:21 -07:00
Rob N	1ede0c716b	libspl_assert: always link -lpthread on FreeBSD The pthread_* functions are in -lpthread on FreeBSD. Some of them are implicitly linked through libc, but on FreeBSD 13 at least pthread_getname_np() is not. Just be explicit, since -lpthread is the documented interface anyway. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16168	2024-05-09 07:43:48 -07:00
Martin Matuška	414acbd37e	Unbreak FreeBSD cross-build on MacOS broken in `051460b8b` MacOS used FreeBSD-compatible getprogname() and pthread_getname_np(). But pthread_getthreadid_np() does not exist on MacOS. This implements libspl_gettid() using pthread_threadid_np() to get the thread id of the current thread. Tested with FreeBSD GitHub actions freebsd-src/.github/workflows/cross-bootstrap-tools.yml Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16167	2024-05-09 07:42:51 -07:00
Rob Norris	051460b8b2	libspl/assert: use libunwind for backtrace when available libunwind seems to do a better job of resolving a symbols than backtrace(), and is also useful on platforms that don't have backtrace() (eg musl). If it's available, use it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140	2024-05-01 10:52:05 -07:00
Rob Norris	2152c405ba	libspl/assert: dump backtrace in assert Adds a check for the backtrace() function. If available, uses it to show a stack backtrace in the assertion output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140	2024-05-01 10:52:00 -07:00
Rob Norris	dec697ad68	libspl/assert: add lock around assertion output If multiple threads trip an assertion at the same moment (quite common), they can be printing at the same time, and their output gets messy. This adds a simple lock around the whole thing, to prevent a second task printing assert output before the first has finished. Additionally, if libspl_assert_ok is not set, abort() is called without dropping the lock, so that any other asserting tasks will be killed before starting any output, rather than only getting part-way through. This is a tradeoff; it's assumed that multiple threads asserting at the same moment are likely the same fault in different instances of a thread, and so there won't be any more useful information from the other tasks anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140	2024-05-01 10:51:54 -07:00
Rob Norris	394800200e	libspl/assert: show process/task details in assert output Makes it much easier to see what thing complained. Getting thread id, program name and thread name vary wildly between Linux and FreeBSD, so those are set up in macros. pthread_getname_np() did not appear in musl until very recently, but the same info has always been available via prctl(PR_GET_NAME), so we use that instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140	2024-05-01 10:51:49 -07:00
Rob Norris	4429ad9276	libzpool: set thread names Arrange for the thread/task name to be set when new threads are created. This makes them visible in the process table etc. pthread_setname_np() is generally available in glibc, musl and FreeBSD, so no test is required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140	2024-05-01 10:51:44 -07:00
Rich Ercolani	db499e68f9	Overflowing refreservation is bad Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 264, given the 217 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #15996	2024-04-29 11:32:49 -07:00
Tony Hutter	4840f023af	GCC: Fixes for gcc 14 on Fedora 40 - Workaround dangling pointer in uu_list.c (#16124) - Fix calloc() transposed arguments in zpool_vdev_os.c - Make some temp variables unsigned to prevent triggering a '-Werror=alloc-size-larger-than' error. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16124 Closes #16125	2024-04-29 11:31:50 -07:00
Jason Lee	e5ddecd1a7	return NULL at end of send_progress_thread Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jason Lee <jasonlee@lanl.gov> Closes #16074	2024-04-10 15:01:39 -07:00
Rich Ercolani	e5e2a5a3b8	Add custom debug printing for your asserts Being able to print custom debug information on assert trip seems useful. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #15792	2024-04-10 13:30:25 -07:00
Maxim Filimonov	f07389d3ad	Fix locale-specific time In `zpool status -t`, scrub date/time is reported using the C locale, while trim time is reported using the current one. This is inconsistent. This patch fixes that. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Maxim Filimonov <che@bein.link> Closes #15878 Closes #15879	2024-04-08 15:37:41 -07:00
George Wilson	b1e46f869e	Add ashift validation when adding devices to a pool Currently, zpool add allows users to add top-level vdevs that have different ashifts but doing so prevents users from being able to perform a top-level vdev removal. Often times consumers may not realize that they have mismatched ashifts until the top-level removal fails. This feature adds ashift validation to the zpool add command and will fail the operation if the sector size of the specified vdev does not match the existing pool. This behavior can be disabled by using the -f flag. In addition, new flags have been added to provide fine-grained control to disable specific checks. These flags are: --allow-in-use --allow-ashift-mismatch --allow-replicaton-mismatch The force flag will disable all of these checks. Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Maybee <mmaybee@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #15509	2024-03-29 13:15:56 -06:00
George Wilson	493fcce9be	Provide macros for setting and getting blkptr birth times There exist a couple of macros that are used to update the blkptr birth times but they can often be confusing. For example, the BP_PHYSICAL_BIRTH() macro will provide either the physical birth time if it is set or else return back the logical birth time. The complement to this macro is BP_SET_BIRTH() which will set the logical birth time and set the physical birth time if they are not the same. Consumers may get confused when they are trying to get the physical birth time and use the BP_PHYSICAL_BIRTH() macro only to find out that the logical birth time is what is actually returned. This change cleans up these macros and makes them symmetrical. The same functionally is preserved but the name is changed. Instead of calling BP_PHYSICAL_BIRTH(), consumer can now call BP_GET_BIRTH(). In additional to cleaning up this naming conventions, two new sets of macros are introduced -- BP_[SET\|GET]_LOGICAL_BIRTH() and BP_[SET\|GET]_PHYSICAL_BIRTH. These new macros allow the consumer to get and set the specific birth time. As part of the cleanup, the unused GRID macros have been removed and that portion of the blkptr are currently unused. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #15962	2024-03-25 15:01:54 -07:00
Brian Behlendorf	af4da5ccf2	Check for minimum partition size On Linux block devices used for vdevs will by partitioned. The block device must be large enough for an 64M partition starting at offset of 2048 sectors (part1), and a second 64M reserved partition at the end of the device (part9). This commit adds a capacity check when creating the GPT label to immediately detect a device which is too small. With the existing code this would be caught slightly latter when attempting to use the partition. Catching it sooner let's us print a more useful error. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15898	2024-02-16 09:07:32 -08:00
Rob Norris	5973854153	ddt: lift dedup stats out to separate file We want to add other kinds of dedup-related objects and keep stats for them. This makes those functions easier to use from outside ddt.c. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Closes #15887	2024-02-15 11:45:05 -08:00
Don Brady	cbe882298e	Add slow disk diagnosis to ZED Slow disk response times can be indicative of a failing drive. ZFS currently tracks slow I/Os (slower than zio_slow_io_ms) and generates events (ereport.fs.zfs.delay). However, no action is taken by ZED, like is done for checksum or I/O errors. This change adds slow disk diagnosis to ZED which is opt-in using new VDEV properties: VDEV_PROP_SLOW_IO_N VDEV_PROP_SLOW_IO_T If multiple VDEVs in a pool are undergoing slow I/Os, then it skips the zpool_vdev_degrade(). Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15469	2024-02-08 09:19:52 -08:00
Rich Ercolani	a0d3fe72bf	libzdb: Initial breakout of libzdb Step 1 in trying to slowly rip the zdb functions out of zdb.c to allow people to play with more flexible things to leverage zdb's functionality. No promises on any functions or structs being stable, now or probably in general unless someone builds a more polished abstraction, the goal at the moment is to slowly untangle the global state usage in zdb... Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #15804	2024-02-05 10:00:41 -08:00
Richard Kojedzinszky	401c3563d4	libzfs: use zfs_strerror() in place of strerror() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Closes #15793	2024-01-29 09:54:57 -08:00
Richard Kojedzinszky	692f0daba3	libzfs: make userquota_propname_decode threadsafe Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Closes #15793	2024-01-29 09:54:43 -08:00
rilysh	0cbf135293	libnvpair.c: replace strstr() with strchr() for a single character Since we're looking for a single new-line character in the haystack, it's better (and slightly more efficient) to use strchr() instead of strstr(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Signed-off-by: rilysh <nightquick@proton.me> Closes #15798	2024-01-29 09:46:13 -08:00
MigeljanImeri	78e8c1f844	Remove list_size struct member from list implementation Removed the list_size struct member as it was only used in a single assertion, as mentioned in PR #15478. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: MigeljanImeri <imerimigel@gmail.com> Closes #15812	2024-01-26 14:46:42 -08:00
Val Packett	d9cb42da99	FreeBSD: Fix bootstrapping tools under Linux/musl musl libc has deprecated LFS64 aliases, so bootstrapping FreeBSD tools under musl distros has been failing with stat64 errors. Apply the aliases under non-glibc Linux to fix this problem. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Val Packett <val@packett.cool> Closes #15780	2024-01-19 13:01:26 -08:00
Ameer Hamza	b64be1624c	Add path handling for aux vdevs in `label_path` If the AUX vdev is added using UUID, importing the pool falls back AUX vdev to open it with disk name instead of UUID due to the absence of path information for AUX vdevs. Since AUX label now have path information, this PR adds path handling for it in `label_path`. Reviewed-by: Umer Saleem <usaleem@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #15737	2024-01-16 13:18:07 -08:00
Brian Behlendorf	a1771d243a	Fix "out of memory" error Drop the no_memory() call from zpool_in_use() when reading the label fails and instead return the error to the caller. This prevents a misleading "internal error: out of memory" error when the label can't be read. This will result in is_spare() returning B_FALSE instead of aborting, which is already safely handled. Furthermore, on Linux it's possible for EREMOTEIO to returned by an NVMe device if the device has been low-level formatted and not rescanned. In this case we want to fallback to the legacy scanning method and read any of the labels we can. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #13538 Closes #15747	2024-01-12 12:35:29 -08:00
Mark Johnston	07e95b4670	Fix the FreeBSD userspace build (#15716 ) - Mark some parameters to zpool_power*() as unused. - Add a stub zpool_disk_wait(). Fixes: `a9520e6e5` ("zpool: Add slot power control, print power status") Signed-off-by: Mark Johnston <markj@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2023-12-27 12:17:53 -08:00
Tony Hutter	a9520e6e59	zpool: Add slot power control, print power status Add `zpool` flags to control the slot power to drives. This assumes your SAS or NVMe enclosure supports slot power control via sysfs. The new `--power` flag is added to `zpool offline\|online\|clear`: zpool offline --power <pool> <device> Turn off device slot power zpool online --power <pool> <device> Turn on device slot power zpool clear --power <pool> [device] Turn on device slot power If the ZPOOL_AUTO_POWER_ON_SLOT env var is set, then the '--power' option is automatically implied for `zpool online` and `zpool clear` and does not need to be passed. zpool status also gets a --power option to print the slot power status. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mart Frauenlob <AllKind@fastest.cc> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15662	2023-12-21 10:53:16 -08:00
Alexander Motin	9b1677fb5a	dmu: Allow buffer fills to fail When ZFS overwrites a whole block, it does not bother to read the old content from disk. It is a good optimization, but if the buffer fill fails due to page fault or something else, the buffer ends up corrupted, neither keeping old content, nor getting the new one. On FreeBSD this is additionally complicated by page faults being blocked by VFS layer, always returning EFAULT on attempt to write from mmap()'ed but not yet cached address range. Normally it is not a big problem, since after original failure VFS will retry the write after reading the required data. The problem becomes worse in specific case when somebody tries to write into a file its own mmap()'ed content from the same location. In that situation the only copy of the data is getting corrupted on the page fault and the following retries only fixate the status quo. Block cloning makes this issue easier to reproduce, since it does not read the old data, unlike traditional file copy, that may work by chance. This patch provides the fill status to dmu_buf_fill_done(), that in case of error can destroy the corrupted buffer as if no write happened. One more complication in case of block cloning is that if error is possible during fill, dmu_buf_will_fill() must read the data via fall-back to dmu_buf_will_dirty(). It is required to allow in case of error restoring the buffer to a state after the cloning, not not before it, that would happen if we just call dbuf_undirty(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15665	2023-12-15 09:51:41 -08:00
Rob N	f0cb6482e1	setproctitle: fix ununitialised variable Reported-by: Coverity (CID-1573333) Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #15649	2023-12-07 08:23:16 -08:00
Alexander Motin	adcea23cb0	ZIO: Add overflow checks for linear buffers Since we use a limited set of kmem caches, quite often we have unused memory after the end of the buffer. Put there up to a 512-byte canary when built with debug to detect buffer overflows at the free time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15553	2023-12-01 11:50:10 -08:00
Brooks Davis	3e4bef52b0	Only provide execvpe(3) when needed Check for the existence of execvpe(3) and only provide the FreeBSD compat version if required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #15609	2023-11-30 16:04:18 -08:00
Yuri Pankov	735ba3a7b7	Use uint64_t instead of u_int64_t Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Yuri Pankov <ypankov@tintri.com> Closes #15610	2023-11-30 10:36:33 -08:00
Wraithh	8adf2e3066	Fix zoneid when USER_NS is disabled getzoneid() should return GLOBAL_ZONEID instead of 0 when USER_NS is disabled. Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ilkka Sovanto <github@ilkka.kapsi.fi> Closes #15560	2023-11-29 09:55:17 -08:00
Brooks Davis	cd67bc0ae4	freebsd: remove __FBSDID macro use With FreeBSD's switch to git the $FreeBSD$ string is no longer expanded and they have mostly been removed upstream. Stop using __FBSDID and remove the no-longer needed sys/cdefs.h includes. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #15527	2023-11-17 14:02:09 -08:00
Paul Dagnelie	6c6fae6fae	Fix memory leak in zfs_setprocinit code Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #15508	2023-11-17 13:21:04 -08:00
Don Brady	5caeef02fa	RAID-Z expansion feature This feature allows disks to be added one at a time to a RAID-Z group, expanding its capacity incrementally. This feature is especially useful for small pools (typically with only one RAID-Z group), where there isn't sufficient hardware to add capacity by adding a whole new RAID-Z group (typically doubling the number of disks). == Initiating expansion == A new device (disk) can be attached to an existing RAIDZ vdev, by running `zpool attach POOL raidzP-N NEW_DEVICE`, e.g. `zpool attach tank raidz2-0 sda`. The new device will become part of the RAIDZ group. A "raidz expansion" will be initiated, and the new device will contribute additional space to the RAIDZ group once the expansion completes. The `feature@raidz_expansion` on-disk feature flag must be `enabled` to initiate an expansion, and it remains `active` for the life of the pool. In other words, pools with expanded RAIDZ vdevs can not be imported by older releases of the ZFS software. == During expansion == The expansion entails reading all allocated space from existing disks in the RAIDZ group, and rewriting it to the new disks in the RAIDZ group (including the newly added device). The expansion progress can be monitored with `zpool status`. Data redundancy is maintained during (and after) the expansion. If a disk fails while the expansion is in progress, the expansion pauses until the health of the RAIDZ vdev is restored (e.g. by replacing the failed disk and waiting for reconstruction to complete). The pool remains accessible during expansion. Following a reboot or export/import, the expansion resumes where it left off. == After expansion == When the expansion completes, the additional space is available for use, and is reflected in the `available` zfs property (as seen in `zfs list`, `df`, etc). Expansion does not change the number of failures that can be tolerated without data loss (e.g. a RAIDZ2 is still a RAIDZ2 even after expansion). A RAIDZ vdev can be expanded multiple times. After the expansion completes, old blocks remain with their old data-to-parity ratio (e.g. 5-wide RAIDZ2, has 3 data to 2 parity), but distributed among the larger set of disks. New blocks will be written with the new data-to-parity ratio (e.g. a 5-wide RAIDZ2 which has been expanded once to 6-wide, has 4 data to 2 parity). However, the RAIDZ vdev's "assumed parity ratio" does not change, so slightly less space than is expected may be reported for newly-written blocks, according to `zfs list`, `df`, `ls -s`, and similar tools. Sponsored-by: The FreeBSD Foundation Sponsored-by: iXsystems, Inc. Sponsored-by: vStack Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Authored-by: Matthew Ahrens <mahrens@delphix.com> Contributions-by: Fedor Uporov <fuporov.vstack@gmail.com> Contributions-by: Stuart Maybee <stuart.maybee@comcast.net> Contributions-by: Thorsten Behrens <tbehrens@outlook.com> Contributions-by: Fmstrat <nospam@nowsci.com> Contributions-by: Don Brady <dev.fs.zfs@gmail.com> Signed-off-by: Don Brady <dev.fs.zfs@gmail.com> Closes #15022	2023-11-08 10:19:41 -08:00
Tony Hutter	358ce2cf28	zed: misc vdev_enc_sysfs_path fixes There have been rare cases where the VDEV_ENC_SYSFS_PATH value that zed gets passed is stale. To mitigate this, dynamically check the sysfs path at the time of zed event processing, and use the dynamic value if possible. Note that there will be other times when we can not dynamically detect the sysfs path (like if a disk disappears) and have to rely on the old value for things like turning on the fault LED. That is to say, we can't just blindly use the dynamic path in every case. Also: - Add enclosure sysfs entry when running 'zpool add' - Fix 'slot' and 'enc' zpool.d scripts for nvme Reviewed-by: Don Brady <dev.fs.zfs@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #15462	2023-11-07 09:09:24 -08:00
ednadolski-ix	3bd4df3841	Improve ZFS objset sync parallelism As part of transaction group commit, dsl_pool_sync() sequentially calls dsl_dataset_sync() for each dirty dataset, which subsequently calls dmu_objset_sync(). dmu_objset_sync() in turn uses up to 75% of CPU cores to run sync_dnodes_task() in taskq threads to sync the dirty dnodes (files). There are two problems: 1. Each ZVOL in a pool is a separate dataset/objset having a single dnode. This means the objsets are synchronized serially, which leads to a bottleneck of ~330K blocks written per second per pool. 2. In the case of multiple dirty dnodes/files on a dataset/objset on a big system they will be sync'd in parallel taskq threads. However, it is inefficient to to use 75% of CPU cores of a big system to do that, because of (a) bottlenecks on a single write issue taskq, and (b) allocation throttling. In addition, if not for the allocation throttling sorting write requests by bookmarks (logical address), writes for different files may reach space allocators interleaved, leading to unwanted fragmentation. The solution to both problems is to always sync no more and (if possible) no fewer dnodes at the same time than there are allocators the pool. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Edmund Nadolski <edmund.nadolski@ixsystems.com> Closes #15197	2023-11-06 10:38:42 -08:00

1 2 3 4 5 ...

1364 Commits