mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-26 12:12:13 +03:00

Author	SHA1	Message	Date
Jorgen Lundman	41eba77061	pass handle to do_unmount() The same change has already been done for domount(). On macOS platform we need to have access to zhp to handle devdisks and snapshots. Also, symmetry is pleasing. In addition, the code in zpool_disable_datasets which sorts the mountpoints did not sort the related handle, which meant that the mountpoint, and the handle that it is paired with, was lost. You'd get a random handle with the mountpoint. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #12296	2021-07-15 12:31:00 -06:00
наб	d9f0f1582c	config/libatomic: require -latomic iff atomic.c doesn't link w/o it In absence of LTO, and dynamic libatomic, la.so ends up in the needs section of every toolchain executable; some consider this an issue. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12345 Closes #12359	2021-07-13 13:50:48 -07:00
Rich Ercolani	1325434b2d	Tinker with slop space accounting with dedup * Tinker with slop space accounting with dedup Do not include the deduplicated space usage in the slop space reservation, it leads to surprising outcomes. * Update spa_dedup_dspace sometimes Sometimes, we get into spa_get_slop_space() with spa_dedup_dspace=~0ULL, AKA "unset", while spa_dspace is correctly set. So call the code to update it before we use it if we hit that case. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12271	2021-07-13 09:47:57 -06:00
Alexander Motin	f7de776da2	Fix ARC ghost states eviction accounting arc_evict_hdr() returns number of evicted bytes in scope of specific state. For ghost states it does not mean the amount of really freed memory, but the logical buffer size. It is correct for the eviction process, but not for waking up threads waiting for ARC size reduction, as added in "Revise ARC shrinker algorithm" commit, causing premature wakeups while ARC is still overflowed, allowing even bigger overflow, plus processing overhead when next allocation will also get blocked, probably also for too short time. To fix that make arc_evict_hdr() also return the amount of really freed memory, which for the ghost states is only the header, and use it to update arc_evict_count instead. Originally I was thinking to not return it at all, since arc_get_data_impl() does not account for the headers, but decided that some slow allocation progress is better than long waits, reaching on my tests up to 100ms. To reduce negative latency effects of long time periods when reclaim thread can free little real memory, start reclamation process earlier, before we actually reached the overflow threshold, when we have to throttle new allocations. We can also do it without taking global arc_evict_lock, reducing the contention. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12279	2021-07-13 09:41:59 -06:00
Brian Behlendorf	07a4c76e90	Update bug report template - Remove the "SPL Version" line, the repositories have been merged since the 0.8 release and we no longer need to ask about this. - Simply ask for the kernel version / patch level and add a hint about how to get this information on Linux and FreeBSD. - Remove "Status: Triage Needed" from the template, in practice we really haven't been using this label so let's step setting it. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #12340	2021-07-12 14:05:50 -06:00
George Wilson	958826be7a	file reference counts can get corrupted Callers of zfs_file_get and zfs_file_put can corrupt the reference counts for the file structure resulting in a panic or a soft lockup. When zfs send/recv runs, it will add a reference count to the open file, and begin to send or recv the stream. If the file descriptor is closed, then when dmu_recv_stream() or dmu_send() return we will call zfs_file_put to remove the reference we placed on the file structure. Unfortunately, because zfs_file_put() uses the file descriptor to lookup the file structure, it may end up finding that the file descriptor table no longer contains the file struct, thus leaking the file structure. Or it might end up finding a file descriptor for a different file and blindly updating its reference counts. Other failure modes probably exists. This change reworks the zfs_file_[get\|put] interface to not rely on the file descriptor but instead pass the zfs_file_t pointer around. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Co-authored-by: Allan Jude <allan@klarasystems.com> Signed-off-by: George Wilson <gwilson@delphix.com> External-issue: DLPX-76119 Closes #12299	2021-07-10 19:00:37 -06:00
Jorgen Lundman	03dba7ae31	dprintf_dnode: strcpy -> strlcpy Missed a couple of strcpy() in earlier commit, this is only used with --enable-debug. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #12311	2021-07-07 21:13:40 -06:00
Jorgen Lundman	d2119d0e6f	Replace strchrnul() with strrchr() Could have gone either way with this one, either adding it to macOS/Windows SPL, or returning it to "classic" usage with strrchr(). Since the new special way isn't really used, and only used once, we have this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #12312	2021-07-07 21:08:13 -06:00
Alexander Motin	eb5983e1b7	FreeBSD: Use unmapped I/O for scattered/gang ABD buffers Many FreeBSD disk drivers support "unmapped" I/O mode, when data buffer represented not with a virtually contiguous KVA-mapped address range, but with a list of physical memory pages. Originally it was designed to do I/O from buffers without KVA mapping (unmapped). But moving virtual addresses out of equation allows us to operate even non-contiguous data buffers with one condition: all buffer discon- tinuities must be aligned to memory page borders. Doing I/O to capable GEOM device this patch traverses through non- linear ABD buffers, validating the chunks borders. If the condition is met, it supplies GEOM with the list of original physical memory pages instead of copying the data into temporary contiguous buffer. On capable hardware on pools with ashift=12 and default ABD chunk of 4KB it should handle all the I/O without additional memory copying. Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12320	2021-07-07 16:39:00 -07:00
Alexander Motin	bdd11cbb90	FreeBSD: Hardcode abd_chunk_size to PAGE_SIZE It makes no sense to set it below PAGE_SIZE, since it increases all overheads and makes returning memory to OS problematic. It makes no sense to set it above PAGE_SIZE, since such allocations and especially frees are too expensive and cause KVA fragmentation to benefit from fewer chunks. After that it makes no sense to keep more complicated math here. What may have sense though is just a tunable border between linear and scatter ABDs, previously also controlled by this tunable. Retain that functionality by taking abd_scatter_min_size tunable from Linux, just with different default value. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes #12328	2021-07-06 17:39:23 -07:00
Alexander Motin	97752ba22a	Move gethrtime() calls out of vdev queue lock This dramatically reduces the lock contention on systems with slower (non-TSC) timecounters. With TSC the difference is minimal, but since this lock is pretty congested, any improvement counts. Plus I don't see any reason to do it under the lock other than the latency of the lock itself, which this change actually reduces. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12281	2021-07-06 14:38:00 -07:00
Justin Gottula	6e4e3c3ab6	Udev rules: remove zvol compat symlinks (without the leading zvol/) This is a potentially arguable change, because it removes some compatibility cruft that certain systems or people may have come to rely on (either a very long time ago, or unwisely in recent times). On the other hand, it's been literally over a decade since OpenZFS switched to the strategy of using opaque numbered /dev/zd* device nodes, with the canonical zvol access path being a directory tree of symlinks created by udev rules inside /dev/zvol/. (See #102.) Even at the time, the /dev/ scheme was labeled as being for "compatibility". This commit removes the second tree of symlinks located directly at /dev/, under the assumption that anybody with any sense has been using the intended /dev/zvol/ path for a very very long time now. (The more I think about this, the more I anticipate that some large fraction of people will have been blissfully unaware that the intention has been for them to use the /dev/zvol/* tree all along, and they will have come to rely upon the /dev/* tree simply because it's been there this whole time despite being a compat thing.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Neal Gompa <ngompa@datto.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12303	2021-07-06 13:41:17 -07:00
Justin Gottula	f24c7c359e	Use substantially more robust program exit status logic in zvol_id Currently, there are several places in zvol_id where the program logic returns particular errno values, or even particular ioctl return values, as the program exit status, rather than a straightforward system of explicit zero on success and explicit nonzero value(s) on failure. This is problematic for multiple reasons. One particularly interesting problem that can arise, is that if any of these values happens to have all 8 least significant bits unset (i.e., it is a positive or negative multiple of 256), then although the C program sees a nonzero int value (presumed to be a failure exit status), the actual exit status as seen by the system is only the bottom 8 bits of that integer: zero. This can happen in practice, and I have encountered it myself. In a particularly weird situation, the zvol_open code in the zfs kernel module was behaving in such a manner that it caused the open() syscall to fail and for errno to be set to a kernel-private value (ERESTARTSYS, which happens to be defined as 512). It turns out that 512 is evenly divisible by 256; or, in other words, its least significant 8 bits are all-zero. So even though zvol_id believed it was returning a nonzero (failure) exit status of 512, the system modulo'd that value by 256, resulting in the actual exit status visible by other programs being 0! This actually-zero (non-failure) exit status caused problems: udev believed that the program was operating successfully, when in fact it was attempting to indicate failure via a nonzero exit status integer. Combined with another problem, this led to the creation of nonsense symlinks for zvol dev nodes by udev. Let's get rid of all this problematic logic, and simply return EXIT_SUCCESS (0) is everything went fine, and EXIT_FAILURE (1) if anything went wrong. Additionally, let's clarify some of the variable names (error is similar to errno, etc) and clean up the overall program flow a bit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12302	2021-07-02 13:10:36 -07:00
Justin Gottula	b19e2bdfb5	Print zvol_id error messages to stderr rather than stdout The zvol_id program is invoked by udev, via a PROGRAM key in the 60-zvol.rules.in rule file, to determine the "pretty" /dev/zvol/* symlink paths paths that should be generated for each opaquely named /dev/zd* dev node. The udev rule uses the PROGRAM key, followed by a SYMLINK+= assignment containing the %c substitution, to collect the program's stdout and then "paste" it directly into the name of the symlink(s) to be created. Unfortunately, as currently written, zvol_id outputs both its intended output (a single string representing the symlink path that should be created to refer to the name of the dataset whose /dev/zd* path is given) AND its error messages (if any) to stdout. When processing PROGRAM keys (and others, such as IMPORT{program}), udev uses only the data written to stdout for functional purposes. Any data written to stderr is used solely for the purposes of logging (if udev's log_level is set to debug). The unintended consequence of this is as follows: if zvol_id encounters an error condition; and then udev fails to halt processing of the current rule (either because zvol_id didn't return a nonzero exit status, or because the PROGRAM key in the rule wasn't written properly to result in a "non-match" condition that would stop the current rule on a nonzero exit); then udev will create a space-delimited list of symlink names derived directly from the words of the error message string! I've observed this exact behavior on my own system, in a situation where the open() syscall on /dev/zd* dev nodes was failing sporadically (for reasons that aren't especially relevant here). Because the open() call failed, zvol_id printed "Unable to open device file: /dev/zd736\n" to stdout and then exited. The udev rule finished with SYMLINK+="zvol/%c %c". Assuming a volume name like pool/foo/bar, this would ordinarily expand to SYMLINK+="zvol/pool/foo/bar pool/foo/bar" and would cause symlinks to be created like this: /dev/zvol/pool/foo/bar -> /dev/zd736 /dev/pool/foo/bar -> /dev/zd736 But because of the combination of error messages being printed to stdout, and the udev syntax freely accepting a space-delimited sequence of names in this context, the error message string "Unable to open device file: /dev/zd736\n" in reality expanded to SYMLINK+="zvol/Unable to open device file: /dev/zd736" which caused the following symlinks to actually be created: /dev/zvol/Unable -> /dev/zd736 /dev/to -> /dev/zd736 /dev/open -> /dev/zd736 /dev/device -> /dev/zd736 /dev/file: -> /dev/zd736 /dev//dev/zd736 -> /dev/zd736 (And, because multiple zvols had open() syscall errors, multiple zvols attempted to claim several of those symlink names, resulting in numerous udev errors and timeouts and general chaos.) This commit rectifies all this silliness by simply printing error messages to stderr, as Dennis Ritchie originally intended. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12302	2021-07-02 13:10:06 -07:00
Justin Gottula	f1aef8d4df	Udev rules: use match (==) rather than assign (=) for PROGRAM Assignment syntax (=) can be used for the PROGRAM key. But the PROGRAM key is really a match key, not an assign key. The internal logic used by udev to decide whether a PROGRAM key "matched" or not (which determines whether the remainder of the rule is evaluated) depends on whether the operator was OP_MATCH (==) or OP_NOMATCH (!=). [1] The man page claims that '"=", ":=", and "+=" have the same effect as "=="' for PROGRAM keys. And, after a brief perusal, the udev source code does seem to confirm that operators other than OP_MATCH (==) or OP_NOMATCH (!=) are implicitly converted to OP_MATCH (==). [2] But it's not entirely clear that this is definitely the case: anecdotal testing seems to indicate that when OP_ASSIGN (=) is used, the program's exit status is disregarded and the remainder of the rule is processed regardless of whether it was, in fact, a successful exit. The bottom line here is that, if zvol_id hits some snag and returns a nonzero exit status, then we almost certainly do NOT want to continue on with the rule and use whatever the stdout contents may have been to mindlessly create /dev/zvol/* symlinks. Therefore, let's be extra-sure and use the match (==) operator explicitly, to eliminate any possibility that udev might do the wrong thing, and ensure that a nonzero exit status will definitely short-circuit the rest of the rule, bypassing the SYMLINK+= assignments. [1] udev, file src/udev/udev-rules.c, func udev_rule_apply_token_to_event, switch case TK_M_PROGRAM if r != 0 (nonzero exit status): return token->op == OP_NOMATCH; switch case TK_M_PROGRAM if r == 0 (zero exit status): return token->op == OP_MATCH; func retval 0 => key is considered to have matched func retval 1 => key is considered to have NOT matched [2] udev, file src/udev/udev-rules.c, func parse_token, at func start: bool is_match = IN_SET(op, OP_MATCH, OP_NOMATCH); in else-if case streq(key, "PROGRAM"): if (!is_match) op = OP_MATCH; Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12302	2021-07-02 13:09:39 -07:00
Justin Gottula	17c794e7b0	Udev rules: replace deprecated $tempnode with $devnode The $tempnode substitution is so old that it's not even mentioned in the man page anymore. It is still technically supported by udev, but with plenty of "deprecated" comments surrounding it. The preferred modern equivalent of $tempnode is $devnode (or alternatively, %N). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12302	2021-07-02 13:09:09 -07:00
Justin Gottula	a5398f8782	Udev rules: use non-ancient comma syntax This file is old as dirt. It's entirely possible that commas were optional in udev back at that time. But they're definitely supposed to be there nowadays. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Signed-off-by: Justin Gottula <justin@jgottula.com> Closes #12302	2021-07-02 13:08:42 -07:00
Alexander Motin	b192a2c0a1	Remove avl_size field from struct avl_tree This field is used only by illumos mdb. On other platforms it only increases the struct size from 32 to 40 bytes. For struct vdev_queue including 13 instances of avl_tree_t size means active cache lines. Keep the padding in user-space for now to not break the ABI. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12290	2021-07-01 09:32:31 -06:00
Alexander Motin	490c845efe	Compact dbuf/buf hashes and lock arrays With default dbuf cache size of 1/32 of ARC, it makes no sense to have hash table of the same size (or even bigger on Linux). Reduce it to 1/8 of ARC's one, still leaving some slack, assuming higher I/O rate via dbuf cache than via ARC. Remove padding from ARC hash locks array. The idea behind padding is to avoid false sharing between locks. It would have sense if there would be a limited number of very busy locks. But since we have no limit on the number, using the same memory for more locks we can achieve even lower lock contention with the same false sharing, or we can use less memory for the same contention level. Reduce number of hash locks from 8192 to 2048. The number is still big enough to not cause contention, but reduced memory size improves cache hit rate for mutex_tryenter() in ARC eviction thread, saving about 1% of the thread time. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12289	2021-07-01 09:30:31 -06:00
Jorgen Lundman	c6d1112bf4	Fix abd leak, kmem_free correct size of abd_t Fix a leak of abd_t that manifested mostly when using raidzN with at least as many columns as N (e.g. a four-disk raidz2 but not a three-disk raidz2). Sufficiently heavy raidz use would eventually run a system out of memory. Additionally: * Switch abd_cache arena to FIRSTFIT, which empirically improves perofrmance. * Make abd_chunk_cache more performant and debuggable. * Allocate the abd_zero_buf from abd_chunk_cache rather than the heap. * Don't try to reap non-existent qcaches in abd_cache arena. * KM_PUSHPAGE->KM_SLEEP when allocating chunks from their own arena Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: Sean Doran <smd@use.net> Closes #12295	2021-07-01 09:28:15 -06:00
Jorgen Lundman	eca174527e	Upstream: dmu_zfetch_stream_fini leaks refcount dmu_zfetch_stream_fini() is missing calls to destroy the refcounts, leaking them and the mutex inside. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #12294	2021-07-01 09:22:16 -06:00
Kevin Jin	50e09eddd0	Optimize txg_kick() process (#12274 ) Use dp_dirty_pertxg[] for txg_kick(), instead of dp_dirty_total in original code. Extra parameter "txg" is added for txg_kick(), thus it knows which txg to kick. Also txg_kick() call is moved from dsl_pool_need_dirty_delay() to dsl_pool_dirty_space() so that we can know the txg number assigned for txg_kick(). Some unnecessary code regarding dp_dirty_total in txg_sync_thread() is also cleaned up. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: jxdking <lostking2008@hotmail.com> Closes #12274	2021-07-01 09:20:27 -06:00
Alexander Motin	42afb12da7	Remove refcount from spa_config_() The only reason for spa_config_() to use refcount instead of simple non-atomic (thanks to scl_lock) variable for scl_count is tracking, hard disabled for the last 8 years. Switch to simple int scl_count reduces the lock hold time by avoiding atomic, plus makes structure fit into single cache line, reducing the locks contention. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12287	2021-07-01 09:16:54 -06:00
Ryan Moeller	cfc564f9b1	ZED: Match added disk by pool/vdev GUID if found (#12217 ) This enables ZED to auto-online vdevs that are not wholedisk managed by ZFS. Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2021-06-30 07:37:20 -07:00
Brian Behlendorf	4694131a0a	Linux 5.13 compat: META Increase the Linux-Maximum version in the META file to 5.13. All of the required compatibility patches have been merged and the 5.13 kernel has been officially released. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2021-06-29 13:16:38 -07:00
Laurențiu Nicola	3482c2b79a	zed: fix sending emails (#12292 ) Commit `6fc3099` broke the quoting when invoking the mail program, revert that change. Signed-off-by: Laurențiu Nicola <lnicola@dend.ro> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov>	2021-06-29 12:33:49 -07:00
Alexander	6a19dea7f6	module/zfs: simplify ddt_stat_add() loop LLVM's Polly (ISL to be precise) is unhappy with the loop from ddt_stat_add(): CC [M] fs/zfs/zfs/ddt.o ../lib/External/isl/isl_schedule_node.c:2470: cannot insert node between set or sequence node and its filter children (building with the custom patch which adds Polly support to Kbuild) The mentioned loop is rather suboptimal. All that we need is to just treat ddt_stat_t as an array of u64 and perform 1:1 addition or substraction. This can be done in simpler for-loop with the determined index and bounds. Compiler will expand d_end - d into a number of ddt_stat_t fields at compile time. This prevents Polly from failing on this file. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Closes #12253	2021-06-29 08:26:11 -06:00
Alexander Motin	5b7053a9a5	Avoid 64bit division in multilist index functions The number of sublists in a multilist is relatively small. We dont need 64 bits to calculate an index. 32 bits is sufficient and makes the code more efficient. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12288	2021-06-29 06:59:14 -06:00
Michal Vasilek	f20fb199e6	Fix plymouth passphrase prompt with dracut plymouth --command splits the command on spaces which means that zfs-load-key was getting the filesystem name enclosed in single quotes (since `13c59bb76`) and failing. This commit fixes it by piping the password directly to the command similar to how it's done in other scripts (initramfs, dracut without plymouth). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michal Vasilek <michal@vasilek.cz> Related-to: #9193 Related-to: #9202 Closes #12147	2021-06-25 22:43:25 -07:00
Rich Ercolani	2084e9f748	Fix build with KASAN The stock zstd code expects some helpers from ASAN if present. This works fine in userland, but in kernel, KASAN also gets detected, and lacks those helpers. So let's make some empty substitutes for that case. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12232	2021-06-25 22:28:12 -07:00
Alexander Motin	5e2c8338bf	Help compiller optimize out abd_verify() While abd_verify() does nothing when built without debug, compiler can't optimize it out by itself due to calls to external list_*() and abd_verify_scatter(). This commit makes it explicit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam Moss <c@yotes.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12280	2021-06-25 16:38:31 -07:00
Martin Matuška	14d2841b53	FreeBSD: fix compilation of FreeBSD world after `29274c9f6` prng32_bounded() is available to kernel only on FreeBSD 13+. Call inline random_get_pseudo_bytes() with correct pointer type. To be consistent, apply to Linux as well. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #12282	2021-06-25 10:28:51 -07:00
Brian Behlendorf	88a4833039	Update cache file when setting compatibility property Unlike most other properties the 'compatibility' property is stored in the pool config object and not the DMU_OT_POOL_PROPS object. This had the advantage that the compatibility information is available without needing to fully import the pool (it can be read with zdb). However, this means we need to make sure to update both the copy of the config in the MOS and the cache file. This wasn't being done. This commit adds a call to spa_async_request() to ensure the copy of the config in the cache file gets updated as well as the one stored in the pool. This same change is made for the 'comment' property which suffers from the same inconsistency. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Colm Buckley <colm@tuatha.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12261 Closes #12276	2021-06-24 14:30:02 -07:00
Paul Dagnelie	8f11b1d26e	Fix flag copying in resume case A couple flags weren't being copied in the case where we're doing size estimation on a resume. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes: #12266	2021-06-24 13:42:01 -06:00
jumbi77	86f5e0bbce	zfs_metaslab_mem_limit should be 25 instead of 75 According to current zfs man page zfs_metaslab_mem_limit should be 25 instead of 75. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: jumbi77@users.noreply.github.com Closes #12273	2021-06-24 10:02:54 -07:00
Rich Ercolani	126615303d	Stop using "zstreamdump" in tests/ zstreamdump was replaced with "zstream dump"; let's stop using the old name, compat symlink or no. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12277	2021-06-24 09:38:33 -07:00
Jonathon	942121cfcb	Update libera webchat client URL Libera have made a webchat client available. This change builds on #12127. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jonathon Fernyhough <jonathon@m2x.dev> Closes #12251	2021-06-23 17:14:58 -07:00
Attila Fülöp	1b610ae45f	gcc 11 cleanup Compiling with gcc 11.1.0 produces three new warnings. Change the code slightly to avoid them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Attila Fülöp <attila@fueloep.org> Closes #12130 Closes #12188 Closes #12237	2021-06-23 17:57:06 -06:00
Brian Behlendorf	63f4b959a6	ZTS: Add known exceptions The receive-o-x_props_override test case reliably fails on the FreeBSD main builders (but not on Linux), until the root cause is understood add this test to the FreeBSD exception list. On Linux the alloc_class_012_pos test case may occasionally fail. This is a known false positive which has also been added to the Linux exception list until the test can be made entirely reliable. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12272	2021-06-23 15:53:13 -07:00
Rich Ercolani	8e739b2c9f	Annotated dprintf as printf-like ZFS loves using %llu for uint64_t, but that requires a cast to not be noisy - which is even done in many, though not all, places. Also a couple places used %u for uint64_t, which were promoted to %llu. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12233	2021-06-22 21:53:45 -07:00
Antonio Russo	a81b812495	Revert Consolidate arc_buf allocation checks This reverts commit `13fac09868`. Per the discussion in #11531, the reverted commit---which intended only to be a cleanup commit---introduced a subtle, unintended change in behavior. Care was taken to partially revert and then reapply `10b3c7f5e4` which would otherwise have caused a conflict. These changes were squashed in to this commit. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Suggested-by: @chrisrd Suggested-by: robn@despairlabs.com Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11531 Closes #12227	2021-06-22 21:39:15 -07:00
Alexander Motin	29274c9f6d	Optimize small random numbers generation In all places except two spa_get_random() is used for small values, and the consumers do not require well seeded high quality values. Switch those two exceptions directly to random_get_pseudo_bytes() and optimize spa_get_random(), renaming it to random_in_range(), since it is not related to SPA or ZFS in general. On FreeBSD directly map random_in_range() to new prng32_bounded() KPI added in FreeBSD 13. On Linux and in user-space just reduce the type used to uint32_t to avoid more expensive 64bit division. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12183	2021-06-22 17:35:23 -06:00
Brian Behlendorf	ba91311561	Linux 5.12 compat: META Increase the Linux-Maximum version in the META file to 5.12. All of the required compatibility patches have been merged. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2021-06-21 08:22:40 -07:00
Alexander Motin	c4c162c1e8	Use wmsum for arc, abd, dbuf and zfetch statistics. (#12172 ) wmsum was designed exactly for cases like these with many updates and rare reads. It allows to completely avoid atomic operations on congested global variables. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #12172	2021-06-16 18:19:34 -06:00
George Amanakis	9ffcaa370a	Avoid deadlock when removing L2ARC devices under I/O In case we have I/O and try to remove an L2ARC device a deadlock might occur. arc_read()->zio_read()->zfs_blkptr_verify() waits for SCL_VDEV to be dropped while holding the hash_lock. However, spa_l2cache_load() holds SCL_ALL and waits for the hash_lock in l2arc_evict(). Fix this by moving zfs_blkptr_verify() to the top top arc_read() before the hash_lock is taken. Verify the block pointer and return a checksum error if damaged rather than halting the system, by using BLK_VERIFY_LOG instead of BLK_VERIFY_HALT. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #12054	2021-06-16 18:17:42 -06:00
Rich Ercolani	ff31750405	Fix importing with symlinks It turns out that symlinks are heavily used on Linux in /dev/disk. So let's allow importing from them. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #12238	2021-06-14 09:59:54 -07:00
наб	83ba91adf6	systemd: import: expand $ZPOOL_IMPORT_OPTS correctly Turns out $ZPOOL_IMPORT_OPTS expands in a shell-like fashion, yielding 'import' '-aN' '-o' 'cachefile=none' for an unset variable, and 'import' '-aN' '-o' 'cachefile=none' 'word1' 'word2' for a white-spaced one, but ${ZPOOL_IMPORT_OPTS} expands like "${Z_I_O}" would in a shell, yielding 'import' '-aN' '-o' 'cachefile=none' '' (empty) and 'import' '-aN' '-o' 'cachefile=none' 'word1 word2' (spaced) Fixes `eec5ba113e` "dracut: 90zfs: respect zfs_force=1 on systemd systems" Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes: #12231	2021-06-14 09:48:53 -06:00
Matthew Ahrens	069bf406b4	vdev_draid_min_asize() ignores reserved space vdev_draid_min_asize() returns the minimum size of a child vdev. This is used when determining if a disk is big enough to replace a child. It's also used by zdb to determine how big of a child to make to test replacement. vdev_draid_min_asize() says that the child’s asize has to be at least 1/Nth of the entire draid’s asize, which is the same logic as raidz. However, this contradicts the code in vdev_draid_open(), which calculates the draid’s asize based on a reduced child size: An additional 32MB of scratch space is reserved at the end of each child for use by the dRAID expansion feature So the problem is that you can replace a draid disk with one that’s vdev_draid_min_asize(), but it actually needs to be larger to accommodate the additional 32MB. The replacement is allowed and everything works at first (since the reserved space is at the end, and we don’t try to use it yet), but when you try to close and reopen the pool, vdev_draid_open() calculates a smaller asize for the draid, because of the smaller leaf, which is not allowed. I think the confusion is that vdev_draid_min_asize() is correctly returning the amount of required allocatable space in a leaf, but the actual size of the leaf needs to be at least 32MB more than that. ztest_vdev_attach_detach() assumes that it can attach that size of device, and it actually can (the kernel/libzpool accepts it), but it then later causes zdb to not be able to open the pool. This commit changes vdev_draid_min_asize() to return the required size of the leaf, not the size that draid will make available to the metaslab allocator. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #11459 Closes #12221	2021-06-13 10:48:53 -07:00
Paul Zuchowski	afa7b34845	Do not hash unlinked inodes In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210	2021-06-11 17:00:33 -07:00
наб	10bcc4da6c	scripts/commitcheck.sh: fix false positive for signed commits Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes #12105	2021-06-11 09:10:25 -07:00

1 2 3 4 5 ...

7010 Commits