mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Arvind Sankar	60356b1a21	Add include files for prototypes Include the header with prototypes in the file that provides definitions as well, to catch any mismatch between prototype and definition. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:21:25 -07:00
Arvind Sankar	c3fe42aabd	Remove dead code Delete unused functions. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:21:18 -07:00
Arvind Sankar	65c7cc49bf	Mark functions as static Mark functions used only in the same translation unit as static. This only includes functions that do not have a prototype in a header file either. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10470	2020-06-18 12:20:38 -07:00
adilger	f734301d22	linux: add basic fallocate(mode=0/2) compatibility Implement semi-compatible functionality for mode=0 (preallocation) and mode=FALLOC_FL_KEEP_SIZE (preallocation beyond EOF) for ZPL. Since ZFS does COW and snapshots, preallocating blocks for a file cannot guarantee that writes to the file will not run out of space. Even if the first overwrite was guaranteed, it would not handle any later overwrite of blocks due to COW, so strict compliance is futile. Instead, make a best-effort check that at least enough free space is currently available in the pool (with a bit of margin), then create a sparse file of the requested size and continue on with life. This does not handle all cases (e.g. several fallocate() calls before writing into the files when the filesystem is nearly full), which would require a more complex mechanism to be implemented, probably based on a modified version of dmu_prealloc(), but is usable as-is. A new module option zfs_fallocate_reserve_percent is used to control the reserve margin for any single fallocate call. By default, this is 110% of the requested preallocation size, so an additional 10% of available space is reserved for overhead to allow the application a good chance of finishing the write when the fallocate() succeeds. If the heuristics of this basic fallocate implementation are not desirable, the old non-functional behavior of returning EOPNOTSUPP for calls can be restored by setting zfs_fallocate_reserve_percent=0. The parameter of zfs_statvfs() is changed to take an inode instead of a dentry, since no dentry is available in zfs_fallocate_common(). A few tests from @behlendorf cover basic fallocate functionality. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Arshad Hussain <arshad.super@gmail.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andreas Dilger <adilger@dilger.ca> Issue #326 Closes #10408	2020-06-18 11:22:11 -07:00
Matthew Macy	8056a75672	Disambiguate condvar API contract On Illumos callers of cv_timedwait and cv_timedwait_hires can't distinguish between whether or not the cv was signaled or the call timed out. Illumos handles this (for some definition of handles) by calling cv_signal in the return path if we were signaled but the return value indicates instead that we timed out. This would make sense if it were possible to query the the cv for its net signal disposition. However, this isn't possible and, in spite of the fact that there are places in the code that clearly take a different and incompatible path if a timeout value is indicated, this distinction appears to be rather subtle to most developers. This problem is further compounded by the fact that on Linux, calling cv_signal in the return path wouldn't even do the right thing unless there are other waiters. Since it is possible for the caller to independently determine how much time is remaining but it is not possible to query if the cv was in fact signaled, prioritizing signalling over timeout seems like a cleaner solution. In addition, judging from usage patterns within the code itself, it is also less error prone. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10471	2020-06-18 10:17:50 -07:00
Matthew Macy	7564073ed6	Add abd_cache_reap_now for abd_chunk_cache users Apparently missed in the initial port integration was the need to reap the abd_chunk_cache on FreeBSD. This change addresses that oversight. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10474	2020-06-17 21:44:13 -07:00
Ryan Moeller	86a0f49483	FreeBSD: Kernel module should depend on xdr not krpc after 1300092 Since https://reviews.freebsd.org/D24408 FreeBSD provides XDR functions in the xdr module instead of krpc. For FreeBSD 13, the MODULE_DEPEND should be changed to xdr Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10442 Closes #10443	2020-06-16 11:47:04 -07:00
Jorgen Lundman	d366c8fd7a	Make struct vdev_disk_t be platform private Linux defines different vdev_disk_t members to macOS, but they are only used in vdev_disk.c so move the declaration there. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #10452	2020-06-16 11:43:33 -07:00
Brian Atkinson	0a03495e3e	Fixing ABD struct allocation for FreeBSD In the event we are allocating a gang ABD in FreeBSD we are passing 0 to abd_alloc_struct(); however, this led to an allocation of ABD scatter with 0 chunks. This left the gang ABD allocation 24 bytes smaller than it should have been. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #10431	2020-06-16 10:05:22 -07:00
Brian Atkinson	e08b993396	Removing ZERO_PAGE abd_alloc_zero_scatter For MIPS architectures on Linux the ZERO_PAGE macro references empty_zero_page, which is exported as a GPL symbol. The call to ZERO_PAGE in abd_alloc_zero_scatter has been removed and a single zero'd page is now allocated for each of the pages in abd_zero_scatter in the kernel ABD code path. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #10428	2020-06-10 17:54:11 -07:00
Ryan Moeller	feff3f69fc	Fixup "Avoid the GEOM topology lock recursion when autoexpanding a pool" The patch was applied to vdev_geom_open instead of vdev_geom_close by mistake. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10427	2020-06-10 11:05:15 -07:00
Arvind Sankar	71504277ae	Cleanup linux module kbuild files The linux module can be built either as an external module, or compiled into the kernel, using copy-builtin. The source and build directories are slightly different between the two cases, and currently, compiling into the kernel still refers to some files from the configured ZFS source tree, instead of the copies inside the kernel source tree. There is also duplication between copy-builtin, which creates a Kbuild file to build ZFS inside the kernel tree, and the top-level module/Makefile.in. Fix this by moving the list of modules and the CFLAGS settings into a new module/Kbuild.in, which will be used by the kernel kbuild infrastructure, and using KBUILD_EXTMOD to distinguish the two cases within the Makefiles, in order to choose appropriate include directories etc. Module CFLAGS setting is simplified by using subdir-ccflags-y (available since 2.6.30) to set them in the top-level Kbuild instead of each individual module. The disabling of -Wunused-but-set-variable is removed from the lua and zfs modules. The variable that the Makefile uses is actually not defined, so this has no effect; and the warning has long been disabled by the kernel Makefile itself. The target_cpu definition in module/{zfs,zcommon} is removed as it was replaced by use of CONFIG_SPARC64 in commit `70835c5b75` ("Unify target_cpu handling") os/linux/{spl,zfs} are removed from obj-m, as they are not modules in themselves, but are included by the Makefile in the spl and zfs module directories. The vestigial Makefiles in os and os/linux are removed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Closes #10379 Closes #10421	2020-06-10 09:24:15 -07:00
Andrea Gelmini	dd4bc569b9	Fix typos Correct various typos in the comments and tests. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Closes #10423	2020-06-09 21:24:09 -07:00
Matthew Ahrens	7bcb7f0840	File incorrectly zeroed when receiving incremental stream that toggles -L Background: By increasing the recordsize property above the default of 128KB, a filesystem may have "large" blocks. By default, a send stream of such a filesystem does not contain large WRITE records, instead it decreases objects' block sizes to 128KB and splits the large blocks into 128KB blocks, allowing the large-block filesystem to be received by a system that does not support the `large_blocks` feature. A send stream generated by `zfs send -L` (or `--large-block`) preserves the large block size on the receiving system, by using large WRITE records. When receiving an incremental send stream for a filesystem with large blocks, if the send stream's -L flag was toggled, a bug is encountered in which the file's contents are incorrectly zeroed out. The contents of any blocks that were not modified by this send stream will be lost. "Toggled" means that the previous send used `-L`, but this incremental does not use `-L` (-L to no-L); or that the previous send did not use `-L`, but this incremental does use `-L` (no-L to -L). Changes: This commit addresses the problem with several changes to the semantics of zfs send/receive: 1. "-L to no-L" incrementals are rejected. If the previous send used `-L`, but this incremental does not use `-L`, the `zfs receive` will fail with this error message: incremental send stream requires -L (--large-block), to match previous receive. 2. "no-L to -L" incrementals are handled correctly, preserving the smaller (128KB) block size of any already-received files that used large blocks on the sending system but were split by `zfs send` without the `-L` flag. 3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`. This feature indicates that we can correctly handle "no-L to -L" incrementals. This flag is currently not set on any send streams. In the future, we intend for incremental send streams of snapshots that have large blocks to use `-L` by default, and these streams will also have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams from the default use of `zfs send` won't encounter the bug mentioned above, because they can't be received by software with the bug. Implementation notes: To facilitate accessing the ZPL's generation number, `zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and restructured to fill in a struct with ZPL-specific info including owner and generation. In the "no-L to -L" case, if this is a compressed send stream (from `zfs send -cL`), large WRITE records that are being written to small (128KB) blocksize files need to be decompressed so that they can be written split up into multiple blocks. The zio pipeline will recompress each smaller block individually. A new test case, `send-L_toggle`, is added, which tests the "no-L to -L" case and verifies that we get an error for the "-L to no-L" case. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #6224 Closes #10383	2020-06-09 10:41:01 -07:00
George Amanakis	b7654bd794	Trim L2ARC The l2arc_evict() function is responsible for evicting buffers which reference the next bytes of the L2ARC device to be overwritten. Teach this function to additionally TRIM that vdev space before it is overwritten if the device has been filled with data. This is done by vdev_trim_simple() which trims by issuing a new type of TRIM, TRIM_TYPE_SIMPLE. We also implement a "Trim Ahead" feature. It is a zfs module parameter, expressed in % of the current write size. This trims ahead of the current write size. A minimum of 64MB will be trimmed. The default is 0 which disables TRIM on L2ARC as it can put significant stress to underlying storage devices. To enable TRIM on L2ARC we set l2arc_trim_ahead > 0. We also implement TRIM of the whole cache device upon addition to a pool, pool creation or when the header of the device is invalid upon importing a pool or onlining a cache device. This is dependent on l2arc_trim_ahead > 0. TRIM of the whole device is done with TRIM_TYPE_MANUAL so that its status can be monitored by zpool status -t. We save the TRIM state for the whole device and the time of completion on-disk in the header, and restore these upon L2ARC rebuild so that zpool status -t can correctly report them. Whole device TRIM is done asynchronously so that the user can export of the pool or remove the cache device while it is trimming (ie if it is too slow). We do not TRIM the whole device if persistent L2ARC has been disabled by l2arc_rebuild_enabled = 0 because we may not want to lose all cached buffers (eg we may want to import the pool with l2arc_rebuild_enabled = 0 only once because of memory pressure). If persistent L2ARC has been disabled by setting the module parameter l2arc_rebuild_blocks_min_l2size to a value greater than the size of the cache device then the whole device is trimmed upon creation or import of a pool if l2arc_trim_ahead > 0. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Adam D. Moss <c@yotes.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #9713 Closes #9789 Closes #10224	2020-06-09 10:15:08 -07:00
Michael Niewöhner	32f26eaa70	Move GFP flags kernel compatibility code Move the GFP flags kernel compat code from c file to kmem header. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #10424	2020-06-08 16:33:46 -07:00
Michael Niewöhner	080102a1b6	Linux 5.8 compat: __vmalloc() The `pgprot` argument has been removed from `__vmalloc` in Linux 5.8, being `PAGE_KERNEL` always now [1]. Detect this during configure and define a wrapper for older kernels. [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/mm/vmalloc.c?h=next-20200605&id=88dca4ca5a93d2c09e5bbc6a62fbfc3af83c4fca Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Co-authored-by: Michael Niewöhner <foss@mniewoehner.de> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #10422	2020-06-08 16:32:02 -07:00
Pawel Jakub Dawidek	529246df96	Restore support for in-kernel ZFS ioctls In Illumos it is possible to call ioctl functions from within the kernel by passing the FKIOCTL flag. Neither FreeBSD nor Linux support that, but it doesn't hurt to keep it around, as all the code is there. Before this commit it was a dead code and zc_iflags was always zero. Restore this functionality by allowing to pass a flag to the zfsdev_ioctl_common() function. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #10417	2020-06-08 13:57:22 -07:00
Pawel Jakub Dawidek	77b998fa70	Remove redundant includes By removing excessive includes it takes us a small step close to compiling this file in userland. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net> Closes #10415	2020-06-08 09:57:36 -07:00
Jorgen Lundman	c9e319faae	Replace sprintf()->snprintf() and strcpy()->strlcpy() The strcpy() and sprintf() functions are deprecated on some platforms. Care is needed to ensure correct size is used. If some platforms miss snprintf, we can add a #define to sprintf, likewise strlcpy(). The biggest change is adding a size parameter to zfs_id_to_fuidstr(). The various *_impl_get() functions are only used on linux and have not yet been updated. Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes #10400	2020-06-07 11:42:12 -07:00
Ryan Moeller	60265072e0	Improve compatibility with C++ consumers C++ is a little picky about not using keywords for names, or string constness. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10409	2020-06-06 12:54:04 -07:00
Brian Behlendorf	63761a8f1a	zfsvfs_setup(): zap_stats_t may have undefined content when accessed (#10398 ) Signed-off-by: Allan Jude <allanjude@klarasystems.com> Co-authored-by: Allan Jude <allanjude@klarasystems.com>	2020-06-05 17:21:04 -07:00
Allan Jude	4547fc4e07	Connect dataset_kstats for FreeBSD Expand the FreeBSD spl for kstats to support all current types Move the dataset_kstats_t back to zvol_state_t from zfs_state_os_t now that it is common once again ``` kstat.zfs/mypool.dataset.objset-0x10b.nunlinked: 0 kstat.zfs/mypool.dataset.objset-0x10b.nunlinks: 0 kstat.zfs/mypool.dataset.objset-0x10b.nread: 150528 kstat.zfs/mypool.dataset.objset-0x10b.reads: 48 kstat.zfs/mypool.dataset.objset-0x10b.nwritten: 134217728 kstat.zfs/mypool.dataset.objset-0x10b.writes: 1024 kstat.zfs/mypool.dataset.objset-0x10b.dataset_name: mypool/datasetname ``` Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes #10386	2020-06-05 17:17:02 -07:00
Allan Jude	7da304bbce	zfsvfs_setup(): zap_stats_t may have undefined content when accessed Signed-off-by: Allan Jude <allanjude@klarasystems.com>	2020-06-03 22:20:27 +00:00
Ryan Moeller	c0eb5c35ef	FreeBSD: Simplify zvol and fix locking zvol_geom_bio_strategy should handle its own use of the zvol suspend reader lock and ensure the zilog exists when needed. A few other places using the zvol zilog should use the suspend reader lock as well. Simplify consumers of zvol_geom_bio_strategy, fix the locking, and while in here, use the boolean_t constants with doread. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10381	2020-06-03 10:45:12 -07:00
Matthew Macy	3bf3b164ee	Fix crypto build on FreeBSD HEAD Update API usage to reflect recent change. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10384	2020-05-30 12:54:57 -07:00
Brian Atkinson	fb822260b1	Gang ABD Type Adding the gang ABD type, which allows for linear and scatter ABDs to be chained together into a single ABD. This can be used to avoid doing memory copies to/from ABDs. An example of this can be found in vdev_queue.c in the vdev_queue_aggregate() function. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Brian <bwa@clemson.edu> Co-authored-by: Mark Maybee <mmaybee@cray.com> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #10069	2020-05-20 18:06:09 -07:00
Ryan Moeller	7b0e39030c	freebsd: Correct the order of arguments to copyin() for Q_SETQUOTA Sponsored by: DARPA External-issue: https://reviews.freebsd.org/D24656 FreeBSD-commit: freebsd/freebsd@a431c095d3 Authored by: jhb <jhb@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10344	2020-05-19 16:45:25 -07:00
Kyle Evans	5b090d57d4	freebsd: return EISDIR for read(2) on directories This is arguably a change for internal consistency within OpenZFS, as the Linux implementation will reject read(2) on directories with EISDIR. It's not unreasonable for read(2) to do something here on FreeBSD, but we don't currently copy out anything useful anyways so start rejecting it with the appropriate error. Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Kyle Evans <kevans@FreeBSD.org> Closes #10338	2020-05-16 10:12:01 -07:00
Ryan Moeller	d2782af461	Fix ZVOL_DIR We only use ZVOL_DIR on FreeBSD, and on FreeBSD it isn't correct. Move the definition to the file where it is needed, and define it as /dev/zvol/. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10337	2020-05-16 10:10:38 -07:00
yparitcher	cdcce2f019	Fix VN_OPEN_INVFS typo The VN_OPEN_INVFS literal is in the wrong field. Reviewed-by: Matt Macy <mmacy@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: yparitcher <y@paritcher.com> Closes #10322	2020-05-14 20:47:14 -07:00
Brian Behlendorf	2ade659eb4	Fix abd_enter/exit_critical wrappers Commit `fc551d7` introduced the wrappers abd_enter_critical() and abd_exit_critical() to mark critical sections. On Linux these are implemented with the local_irq_save() and local_irq_restore() macros which set the 'flags' argument when saving. By wrapping them with a function the local variable is no longer set by the macro and is no longer properly restored. Convert abd_enter_critical() and abd_exit_critical() to macros to resolve this issue and ensure the flags are properly restored. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10332	2020-05-14 20:45:16 -07:00
Brian Atkinson	fc551d7efb	Combine OS-independent ABD Code into Common Source File Reorganizing ABD code base so OS-independent ABD code has been placed into a common abd.c file. OS-dependent ABD code has been left in each OS's ABD source files, and these source files have been renamed to abd_os. The OS-independent ABD code is now under: module/zfs/abd.c With the OS-dependent code in: module/os/linux/zfs/abd_os.c module/os/freebsd/zfs/abd_os.c Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes #10293	2020-05-10 12:23:52 -07:00
Paul Dagnelie	108a454a46	Add support for boot environment data to be stored in the label Modern bootloaders leverage data stored in the root filesystem to enable some of their powerful features. GRUB specifically has a grubenv file which can store large amounts of configuration data that can be read and written at boot time and during normal operation. This allows sysadmins to configure useful features like automated failover after failed boot attempts. Unfortunately, due to the Copy-on-Write nature of ZFS, the standard behavior of these tools cannot handle writing to ZFS files safely at boot time. We need an alternative way to store data that allows the bootloader to make changes to the data. This work is very similar to work that was done on Illumos to enable similar functionality in the FreeBSD bootloader. This patch is different in that the data being stored is a raw grubenv file; this file can store arbitrary variables and values, and the scripting provided by grub is powerful enough that special structures are not required to implement advanced behavior. We repurpose the second padding area in each label to store the grubenv file, protected by an embedded checksum. We add two ioctls to get and set this data, and libzfs_core and libzfs functions to access them more easily. There are no direct command line interfaces to these functions; these will be added directly to the bootloader utilities. Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #10009	2020-05-07 09:36:33 -07:00
Ryan Moeller	ddc7a2dd3b	taskq: Don't leak system_delay_taskq on FreeBSD Adds a missing taskq_destroy() call. Reported by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10292	2020-05-05 09:36:41 -07:00
Ryan Moeller	6f3e1a4828	Avoid the GEOM topology lock recursion when autoexpanding a pool The steps to reproduce the problem: mdconfig -a -t swap -s 3g -u 0 gpart create -s GPT md0 gpart add -t freebsd-zfs -s 1g md0 zpool create -o autoexpand=on foo md0p1 gpart resize -i 1 -s 2g md0 Authored by: pjd <pjd@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@bccd2db598 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10270	2020-05-04 15:10:41 -07:00
Ryan Moeller	639dfeb831	Update FreeBSD SPL atomics Sync up with the following changes from FreeBSD: ZFS: add emulation of atomic_swap_64 and atomic_load_64 Some 32-bit platforms do not provide 64-bit atomic operations that ZFS requires, either in userland or at all. We emulate those operations for those platforms using a mutex. That is not entirely correct and it's very efficient. Besides, the loads are plain loads, so torn values are possible. Nevertheless, the emulation seems to work for some definition of work. This change adds atomic_swap_64, which is already used in ZFS code, and atomic_load_64 that can be used to prevent torn reads. Authored by: avg <avg@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@3458e5d1e6 cleanup of illumos compatibility atomics atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms. Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it. The only exception is sparc64 that provides MD atomic_cas_32 and atomic_cas_64. This is slightly inefficient as fcmpset reports whether the operation updated the target and that information is not needed for cas. Nevertheless, there is less code to maintain and to add for new platforms. Also, the operations are done inline now as opposed to function calls before. atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms that provide it. casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they have no users. atomic_mtx that is used to emulate 64-bit atomics on platforms that lack them is defined only on those platforms. As a result, platform specific opensolaris_atomic.S files have lost most of their code. The only exception is i386 where the compat+contrib code provides 64-bit atomics for userland use. That code assumes availability of cmpxchg8b instruction. FreeBSD does not have that assumption for i386 userland and does not provide 64-bit atomics. Hopefully, this can and will be fixed. Authored by: avg <avg@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@e9642c209b emulate illumos membar_producer with atomic_thread_fence_rel membar_producer is supposed to be a store-store barrier. Also, in the code that FreeBSD has ported from illumos membar_producer is used only with regular stores to regular memory (with respect to caching). We do not have an MI primitive for the store-store barrier, so atomic_thread_fence_rel is the closest we have as it provides (load \| store) -> store barrier. Previously, membar_producer was an empty function call on all 32-bit arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was inadequate. On other platforms, such as amd64, arm64, i386, powerpc64, sparc64, membar_producer was implemented using stronger primitives than required for a store-store barrier with respect to regular memory access. For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO). On powerpc64 we now use recommended lwsync instead of eieio. On sparc64 FreeBSD uses TSO mode. On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this is an improvement, actually. After this change we can drop opensolaris_atomic.S for aarch64, amd64, powerpc64 and sparc64 as all required atomic operations have either direct or light-weight mapping to FreeBSD native atomic operations. Discussed with: kib Authored by: avg <avg@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@50cdda62fc fix up r353340, don't assume that fcmpset has strong semantics fcmpset can have two kinds of semantics, weak and strong. For practical purposes, strong semantics means that if fcmpset fails then the reported current value is always different from the expected value. Weak semantics means that the reported current value may be the same as the expected value even though fcmpset failed. That's a so called "sporadic" failure. I originally implemented atomic_cas expecting strong semantics, but many platforms actually have weak one. Reported by: pkubaj (not confirmed if same issue) Discussed with: kib, mjg Authored by: avg <avg@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@238787c74e [PowerPC] [MIPS] Implement 32-bit kernel emulation of atomic64 operations This is a lock-based emulation of 64-bit atomics for kernel use, split off from an earlier patch by jhibbits. This is needed to unblock future improvements that reduce the need for locking on 64-bit platforms by using atomic updates. The implementation allows for future integration with userland atomic64, but as that implies going through sysarch for every use, the current status quo of userland doing its own locking may be for the best. Submitted by: jhibbits (original patch), kevans (mips bits) Reviewed by: jhibbits, jeff, kevans Authored by: bdragon <bdragon@FreeBSD.org> Differential Revision: https://reviews.freebsd.org/D22976 FreeBSD-commit: freebsd/freebsd@db39dab3a8 Remove sparc64 kernel support Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs Authored by: imp <imp@FreeBSD.org> FreeBSD-commit: freebsd/freebsd@48b94864c5 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Ported-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10250	2020-05-04 15:07:04 -07:00
Paul B. Henson	0aeb0bed6f	OpenZFS 6765 - zfs_zaccess_delete() comments do not accurately reflect delete permissions for ACLs Authored by: Kevin Crowe <kevin.crowe@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org> Porting Notes: * Only comments are updated OpenZFS-issue: https://www.illumos.org/issues/6765 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/da412744bc Closes #10266	2020-04-30 11:24:55 -07:00
Paul B. Henson	235a856576	OpenZFS 6762 - POSIX write should imply DELETE_CHILD on directories - and some additional considerations Authored by: Kevin Crowe <kevin.crowe@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/6762 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1eb4e906ec Closes #10266	2020-04-30 11:24:38 -07:00
Paul B. Henson	99495ba6ab	OpenZFS 8984 - fix for 6764 breaks ACL inheritance Authored by: Dominik Hassler <hadfl@omniosce.org> Reviewed by: Sam Zaydel <szaydel@racktopsystems.com> Reviewed by: Paul B. Henson <henson@acm.org> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Matthew Ahrens <mahrens@delphix.com> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/8984 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e9bacc6d1a Closes #10266	2020-04-30 11:24:27 -07:00
Paul B. Henson	5a2f527d4b	OpenZFS 6764 - zfs issues with inheritance flags during chmod(2) with aclmode=passthrough Authored by: Albert Lee <trisk@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/6764 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/de0f1ddb59 Closes #10266	2020-04-30 11:24:14 -07:00
Paul B. Henson	7bf3e1fa0f	OpenZFS 3254 - add support in zfs for aclmode=restricted Authored-by: Paul B. Henson <henson@acm.org> Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/3254 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/71dbfc287c Closes #10266	2020-04-30 11:23:59 -07:00
Paul B. Henson	a1af567bb6	OpenZFS 742 - Resurrect the ZFS "aclmode" property OpenZFS 664 - Umask masking "deny" ACL entries OpenZFS 279 - Bug in the new ACL (post-PSARC/2010/029) semantics Porting notes: * Updated zfs_acl_chmod to take 'boolean_t isdir' as first parameter rather than 'zfsvfs_t zfsvfs' zfs man pages changes mixed between zfs and new zfsprops man pages Reviewed by: Aram Hvrneanu <aram@nexenta.com> Reviewed by: Gordon Ross <gwr@nexenta.com> Reviewed by: Robert Gordon <rbg@openrbg.com> Reviewed by: Mark.Maybee@oracle.com Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@nexenta.com> Ported-by: Paul B. Henson <henson@acm.org> OpenZFS-issue: https://www.illumos.org/issues/742 OpenZFS-issue: https://www.illumos.org/issues/664 OpenZFS-issue: https://www.illumos.org/issues/279 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/a3c49ce110 Closes #10266	2020-04-30 11:22:45 -07:00
Ryan Moeller	a8085184d6	Fix zlib leak on FreeBSD zlib_inflateEnd was accidentally a wrapper for inflateInit instead of inflateEnd, and hilarity ensues. Fix the typo so we free memory instead of allocating more. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10225 Closes #10252	2020-04-28 09:14:30 -07:00
Matthew Macy	c614fd6e12	Use new FreeBSD API to largely eliminate object locking Propagate changes in HEAD that mostly eliminate object locking. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10205	2020-04-17 09:30:26 -07:00
Ryan Moeller	a7929f3137	Update FreeBSD tunables Remove some obsolete legacy compat, rename some misnamed, and add some missing tunables for FreeBSD. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10203	2020-04-15 11:14:47 -07:00
Matthew Macy	9f0a21e641	Add FreeBSD support to OpenZFS Add the FreeBSD platform code to the OpenZFS repository. As of this commit the source can be compiled and tested on FreeBSD 11 and 12. Subsequent commits are now required to compile on FreeBSD and Linux. Additionally, they must pass the ZFS Test Suite on FreeBSD which is being run by the CI. As of this commit 1230 tests pass on FreeBSD and there are no unexpected failures. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #898 Closes #8987	2020-04-14 11:36:28 -07:00
Matthew Ahrens	20f287855a	zvol_write() can use dmu_tx_hold_write_by_dnode() We can improve the performance of writes to zvols by using dmu_tx_hold_write_by_dnode() instead of dmu_tx_hold_write(). This reduces lock contention on the first block of the dnode object, and also reduces the amount of CPU needed. The benefit will be highest with multi-threaded async writes (i.e. writes that don't call zil_commit()). Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10184	2020-04-10 21:14:01 -07:00
George Amanakis	77f6826b83	Persistent L2ARC This commit makes the L2ARC persistent across reboots. We implement a light-weight persistent L2ARC metadata structure that allows L2ARC contents to be recovered after a reboot. This significantly eases the impact a reboot has on read performance on systems with large caches. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Saso Kiselkov <skiselkov@gmail.com> Co-authored-by: Jorgen Lundman <lundman@lundman.net> Co-authored-by: George Amanakis <gamanakis@gmail.com> Ported-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #925 Closes #1823 Closes #2672 Closes #3744 Closes #9582	2020-04-10 10:33:35 -07:00
Ryan Moeller	36a6e2335c	Don't ignore zfs_arc_max below allmem/32 Set arc_c_min before arc_c_max so that when zfs_arc_min is set lower than the default allmem/32 zfs_arc_max can also be set lower. Add warning messages when tunables are being ignored. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10157 Closes #10158	2020-04-09 15:39:48 -07:00
Brian Behlendorf	68dde63d13	Linux 5.7 compat: blk_alloc_queue() Commit https://github.com/torvalds/linux/commit/3d745ea5 simplified the blk_alloc_queue() interface by updating it to take the request queue as an argument. Add a wrapper function which accepts the new arguments and internally uses the available interfaces. Other minor changes include increasing the Linux-Maximum to 5.6 now that 5.6 has been released. It was not bumped to 5.7 because this release has not yet been finalized and is still subject to change. Added local 'struct zvol_state_os *zso' variable to zvol_alloc. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10181 Closes #10187	2020-04-09 09:16:46 -07:00
Paul Dagnelie	5a42ef04fd	Add 'zfs wait' command Add a mechanism to wait for delete queue to drain. When doing redacted send/recv, many workflows involve deleting files that contain sensitive data. Because of the way zfs handles file deletions, snapshots taken quickly after a rm operation can sometimes still contain the file in question, especially if the file is very large. This can result in issues for redacted send/recv users who expect the deleted files to be redacted in the send streams, and not appear in their clones. This change duplicates much of the zpool wait related logic into a zfs wait command, which can be used to wait until the internal deleteq has been drained. Additional wait activities may be added in the future. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: John Gallagher <john.gallagher@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #9707	2020-04-01 10:02:06 -07:00
Fabio Scaccabarozzi	c9e3efdb3a	Bugfix/fix uio partial copies In zfs_write(), the loop continues to the next iteration without accounting for partial copies occurring in uiomove_iov when copy_from_user/__copy_from_user_inatomic return a non-zero status. This results in "zfs: accessing past end of object..." in the kernel log, and the write failing. Account for partial copies and update uio struct before returning EFAULT, leave a comment explaining the reason why this is done. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: ilbsmart <wgqimut@gmail.com> Signed-off-by: Fabio Scaccabarozzi <fsvm88@gmail.com> Closes #8673 Closes #10148	2020-04-01 09:48:54 -07:00
Matthew Ahrens	0929c4de39	Improve ZVOL sync write performance by using a taskq == Summary == Prior to this change, sync writes to a zvol are processed serially. This commit makes zvols process concurrently outstanding sync writes in parallel, similar to how reads and async writes are already handled. The result is that the throughput of sync writes is tripled. == Background == When a write comes in for a zvol (e.g. over iscsi), it is processed by calling `zvol_request()` to initiate the operation. ZFS is expected to later call `BIO_END_IO()` when the operation completes (possibly from a different thread). There are a limited number of threads that are available to call `zvol_request()` - one one per iscsi client (unless using MC/S). Therefore, to ensure good performance, the latency of `zvol_request()` is important, so that many i/o operations to the zvol can be processed concurrently. In other words, if the client has multiple outstanding requests to the zvol, the zvol should have multiple outstanding requests to the storage hardware (i.e. issue multiple concurrent `zio_t`'s). For reads, and async writes (i.e. writes which can be acknowledged before the data reaches stable storage), `zvol_request()` achieves low latency by dispatching the bulk of the work (including waiting for i/o to disk) to a taskq. The taskq callback (`zvol_read()` or `zvol_write()`) blocks while waiting for the i/o to disk to complete. The `zvol_taskq` has 32 threads (by default), so we can have up to 32 concurrent i/os to disk in service of requests to zvols. However, for sync writes (i.e. writes which must be persisted to stable storage before they can be acknowledged, by calling `zil_commit()`), `zvol_request()` does not use `zvol_taskq`. Instead it blocks while waiting for the ZIL write to disk to complete. This has the effect of serializing sync writes to each zvol. In other words, each zvol will only process one sync write at a time, waiting for it to be written to the ZIL before accepting the next request. The same issue applies to FLUSH operations, for which `zvol_request()` calls `zil_commit()` directly. == Description of change == This commit changes `zvol_request()` to use `taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations. Therefore we can have up to 32 threads (the taskq threads) simultaneously calling `zil_commit()`, for a theoretical performance improvement of up to 32x. To avoid the locking issue described in the comment (which this commit removes), we acquire the rangelock from the taskq callback (e.g. `zvol_write()`) rather than from `zvol_request()`. This applies to all writes (sync and async), reads, and discard operations. This means that multiple simultaneously-outstanding i/o's which access the same block can complete in any order. This was previously thought to be incorrect, but a review of the block device interface requirements revealed that this is fine - the order is inherently not defined. The shorter hold time of the rangelock should also have a slight performance improvement. For an additional slight performance improvement, we use `taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a `kmem_alloc()` and eliminates a failure mode. This applies to all writes (sync and async), reads, and discard operations. == Performance results == We used a zvol as an iscsi target (server) for a Windows initiator (client), with a single connection (the default - i.e. not MC/S). We used `diskspd` to generate a workload with 4 threads, doing 1MB writes to random offsets in the zvol. Without this change we get 231MB/s, and with the change we get 728MB/s, which is 3.15x the original performance. We ran a real-world workload, restoring a MSSQL database, and saw throughput 2.5x the original. We saw more modest performance wins (typically 1.5x-2x) when using MC/S with 4 connections, and with different number of client threads (1, 8, 32). Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10163	2020-03-31 10:50:44 -07:00
Ryan Moeller	9a51738b60	Let default arc_c_max be platform dependent Linux changed the default max ARC size to 1/2 of physical memory to deal with shortcomings of the Linux SLUB allocator. Other platforms do not require the same logic. Implement an arc_default_max() function to determine a default max ARC size in platform code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10155	2020-03-27 09:14:46 -07:00
Brian Behlendorf	5351951274	Fix zfs_rmnode() unlink / rollback issue If a has rollback has occurred while a file is open and unlinked. Then when the file is closed post rollback it will not exist in the rolled back version of the unlinked object. Therefore, the call to zap_remove_int() may correctly return ENOENT and should be allowed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6812 Closes #9739	2020-03-18 11:47:07 -07:00
Matthew Ahrens	0fdd6106bb	dmu_objset_from_ds must be called with dp_config_rwlock held The normal lock order is that the dp_config_rwlock must be held before the ds_opening_lock. For example, dmu_objset_hold() does this. However, dmu_objset_open_impl() is called with the ds_opening_lock held, and if the dp_config_rwlock is not already held, it will attempt to acquire it. This may lead to deadlock, since the lock order is reversed. Looking at all the callers of dmu_objset_open_impl() (which is principally the callers of dmu_objset_from_ds()), almost all callers already have the dp_config_rwlock. However, there are a few places in the send and receive code paths that do not. For example: dsl_crypto_populate_key_nvlist, send_cb, dmu_recv_stream, receive_write_byref, redact_traverse_thread. This commit resolves the problem by requiring all callers ot dmu_objset_from_ds() to hold the dp_config_rwlock. In most cases, the code has been restructured such that we call dmu_objset_from_ds() earlier on in the send and receive processes, when we already have the dp_config_rwlock, and save the objset_t until we need it in the middle of the send or receive (similar to what we already do with the dsl_dataset_t). Thus we do not need to acquire the dp_config_rwlock in many new places. I also cleaned up code in dmu_redact_snap() and send_traverse_thread(). Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Zuchowski <pzuchowski@datto.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #9662 Closes #10115	2020-03-12 10:55:02 -07:00
Matthew Ahrens	b3212d2fa6	Improve performance of zio_taskq_member __zio_execute() calls zio_taskq_member() to determine if we are running in a zio interrupt taskq, in which case we may need to switch to processing this zio in a zio issue taskq. The call to zio_taskq_member() can become a performance bottleneck when we are processing a high rate of zio's. zio_taskq_member() calls taskq_member() on each of the zio interrupt taskqs, of which there are 21. This is slow because each call to taskq_member() does tsd_get(taskq_tsd), which on Linux is relatively slow. This commit improves the performance of zio_taskq_member() by having it cache the value of tsd_get(taskq_tsd), reducing the number of those calls to 1/21th of the current behavior. In a test case running `zfs send -c >/dev/null` of a filesystem with small blocks (average 2.5KB/block), zio_taskq_member() was using 6.7% of one CPU, and with this change it is reduced to 1.3%. Overall time to perform the `zfs send` reduced by 10% (~150,000 block/sec to ~165,000 blocks/sec). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #10070	2020-03-03 10:29:38 -08:00
Ryan Moeller	9bb907bc3f	Make spa_history_zone platform-dependent in kernel This function should only return "linux" on Linux. Move the kernel part of the function out of common code. Fix the tests for FreeBSD. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #10079	2020-03-02 09:43:30 -08:00
Matthew Macy	ae9f92f6f3	Re-share zfsdev_getminor and zfs_onexit_fd_hold By adding a zfs_file_private accessor to the common interfaces and some extensions to FreeBSD platform code it is now possible to share the implementations for the aforementioned functions. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #10073	2020-02-28 14:50:32 -08:00
Brian Behlendorf	bd0d24e09b	Linux 5.5 compat: blkg_tryget() Commit https://github.com/torvalds/linux/commit/9e8d42a0f accidentally converted the static inline function blkg_tryget() to GPL-only for kernels built with CONFIG_PREEMPT_RCU=y and CONFIG_BLK_CGROUP=y. Resolve the build issue by providing our own equivalent functionality when needed which uses rcu_read_lock_sched() internally as before. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9745 Closes #10072	2020-02-28 08:58:39 -08:00
Brian Behlendorf	2c3a83701d	Linux 5.6 compat: time_t As part of the Linux kernel's y2038 changes the time_t type has been fully retired. Callers are now required to use the time64_t type. Rather than move to the new type, I've removed the few remaining places where a time_t is used in the kernel code. They've been replaced with a uint64_t which is already how ZFS internally handled these values. Going forward we should work towards updating the remaining user space time_t consumers to the 64-bit interfaces. Reviewed-by: Matthew Macy <mmacy@freebsd.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #10052 Closes #10064	2020-02-27 09:31:02 -08:00
Matthew Macy	28caa74b19	Refactor dnode dirty context from dbuf_dirty * Add dedicated donde_set_dirtyctx routine. * Add empty dirty record on destroy assertion. * Make much more extensive use of the SET_ERROR macro. Reviewed-by: Will Andrews <wca@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9924	2020-02-26 16:09:17 -08:00
Dirkjan Bussink	327000ce04	Remove zfs_getattr and convoff dead code The `convoff` function is called only in one code path in `zfs_space`. Each caller of `zfs_space` is called with a `flock64_t` that has `l_whence` set to `SEEK_SET`. This means that `convoff` always results in a no-op as the `bfp` parameter has `l_whence` set to `SEEK_SET` and `int whence` is `SEEK_SET` as well. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com> Closes #10006	2020-02-24 15:38:22 -08:00
DeHackEd	d09dc5980c	Honour sync=disabled when relinking tpmfiles Unlinked files don't respect synchronous flush commands, but when they get relinked their state is unknown. Previously we force flushed all such files even when sync=disabled. Correct this case. Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: DHE <git@dehacked.net> Closes #10005	2020-02-16 12:44:08 -08:00
Matthew Ahrens	2adc6b35ae	Missed wakeup when growing kmem cache When growing the size of a (VMEM or KVMEM) kmem cache, spl_cache_grow() always does taskq_dispatch(spl_cache_grow_work), and then waits for the KMC_BIT_GROWING to be cleared by the taskq thread. The taskq thread (spl_cache_grow_work()) does: 1. allocate new slab and add to list 2. wake_up_all(skc_waitq) 3. clear_bit(KMC_BIT_GROWING) Therefore, the waiting thread can wake up before GROWING has been cleared. It will see that the growing has not yet completed, and go back to sleep until it hits the 100ms timeout. This can have an extreme performance impact on workloads that alloc/free more than fits in the (statically-sized) magazines. These workloads allocate and free slabs with high frequency. The problem can be observed with `funclatency spl_cache_grow`, which on some workloads shows that 99.5% of the time it takes <64us to allocate slabs, but we spend ~70% of our time in outliers, waiting for the 100ms timeout. The fix is to do `clear_bit(KMC_BIT_GROWING)` before `wake_up_all(skc_waitq)`. A future investigation should evaluate if we still actually need to taskq_dispatch() at all, and if so on which kernel versions. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Wilson <gwilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #9989	2020-02-13 11:23:02 -08:00
Ryan Moeller	57940b435c	Share some code for spa deadman tunables We need to do the same thing to update all spas on any OS for these tunables, so let's share the code. While here let's match the types of the literals initializing the variables with the type of the variable. Reviewed-by: Allan Jude <allanjude@freebsd.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@iXsystems.com> Closes #9964	2020-02-10 13:11:30 -08:00
Brian Behlendorf	795699a6cc	Linux 5.6 compat: timestamp_truncate() The timestamp_truncate() function was added, it replaces the existing timespec64_trunc() function. This change renames our wrapper function to be consistent with the upstream name and updates the compatibility code for older kernels accordingly. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9956 Closes #9961	2020-02-07 11:04:32 -08:00
Brian Behlendorf	0dd7364853	Linux 5.6 compat: struct proc_ops The proc_ops structure was introduced to replace the use of of the file_operations structure when registering proc handlers. This change creates a new kstat_proc_op_t typedef for compatibility which can be used to pass around the correct structure. This change additionally adds the 'const' keyword to all of the existing proc operations structures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9961	2020-02-07 11:03:53 -08:00
Romain Dolbeau	77122f9d68	Replace static per-cpu with dynamic per-cpu data This solves the issue of loading the spl module on RISC-V. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Romain Dolbeau <romain.dolbeau@european-processor-initiative.eu> Closes #9942	2020-02-06 09:26:13 -08:00
Alexander Motin	741db5a346	Prepare ks_data before calling kstat_install() It violated sequence described in kstat.h, and at least on FreeBSD kstat_install() uses provided names to create the sysctls. If the names are not available at the time, it ends up bad. Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #9933	2020-02-04 08:49:12 -08:00
Matthew Ahrens	ec21397127	async zvol minor node creation interferes with receive When we finish a zfs receive, dmu_recv_end_sync() calls zvol_create_minors(async=TRUE). This kicks off some other threads that create the minor device nodes (in /dev/zvol/poolname/...). These async threads call zvol_prefetch_minors_impl() and zvol_create_minor(), which both call dmu_objset_own(), which puts a "long hold" on the dataset. Since the zvol minor node creation is asynchronous, this can happen after the `ZFS_IOC_RECV[_NEW]` ioctl and `zfs receive` process have completed. After the first receive ioctl has completed, userland may attempt to do another receive into the same dataset (e.g. the next incremental stream). This second receive and the asynchronous minor node creation can interfere with one another in several different ways, because they both require exclusive access to the dataset: 1. When the second receive is finishing up, dmu_recv_end_check() does dsl_dataset_handoff_check(), which can fail with EBUSY if the async minor node creation already has a "long hold" on this dataset. This causes the 2nd receive to fail. 2. The async udev rule can fail if zvol_id and/or systemd-udevd try to open the device while the the second receive's async attempt at minor node creation owns the dataset (via zvol_prefetch_minors_impl). This causes the minor node (/dev/zd*) to exist, but the udev-generated /dev/zvol/... to not exist. 3. The async minor node creation can silently fail with EBUSY if the first receive's zvol_create_minor() trys to own the dataset while the second receive's zvol_prefetch_minors_impl already owns the dataset. To address these problems, this change synchronously creates the minor node. To avoid the lock ordering problems that the asynchrony was introduced to fix (see #3681), we create the minor nodes from open context, with no locks held, rather than from syncing contex as was originally done. Implementation notes: We generally do not need to traverse children or prefetch anything (e.g. when running the recv, snapshot, create, or clone subcommands of zfs). We only need recursion when importing/opening a pool and when loading encryption keys. The existing recursive, asynchronous, prefetching code is preserved for use in these cases. Channel programs may need to create zvol minor nodes, when creating a snapshot of a zvol with the snapdev property set. We figure out what snapshots are created when running the LUA program in syncing context. In this case we need to remember what snapshots were created, and then try to create their minor nodes from open context, after the LUA code has completed. There are additional zvol use cases that asynchronously own the dataset, which can cause similar problems. E.g. changing the volmode or snapdev properties. These are less problematic because they are not recursive and don't touch datasets that are not involved in the operation, there is still potential for interference with subsequent operations. In the future, these cases should be similarly converted to create the zvol minor node synchronously from open context. The async tasks of removing and renaming minors do not own the objset, so they do not have this problem. However, it may make sense to also convert these operations to happen synchronously from open context, in the future. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-65948 Closes #7863 Closes #9885	2020-02-03 09:33:14 -08:00
Matthew Macy	d3c1e45b7a	Re-consolidate zio_delay_interrupt With recent SPL changes there is no longer any need for a per platform version. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9860	2020-01-21 15:04:13 -08:00
Brian Behlendorf	70835c5b75	Unify target_cpu handling Over the years several slightly different approaches were used in the Makefiles to determine the target architecture. This change updates both the build system and Makefile to handle this in a consistent fashion. TARGET_CPU is set to i386, x86_64, powerpc, aarch6 or sparc64 and made available in the Makefiles to be used as appropriate. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9848	2020-01-17 12:40:09 -08:00
loli10K	7e2da7786e	KMC_KVMEM disrupts kv_alloc() memory alignment expectations On kernels with KASAN enabled the following failure can be observed as soon as the zfs module is loaded: VERIFY(IS_P2ALIGNED(ptr, PAGE_SIZE)) failed PANIC at spl-kmem-cache.c:228:kv_alloc() The problem is kmalloc() has never guaranteed aligned allocations; this requirement resulted in zfsonlinux/spl@8b45dda which removed all kmalloc() usage in kv_alloc(). Until a GFP_ALIGNED flag (or equivalent functionality) is provided by the kernel this commit partially reverts `66955885` and `6d948c35` to prevent k(v)malloc() allocations in kv_alloc(). Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Michael Niewöhner <foss@mniewoehner.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9813	2020-01-14 09:09:59 -08:00
Brian Behlendorf	e458fcca75	Change http://zfsonlinux.org links to https://zfsonlinux.org Update the project website links contained in to repository to reference the secure https://zfsonlinux.org address. Reviewed-By: Richard Laager <rlaager@wiktel.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Garrett Fields <ghfields@gmail.com> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9837	2020-01-13 16:43:59 -08:00
Brian Behlendorf	bc9cef11fd	Fix QAT allocation failure return value When qat_compress() fails to allocate the required contiguous memory it mistakenly returns success. This prevents the fallback software compression from taking over and (un)compressing the block. Resolve the issue by correctly setting the local 'status' variable on all exit paths. Furthermore, initialize it to CPA_STATUS_FAIL to ensure qat_compress() always fails safe to guard against any similar bugs in the future. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9784 Closes #9788	2020-01-06 11:17:53 -08:00
Ubuntu	abfdb83607	cppcheck: (error) Shifting signed 64-bit value by 63 bits As of cppcheck 1.82 surpress the warning regarding shifting too many bits for __divdi3() implemention. The algorithm used here is correct. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9732	2019-12-18 17:24:42 -08:00
Ubuntu	7cf1fe6331	cppcheck: (error) Uninitialized variable As of cppcheck 1.82 warnings are issued when using the list_for_each_* functions with an uninitialized variable. Functionally, this is fine but to resolve the warning initialize these variables. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9732	2019-12-18 17:24:29 -08:00
Tomohiro Kusumi	ddb4e69db5	Don't fail to apply umask for O_TMPFILE files Apply umask to `mode` which will eventually be applied to inode. This is needed since VFS doesn't apply umask for O_TMPFILE files. (Note that zpl_init_acl() applies `ip->i_mode &= ~current_umask();` only when POSIX ACL is used.) Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@gmail.com> Closes #8997 Closes #8998	2019-12-13 15:02:23 -08:00
Matthew Macy	13a9a6f5e8	Make zfs_replay.c work on FreeBSD FreeBSD's vfs currently doesn't permit file systems to do their own locking. To avoid having to have duplicate zfs functions with and without locking add locking here. With luck these changes can be removed in the future. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9715	2019-12-13 07:54:10 -08:00
Ryan Moeller	957c7aa23c	Relocate common quota functions to shared code The quota functions are common to all implementations and can be moved to common code. As a simplification they were moved to the Linux platform code in the initial refactoring. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan Moeller <ryan@ixsystems.com> Closes #9710	2019-12-11 12:12:08 -08:00
Matthew Macy	657ce25357	Eliminate Linux specific inode usage from common code Change many of the znops routines to take a znode rather than an inode so that zfs_replay code can be largely shared and in the future the much of the znops code may be shared. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9708	2019-12-11 11:53:57 -08:00
Matthew Macy	362ae8d11f	Abstract away platform specific superblock references The zfsvfs->z_sb field is Linux specified and should be abstracted. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9697	2019-12-10 09:21:07 -08:00
Brian Behlendorf	a25861dcae	ZTS: Fix zpool_reopen_001_pos Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9680	2019-12-09 11:09:14 -08:00
Matthew Macy	e64e84eca5	Refactor deadman set failmode to be cross platform Update zfs_deadman_failmode to use the ZFS_MODULE_PARAM_CALL wrapper, and split the common and platform specific portions. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9670	2019-12-05 12:40:45 -08:00
Matthew Macy	2a8ba608d3	Replace ASSERTV macro with compiler annotation Remove the ASSERTV macro and handle suppressing unused compiler warnings for variables only in ASSERTs using the __attribute__((unused)) compiler annotation. The annotation is understood by both gcc and clang. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9671	2019-12-05 12:37:00 -08:00
Matthew Macy	5142032106	Move zfs_cmd_t copyin/copyout to platform code FreeBSD needs to cope with multiple version of the zfs_cmd_t structure. Allowing the platform code to pre and post process the cmd structure makes it possible to work with legacy tooling. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9624	2019-12-02 10:08:27 -08:00
Matthew Macy	f348c78f97	Mark Linux fallocate extensions as specific to Linux fallocate(2) is a Linux-specific system call which in unavailable on other platforms. Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9633	2019-11-30 15:40:22 -08:00
Matthew Macy	a5b762ab1d	Resolve ZoF differences in zfs_ioctl.h FreeBSD needs to be able to pass the jail id to the jail/unjail ioctls and the struct file in the device structure is unused. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9625	2019-11-30 15:35:54 -08:00
Brian Behlendorf	9e17e6f254	Remove zfs_vdev_elevator module option As described in commit `f81d5ef6` the zfs_vdev_elevator module option is being removed. Users who require this functionality should update their systems to set the disk scheduler using a udev rule. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #8664 Closes #9417 Closes #9609	2019-11-27 10:35:49 -08:00
Mauricio Faria de Oliveira	0c46813805	Check for unlinked znodes after igrab() The changes in commit `41e1aa2a` / PR #9583 introduced a regression on tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started to fail with errno ENODATA: openat(AT_FDCWD, "/test", O_RDWR\|O_TMPFILE, 0666) = 3 <...> fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA The originally proposed change on PR #9583 is not susceptible to it, so just move the code/if-checks around back in that way, to fix it. Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Closes #9602	2019-11-21 12:24:03 -08:00
Matthew Macy	da92d5cbb3	Add zfs_file_* interface, remove vnodes Provide a common zfs_file_* interface which can be implemented on all platforms to perform normal file access from either the kernel module or the libzpool library. This allows all non-portable vnode_t usage in the common code to be replaced by the new portable zfs_file_t. The associated vnode and kobj compatibility functions, types, and macros have been removed from the SPL. Moving forward, vnodes should only be used in platform specific code when provided by the native operating system. Reviewed-by: Sean Eric Fagan <sef@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Matt Macy <mmacy@FreeBSD.org> Closes #9556	2019-11-21 09:32:57 -08:00
Brian Behlendorf	7ae3f8dc8f	Partially revert `5a6ac4c` Reinstate the zpl_revalidate() functionality to resolve a regression where dentries for open files during a rollback are not invalidated. The unrelated functionality for automatically unmounting .zfs/snapshots was not reverted. Nor was the addition of shrink_dcache_sb() to the zfs_resume_fs() function. This issue was not immediately caught by the CI because the test case intended to catch it was included in the list of ZTS tests which may occasionally fail for unrelated reasons. Remove all of the rollback tests from this list to help identify the frequency of any spurious failures. The rollback_003_pos.ksh test case exposes a real issue with the long standing code which needs to be investigated. Regardless, it has been enable with a small workaround in the test case itself. Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9587 Closes #9592	2019-11-18 13:05:56 -08:00
Heitor Alves de Siqueira	41e1aa2a06	Break out of zfs_zget early if unlinked znode If zp->z_unlinked is set, we're working with a znode that has been marked for deletion. If that's the case, we can skip the "goto again" loop and return ENOENT, as the znode should not be discovered. Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com> Closes #9583	2019-11-15 09:56:05 -08:00
loli10K	7ba964cc3f	Prevent NULL pointer dereference in blkg_tryget() on EL8 kernels blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9546 Closes #9577	2019-11-13 10:19:06 -08:00
Michael Niewöhner	8aaa10a9a0	Check for __GFP_RECLAIM instead of GFP_KERNEL Check for __GFP_RECLAIM instead of GFP_KERNEL because zfs modifies IO and FS flags which breaks the check for GFP_KERNEL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Reviewed-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:23 -08:00
Michael Niewöhner	6d948c3519	Add kmem_cache flag for forcing kvmalloc This adds a new KMC_KVMEM flag was added to enforce use of the kvmalloc allocator in kmem_cache_create even for large blocks, which may also increase performance in some specific cases (e.g. zstd), too. Default to KVMEM instead of VMEM in spl_kmem_cache_create. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:23 -08:00
Michael Niewöhner	66955885e2	Make use of kvmalloc if available and fix vmem_alloc implementation This patch implements use of kvmalloc for GFP_KERNEL allocations, which may increase performance if the allocator is able to allocate physical memory, if kvmalloc is available as a public kernel interface (since v4.12). Otherwise it will simply fall back to virtual memory (vmalloc). Also fix vmem_alloc implementation which can lead to slow allocations since the first attempt with kmalloc does not make use of the noretry flag but tells the linux kernel to retry several times before it fails. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 10:05:10 -08:00
Michael Niewöhner	c025008df5	Add missing documentation for some KMC flags Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matt Ahrens <matt@delphix.com> Signed-off-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Signed-off-by: Michael Niewöhner <foss@mniewoehner.de> Closes #9034	2019-11-13 09:34:51 -08:00

1 2 3 4

185 Commits