mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-27 04:32:16 +03:00

Author	SHA1	Message	Date
Tim Schumacher	b884768e46	Prefix all refcount functions with zfs_ Recent changes in the Linux kernel made it necessary to prefix the refcount_add() function with zfs_ due to a name collision. To bring the other functions in line with that and to avoid future collisions, prefix the other refcount functions as well. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Schumacher <timschumi@gmx.de> Closes #7963	2018-11-08 14:38:28 -08:00
Tim Schumacher	f8f4e13776	Linux 4.19-rc3+ compat: Remove refcount_t compat torvalds/linux@59b57717f ("blkcg: delay blkg destruction until after writeback has finished") added a refcount_t to the blkcg structure. Due to the refcount_t compatibility code, zfs_refcount_t was used by mistake. Resolve this by removing the compatibility code and replacing the occurrences of refcount_t with zfs_refcount_t. Reviewed-by: Franz Pletz <fpletz@fnordicwalking.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Schumacher <timschumi@gmx.de> Closes #7885 Closes #7932	2018-11-08 14:38:28 -08:00
Gregor Kopka	5f07d51751	Zpool iostat: remove latency/queue scaling Bandwidth and iops are average per second while _wait are averages per request for latency or, for queue depths, an instantaneous measurement at the end of an interval (according to man zpool). When calculating the first two it makes sense to do x/interval_duration (x being the increase in total bytes or number of requests over the duration of the interval, interval_duration in seconds) to 'scale' from amount/interval_duration to amount/second. But applying the same math for the latter (_wait latencies/queue) is wrong as there is no interval_duration component in the values (these are time/requests to get to average_time/request or already an absulute number). This bug leads to the only correct continuous *_wait figures for both latencies and queue depths from 'zpool iostat -l/q' being with duration=1 as then the wrong math cancels itself (x/1 is a nop). This removes temporal scaling from latency and queue depth figures. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gregor Kopka <gregor@kopka.net> Closes #7945 Closes #7694	2018-11-08 14:38:28 -08:00
Brian Behlendorf	b2f003c4f4	Fix statfs(2) for 32-bit user space When handling a 32-bit statfs() system call the returned fields, although 64-bit in the kernel, must be limited to 32-bits or an EOVERFLOW error will be returned. This is less of an issue for block counts since the default reported block size in 128KiB. But since it is possible to set a smaller block size, these values will be scaled as needed to fit in a 32-bit unsigned long. Unlike most other filesystems the total possible file counts are more likely to overflow because they are calculated based on the available free space in the pool. In order to prevent this the reported value must be capped at 2^32-1. This is only for statfs(2) reporting, there are no changes to the internal ZFS limits. Reviewed-by: Andreas Dilger <andreas.dilger@whamcloud.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7927 Closes #7122 Closes #7937	2018-11-08 14:38:28 -08:00
Olaf Faaland	9014da2b01	Skip import activity test in more zdb code paths Since zdb opens the pools read-only, it cannot damage the pool in the event the pool is already imported either on the same host or on another one. If the pool vdev structure is changing while zdb is importing the pool, it may cause zdb to crash. However this is unlikely, and in any case it's a user space process and can simply be run again. For this reason, zdb should disable the multihost activity test on import that is normally run. This commit fixes a few zdb code paths where that had been overlooked. It also adds tests to ensure that several common use cases handle this properly in the future. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gu Zheng <guzheng2331314@163.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7797 Closes #7801	2018-11-08 14:38:28 -08:00
Matthew Ahrens	45579c9515	Reduce taskq and context-switch cost of zio pipe When doing a read from disk, ZFS creates 3 ZIO's: a zio_null(), the logical zio_read(), and then a physical zio. Currently, each of these results in a separate taskq_dispatch(zio_execute). On high-read-iops workloads, this causes a significant performance impact. By processing all 3 ZIO's in a single taskq entry, we reduce the overhead on taskq locking and context switching. We accomplish this by allowing zio_done() to return a "next zio to execute" to zio_execute(). This results in a ~12% performance increase for random reads, from 96,000 iops to 108,000 iops (with recordsize=8k, on SSD's). Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Wilson <george.wilson@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> External-issue: DLPX-59292 Closes #7736	2018-11-08 14:38:28 -08:00
Tom Caputi	b32f1279d4	Fix race in dnode_check_slots_free() Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7147 Closes #7388	2018-11-08 14:38:28 -08:00
Tony Hutter	1b0cd07131	Tag zfs-0.7.11 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2018-09-13 10:13:41 -07:00
Dr. András Korn	8c6867dae4	tx_waited -> tx_dirty_delayed in trace_dmu.h This change was missed in `0735ecb334`. Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: András Korn <korn-github.com@elan.rulez.org> Closes #7096	2018-09-13 10:12:22 -07:00
Tony Hutter	99310c0aa0	Revert "zpool reopen should detect expanded devices" This reverts commit `2a16d4cfaf`. The commit was causing a "attempt to access beyond the end of device" error: list.zfsonlinux.org/pipermail/zfs-discuss/2018-September/032217.html	2018-09-13 10:11:42 -07:00
Tony Hutter	d126980e5f	Tag zfs-0.7.10 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2018-09-05 10:37:32 -07:00
Chris Siebenmann	88ef5b238b	Correctly handle errors from kern_path As a regular kernel function, kern_path() returns errors as negative errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into the positive errnos used throughout the ZFS code when it returns them to other ZFS functions so that the ZFS code properly sees them as errors. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu> Closes #7764 Closes #7864	2018-07-06 02:46:51 -07:00
Georgy Yakovlev	30d8b85702	Fix build with CONFIG_GCC_PLUGIN_RANDSTRUCT fs/zfs/zfs/metaslab.c:1055:2: error: positional initialization of field in ‘struct’ declared with ‘designated_init’ attribute [-Werror=designated-init] metaslab_rt_remove, Signed-off-by: Georgy Yakovlev <ya@sysdump.net> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes: #7069	2018-07-06 02:46:51 -07:00
Tom Caputi	45f0437912	Fix 'zfs recv' of non large_dnode send streams Currently, there is a bug where older send streams without the DMU_BACKUP_FEATURE_LARGE_DNODE flag are not handled correctly. The code in receive_object() fails to handle cases where drro->drr_dn_slots is set to 0, which is always the case when the sending code does not support this feature flag. This patch fixes the issue by ensuring that that a value of 0 is treated as DNODE_MIN_SLOTS. Tested-by: DHE <git@dehacked.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7617 Closes #7662	2018-07-06 02:46:51 -07:00
Tom Caputi	dc3eea871a	Fix object reclaim when using large dnodes Currently, when the receive_object() code wants to reclaim an object, it always assumes that the dnode is the legacy 512 bytes, even when the incoming bonus buffer exceeds this length. This causes a buffer overflow if --enable-debug is not provided and triggers an ASSERT if it is. This patch resolves this issue and adds an ASSERT to ensure this can't happen again. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7097 Closes #7433	2018-07-06 02:46:51 -07:00
Tim Chase	d2c8103a68	Fix problems receiving reallocated dnodes This is a port of `047116ac` - Raw sends must be able to decrease nlevels, to the zfs-0.7-stable branch. It includes the various fixes to the problem of receiving incremental streams which include reallocated dnodes in which the number of dnode slots has changed but excludes the parts which are related to raw streams. From `047116ac`: Currently, when a raw zfs send file includes a DRR_OBJECT record that would decrease the number of levels of an existing object, the object is reallocated with dmu_object_reclaim() which creates the new dnode using the old object's nlevels. For non-raw sends this doesn't really matter, but raw sends require that nlevels on the receive side match that of the send side so that the checksum-of-MAC tree can be properly maintained. This patch corrects the issue by freeing the object completely before allocating it again in this case. This patch also corrects several issues with dnode_hold_impl() and related functions that prevented dnodes (particularly multi-slot dnodes) from being reallocated properly due to the fact that existing dnodes were not being fully cleaned up when they were freed. This patch adds a test to make sure that zfs recv functions properly with incremental streams containing dnodes of different sizes. This also includes a one-liner fix from loli10K to fix a test failure: https://github.com/zfsonlinux/zfs/pull/7792#discussion_r212769264 Authored-by: Tom Caputi <tcaputi@datto.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Ported-by: Tim Chase <tim@chase2k.com> Closes #6821 Closes #6864 NOTE: This is the first of the port of 3 related patches patches to the zfs-0.7-release branch of ZoL. The other two patches should immediately follow this one.	2018-07-06 02:46:51 -07:00
Joao Carlos Mendes Luis	3ea1f7f193	Fedora 28: Fix misc bounds check compiler warnings Fix a bunch of truncation compiler warnings that show up on Fedora 28 (GCC 8.0.1). Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #7368 Closes #7826 Closes #7830	2018-07-06 02:46:51 -07:00
LOLi	4356dd23a9	Fix libaio-devel requirement for Debian-based distributions BuildRequires tags for "-devel" packages in the RPM spec file do not work when building on Debian-based distributions. Fix this issue by making this requirement conditional to RPM-based distributions. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7829 Closes #7831	2018-07-06 02:46:51 -07:00
Brian Behlendorf	75318ec497	Add libaio-devel BuildRequires The zfs-test package needs a build requirement on the libaio-devel package. Without it ./configure will correctly determine that mmap_libaio cannot be built and it will be skipped. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7821 Closes #7824	2018-07-06 02:46:51 -07:00
Brian Behlendorf	c1629734ab	Add missing zfs-dracut RPM dependencies The zfs-dracut package requires the hostid, basename, head, awk, and grep utilities be installed. The first three are provided by coreutils but additional dependencies are required for awk and grep. Reviewed-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7729 Closes #7747	2018-07-06 02:46:51 -07:00
DeHackEd	778290d5bc	Don't modify argv[] in user tools argv[] gets modified during string parsing for input arguments. This is reflected in the live process listing. Don't do that. Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net> Closes #7760	2018-07-06 02:46:51 -07:00
LOLi	98bc8e0b23	Fix arcstat.py handling of unsupported options This change allows the arcstat.py script to handle unsupported options gracefully and print both error and usage messages when one such option is provided. Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7799	2018-07-06 02:46:51 -07:00
LOLi	caafa436eb	Allow inherited properties in zfs_check_settable() This change modifies how 'checksum' and 'dedup' properties are verified in zfs_check_settable() handling the case where they are explicitly inherited in the dataset hierarchy when receiving a recursive send stream. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7755 Closes #7576 Closes #7757	2018-07-06 02:46:51 -07:00
LOLi	fe8de1c8a6	Fix zfs incremental send remove '-o' properties When receiving an incremental send stream with intermediary snapshots zfs_receive_one() does not correctly identify the top-level dataset: consequently we restore said snapshots as if they were children datasets in the hierarchy, forcing inheritance of any property received with 'zfs send -o' and effectively removing any locally set value. The test case did not correctly verify this situation because it uses adjacent snapshots, basically testing 'zfs send -i' instead of 'zfs send -I': this commit adds an additional intermediary snapshot to the test script. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7478	2018-07-06 02:46:51 -07:00
Toomas Soome	1bd93ea1e0	OpenZFS 8906 - uts: illumos rootfs should support salted cksum Porting notes: * As of grub-2.02 these checksums are not supported. However, as pointed out in #6501 there are alternatives such as EFISTUB which work and have no such restriction. A warning was added to the checksum property section of the zfs.8 man page. Authored by: Toomas Soome <tsoome@me.com> Reviewed by: C Fraire <cfraire@me.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <yuripv@yuripv.net> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://illumos.org/issues/8906 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7dec52f Closes #6501 Closes #7714	2018-07-06 02:46:51 -07:00
Brian Behlendorf	6857950e46	Fix zpl_mount() deadlock Commit `93b43af10` inadvertently introduced the following scenario which can result in a deadlock. This issue was most easily reproduced by LXD containers using a ZFS storage backend but should be reproducible under any workload which is frequently mounting and unmounting. -- THREAD A -- spa_sync() spa_sync_upgrades() rrw_enter(&dp->dp_config_rwlock, RW_WRITER, FTAG); <- Waiting on B -- THREAD B -- mount_fs() zpl_mount() zpl_mount_impl() dmu_objset_hold() dmu_objset_hold_flags() dsl_pool_hold() dsl_pool_config_enter() rrw_enter(&dp->dp_config_rwlock, RW_READER, tag); sget() sget_userns() grab_super() down_write(&s->s_umount); <- Waiting on C -- THREAD C -- cleanup_mnt() deactivate_super() down_write(&s->s_umount); deactivate_locked_super() zpl_kill_sb() kill_anon_super() generic_shutdown_super() sync_filesystem() zpl_sync_fs() zfs_sync() zil_commit() txg_wait_synced() <- Waiting on A Reviewed by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7598 Closes #7659 Closes #7691 Closes #7693	2018-07-06 02:46:51 -07:00
Brian Behlendorf	716ce2b89e	Fix kernel unaligned access on sparc64 Update the SA_COPY_DATA macro to check if architecture supports efficient unaligned memory accesses at compile time. Otherwise fallback to using the sa_copy_data() function. The kernel provided CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is used to determine availability in kernel space. In user space the x86_64, x86, powerpc, and sometimes arm architectures will define the HAVE_EFFICIENT_UNALIGNED_ACCESS macro. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7642 Closes #7684	2018-07-06 02:46:51 -07:00
Troels Nørgaard	9daae583d8	Default ashift for Amazon EC2 NVMe devices Add a default 4 KiB ashift for Amazon EC2 NVMe devices on instances with NVMe ephemeral devices, such as the types c5d, f1, i3 and m5d. As per the official documentation [1] a 4096 byte blocksize should be used to match the underlying hardware. The string was identified via: $ sudo sginfo -M /dev/nvme0n1 INQUIRY response (cmd: 0x12) ---------------------------- Device Type 0 Vendor: NVMe Product: Amazon EC2 NVMe Revision level: $ lsblk -io KNAME,TYPE,SIZE,MODEL KNAME TYPE SIZE MODEL nvme0n1 disk 442.4G Amazon EC2 NVMe Instance Storage [1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ storage-optimized-instances.html Retrived 2018-07-03 Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <guss80@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Troels Nørgaard <tnn@tradeshift.com> Closes #7676	2018-07-06 02:46:51 -07:00
Brian Behlendorf	b5ee3df776	Linux 4.14 compat: blk_queue_stackable() The blk_queue_stackable() function was replaced in the 4.14 kernel by queue_is_rq_based(), commit torvalds/linux@5fdee212. This change resulted in the default elevator being used which can negatively impact performance. Rather than adding additional compatibility code to detect the new interface unconditionally attempt to set the elevator. Since we expect this to fail for block devices without an elevator the error message has been moved in to zfs_dbgmsg(). Finally, it was observed that the elevator_change() was removed from the 4.12 kernel, commit torvalds/linux@c033269. Update the comment to clearly specify which are expected to export the elevator_change() symbol. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7645	2018-07-06 02:46:51 -07:00
Tony Hutter	17cd9a8e0c	Add pool state /proc entry, "SUSPENDED" pools 1. Add a proc entry to display the pool's state: $ cat /proc/spl/kstat/zfs/tank/state ONLINE This is done without using the spa config locks, so it will never hang. 2. Fix 'zpool status' and 'zpool list -o health' output to print "SUSPENDED" instead of "ONLINE" for suspended pools. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Richard Elling <Richard.Elling@RichardElling.com> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7331 Closes #7563	2018-07-06 02:46:51 -07:00
Sara Hartse	2a16d4cfaf	zpool reopen should detect expanded devices Update bdev_capacity to have wholedisk vdevs query the size of the underlying block device (correcting for the size of the efi parition and partition alignment) and therefore detect expanded space. Correct vdev_get_stats_ex so that the expandsize is aligned to metaslab size and new space is only reported if it is large enough for a new metaslab. Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Signed-off-by: sara hartse <sara.hartse@delphix.com> External-issue: LX-165 Closes #7546 Issue #7582	2018-07-06 02:46:51 -07:00
Antonio Russo	3350a33908	Support Debian DKMS builds scripts/dkms.mkconf calls configure with `--with-linux=${kernel_source_dir}`, but Debian puts it kernel source at `/lib/modules/<version>/source`. This patch adds the same logic to the DKMS file produced by `scripts/dkms.mkconf` that Debian has shipped in its official ZFS packaging: at DKMS build time, it checks if the system is a Debian system, and adjusts the path accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7358 Closes #7540 Closes #7554	2018-07-06 02:46:51 -07:00
Olaf Faaland	3eef58c9b6	module param callbacks check for initialized spa Callbacks provided for module parameters are executed both after the module is loaded, when a user alters it via sysfs, e.g echo bar > /sys/modules/zfs/parameters/foo as well as when the module is loaded with an argument, e.g. modprobe zfs foo=bar In the latter case, the init functions likely have not run yet, including spa_init() which initializes the namespace lock so it is safe to use. Instead of immediately taking the namespace lock and attemping to iterate over initialized spa structures, check whether spa_mode_global is nonzero. This is set by spa_init() after it has initialized the namespace lock. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7496 Closes #7521	2018-07-06 02:46:51 -07:00
Brian Behlendorf	4805781c74	Trim new line from zfs_vdev_scheduler Add a helper function to trim the tailing new line. While we're here use this new hook to immediately apply the new scheduler. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3356 Closes #6573	2018-07-06 02:46:51 -07:00
Chunwei Chen	b06f40ea9b	Fix ENOSPC in "Handle zap_add() failures in ..." Commit `cc63068` caused ENOSPC error when copy a large amount of files between two directories. The reason is that the patch limits zap leaf expansion to 2 retries, and return ENOSPC when failed. The intent for limiting retries is to prevent pointlessly growing table to max size when adding a block full of entries with same name in different case in mixed mode. However, it turns out we cannot use any limit on the retry. When we copy files from one directory in readdir order, we are copying in hash order, one leaf block at a time. Which means that if the leaf block in source directory has expanded 6 times, and you copy those entries in that block, by the time you need to expand the leaf in destination directory, you need to expand it 6 times in one go. So any limit on the retry will result in error where it shouldn't. Note that while we do use different salt for different directories, it seems that the salt/hash function doesn't provide enough randomization to the hash distance to prevent this from happening. Since `cc63068` has already been reverted. This patch adds it back and removes the retry limit. Also, as it turn out, failing on zap_add() has a serious side effect for mzap_upgrade(). When upgrading from micro zap to fat zap, it will call zap_add() to transfer entries one at a time. If it hit any error halfway through, the remaining entries will be lost, causing those files to become orphan. This patch add a VERIFY to catch it. Reviewed-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com> Reviewed-by: Richard Yao <ryao@gentoo.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7401 Closes #7421	2018-07-06 02:46:51 -07:00
Olaf Faaland	6b5cc49d81	Fix divide-by-zero in mmp_delay_update() vdev_count_leaves() in the denominator may return 0, caught by Coverity. Introduced by * `533ea04` Update mmp_delay on sync or skipped, failed write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7391	2018-07-06 02:46:51 -07:00
Prakash Surya	ef7a79488a	OpenZFS 8997 - ztest assertion failure in zil_lwb_write_issue PROBLEM ======= When `dmu_tx_assign` is called from `zil_lwb_write_issue`, it's possible for either `ERESTART` or `EIO` to be returned. If `ERESTART` is returned, this will cause an assertion to fail directly in `zil_lwb_write_issue`, where the code assumes the return value is `EIO` if `dmu_tx_assign` returns a non-zero value. This can occur if the SPA is suspended when `dmu_tx_assign` is called, and most often occurs when running `zloop`. If `EIO` is returned, this can cause assertions to fail elsewhere in the ZIL code. For example, `zil_commit_waiter_timeout` contains the following logic: lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb); ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED); In this case, if `dmu_tx_assign` returned `EIO` from within `zil_lwb_write_issue`, the `lwb` variable passed in will not be issued to disk. Thus, it's `lwb_state` field will remain `LWB_STATE_OPENED` and this assertion will fail. `zil_commit_waiter_timeout` assumes that after it calls `zil_lwb_write_issue`, the `lwb` will be issued to disk, and doesn't handle the case where this is not true; i.e. it doesn't handle the case where `dmu_tx_assign` returns `EIO`. SOLUTION ======== This change modifies the `dmu_tx_assign` function such that `txg_how` is a bitmask, rather than of the `txg_how_t` enum type. Now, the previous `TXG_WAITED` semantics can be used via `TXG_NOTHROTTLE`, along with specifying either `TXG_NOWAIT` or `TXG_WAIT` semantics. Previously, when `TXG_WAITED` was specified, `TXG_NOWAIT` semantics was automatically invoked. This was not ideal when using `TXG_WAITED` within `zil_lwb_write_issued`, leading the problem described above. Rather, we want to achieve the semantics of `TXG_WAIT`, while also preventing the `tx` from being penalized via the dirty delay throttling. With this change, `zil_lwb_write_issued` can acheive the semtantics that it requires by passing in the value `TXG_WAIT \| TXG_NOTHROTTLE` to `dmu_tx_assign`. Further, consumers of `dmu_tx_assign` wishing to achieve the old `TXG_WAITED` semantics can pass in the value `TXG_NOWAIT \| TXG_NOTHROTTLE`. Authored by: Prakash Surya <prakash.surya@delphix.com> Approved by: Robert Mustacchi <rm@joyent.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Andriy Gapon <avg@FreeBSD.org> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Porting Notes: - Additionally updated `zfs_tmpfile` to use `TXG_NOTHROTTLE` OpenZFS-issue: https://www.illumos.org/issues/8997 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/19ea6cb0f9 Closes #7084	2018-07-06 02:46:51 -07:00
Brian Behlendorf	a2f759146d	Linux compat 4.18: check_disk_size_change() Added support for the bops->check_events() interface which was added in the 2.6.38 kernel to replace bops->media_changed(). Fully implementing this functionality allows the volume resize code to rely on revalidate_disk(), which is the preferred mechanism, and removes the need to use check_disk_size_change(). In order for bops->check_events() to lookup the zvol_state_t stored in the disk->private_data the zvol_state_lock needs to be held. Since the check events interface may poll the mutex has been converted to a rwlock for better concurrently. The rwlock need only be taken as a writer in the zvol_free() path when disk->private_data is set to NULL. The configure checks for the block_device_operations structure were consolidated in a single kernel-block-device-operations.m4 file. The ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS configure checks and assoicated dead code was removed. This interface was added to the 2.6.28 kernel which predates the oldest supported 2.6.32 kernel and will therefore always be available. Updated maximum Linux version in META file. The 4.17 kernel was released on 2018-06-03 and ZoL is compatible with the finalized kernel. Reviewed-by: Boris Protopopov <boris.protopopov@actifio.com> Reviewed-by: Sara Hartse <sara.hartse@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7611	2018-07-06 02:46:51 -07:00
Brian Behlendorf	f79c0de208	Linux 4.18 compat: inode timespec -> timespec64 Commit torvalds/linux@95582b0 changes the inode i_atime, i_mtime, and i_ctime members form timespec's to timespec64's to make them 2038 safe. As part of this change the current_time() function was also updated to return the timespec64 type. Resolve this issue by introducing a new inode_timespec_t type which is defined to match the timespec type used by the inode. It should be used when working with inode timestamps to ensure matching types. The timestruc_t type under Illumos was used in a similar fashion but was specified to always be a timespec_t. Rather than incorrectly define this type all timespec_t types have been replaced by the new inode_timespec_t type. Finally, the kernel and user space 'sys/time.h' headers were aligned with each other. They define as appropriate for the context several constants as macros and include static inline implementation of gethrestime(), gethrestime_sec(), and gethrtime(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7643 Backported-by: Richard Yao <ryao@gentoo.org>	2018-07-06 02:46:51 -07:00
Boris Protopopov	1667816089	zv_suspend_lock in zvol_open()/zvol_release() Acquire zv_suspend_lock on first open and last close only. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Closes #6342	2018-07-06 02:46:51 -07:00
Tony Hutter	d1ed1be3cd	Tag zfs-0.7.9 META file and changelog updated. Signed-off-by: Tony Hutter <hutter2@llnl.gov>	2018-05-08 13:33:38 -07:00
Tony Hutter	e749242a99	Remove DEBUG_STACKFLAGS to bypass compiler error 'Support -fsanitize=address with --enable-asan' (`fed9035`) removed DEBUG_STACKFLAGS="-fstack-check" from zfs-build.m4 in master. However, that's too heavyweight a patch to merge in to the 0.7.x branch, so just take the one-liner we need to get around a compiler error on Fedora 28: $ ./configure --enable-debug --enable-debuginfo && make pkg-utils CC gethrtime.lo cc1: error: '-fstack-check=' and '-fstack-clash_protection' are mutually exclusive. Disabling '-fstack-check=' [-Werror] Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Requires-spl: #701	2018-05-07 17:19:58 -07:00
Tony Hutter	9267ef84fd	Fedora 28: Add BuildRequires: libtirpc-devel Add "BuildRequires: libtirpc-devel" to fix mock builds on Fedora 28. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7494 Closes #7495	2018-05-07 17:19:57 -07:00
Brian Behlendorf	0ee129199f	RHEL 7.5 compat: FMODE_KABI_ITERATE As of RHEL 7.5 the mainline fops.iterate() method was added to the file_operations structure and is correctly detected by the configure script. Normally this is what we want, but in order to maintain KABI compatibility the RHEL change additionally does the following: * Requires that callers intending to use this extended interface set the FMODE_KABI_ITERATE flag on the file structure when opening the directory. * Adds the fops.iterate() method to the end of the structure, without removing fops.readdir(). This change updates the configure check to ignore the RHEL 7.5+ variant of fops.iterate() when detected. Instead fallback to the fops.readdir() interface which will be available. Finally, add the 'zpl_' prefix to the directory context wrappers to avoid colliding with the kernel provided symbols when both the fops.iterate() and fops.readdir() are provided by the kernel. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7460 Closes #7463	2018-05-07 17:19:57 -07:00
George Melikov	245be00597	Add back iostat -y or -w descriptions The iostat -y and -w descriptions were left in `cda0317e`, get them back. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Melikov <mail@gmelikov.ru> Closes #7479 Closes #7483	2018-05-07 17:19:57 -07:00
Antonio Russo	c38d702330	Add test with two kinds of file creation orders Data loss was identified in #7401 when many small files were copied. This adds a reproducer for this bug and other similar ones: randomly generate N files. Then, listing M of them by `ls -U` order, produce those same files in a directory of the same name. This triggers the bug consistently, provided N and M are large enough. Here, N=2^16 and M=2^13. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7411	2018-05-07 17:19:57 -07:00
Seth Forshee	3f729907c8	Allow mounting datasets more than once Currently mounting an already mounted zfs dataset results in an error, whereas it is typically allowed with other filesystems. This causes some bad interactions with mount namespaces. Take this sequence for example: - Create a dataset - Create a snapshot of the dataset - Create a clone of the snapshot - Create a new mount namespace - Rename the original dataset The rename results in unmounting and remounting the clone in the original mount namespace, however the remount fails because the dataset is still mounted in the new mount namespace. (Note that this means the mount in the new mount namespace is never being unmounted, so perhaps the unmount/remount of the clone isn't actually necessary.) The problem here is a result of the way mounting is implemented in the kernel module. Since it is not mounting block devices it uses mount_nodev() instead of the usual mount_bdev(). However, mount_nodev() is written for filesystems for which each mount is a new instance (i.e. a new super block), and zfs should be able to detect when a mount request can be satisfied using an existing super block. Change zpl_mount() to call sget() directly with it's own test callback. Passing the objset_t object as the fs data allows checking if a superblock already exists for the dataset, and in that case we just need to return a new reference for the sb's root dentry. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Closes #5796 Closes #7207	2018-05-07 17:19:57 -07:00
beren12	cca220d7c6	Fix zfs_arc_max minimum tuning When setting `zfs_arc_max` its minimum value is allowed to be 64 MiB. There was an off-by-1 error which can matter on tiny systems. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Zubrzycki <github@mid-earth.net> Closes #7417	2018-05-07 17:19:57 -07:00
Brian Behlendorf	4ed30958ce	Linux compat 4.16: blk_queue_flag_{set,clear} The HAVE_BLK_QUEUE_WRITE_CACHE_GPL_ONLY case was overlooked in the original `10f88c5c` commit because blk_queue_write_cache() was available for the in-kernel builds. Update the blk_queue_flag_{set,clear} wrappers to call the locked versions to avoid confusion. This is safe for all existing callers. The blk_queue_set_write_cache() function has been updated to use these wrappers. This means setting/clearing both QUEUE_FLAG_WC and QUEUE_FLAG_FUA is no longer atomic but this only done early in zvol_alloc() prior to any requests so there is no issue. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Kash Pande <kash@tripleback.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7428 Closes #7431	2018-05-07 17:19:57 -07:00
Giuseppe Di Natale	2f118072cb	Linux compat 4.16: blk_queue_flag_{set,clear} queue_flag_{set,clear}_unlocked are now private interfaces in the Linux kernel (https://github.com/torvalds/linux/commit/8a0ac14). Use blk_queue_flag_{set,clear} interfaces which were introduced as of https://github.com/torvalds/linux/commit/8814ce8. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #7410	2018-05-07 17:19:57 -07:00

1 2 3 4 5 ...

3238 Commits