mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-26 12:12:13 +03:00

Author	SHA1	Message	Date
Antonio Russo	55d80e651a	systemd mount generator and tracking ZEDLET zfs-mount-generator implements the "systemd generator" protocol, producing systemd.mount units from the cached outputs of zfs list, during early boot, integrating with systemd. Each pool has an indpendent cache of the command zfs list -H -oname,mountpoint,canmount -tfilesystem -r $pool which is kept synchronized by the ZEDLET history_event-zfs-list-cacher.sh Datasets not in the cache will be loaded later in the boot process by zfs-mount.service, including pools without a cache. Among other things, this allows for complex mount hierarchies. Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com> Closes #7329	2018-04-06 14:11:09 -07:00
Matthew Ahrens	5c27ec1088	Fixes for SNPRINTF_BLKPTR with encrypted BP's mdb doesn't have dmu_ot[], so we need a different mechanism for its SNPRINTF_BLKPTR() to determine if the BP is encrypted vs authenticated. Additionally, since it already relies on BP_IS_ENCRYPTED (etc), SNPRINTF_BLKPTR might as well figure out the "crypt_type" on its own, rather than making the caller do so. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #7390	2018-04-06 13:30:26 -07:00
Olaf Faaland	0ba106e75c	Fix divide-by-zero in mmp_delay_update() vdev_count_leaves() in the denominator may return 0, caught by Coverity. Introduced by * `533ea04` Update mmp_delay on sync or skipped, failed write Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7391	2018-04-06 13:29:11 -07:00
Tom Caputi	1bf9a552bb	Make encrypted "zfs mount -a" failures consistent Currently, "zfs mount -a" will print a warning and fail to mount any encrypted datasets that do not have a key loaded. This patch makes the behavior of this failure consistent with other failure modes ("zfs mount -a" will silently continue, explict "zfs mount" will print a message and return an error code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7382	2018-04-06 13:28:15 -07:00
Olaf Faaland	533ea0415b	Update mmp_delay on sync or skipped, failed write When an MMP write is skipped, or fails, and time since mts->mmp_last_write is already greater than mts->mmp_delay, increase mts->mmp_delay. The original code only updated mts->mmp_delay when a write succeeded, but this results in the write(s) after delays and failed write(s) reporting an ub_mmp_delay which is too low. Update mmp_last_write and mmp_delay if a txg sync was successful. At least one uberblock was written, thus extending the time we can be sure the pool will not be imported by another host. Do not allow mmp_delay to go below (MSEC2NSEC(zfs_multihost_interval) / vdev_count_leaves()) so that a period of frequent successful MMP writes, e.g. due to frequent txg syncs, does not result in an import activity check so short it is not reliable based on mmp thread writes alone. Remove unnecessary local variable, start. We do not use the start time of the loop iteration. Add a debug message in spa_activity_check() to allow verification of the import_delay value and to prove the activity check occurred. Alter the tests that import pools and attempt to detect an activity check. Calculate the expected duration of spa_activity_check() based on module parameters at the time the import is performed, rather than a fixed time set in mmp.cfg. The fixed time may be wrong. Also, use the default zfs_multihost_interval value so the activity check is longer and easier to recognize. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7330	2018-04-04 16:38:44 -07:00
Tony Hutter	21a4f5cc86	Fedora 28: Fix misc bounds check compiler warnings Fix a bunch of (mostly) sprintf/snprintf truncation compiler warnings that show up on Fedora 28 (GCC 8.0.1). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7361 Closes #7368	2018-04-04 10:16:47 -07:00
Brian Behlendorf	581bc01a07	Remove sysevents These headers are provided in the ZFS repository and never used by the SPL. Remove them to ensure the right ones are included. Reviewed-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #696	2018-04-04 09:54:20 -07:00
LOLi	1724eb62de	Fix spa reference leak in zfs_ioc_pool_scan zfs_ioc_pool_scan leaks a spa reference when zc->zc_flags is not a valid pool_scrub_cmd_t: this could happen if the userland binaries and ZFS kernel module differ in version and would prevent the pool from being exported. Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7380	2018-04-03 17:31:30 -07:00
LOLi	f119d00c1f	Fix add_nested_replacing_spare test case Use 'zpool reopen' instead of 'zpool scrub' to kick in the spare device: this is required to avoid spurious failures caused by a race condition in events processing by the ZFS Event Daemon: P1 (zpool scrub) P2 (zed) --- zfs_ioc_pool_scan() -> dsl_scan() -> vdev_reopen() -> vdev_set_state(VDEV_STATE_CANT_OPEN) zfs_ioc_vdev_attach() -> spa_vdev_attach() -> dsl_resilver_restart() -> dsl_sync_task() -> dsl_scan_setup_check() <- dsl_scan_setup_check(): EBUSY Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7247 Closes #7342	2018-04-03 17:30:14 -07:00
Tim Chase	10adee27ce	Remove ASSERT() in l2arc_apply_transforms() The ASSERT was erroneously copied from the next section of code. The buffer's size should be expanded from "psize" to "asize" if necessary. Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #7375	2018-03-31 15:14:21 -07:00
Tom Caputi	a2c2ed1bd4	Decryption error handling improvements Currently, the decryption and block authentication code in the ZIO / ARC layers is a bit inconsistent with regards to the ereports that are produces and the error codes that are passed to calling functions. This patch ensures that all of these errors (which begin as ECKSUM) are converted to EIO before they leave the ZIO or ARC layer and that ereports are correctly generated on each decryption / authentication failure. In addition, this patch fixes a bug in zio_decrypt() where ECKSUM never gets written to zio->io_error. Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7372	2018-03-31 11:12:51 -07:00
Tom Caputi	4515b1d01c	Encrypted dnode blocks should be prefetched raw Encrypted dnode blocks are always initially read as raw data and converted to decrypted data when an encrypted bonus buffer is needed. This allows the DMU to be used for things like fetching the DMU master node without requiring keys to be loaded. However, dbuf_issue_final_prefetch() does not currently read the data as raw. The end result of this is that prefetched dnode blocks are read twice from disk: once decrypted and then again as raw data. This patch corrects the issue by adding the flag when appropriate. Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7362	2018-03-31 11:11:48 -07:00
LOLi	77d8a0f1a4	Fix hung z_zvol tasks during 'zfs receive' During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6330 Closes #6890 Closes #7343	2018-03-30 12:10:01 -07:00
Georgy Yakovlev	2f291ebaed	zfs-functions: skip lines where mntpnt is 'none' This fixes zfs-mount initscript trying to mount swap volumes as filesystems or anything that has 'none' as a mountpoint in /etc/fstab. Additionally, fixes trying to mount swap volumes as a filesystem on RHEL. RHEL defines mountpoint for swap as `swap`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Georgy Yakovlev <ya@sysdump.net> Closes #7346 Closes #7347	2018-03-30 12:05:24 -07:00
Andriy Gapon	5e00213e43	OpenZFS 9164 - assert: newds == os->os_dsl_dataset Authored by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <don.brady@delphix.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> Porting Notes: * Re-enabled and tweaked the zpool_upgrade_007_pos test case to successfully run in under 5 minutes. OpenZFS-issue: https://www.illumos.org/issues/9164 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0e776dc06a Closes #6112 Closes #7336	2018-03-30 12:00:40 -07:00
Don Brady	99f505a4d7	Add support for nvme based devids Adds a devid for nvme devices. This is very similar to how the other 'bus' (scsi\|sata\|usb) devids are generated. The devid resides in a name/value pair in the leaf vdevs in a zpool config. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@delphix.com> Closes #7356	2018-03-29 17:43:25 -07:00
Tom Caputi	32dce2da0c	Resolve QAT issues with incompressible data Currently, when ZFS wants to accelerate compression with QAT, it passes a destination buffer of the same size as the source buffer. Unfortunately, if the data is incompressible, QAT can actually "compress" the data to be larger than the source buffer. When this happens, the QAT driver will return a FAILED error code and print warnings to dmesg. This patch fixes these issues by providing the QAT driver with an additional buffer to work with so that even completely incompressible source data will not cause an overflow. This patch also resolves an error handling issue where incompressible data attempts compression twice: once by QAT and once in software. To fix this issue, a new (and fake) error code CPA_STATUS_INOMPRESSIBLE has been added so that the calling code can correctly account for the difference between a hardware failure and data that simply cannot be compressed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Weigang Li <weigang.li@intel.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7338	2018-03-29 17:40:34 -07:00
Tom Caputi	13a2ff2727	Fix ASSERT in dsl_scan_fini() and cleanup comments This patch fixes an issue where dsl_scan_prefetch_cb() might add more prefetch I/Os to the prefetch queue after prefetching has been completed. This was happening because that code was checking scn->scn_suspending instead of scn->scn_prefetch_stop. This occasionally triggered an ASSERT during ztest runs in dsl_scan_fini() when the code attempted to destroy an AVL tree that still had entires in it. This patch also includes a number of spelling corrections and comment cleanups throughout dsl_scan.c Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7353	2018-03-28 18:30:44 -07:00
Tony Hutter	d2812de6f7	chmod -x on etc/init.d/zfs-*.in automake files Clear executable bit on zfs-import.in, zfs-mount.in, zfs-share.in, and zfs-zed.in. These are automake files and should not be marked executable. This fixes a RPM build error on Fedora 28. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7355 Closes #7327	2018-03-28 12:08:21 -07:00
Brian Behlendorf	b2ab468dde	Fix mmap / libaio deadlock Calling uiomove() in mappedread() under the page lock can result in a deadlock if the user space page needs to be faulted in. Resolve the issue by dropping the page lock before the uiomove(). The inode range lock protects against concurrent updates via zfs_read() and zfs_write(). Reviewed-by: Albert Lee <trisk@forkgnu.org> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7335 Closes #7339	2018-03-28 10:19:22 -07:00
DeHackEd	668173b576	Remove libattr requirement RHEL/CentOS 6 supports sys/xattr.h eliminating the need for libattr-devel as a dependency. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: DHE <git@dehacked.net> Closes #7344 Closes #7351	2018-03-27 16:51:32 -07:00
Allan Jude	5152a74088	OpenZFS 9321 - arc_loan_compressed_buf() can increment arc_loaned_bytes by the wrong value Authored by: Allan Jude <allanjude@freebsd.org> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Garrett D'Amore <garrett@damore.org> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9321 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/92b05f3a18 Closes #7333	2018-03-26 20:40:15 -07:00
Tony Hutter	9ea6c3d39d	Fedora 28: Fix "Macro %_dracutdir has empty body" If you run ./configure --with-config=srpm, it will not trigger the user m4 scripts to populate the dracut and udev directories. This causes a build error on Fedora 28. Make the dracut and udev lines conditional to get around this. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #7326 Closes #7328	2018-03-25 15:00:47 -07:00
Tom Caputi	157ef7f6a5	Don't count embedded bps in read stats Currently, ZFS tracks statistics about calls to arc_read() via the /proc/spl/kstat/zfs/<pool>/reads file for debugging. Unfortunately, this file currently counts embedded bps as disk reads since they are technically processed by the ZIO layer. This pollutes the log since the ARC will never cache embedded bps. This patch corrects this issue by preventing the logging of embedded bp reads. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7334	2018-03-23 21:35:19 -07:00
Paul Dagnelie	387b6856d6	OpenZFS 9193 - bootcfg -C doesn't work When given an empty string as a rootds value, bootcfg -C fails with the error message 'could not set nextboot: '' is an invalid name'. This should be allowed because it represents clearing the nextboot configuration. Authored by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Igor Kozhukhov <igor@dilos.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Robert Mustacchi <rm@joyent.com> Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9193 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/504645d227 Closes #7230	2018-03-22 16:16:55 -07:00
Peter Ashford	910f3ce739	Clarify zpool actions for an intent log device Updated the "Intent Log" section of the "zpool" manual page to properly reflect the actions that may be performed. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Peter Ashford <ashford@accs.com> Closes #6938 Closes #7318	2018-03-22 15:12:08 -07:00
kpande	05747eca5b	modprobe zfs during dracut mount Resolves importing root pool during boot in dracut. This case was inadvertently broken with the module autoloading change in #7287. Reviewed-by: Matthew Thode <prometheanfire@gentoo.org> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Kash Pande <kash@tripleback.net> Closes #7322	2018-03-22 10:14:29 -07:00
Matthew Ahrens	2fd92c3d6c	enable zfs_dbgmsg() by default, without dprintf() zfs_dbgmsg() should record a message by default. As a general principal, these messages shouldn't be too verbose. Furthermore, the amount of memory used is limited to 4MB (by default). dprintf() should only record a message if this is a debug build, and ZFS_DEBUG_DPRINTF is set in zfs_flags. This flag is not set by default (even on debug builds). These messages are extremely verbose, and sometimes nontrivial to compute. SET_ERROR() should only record a message if ZFS_DEBUG_SET_ERROR is set in zfs_flags. This flag is not set by default (even on debug builds). This brings our behavior in line with illumos. Note that the message format is unchanged (including file, line, and function, even though these are not recorded on illumos). Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Prakash Surya <prakash.surya@delphix.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #7314	2018-03-21 15:37:32 -07:00
Tom Caputi	8d9e7c8fbe	Fix spelling errors in comments This patch simply corrects some spelling / grammar errors in the QAT and encryption code comments. No functional changes Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7319	2018-03-21 08:42:13 -07:00
timor	c66e54e9dc	Add support for nvme disk detection This treats /dev/nvme.. devices the same way as /dev/sd... devices. The motivation behind this is that whole disk detection did not work on nvme SSDs without that, because it DKC_UNKNOWN was returned for such devices. Perhaps there should be a separate DKC_ type for this, but I don't know enough about the code to know the implications of that. Reviewed-by: Don Brady <don.brady@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: timor <timor.dd@googlemail.com> Closes #7304	2018-03-21 08:35:20 -07:00
Tom Caputi	089fbf313c	Add comments for portable dnode / objset flags This patch adds some comments describing the purpose of "portable" dnode and objset flags so that it is clear when new flags should be added to the repective flag masks. This patch includes no functional changes. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7313	2018-03-20 11:55:21 -07:00
Alek P	272b5d730f	Add JSON output support to channel programs The changes piggyback JSON output support on top of channel programs (#6558). This way the JSON output support is targeted to scripting use cases and is easily maintainable since it really only touches one function (zfs_do_channel_program()). This patch ports Joyent's JSON nvlist library from illumos to enable easy JSON printing of channel program output nvlist. To keep the delta small I also took advantage of the fact that printing in zfs_do_channel_program() was almost always done before exiting the program. Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alek Pinchuk <apinchuk@datto.com> Closes #7281	2018-03-19 12:40:58 -07:00
Brian Behlendorf	a76f3d0437	Fix deadlock in IO pipeline In vdev_queue_aggregate() the zio_execute() bypass should not be called under the vdev queue lock. This can result in a deadlock as shown in the stack traces below. Drop the vdev queue lock then walk the parents of the aggregate IO to determine the list of component IOs to be bypassed. This can be done safely without holding the io_lock since the new aggregate IO has not yet been returned and its parents cannot change. --- THREAD 1 --- arc_read() zio_nowait() zio_vdev_io_start() vdev_queue_io() <--- mutex_enter(vq->vq_lock) vdev_queue_io_to_issue() vdev_queue_aggregate() zio_execute() zio_vdev_io_assess() zio_wait_for_children() <- mutex_enter(zio->io_lock) --- THREAD 2 --- (inverse order) arc_read() zio_change_priority() <- mutex_enter(zio->zio_lock) vdev_queue_change_io_priority() <- mutex_enter(vq->vq_lock) Reviewed-by: Tom Caputi <tcaputi@datto.com> Reviewed-by: Don Brady <don.brady@delphix.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7307	2018-03-16 16:46:06 -07:00
Tom Caputi	17dd88352e	Allow QAT to handle non page-aligned buffers This patch adds some handling to the QAT acceleration functions that allows them to handle buffers that are not aligned with the page cache. At the moment this never happens since callers only happen to work with page-aligned buffers, but this code should prevent headaches if this isn't always true in the future. This patch also adds some cleanups to align the QAT compression code with the encryption and checksumming code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Weigang Li <weigang.li@intel.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7305	2018-03-16 16:02:49 -07:00
Olaf Faaland	cec3a0a1bb	Report pool suspended due to MMP When the pool is suspended, record whether it was due to an I/O error or due to MMP writes failing to succeed within the required time. Change spa_suspended from uint8_t to zio_suspend_reason_t to store the reason. When userspace queries pool status via spa_tryimport(), report the reason the pool was suspended in a new key, ZPOOL_CONFIG_SUSPENDED_REASON. In libzfs, when interpreting the returned config nvlist, report suspension due to MMP with a new pool status enum value, ZPOOL_STATUS_IO_FAILURE_MMP. In status_callback(), which generates and emits the message when 'zpool status' is executed, add a case to print an appropriate message for the new pool status enum value. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7296	2018-03-15 10:56:55 -07:00
Tom Caputi	3874220932	SHA256 QAT acceleration This patch enables acceleration of SHA256 checksums using Intel Quick Assist Technology. This patch also fixes up and refactors some of the code from QAT encryption to make the behavior consistent. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chengfeix Zhu <chengfeix.zhu@intel.com> Signed-off-by: Weigang Li <weigang.li@intel.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #7295	2018-03-15 10:53:58 -07:00
Stephen Blinick	8a2a9db8df	OpenZFS 9076 - Adjust perf test concurrency settings ZFS Performance test concurrency should be lowered for better latency Work by Stephen Blinick. Nightly performance runs typically consist of two levels of concurrency; and both are fairly high. Since the IO runs are to a ZFS filesystem, within a zpool, which is based on some variable number of vdev's, the amount of IO driven to each device is variable. Additionally, different device types (HDD vs SSD, etc) can generally handle a different amount of concurrent IO before saturating. Nevertheless, in practice, it appears that most tests are well past the concurrency saturation point and therefore both perform with the same throughput, the maximum of the device. Because the queuedepth to the device(s) is so high however, the latency is much higher than the best possible at that throughput, and increases linearly with the increase in concurrency. This means that changes in code that impact latency during normal operation (before saturation) may not be apparent when a large component of the measured latency is from the IO sitting in a queue to be serviced. Therefore, changing the concurrency settings is recommended Authored by: Stephen Blinick <stephen.blinick@delphix.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: John Wren Kennedy <jwk404@gmail.com> Ported-by: John Wren Kennedy <jwk404@gmail.com> OpenZFS-issue: https://www.illumos.org/issues/9076 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/562 Upstream bug: DLPX-45477 Closes #7302	2018-03-15 10:51:00 -07:00
Paul Zuchowski	1a2342784a	receive_spill does not byte swap spill contents In zfs receive, the function receive_spill should account for spill block endian conversion as a defensive measure. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #7300	2018-03-15 10:29:51 -07:00
Brian Behlendorf	de4f8d5d26	OpenZFS 9188 - increase size of dbuf cache to reduce indirect block decompression With compressed ARC (bug 6950) we use up to 25% of our CPU to decompress indirect blocks, under a workload of random cached reads. To reduce this decompression cost, we would like to increase the size of the dbuf cache so that more indirect blocks can be stored uncompressed. If we are caching entire large files of recordsize=8K, the indirect blocks use 1/64th as much memory as the data blocks (assuming they have the same compression ratio). We suggest making the dbuf cache be 1/32nd of all memory, so that in this scenario we should be able to keep all the indirect blocks decompressed in the dbuf cache. (We want it to be more than the 1/64th that the indirect blocks would use because we need to cache other stuff in the dbuf cache as well.) In real world workloads, this won't help as dramatically as the example above, but we think it's still worth it because the risk of decreasing performance is low. The potential negative performance impact is that we will be slightly reducing the size of the ARC (by ~3%). Porting Notes: * Added modules options to zfs-module-parameters.5 man page. * Preserved scaling based on target ARC size rather than max ARC size. Authored by: George Wilson <george.wilson@delphix.com> Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Prashanth Sreenivasa <pks@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/9188 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/564 Upstream bug: DLPX-46942 Closes #7273	2018-03-13 10:52:48 -07:00
Brian Behlendorf	a6cc97566c	Add kernel module auto-loading Historically a dynamic misc minor number was registered for the /dev/zfs device in order to prevent minor number collisions. This was fine but it prevented us from being able to use the kernel module auto-loaded which requires a known reserved value. Resolve this issue by adding a configure test to find an available misc minor number which can then be used in MODULE_ALIAS_MISCDEV at build time. By adding this alias the zfs kmod is added to the list of known static-nodes and the systemd-tmpfiles-setup-dev service will create a /dev/zfs character device at boot time. This in turn allows us to update the 90-zfs.rules file to make it aware this is a static node. The upshot of this is that whenever a process (zpool, zfs, zed) opens the /dev/zfs the kmods will be automatic loaded. This even works for unprivileged users so there is no longer a need to manually load the modules at boot time. As an additional bonus the zed now no longer needs to start after the zfs-import.service since it will trigger the module load. In the unlikely event the minor number we selected conflicts with another out of tree unregistered minor number the code falls back to dynamically allocating it. In this case the modules again must be manually loaded. Note that due to the change in the method of registering the minor number the zimport.sh test case may incorrectly fail when the static node for the installed packages is created instead of the dynamic one. This issue will only transiently impact zimport.sh for this single commit when we transition and are mixing and matching methods. Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> TEST_ZIMPORT_SKIP="yes" Closes #7287	2018-03-13 10:45:55 -07:00
Tim Chase	02638a30ef	Add zfs_scan_ignore_errors tunable When it's set, a DTL range will be cleared even if its scan/scrub had errors. This allows to work around resilver/scrub upon import when the pool has errors. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #7293	2018-03-13 10:43:14 -07:00
Paul Zuchowski	83362e8e67	Destroy makes full snap list before destroying Change zfs destroy logic so destroying begins before the entire list of snapshots is built. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Reviewed-by: Kash Pande <kash@tripleback.net> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #7271	2018-03-12 15:24:08 -07:00
Chunwei Chen	9470cbd4f9	Fix race in trace point in zrl_add_impl We hit an illegal memory access in the zrlock trace point. The problem is that zrl->zr_owner and zrl->zr_caller are assigned locklessly. And if zrl->zr_owner got assigned a longer string between when __string() calculate the strlen, and when __assign_str() does strcpy. The copy will overflow the buffer. == For example: Initial condition: zrl->zr_owner = A zrl->zr_caller = "abc" Thread A Thread B ------------------------------------------------- if (zrl->zr_owner == A) { DTRACE_PROBE2() { __string() { strlen(zrl->zr_caller) -> 3 allocate buf[4] } zrl->zr_owner = B zrl->zr_caller = "abcd" __assign_str() { strcpy(buf, zrl->zr_caller) <- buffer overflow == Dereferencing zrl->zr_owner->pid may also be problematic, in that the zrl->zr_owner got changed to other task, and that task exits, freeing the task_struct. This should be very unlikely, as the other task need to zrl_remove and exit between the dereferencing zr->zr_owner and zr->zr_owner->pid. Nevertheless, we'll deal with it as well. To fix the zrl->zr_caller issue, instead of copy the string content, we just copy the pointer, this is safe because it always points to __func__, which is static. As for the zrl->zr_owner issue, we pass in curthread instead of using zrl->zr_owner. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Closes #7291	2018-03-12 11:27:02 -07:00
Brian Behlendorf	b7eec00f9f	Fix MMP write frequency for large pools When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7205 Closes #7289	2018-03-12 11:26:05 -07:00
Olaf Faaland	743253df70	Hold SCL_VDEV when counting leaves A config lock should be held while vdev_count_leaves() walks the tree, otherwise the pointers reference may become invalid during the walk. SCL_VDEV is a minimal lock provided for such uses cases. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7286	2018-03-09 15:42:54 -08:00
Olaf Faaland	ebed90a598	Handle zio_resume and mmp => off When multihost is disabled on a pool, and the pool is resumed via zpool clear, within a single cycle of the mmp thread's loop (e.g. while it's in the cv_timedwait call), both mmp_last_write and mmp_delay should be updated. The original code mistakenly treated the two cases as if they could not occur at the same time. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #7286	2018-03-09 15:42:11 -08:00
Tomohiro Kusumi	5ee220ba5c	Document allowed pool names PR #7208 was a patch to allow non-reserved pool names which begin with mirror, raidz, spare (but do not equal), however we'd rather document it in the man page for compatibility with other OpenZFS implementations, to avoid pool names that may not work on non-Linux platforms. Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com> Closes #7216	2018-03-09 14:04:15 -08:00
LOLi	c45c6d9212	Fix zfs-kmod builds when using rpm >= 4.14 With rpm-software-management/rpm@5e94633 a package version containing invalid characters (most commonly a double '-') causes the kmod package generation to terminate with an error. This change takes advantage of the newly introduced rpm macro "_wrong_version_format_terminate_build" to allow kmod packages to be built. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #7284	2018-03-09 13:52:37 -08:00
LOLi	43983eb202	Fix spl-kmod builds when using rpm >= 4.14 With rpm-software-management/rpm@5e94633 a package version containing invalid characters (most commonly a double '-') causes the kmod package generation to terminate with an error. This change takes advantage of the newly introduced rpm macro "_wrong_version_format_terminate_build" to allow kmod packages to be built. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #691	2018-03-09 13:51:31 -08:00
Tomohiro Kusumi	6b8655ad3f	Change functions which return literals to return `const char` get_format_prompt_string() and zpool_state_to_name() return a string literal which is read-only, thus they should return `const char`. zpool_get_prop_string() returns a non-const string after successful nv-lookup, and returns a string literal otherwise. Since this function is designed to be used for read-only purpose, the return type should also be `const char*`. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com> Closes #7285	2018-03-09 13:47:32 -08:00

1 2 3 4 5 ...

4576 Commits