Commit Graph

6399 Commits

Author SHA1 Message Date
melak
aa51adf0a2 Fix trivial typo in zfs-diff.8
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tamas TEVESZ <ice@extreme.hu>
Closes #11268 
Closes #11272
2020-12-23 13:09:32 -08:00
Alexander Motin
1d02bdee6c Fix for "Reduce latency effects of non-interactive I/O"
It was found that setting min_active tunables for non-interactive I/Os
makes them stuck.  It is caused by zfs_vdev_nia_delay, that can never
be reached if we never issue any I/Os due to min_active set to zero.

Fix this by issuing at least one non-interactive I/O at a time when
there are no interactive I/Os.  When there are interactive I/Os, zero
min_active allows to completely block any non-interactive I/O.  It may
min_active starvation in some scenarios, but who we are to deny foot
shooting?

Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Closes #11261
2020-12-23 13:09:17 -08:00
Alexander Motin
2080c4f27e Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-12-23 13:09:03 -08:00
qzdanis
49ba502f99 Add compatibility for busybox mktemp
Busybox's mktemp requires at least six X's in the template, causing
the current sed --in-place check to fail because the file does not
exist. This change adds additional X's to mktemp templates that do
not already have at least six X's in them.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Quentin Zdanis <zdanisq@gmail.com>
Closes #11269
2020-12-23 13:08:30 -08:00
Ryan Moeller
7735c9addf FreeBSD: notify userspace when a vdev is removed
This is needed for zfsd to autoreplace vdevs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11260
2020-12-23 13:08:12 -08:00
Andrew Sun
ed02d603a1 Make zpool status "remove:" label print in bold
When ZFS_COLOR is set, zpool status shows row headings in bold,
except for the "remove:" heading. This is a quick fix that makes
it print in bold too.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andrew Sun <me@andrewsun.com>
Closes #11255
2020-12-23 13:07:27 -08:00
George Melikov
529469769f CI: simplify checkstyle runner
Remove excess steps.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes #11262
2020-12-23 13:07:14 -08:00
Prawn
3b854534f0
ZED/zfs-list-cacher.sh: don't exit on ignored event type
Check for the history_event type instead.

The zfs-list-cacher.sh script currently respects the event types
excluded from syslog(!) in ZED_SYSLOG_SUBCLASS_EXCLUDE.
This makes little sense in this single-purpose script and
silently breaks when history_events are excluded from syslog,
which is the default since 13d65987a9.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: InsanePrawn <insane.prawny@gmail.com>
Closes #11164
Closes #11347
2020-12-19 18:01:21 -08:00
Brian Behlendorf
dcbf847493 Tag 2.0.0
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-30 10:13:14 -08:00
Brian Behlendorf
2757204434 Verify zfs module loaded before starting services
Extend the change made in ae12b02 to verify the zfs kernel
modules are loaded to the rest of the OpenZFS services.  If
the modules aren't loaded the neither the share, volume, or
and zed services can be started.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11243
2020-11-30 09:44:08 -08:00
Đoàn Trần Công Danh
24a6f83847 dracut: use /bin/sh instead of bash as the intepreter
Despite that dracut has a hard dependency on bash,
its modules doesn't, dracut only has a hard dependency on bash for
module-setup (on a fully usable machine). Inside initramfs, dracut
allows users choose from a list of handful other shells, e.g. bash,
busybox, dash, mkfsh.

In fact, my local machine's initramfs is being built with dash,
and it's functional for a very long time.

Before 64025fa3a (Silence 'make checkbashisms', 2020-08-20), we also
allows our users to have that right, too.

Let's fix the problem 'make checkbashisms' reported and allows our users
to have that right, again.

For 'plymouth' case, let's simply run the command inside the if instead
of checking for the existence of command before running it, because the
status is also failture if plymouth is unavailable.

While we're at it, let's remove an unnecessary fork for grep in
zfs-generator.sh.in and its following complicated 'if elif fi' with
a simple 'case ... esac'.

To support this change, also exclude 90zfs from "make checkbashisms"
because the current CI infrastructure ships an old version of
"checkbashisms", which complains about "command -v", while the current
latest "checkbashisms" thinks it's fine. In the near future, we can
revert that change to "Makefile.am" when CI infrastructure is updated.

Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Closes #11244
2020-11-30 09:44:02 -08:00
Brian Behlendorf
2c36eb763f Revert "Reduce latency effects of non-interactive I/O"
Under certain conditions commit a3a4b8def appears to result in a
hang, or poor performance, when importing a pool.  Until the root
cause can be identified it has been reverted from the release branch.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11245
2020-11-30 09:43:09 -08:00
Brian Behlendorf
a4ab0c607e Tag 2.0.0-rc7
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-25 09:18:29 -08:00
Alexander Motin
a3a4b8def7 Reduce latency effects of non-interactive I/O
Investigating influence of scrub (especially sequential) on random read
latency I've noticed that on some HDDs single 4KB read may take up to 4
seconds!  Deeper investigation shown that many HDDs heavily prioritize
sequential reads even when those are submitted with queue depth of 1.

This patch addresses the latency from two sides:
 - by using _min_active queue depths for non-interactive requests while
   the interactive request(s) are active and few requests after;
 - by throttling it further if no interactive requests has completed
   while configured amount of non-interactive did.

While there, I've also modified vdev_queue_class_to_issue() to give
more chances to schedule at least _min_active requests to the lowest
priorities.  It should reduce starvation if several non-interactive
processes are running same time with some interactive and I think should
make possible setting of zfs_vdev_max_active to as low as 1.

I've benchmarked this change with 4KB random reads from ZVOL with 16KB
block size on newly written non-fragmented pool.  On fragmented pool I
also saw improvements, but not so dramatic.  Below are log2 histograms
of the random read latency in milliseconds for different devices:

4 2x mirror vdevs of SATA HDD WDC WD20EFRX-68EUZN0 before:
0, 0, 2,  1,  12,  21,  19,  18, 10, 15, 17, 21
after:
0, 0, 0, 24, 101, 195, 419, 250, 47,  4,  0,  0
, that means maximum latency reduction from 2s to 500ms.

4 2x mirror vdevs of SATA HDD WDC WD80EFZX-68UW8N0 before:
0, 0,  2,  31,  38,  28,  18,  12, 17, 20, 24, 10, 3
after:
0, 0, 55, 247, 455, 470, 412, 181, 36,  0,  0,  0, 0
, i.e. from 4s to 250ms.

1 SAS HDD SEAGATE ST14000NM0048 before:
0,  0,  29,   70, 107,   45,  27, 1, 0, 0, 1, 4, 19
after:
1, 29, 681, 1261, 676, 1633,  67, 1, 0, 0, 0, 0,  0
, i.e. from 4s to 125ms.

1 SAS SSD SEAGATE XS3840TE70014 before (microseconds):
0, 0, 0, 0, 0, 0, 0, 0,  70, 18343, 82548, 618
after:
0, 0, 0, 0, 0, 0, 0, 0, 283, 92351, 34844,  90

I've also measured scrub time during the test and on idle pools.  On
idle fragmented pool I've measured scrub getting few percent faster
due to use of QD3 instead of QD2 before.  On idle non-fragmented pool
I've measured no difference.  On busy non-fragmented pool I've measured
scrub time increase about 1.5-1.7x, while IOPS increase reached 5-9x.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11166
2020-11-25 08:45:38 -08:00
cragw
14bdf57a99 pam_zfs_key: accommodate different dataset naming scheme
Name of dataset for user home directory may vary from the expected
$homes_prefix/$username, if different naming scheme is being used.

We can use property mountpoint to specify the dataset for $username
as long as its value is identical to passwd's pw_dir.

For example:
    NAME                       PROPERTY     VALUE
    rpool/home/myuser_123456   mountpoint   /home/myuser

Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Crag Wang <crag0715@gmail.com>
Closes #11165
2020-11-25 08:42:37 -08:00
Matthew Macy
45061cc797 FreeBSD: decouple ZFS_DEBUG from kernel debug settings
Reviewed-by: Martelli Nikola @martellini
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11213
2020-11-25 08:42:26 -08:00
Brian Behlendorf
cb01817f22 Obsolete earlier packages due to version bump
In order for package managers such as dnf to upgrade cleanly after
the package SONAME bump the obsolete package names must be known.
Update the new packages to correctly obsolete the old ones.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11230 
Closes #11233
2020-11-24 10:27:14 -08:00
Antonio Russo
1c4ccfb34e libzfsbootenv: do not depend on libnvpair
We do not build libnvpair.pc.  Moreover, it is automatically pulled in
by libzfs.pc, so no additional specific dependency is required.

Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11227
2020-11-24 10:27:14 -08:00
Brian Behlendorf
056287e3f7 Include the ABI with dist tarball
The ABI should be included when generating the `make dist` tarball
since it's required by the `make checkabi` target.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11225
2020-11-22 10:01:47 -08:00
Brian Behlendorf
ccfa35c6f9 Correct missing zil_claim() DTL updates
Commit a1d477c2 accidentally disabled DTL updates for the zil_claim()
case described at the end of vdev_stat_update() by unconditionally
disabling all DTL updates when loading.  This was done to avoid
a deadlock on the vd_dtl_lock when loading the DTLs from disk.

    vdev_dtl_contains <--- Takes vd->vd_dtl_lock
    vdev_mirror_child_missing
    vdev_mirror_io_start
    zio_vdev_io_start
    __zio_execute
    arc_read
    dbuf_issue_final_prefetch
    dbuf_prefetch_impl
    dbuf_prefetch
    dmu_prefetch
    space_map_iterate
    space_map_load_length
    space_map_load
    vdev_dtl_load <--- Takes vd->vd_dtl_lock
    vdev_load
    spa_ld_load_vdev_metadata
    spa_tryimport

The missing DTL updates can be restored by moving the space_map_load()
call outside the vd_dtl_lock.  A private range tree is populated by
reading the space map and then merged in to the DTL_MISSING tree
under the lock.

Furthermore, the SPA_LOAD_NONE check in vdev_dtl_contains() leads to an
additional problem.  Any resilvering which occurs before SPA_LOAD_NONE
is set will incorrectly determine that there's nothing to repair.  This
can result in full redundancy not being restored for some blocks.

Reviewed-by: Matt Ahrens <matt@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11218
2020-11-22 10:01:43 -08:00
Antonio Russo
8471f71132 Track SONAME version bump in packaging
RPM and DEB packages are named after the SONAME version of the library
they contain.  After bumping this version, the packaging should be
renamed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11219
2020-11-20 08:58:27 -08:00
Brian Behlendorf
813185d141
Enable ABI checks for the checkstyle workflow
For the OpenZFS 2.0 release branch extend the CI checkstyle
workflow to perform the library ABI checks.

Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11215
2020-11-18 09:32:49 -08:00
Brian Behlendorf
be12087783 Add ABI snapshot
Add a snapshot of the OpenZFS 2.0 ABI using libabigail-1.7-2.
The included ABI passes `make checkabi` for CentOS 7, Fedora 33,
Debian 10, and Ubuntu 20.04.  This covers a fairly wide range
of glibc, gcc, and libabigail versions plus other changes which
are platform specific.

Reviewed-by: Antonio Russo <aerusso@aerusso.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11144
2020-11-17 20:48:53 +00:00
Antonio Russo
4f9014b70b Library ABI tracking with abigail
Provide two make targets: checkabi and storeabi.

storeabi uses libabigail to generate a reference copy of the ABI for the
public libraries.

checkabi compares such a reference to the compiled version, failing if
they are not compatible.  No ABI is generated for libzpool.so, it is
only used by ztest and zdb and not external consumers.

Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes #11144
2020-11-17 20:29:02 +00:00
Matthew Macy
043ef5c25e Fix problems in zvol_set_volmode_impl
- Don't leave fstrans set when passed a snapshot
- Don't remove minor if volmode already matches new value
- (FreeBSD) Wait for GEOM ops to complete before trying
  remove (at create time GEOM will be "tasting" in parallel)
- (FreeBSD) Don't leak zvol_state_lock on open if zv == NULL
- (FreeBSD) Don't try to unlock zv->zv_state lock if zv == NULL

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11199
2020-11-17 12:20:09 -08:00
наб
c06118e0b1 zpool: correctly align columns with -p
zpool_expand_proplist() now ignores pl_fixed if its new literal
argument is true.  The rest is a consequence of needing to pass
that down.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiao?=~Dska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-17 12:20:01 -08:00
наб
5f24bd11ee zpool(8): fix pool-wi[sd]e typo
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11202
2020-11-17 12:19:56 -08:00
loli10K
f6f3089cf6 Fix 'zfs userspace' for received datasets in encrypted root
For encrypted receives, where user accounting is initially disabled on
creation, both 'zfs userspace' and 'zfs groupspace' fails with
EOPNOTSUPP: this is because dmu_objset_id_quota_upgrade_cb() forgets to
set OBJSET_FLAG_USERACCOUNTING_COMPLETE on the objset flags after a
successful dmu_objset_space_upgrade().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9501
Closes #9596
2020-11-17 12:19:51 -08:00
George Amanakis
a09aeb9fc4 Fix ASSERT logic in l2arc_evict()
In case of cache device removal it is possible that at the end of
l2arc_evict() we have l2ad_hand = l2ad_evict. This can lead to the
following panic in case of a debug build:

VERIFY3(dev->l2ad_hand < dev->l2ad_evict) failed (321920512 < 321920512)
Call Trace:
 dump_stack+0x66/0x90
 spl_panic+0xef/0x117 [spl]
 l2arc_remove_vdev+0x11d/0x290 [zfs]
 spa_load_l2cache+0x275/0x5b0 [zfs]
 spa_vdev_remove+0x4a5/0x6e0 [zfs]
 zfs_ioc_vdev_remove+0x59/0xa0 [zfs]
 zfsdev_ioctl_common+0x5b3/0x630 [zfs]
 zfsdev_ioctl+0x53/0xe0 [zfs]
 do_vfs_ioctl+0x42e/0x6b0
 ksys_ioctl+0x5e/0x90
 do_syscall_64+0x5b/0x1a0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

In case of cache device removal it also possible that l2ad_hand +
distance > l2ad_end since we do not iterate l2arc_evict() and l2ad_hand
is not reset. This has no functional consequence however as the cache
device is about to be removed.

Fix this by omitting the ASSERT in case of device removal.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #11205
2020-11-17 12:19:46 -08:00
Érico Rolim
e4257ed76d config/dracut/90zfs: handle cases where hostid(1) returns all zeros
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Co-authored-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-17 12:19:42 -08:00
Érico Rolim
c4a5e3b90f zgenhostid: accept hostid arguments equal to zero.
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Georgy Yakovlev <gyakovlev@gentoo.org>
Reviewed-by: Andrew J. Hesford <ajh@sideband.org>
Signed-off-by: Érico Rolim <erico.erc@gmail.com>
Closes #11174
Closes #11189
2020-11-17 12:19:33 -08:00
Brian Behlendorf
d02fc15ba1 Linux: Fix ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP usage
The ZFS_ENTER/ZFS_EXIT/ZFS_VERFY_ZP macros should not be used
in the Linux zpl_*.c source files.  They return a positive error
value which is correct for the common code, but not for the Linux
specific kernel code which expects a negative return value.  The
ZPL_ENTER/ZPL_EXIT/ZPL_VERFY_ZP macros should be used instead.

Furthermore, the ZPL_EXIT macro has been updated to not call the
zfs_exit_fs() function.  This prevents a possible deadlock which
can occur when a snapshot is automatically unmounted because the
zpl_show_devname() must never wait on in progress automatic
snapshot unmounts.

Reviewed-by: Adam Moss <c@yotes.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11169 
Closes #11201
2020-11-14 10:51:27 -08:00
Matthew Ahrens
435dc4baab Assertion failure when logging large output of channel program
The output of ZFS channel programs is logged on-disk in the zpool
history, and printed by `zpool history -i`.  Channel programs can use
10MB of memory by default, and up to 100MB by using the `zfs program -m`
flag.  Therefore their output can be up to some fraction of 100MB.

In addition to being somewhat wasteful of the limited space reserved for
the pool history (which for large pools is 1GB), in extreme cases this
can result in a failure of `ASSERT(length <= DMU_MAX_ACCESS);` in
`dmu_buf_hold_array_by_dnode()`.

This commit limits the output size that will be logged to 1MB.  Larger
outputs will not be logged, instead a entry will be logged indicating
the size of the omitted output.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11194
2020-11-14 10:51:21 -08:00
Brian Behlendorf
04177b9c3f Tag 2.0.0-rc6
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2020-11-12 11:02:58 -08:00
Matthew Ahrens
4a87c280dc Channel program may spuriously fail with "memory limit exhausted"
ZFS channel programs (invoked by `zfs program`) are executed in a LUA
sandbox with a limit on the amount of memory they can consume.  The
limit is 10MB by default, and can be raised to 100MB with the `-m` flag.
If the memory limit is exceeded, the LUA program exits and the command
fails with a message like `Channel program execution failed: Memory
limit exhausted.`

The LUA sandbox allocates memory with `vmem_alloc(KM_NOSLEEP)`, which
will fail if the requested memory is not immediately available.  In this
case, the program fails with the same message, `Memory limit exhausted`.
However, in this case the specified memory limit has not been reached,
and the memory may only be temporarily unavailable.

This commit changes the LUA memory allocator `zcp_lua_alloc()` to use
`vmem_alloc(KM_SLEEP)`, so that we won't spuriously fail when memory is
temporarily low.  Instead, we rely on the system to be able to free up
memory (e.g. by evicting from the ARC), and we assume that even at the
highest memory limit of 100MB, the channel program will not truly
exhaust the system's memory.

External-issue: DLPX-71924
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11190
2020-11-12 09:02:00 -08:00
Brian Behlendorf
d237d9a918 Linux: Fix mount/unmount when dataset name has a space
The custom zpl_show_devname() helper should translate spaces in
to the octal escape sequence \040.  The getmntent(2) function
is aware of this convention and properly translates the escape
character back to a space when reading the fsname.

Without this change the `zfs mount` and `zfs unmount` commands
incorrectly detect when a dataset with a name containing spaces
is mounted.

Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #11182
Closes #11187
2020-11-12 09:01:55 -08:00
Tony Perkins
2132ae465d Start snapdir_iterate traversals to begin wtih the value of zero.
The microzap hash can sometimes be zero for single digit snapnames.
The zap cursor can then have a serialized value of two (for . and ..),
and skip the first entry in the avl tree for the .zfs/snapshot directory
listing, and therefore does not return all snapshots.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Cedric Berger <cedric@precidata.com>
Signed-off-by: Tony Perkins <tperkins@datto.com>
Closes #11039
2020-11-12 09:01:27 -08:00
Mateusz Guzik
87f01fc158 G/C data_alloc_arena
It is a leftover from illumos always set to NULL and introducing a
spurious difference between zio_buf and zio_data_buf.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11188
2020-11-11 18:46:22 -08:00
Mateusz Guzik
995b80fa3a G/C struct znode -> z_moved
The field is yet another leftover from unsupported zfs_znode_move.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes #11186
2020-11-11 11:40:15 -08:00
Adrian Chadd
d842f99c6b Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.

It doesn't show up on LLVM because it doesn't trigger at all.

Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-11 11:09:50 -08:00
Adrian Chadd
8cad25a39c Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes #11068
Closes #11069
2020-11-11 11:09:41 -08:00
sterlingjensen
c00bb5f4ea Fix memleak in cmd/mount_zfs.c
Convert dynamic allocation to static buffer, simplify parse_dataset
function return path. Add tests specific to the mount helper.

Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sterling Jensen <sterlingjensen@users.noreply.github.com>
Closes #11098
2020-11-11 11:09:31 -08:00
наб
d33cbbbf93 zpoolprops.8: clarify vdev expansion rules
Remove reference to EFI(?), explain that the new space
is beyond the GPT for whole-disk vdevs, and add section noting how it
behaves with partition vdevs in terms of how the user is most likely to
encounter it ‒ the previous phrasing was confusing
and seemed to indicate that "zpool online -e" will be able to claim

  GPT[whatever, ZFS, free space, whatever]

into

  GPT[whatever, ZFS, whatever]
but that's not the case, as it'll only be able to do so after manually
resizing the ZFS partition to include the free space beforehand, i.e.:
  GPT[whatever, ZFS, free space, whatever]
  GPT[whatever, [ZFS + free space], potentially left-overs, whatever]
  # zpool online -e
  GPT[whatever, ZFS, whatever]

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #11158
2020-11-11 11:08:57 -08:00
Pavel Zakharov
a4efa59a94 initramfs: zfsunlock hook breaks /usr/bin
The copy_exec() function expects that the full path of the target
file is passed rather than just the directory, and will take care
of creating the underlying directories if they don't exist.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Closes #11162
2020-11-11 11:07:40 -08:00
Ryan Moeller
cb4d3fb737 FreeBSD: Simplify zvol_geom_open and zvol_cdev_open
We can consolidate the unlocking procedure into one place by starting
with drop_suspend set to B_FALSE and moving the open count check up.

While here, a little code cleanup. Match the out labels between
zvol_geom_open and zvol_cdev_open, and add a missing period in some
comments.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-11 11:07:40 -08:00
Ryan Moeller
4a2e9811e9 FreeBSD: Avoid spurious EINTR in zvol_cdev_open
zvol_first_open can fail with EINTR if spa_namespace_lock is not held
and cannot be taken without waiting.

Apply the same logic that was done for zvol_geom_open to take
spa_namespace_lock if not already held on first open in zvol_cdev_open.

Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11175
2020-11-11 11:07:40 -08:00
Alexander Motin
050dfc5045 Fix dmu_tx_dirty_throttle after arc_c reduction
After initial arc_c was reduced to arc_c_min it became possible that
on datasets with primarycache=metadata or none dirty data make up most
of ARC capacity and easily more than configured 50% of initial arc_c,
that causes forced txg commits by arc_tempreserve_space() and periodic
very long write delays.

This patch makes arc_tempreserve_space() to use arc_c only after ARC
warmed up once and arc_c really means something, but use arc_c_max
before that.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Matt Macy <mmacy@FreeBSD.org>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes #11178
2020-11-11 11:03:43 -08:00
Matthew Macy
b49118220c Fix dnode refcount tracking
Fix a couple of places where the wrong tag is passed
to dnode_{hold, rele}

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
Closes #11184
2020-11-11 11:03:24 -08:00
Ryan Moeller
806dda56ce ZTS: Add L1 corruption test
Add a new test case which corrupts all level 1 block in a file.
Then verifies that corruption is detected and repaired.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-11 11:03:02 -08:00
Ryan Moeller
97f5cfea77 ZTS: Output all block copies in list_file_blocks
The second part of list_file_blocks transforms the object description
output by zdb -ddddd $ds $objnum into a stream of lines of the form
"level path offset length" for the indirect blocks in the given file.
The current code only works for the first copy of L0 blocks.  L1 and
L2 indirect blocks have more than one copy on disk.

Add one more -d to the zdb command so we get all block copies and
rewrite the transformation to match more than L0 and output all DVAs.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #11141
2020-11-11 11:01:42 -08:00