The blk_queue_stackable() function was replaced in the 4.14 kernel
by queue_is_rq_based(), commit torvalds/linux@5fdee212. This change
resulted in the default elevator being used which can negatively
impact performance.
Rather than adding additional compatibility code to detect the
new interface unconditionally attempt to set the elevator. Since
we expect this to fail for block devices without an elevator the
error message has been moved in to zfs_dbgmsg().
Finally, it was observed that the elevator_change() was removed
from the 4.12 kernel, commit torvalds/linux@c033269. Update the
comment to clearly specify which are expected to export the
elevator_change() symbol.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7645
Commit torvalds/linux@95582b0 changes the inode i_atime, i_mtime,
and i_ctime members form timespec's to timespec64's to make them
2038 safe. As part of this change the current_time() function was
also updated to return the timespec64 type.
Resolve this issue by introducing a new inode_timespec_t type which
is defined to match the timespec type used by the inode. It should
be used when working with inode timestamps to ensure matching types.
The timestruc_t type under Illumos was used in a similar fashion but
was specified to always be a timespec_t. Rather than incorrectly
define this type all timespec_t types have been replaced by the new
inode_timespec_t type.
Finally, the kernel and user space 'sys/time.h' headers were aligned
with each other. They define as appropriate for the context several
constants as macros and include static inline implementation of
gethrestime(), gethrestime_sec(), and gethrtime().
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7643
This patch simply adds an ASSERT that confirms that the last
decrypting reference on a dataset waits until the dataset is
no longer dirty. This should help to debug issues where the
ZIO layer cannot find encryption keys after a dataset has been
disowned.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7637
The recent SPL merge caused a regression in kernels with ZFS integrated
into the sources where our modules would be initialized in alphabetical
order, despite icp requiring the spl module be loaded first. This caused
kernels with ZFS builtin to fail to boot.
We resolve this by adding a special case for the spl that lists it
first. It is somewhat ugly, but it works.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#7595Closes#7606
This patch adds tunables for modifying the maximum memory limit and
maximum instruction limit that can be specified when running a channel
program.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1085
Closes#7618
Added support for the bops->check_events() interface which was
added in the 2.6.38 kernel to replace bops->media_changed().
Fully implementing this functionality allows the volume resize
code to rely on revalidate_disk(), which is the preferred
mechanism, and removes the need to use check_disk_size_change().
In order for bops->check_events() to lookup the zvol_state_t
stored in the disk->private_data the zvol_state_lock needs to
be held. Since the check events interface may poll the mutex
has been converted to a rwlock for better concurrently. The
rwlock need only be taken as a writer in the zvol_free() path
when disk->private_data is set to NULL.
The configure checks for the block_device_operations structure
were consolidated in a single kernel-block-device-operations.m4
file.
The ZFS_AC_KERNEL_BDEV_BLOCK_DEVICE_OPERATIONS configure checks
and assoicated dead code was removed. This interface was added
to the 2.6.28 kernel which predates the oldest supported 2.6.32
kernel and will therefore always be available.
Updated maximum Linux version in META file. The 4.17 kernel
was released on 2018-06-03 and ZoL is compatible with the
finalized kernel.
Reviewed-by: Boris Protopopov <boris.protopopov@actifio.com>
Reviewed-by: Sara Hartse <sara.hartse@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7611
Reserve bit 25 for the ZSTD compression feature from FreeBSD.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Allan Jude <allanjude@freebsd.org>
Closes#7626
The ZoL version of libefi has been modified for Linux in several
places outside the existing __linux__ wrappers. Remove them to
make the code easier to read and so as not to mislead anyone that
these are the sole modifications for Linux.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7625
Commit 2ffd89fc allowed two new errors to be reported by zil_reset()
in order to provide a descriptive error message regarding why a log
device could not be removed. However, the new return values were
not handled in the ztest_vdev_add_remove() test case resulting in
ztest failures during automated testing.
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7630
The zfs_dbuf_evict_key TSD (thread-specific data) is not necessary -
we can instead pass a flag down in a few places to prevent recursive
dbuf eviction. Making this change has 3 benefits:
1. The code semantics are easier to understand.
2. On Linux, performance is improved, because creating/removing
TSD values (by setting to NULL vs non-NULL) is expensive, and
we do it very often.
3. According to Nexenta, the current semantics can cause a
deadlock when concurrently calling dmu_objset_evict_dbufs()
(which is rare today, but they are working on a "parallel
unmount" change that triggers this more easily):
Porting Notes:
* Minor conflict with OpenZFS 9337 which has not yet been ported.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim.dimitro@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9577
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/645
External-issue: DLPX-58547
Closes#7602
Partition detection for zvol devices was not working correctly
resulting inconsistent partitioning behavior when layering pools
on top of zvols. This isn't a supported configuration but we'd
still like it to work properly.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7624
In the cleanup for the privilege tests, an empty variable, empty because
the corresponding setup is skipped on Linux, results in /export/home
being deleted. This patch adds an assertion that the variable is not
empty, and causes the cleanup to be skipped on Linux as well.
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
External-issue: LX-1099
Closes#7615
user_run leaves two files in /tmp, moving them to $TEST_BASE_DIR and
adding them to the default cleanup routine.
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes#7614
The recent addition of pyzfs does not include the generated 'build'
and 'pyzfs.egg-info' directories in the pyzfs .gitignore or the
'make clean' target. This patch simply corrects this problem.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7612
In the case where the pool is loaded without the crypto
keys necessary to playback the intent log, and log device
removal is attempted, a generic busy message is received.
Change the message to inform the user that the datasets
must be mounted.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes#7518
In the new aggsum counters the CPU_SEQID macro should be surrounded by
kpreempt_disable)() and kpreempt_enable() calls to prevent a Linux
kernel BUG warning. The addsum_add() function use the cpuid to
minimize lock contention when selecting a bucket, after selection
the bucket is protected by a mutex and it is safe to reschedule the
process to a different processor at any time.
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7609Closes#7610
zpool and zed place scripts in subdirectories of libexecdir. Some
distributions locate architecture independent scripts in other locations
(e.g. Debian). To avoid these paths getting out of sync, centralize the
definitions.
Build zfs-test's default.cfg by Makefile. Use the new directory
logic building tests/zfs-tests/include/default.cfg.in.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes#7597
If sa_build_index() encounters a corrupt buffer, don't panic.
Add info to zfs ring buffer and return EIO. This allows for a cleaner
error recovery path.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Issue #6500Closes#7487
This patch collects some minor inconsistencies and typos in the
documentation, logging and testing infrastructure.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes#7608
This patch fixes an issue where l2arc_read_done() would always
write data to b_pabd, even if raw encrypted data was requested.
This only occured in cases where the L2ARC device had a different
ashift than the main pool.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7586Closes#7593
This patch fixes a small bug found where receive_spill() sometimes
attempted to decrypt spill blocks when doing a raw receive. In
addition, this patch fixes another small issue in arc_buf_fill()'s
error handling where a decryption failure (which could be caused by
the first bug) would attempt to set the arc header's IO_ERROR flag
without holding the header's lock.
Reviewed-by: Matthew Thode <prometheanfire@gentoo.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#7564Closes#7584Closes#7592
Currently, during a recursive zfs destroy the first error that is
encountered will stop the destruction of the datasets. Errors may
happen for a variety of reasons including competing deletions
and busy datasets.
This patch switches recursive destroy to always do a best-effort
recursive dataset destroy.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes#7574
History tests were hard coded to use /tmp and didn't clean up
properly after testing.
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Issue #7507Closes#7600
In pursuit of improving performance on multi-core systems, we should
implements fanned out counters and use them to improve the performance of
some of the arc statistics. These stats are updated extremely frequently,
and can consume a significant amount of CPU time.
Authored by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Paul Dagnelie <pcd@delphix.com>
OpenZFS-issue: https://www.illumos.org/issues/8484
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/7028a8b92b7
Issue #3752Closes#7462
1. Add a proc entry to display the pool's state:
$ cat /proc/spl/kstat/zfs/tank/state
ONLINE
This is done without using the spa config locks, so it will
never hang.
2. Fix 'zpool status' and 'zpool list -o health' output to print
"SUSPENDED" instead of "ONLINE" for suspended pools.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#7331Closes#7563
The only remaining consumer of the rwlock compatibility wrappers
is ztest. Remove the wrappers and convert the few remaining
calls to the underlying pthread functions.
rwlock_init() -> pthread_rwlock_init()
rwlock_destroy() -> pthread_rwlock_destroy()
rw_rdlock() -> pthread_rwlock_rdlock()
rw_wrlock() -> pthread_rwlock_wrlock()
rw_unlock() -> pthread_rwlock_unlock()
Note pthread_rwlock_init() defaults to PTHREAD_PROCESS_PRIVATE
which is equivilant to the USYNC_THREAD behavior. There is no
functional change.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7591
txg_kick() fails to see that we are quiescing, forcing transactions to
their next stages without leaving them accumulate changes
Creating a fragmented pool in a DCenter VM and continuously writing to it with
multiple instances of randwritecomp, we get the following output from txg.d:
0ms 311MB in 4114ms (95% p1) 75MB/s 544MB (76%) 336us 153ms 0ms
0ms 8MB in 51ms ( 0% p1) 163MB/s 474MB (66%) 129us 34ms 0ms
0ms 366MB in 4454ms (93% p1) 82MB/s 572MB (79%) 498us 20ms 0ms
0ms 406MB in 5212ms (95% p1) 77MB/s 591MB (82%) 661us 37ms 0ms
0ms 340MB in 5110ms (94% p1) 66MB/s 622MB (86%) 1048us 41ms 1ms
0ms 3MB in 61ms ( 0% p1) 51MB/s 419MB (58%) 33us 0ms 0ms
0ms 361MB in 3555ms (88% p1) 101MB/s 542MB (75%) 335us 40ms 0ms
0ms 356MB in 4592ms (92% p1) 77MB/s 561MB (78%) 430us 89ms 1ms
0ms 11MB in 129ms (13% p1) 90MB/s 507MB (70%) 222us 15ms 0ms
0ms 281MB in 2520ms (89% p1) 111MB/s 542MB (75%) 334us 42ms 0ms
0ms 383MB in 3666ms (91% p1) 104MB/s 557MB (77%) 411us 133ms 0ms
0ms 404MB in 5757ms (94% p1) 70MB/s 635MB (88%) 1274us 123ms 2ms
4ms 367MB in 4172ms (89% p1) 88MB/s 556MB (77%) 401us 51ms 0ms
0ms 42MB in 470ms (44% p1) 90MB/s 557MB (77%) 412us 43ms 0ms
0ms 261MB in 2273ms (88% p1) 114MB/s 556MB (77%) 407us 27ms 0ms
0ms 394MB in 3646ms (85% p1) 108MB/s 552MB (77%) 393us 304ms 0ms
0ms 275MB in 2416ms (89% p1) 113MB/s 510MB (71%) 200us 53ms 0ms
0ms 9MB in 53ms ( 0% p1) 169MB/s 483MB (67%) 140us 100ms 1ms
The TXGs that are getting synced and don't have lots of changes are pushed by
txg_kick() which basically forces the current open txg to get to the quiesced
state:
if (tx->tx_syncing_txg == 0 &&
tx->tx_quiesce_txg_waiting <= tx->tx_open_txg &&
tx->tx_sync_txg_waiting <= tx->tx_synced_txg &&
tx->tx_quiesced_txg <= tx->tx_synced_txg) {
tx->tx_quiesce_txg_waiting = tx->tx_open_txg + 1;
cv_broadcast(&tx->tx_quiesce_more_cv);
}
The problem is that the above code doesn't check if we are currently quiescing
anything (only if a quiesce or a sync has been requested, ..etc) so the
following scenario can happen:
1] We have an open txg A that had enough dirty data (more than
zfs_dirty_data_sync) and it was pushed to the quiesced state, and opened
a new txg B. No txg is currently being synced.
2] Immediately after the opening of B, txg_kick() was run by some other write
(and because of A's dirty data) and saw that we are not currently syncing
any txg and no one has requested quiescing so it requests one by bumping
tx_quiesce_txg_waiting and broadcasts the quiesce thread.
3] The quiesce thread just passed txg A to be synced and sees that a quiescing
request has been sent to it so it immediately grabs B without letting it
gather enough data, putting it in a quiesced state and opening a new txg C.
In this scenario txg B, is an example of how the entries of interest show up in
the txg.d output.
Ideally we would like txg_kick() to get triggered only when we are sure that
we are not syncing AND not quiescing any txg. This way we can kick an open TXG
to the quiescing state when we are sure that there is nothing going on and we
would benefit from the different states running concurrently.
Authored by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9464
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1cd7635bCloses#7587
Test 13 would fail because of attempts to zpool destroy -f a pool that
was still busy. Changed those calls to destroy_pool which does a retry
loop, and the problem is no longer reproducible. Also removed some non
functional code in the test which is why it was originally commented out
by placing it after the call to log_pass.
Test 14 would fail because sometimes the check for a degraded pool would
complete before the pool had changed state. Changed the logic to check
in a loop with a timeout and the problem is no longer reproducible.
Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Chris Williamson <chris.williamson@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
* Re-enabled slog_013_pos.ksh
OpenZFS-issue: https://illumos.org/issues/9245
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/8f323b5Closes#7585
We want to be able to pass various settings during import/open of a
pool, which are not only related to rewind. Instead of adding a new
policy and duplicate a bunch of code, we should just rename
rewind_policy to a more generic term like load_policy.
For instance, we'd like to set spa->spa_import_flags from the nvlist,
rather from a flags parameter passed to spa_import as in some cases we
want those flags not only for the import case, but also for the open
case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb)
which would allow zfs to open a pool when logs are missing.
Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9235
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/d2b1e44Closes#7532
Fixed some highlighting in the zpool man page
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes#7596
For the null pointer issue shown below, the solution is to initialize the
contents of the object before changing its type, so that concurrent accessors
will see it as non-zapified until it is ready for access via the ZAP.
BAD TRAP: type=e (#pf Page fault) rp=ffffff00ff520440 addr=20 occurred
in module "zfs" due to a NULL pointer dereference
ffffff00ff520320 unix:die+df ()
ffffff00ff520430 unix:trap+dc0 ()
ffffff00ff520440 unix:cmntrap+e6 ()
ffffff00ff520590 zfs:zap_leaf_lookup+46 ()
ffffff00ff520640 zfs:fzap_lookup+a9 ()
ffffff00ff5206e0 zfs:zap_lookup_norm+111 ()
ffffff00ff520730 zfs:zap_contains+42 ()
ffffff00ff520760 zfs:dsl_dataset_has_resume_receive_state+47 ()
ffffff00ff520900 zfs:get_receive_resume_stats+3e ()
ffffff00ff520a90 zfs:dsl_dataset_stats+262 ()
ffffff00ff520ac0 zfs:dmu_objset_stats+2b ()
ffffff00ff520b10 zfs:zfs_ioc_objset_stats_impl+64 ()
ffffff00ff520b60 zfs:zfs_ioc_objset_stats+33 ()
ffffff00ff520bd0 zfs:zfs_ioc_dataset_list_next+140 ()
ffffff00ff520c80 zfs:zfsdev_ioctl+4d7 ()
ffffff00ff520cc0 genunix:cdev_ioctl+39 ()
ffffff00ff520d10 specfs:spec_ioctl+60 ()
ffffff00ff520da0 genunix:fop_ioctl+55 ()
ffffff00ff520ec0 genunix:ioctl+9b ()
ffffff00ff520f10 unix:brand_sys_sysenter+1c9 ()
Porting Notes:
* DMU_OT_BYTESWAP conditional in zap_lockdir_impl() kept.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9329
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/e8e0f97Closes#7578
The ZAP code was written before we allowed c99 in the Solaris kernel. We
should change it to take advantage of being able to declare variables where
they are first used. This reduces variable scope and means less scrolling
to find the type of variables.
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://illumos.org/issues/9328
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/76ead05Closes#7578
Update bdev_capacity to have wholedisk vdevs query the
size of the underlying block device (correcting for the size
of the efi parition and partition alignment) and therefore detect
expanded space.
Correct vdev_get_stats_ex so that the expandsize is aligned
to metaslab size and new space is only reported if it is large
enough for a new metaslab.
Reviewed by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Signed-off-by: sara hartse <sara.hartse@delphix.com>
External-issue: LX-165
Closes#7546
Issue #7582
`make install` shouldn't fail if a directory it created still exists.
In this case we can blow away the spl src directory before recreating
it. This also gracefully handles the migration from pre-spl-merge to
post-spl-merge.
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#7580
Add META tags Linux-Maximum and Linux-Minimum.
One pain point for package maintainers is ensuring the compatibility of
the packaged version of ZFS with the Linux kernel. By providing an
authoritative compatibility guide in the source tree, maintainers can
automate compatibility checks.
Additionally, increase META string extraction specificity.
configure.ac finds Name and Version by a very simple `grep`, which might
conceivably find other fields. Require the string be at the beginning of
a line, and be followed by a colon to avoid such confusions.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes#7571
This adds a new test to measure ZIL performance.
- Adds the ability to induce IO delays with zinject
- Adds a new variable (PERF_NTHREADS_PER_FS) to allow fio threads to
be distributed to individual file systems as opposed to all IO going
to one, as happens elsewhere.
- Refactoring of do_fio_run
Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: John Wren Kennedy <jwk404@gmail.com>
OpenZFS-issue: https://www.illumos.org/issues/9082
OpenZFS-commit: https://github.com/openzfs/openzfs/pull/634
External-issue: DLPX-48625
Closes#7491
This fixes an assert in vdev_queue_change_io_priority():
VERIFY3(zio->io_priority < ZIO_PRIORITY_NUM_QUEUEABLE) failed (7 < 6)
PANIC at vdev_queue.c:832:vdev_queue_change_io_priority()
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#7566Closes#7542
vdev_id requires the program `basename` when handling short aliases
defined in `vdev_id.conf` (those defined without a leading path), but
`basename` is not always available in the dracut environment. This
causes the pool device names to change when using `by-vdev/` devices
or (in extreme cases) can make the pool import fail in dracut.
This commit fixes the problem by explicitly installing `basename`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steffen Müthing <steffen.muething@iwr.uni-heidelberg.de>
Closes#7562
Minimal changes required to integrate the SPL sources in to the
ZFS repository build infrastructure and packaging.
Build system and packaging:
* Renamed SPL_* autoconf m4 macros to ZFS_*.
* Removed redundant SPL_* autoconf m4 macros.
* Updated the RPM spec files to remove SPL package dependency.
* The zfs package obsoletes the spl package, and the zfs-kmod
package obsoletes the spl-kmod package.
* The zfs-kmod-devel* packages were updated to add compatibility
symlinks under /usr/src/spl-x.y.z until all dependent packages
can be updated. They will be removed in a future release.
* Updated copy-builtin script for in-kernel builds.
* Updated DKMS package to include the spl.ko.
* Updated stale AUTHORS file to include all contributors.
* Updated stale COPYRIGHT and included the SPL as an exception.
* Renamed README.markdown to README.md
* Renamed OPENSOLARIS.LICENSE to LICENSE.
* Renamed DISCLAIMER to NOTICE.
Required code changes:
* Removed redundant HAVE_SPL macro.
* Removed _BOOT from nvpairs since it doesn't apply for Linux.
* Initial header cleanup (removal of empty headers, refactoring).
* Remove SPL repository clone/build from zimport.sh.
* Use of DEFINE_RATELIMIT_STATE and DEFINE_SPINLOCK removed due
to build issues when forcing C99 compilation.
* Replaced legacy ACCESS_ONCE with READ_ONCE.
* Include needed headers for `current` and `EXPORT_SYMBOL`.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
TEST_ZIMPORT_SKIP="yes"
Closes#7556
Merge a minimal version of the zfsonlinux/spl repository in to the
zfsonlinux/zfs repository. Care was taken to prevent file conflicts
when merging and to preserve the spl repository history. The spl
kernel module remains under the GPLv2 license as documented by the
additional THIRDPARTYLICENSE.gplv2 file.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This commit removes everything from the repository except the core
SPL implementation for Linux. Those files which remain have been
moved to non-conflicting locations to facilitate the merge.
The README.md and associated files have been updated accordingly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
scripts/dkms.mkconf calls configure with
`--with-linux=${kernel_source_dir}`, but Debian puts it kernel source at
`/lib/modules/<version>/source`. This patch adds the same logic to the
DKMS file produced by `scripts/dkms.mkconf` that Debian has shipped in
its official ZFS packaging: at DKMS build time, it checks if the system
is a Debian system, and adjusts the path accordingly.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
Closes#7358Closes#7540Closes#7554
16MB alloc in zdb_embedded_block() can cause cores in certain
situations (clang, gcc55).
Authored by: Jorgen Lundman <lundman@lundman.net>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Approved by: Dan McDonald <danmcd@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
* Replaces an equivalent fix previously made for Linux.
OpenZFS-issue: https://illumos.org/issues/9523
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2c1964aCloses#7561
Device removal allocates a new location for each allocated segment on
the disk that's being removed. Each allocation results in one entry in
the mapping table, which maps from old location + length to new
location. When a fragmented disk is removed, this can result in a large
number of mapping entries, and thus a large amount of memory consumed by
the mapping table. In the worst real-world cases, we've seen around 1GB
of RAM per 1TB of storage removed.
We can improve on this situation by allocating larger segments, which
span across both allocated and free regions of the device being removed.
By including free regions in the allocation (and thus mapping), we
reduce the number of mapping entries. For example, if we have a 4K
allocation followed by 1K free and then 4K allocated, we would allocate
4+1+4 = 9KB, and then move the entire region (including allocated and
free parts). In this case we used one mapping where previously we would
have used two, but often the ratio is much higher (up to 20:1 in
real-world use). We then need to mark the regions that were free on the
removing device as free in the new locations, and also obsolete in the
mapping entry.
This method preserves the fragmentation of the removing device, rather
than consolidating its allocated space into a small number of chunks
where possible. But it results in drastic reduction of memory used by
the mapping table - around 20x in the most-fragmented cases.
In the most fragmented real-world cases, this reduces memory used by the
mapping from ~1GB to ~50MB of RAM per 1TB of storage removed. Less
fragmented cases will typically also see around 50-100MB of RAM per 1TB
of storage.
Porting notes:
* Add the following as module parameters:
* zfs_condense_indirect_vdevs_enable
* zfs_condense_max_obsolete_bytes
* Document the following module parameters:
* zfs_condense_indirect_vdevs_enable
* zfs_condense_max_obsolete_bytes
* zfs_condense_min_mapping_bytes
Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
OpenZFS-issue: https://illumos.org/issues/9486
OpenZFS-commit: https://github.com/ahrens/illumos/commit/07152e142e44c
External-issue: DLPX-57962
Closes#7536
Stack profiling is quite useful and Linux ZFS test suite does not
current collect that data.
Linux perf is a common tool for this purpose though the perf record
data file can be quite large. With this change, Linux ZFS perf tests
capture perf record data if perf is installed on the system and
PERF_DO_PROFILING environment variable is set.
Reviewed by: John Wren Kennedy <jwk404@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tony Nguyen <tony.nguyen@delphix.com>
External-issue: LX-971
Closes#7549
Before running zloop.sh, we need to run `scripts/zfs-tests.sh -c` to
create and populate the `bin` directory with symlinks to our utilities.
Rather than making developers remember to do this, `make` should do it
for them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#7525Closes#7547
- Add links to PULL_REQUEST_TEMPLATE.md
- Clean `System information` table
It's easier to find needes documentation about
PR process with links.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#7539
- there are more options now
- command examples are more readable in code style
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Signed-off-by: George Melikov <mail@gmelikov.ru>
Closes#7538
It's possible for the `zpool attach` portion of this test case
to complete before the `zpool scrub` can be issued. Update the
test case to force the resilvering phase to take longer.
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5444Closes#7541