When pools are assembled from partitions, vdev_id.conf aliases
do not work. The directory /dev/disk/by-vdev is not created because
the associated udev rule for parsing vdev_id.conf is never called.
Extend to logic to match "disk" and "partition".
Patch-proposed-by: @sparksh
Reviewed-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3859Closes#5266
Authored by: Akash Ayare <aayare@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Bug was caused due to a change in functionality. At some point, ZFS
snapshots no longer created associated device files which were being
used in the test. To resolve this issue, a clone of the snapshot can be
produced which will also create the expected device files; then, the
test will behave as it did historically.
OpenZFS-issue: https://www.illumos.org/issues/6877
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/2200f27Closes#5275
Porting Notes:
- Hardcoded /dev/zvol/rdsk changed to $ZVOL_RDEVDIR for compatibility.
- Enabled in linux runfile.
These tests all pass once updated to wait for udev to create the
expected linked under /dev/zvol/.
Reviewed-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Reviewed-by: yuxiang <guo.yong33@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5275
In this test the 'ls -ls' command was used to print testfile size in
blocks. Because the environment variable BLOCK_SIZE was set
the 'ls -ls' command detected this and output its block count as the
number of 8192 blocks. Rather than change the variable name
the -k was was added to force ls to return 1k blocks. This has the
additional advantage of behaving consistently across platforms.
For additional details on GNU 'ls' behavior regarding block size:
https://www.gnu.org/software/coreutils/manual/html_node/Block-size.html
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes#5269
The zfs_receive_011_pos test can be enabled now that OpenZFS 6562
has been merged.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5276
OpenZFS 7090 - zfs should throttle allocations
Authored by: George Wilson <george.wilson@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Ported-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
When write I/Os are issued, they are issued in block order but the ZIO
pipeline will drive them asynchronously through the allocation stage
which can result in blocks being allocated out-of-order. It would be
nice to preserve as much of the logical order as possible.
In addition, the allocations are equally scattered across all top-level
VDEVs but not all top-level VDEVs are created equally. The pipeline
should be able to detect devices that are more capable of handling
allocations and should allocate more blocks to those devices. This
allows for dynamic allocation distribution when devices are imbalanced
as fuller devices will tend to be slower than empty devices.
The change includes a new pool-wide allocation queue which would
throttle and order allocations in the ZIO pipeline. The queue would be
ordered by issued time and offset and would provide an initial amount of
allocation of work to each top-level vdev. The allocation logic utilizes
a reservation system to reserve allocations that will be performed by
the allocator. Once an allocation is successfully completed it's
scheduled on a given top-level vdev. Each top-level vdev maintains a
maximum number of allocations that it can handle (mg_alloc_queue_depth).
The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth)
are distributed across the top-level vdevs metaslab groups and round
robin across all eligible metaslab groups to distribute the work. As
top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.
OpenZFS-issue: https://www.illumos.org/issues/7090
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/4756c3d7Closes#5258
Porting Notes:
- Maintained minimal stack in zio_done
- Preserve linux-specific io sizes in zio_write_compress
- Added module params and documentation
- Updated to use optimize AVL cmp macros
The refquota_002_pos and refquota_004_pos test cases can pass
without modification.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes#5273
coverity scan CID:147654,type: Copy into fixed size buffer
- string operation may write past the end of the fixed-size
destination buffer
coverity scan CID:147690,type: Uninitialized scalar variable
- call zfs_prop_get first in case we use sourcetype and
share_sourcetype without initialization
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5253
The ICP requires destructors to for each crypto module that is added.
These do not necessarily exist in Illumos because they assume that
these modules can never be unloaded from the kernel. Some of this
cleanup code was missed when #4760 was merged, resulting in leaks.
This patch simply fixes that.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #4760Closes#5265
coverity scan CID:147452, Type:Unchecked return value from library
coverity scan CID:147447, Type:Unchecked return value from library
coverity scan CID:147446, Type:Unchecked return value from library
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: cao.xuewen <cao.xuewen@zte.com.cn>
Closes#5264
When the exception branch exits, the buf is leaked.
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: luozhengzheng <luo.zhengzheng@zte.com.cn>
Closes#5262
Fix use after free in zfsctl_snapshot_unmount(). Use /usr/bin/env
instead of /bin/sh to fix a shell code injection flaw and allow use
with grsecurity.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Stian Ellingsen <stian@plaimi.net>
Closes#5250Closes#4377
The zfs_snapshot_008_neg test case does not use nested pools and
can be safely enabled. The zfs_snapshot_009_pos test case is
also passing without modification.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5260
Enable reservation_012_pos, reservation_015_pos and reservation_016_pos
test cases which are passing.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes#5254
When array is passed as a parameter it degenerates into a
pointer so the sizeof(path) in is_shorthand_path() and always
get return value of 8, instead of the string length we want.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5198
This is as much an upstream compatibility as it's a bit of a performance
gain.
The illumos taskq implemention doesn't allow a TASKQ_THREADS_CPU_PCT type
to be dynamic and in fact enforces as much with an ASSERT.
As to performance, if this taskq is dynamic, it can cause excessive
contention on tq_lock as the threads are created and destroyed because it
can see bursts of many thousands of tasks in a short time, particularly
in heavy high-concurrency zvol write workloads.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#5236
When #4760 was merged tests were added to ensure that the new checksums
were working properly. However, some of the functionality for sha2
functions were not ported over, resulting in some Coverity defects and
code that would be unstable when needed in the future. This patch
simply ports over the missing code and fixes the defects in the
process.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Issue #4760Closes#5251
Enable readonly_001_pos this test is now passing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
The following new test cases need to have execute permissions set:
userquota/groupspace_003_pos.ksh
userquota/userquota_013_pos.ksh
userquota/userspace_003_pos.ksh
upgrade/upgrade_userobj_001_pos.ksh
upgrade/setup.ksh
upgrade/cleanup.ksh
The following source files accidentally were marked executable:
lib/libzpool/kernel.c
lib/libshare/nfs.c
lib/libzfs/libzfs_dataset.c
lib/libzfs/libzfs_util.c
tests/zfs-tests/cmd/rm_lnkcnt_zero_file/rm_lnkcnt_zero_file.c
tests/zfs-tests/cmd/dir_rd_update/dir_rd_update.c
cmd/zed/zed_exec.c
module/icp/core/kcf_sched.c
module/zfs/dsl_pool.c
module/zfs/arc.c
module/nvpair/nvpair.c
man/man5/zfs-module-parameters.5
Reviewed-by: GeLiXin <ge.lixin@zte.com.cn>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5241
Call mount and umount via /usr/bin/env instead of /bin/sh in
zfsctl_snapshot_mount() and zfsctl_snapshot_unmount().
This change fixes a shell code injection flaw. The call to /bin/sh
passed the mountpoint unescaped, only surrounded by single quotes. A
mountpoint containing one or more single quotes would cause the command
to fail or potentially execute arbitrary shell code.
This change also provides compatibility with grsecurity patches.
Grsecurity only allows call_usermodehelper() to use helper binaries in
certain paths. /usr/bin/* is allowed, /bin/* is not.
OpenZFS decided that ignore_hole_birth was too imprecise and
incorrect a name (and went with send_holes_without_birth_time).
Rename it in ZoL too, while keeping the name "ignore_hole_birth"
pointing to the same variable for existing consumers.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#5239
Updating vd->vdev_parent->vdev_nonrot in vdev_open_child()
is a race when vdev_open_child is called for many children
from a task queue.
vdev_open_child() is only called by vdev_open_children(), let
the latter update the parent vdev_nonrot member. The update
was already there, so done twice previously. Thus using the
same logic at the end in vdev_open_children() to update
vdev_nonrot, either we are vdev_uses_zvols() or not.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Haakan T Johansson <f96hajo@chalmers.se>
Closes#5162
Fixes ABI issues with fletcher4 code, adds support for
incremental updates, and adds ztest method for testing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes#5164
The variable snapprops_nvlist was never initialized, so properties
were not applied to the received snapshot.
Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise
'zfs receive' functionality for user properties.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#4338
Introduce a make recipe for flake8 to enable python
style checking. Ensure all python scripts pass flake8.
Return an error code of 0 for arcstat.py -v and
dbufstat.py -v. Add test cases for python scripts.
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ian Lee <IanLee1521@gmail.com>
Closes#5230
Using a benchmark which creates 2 million files in one TXG, I observe
that the thread running spa_sync() is on CPU almost the entire time we
are syncing, and therefore can be a performance bottleneck. About 50% of
the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls
zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was
modified (or created). In this benchmark, all the files are owned by the
same user/group, so all 2 million calls to zap_increment_int() are
modifying the same entry in the zap. The same issue exists for the
DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are
syncing, and when we finish, iterate over the in-memory map and modify
the ZAP once per entry. This reduces the number of calls to
zap_increment_int() from "number of objects modified" to "number of
owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark
by ~33%, from 11 seconds to 7 seconds.
Upstream bugs: DLPX-44799
Ported by: Ned Bass <bass6@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6988
ZFSonLinux-issue: https://github.com/zfsonlinux/zfs/issues/4642
OpenZFS-commit: unmerged
Porting notes:
- Added curly braces around declaration of userquota_cache_t cache to
quiet compiler warning;
- Handled the userobj accounting the same way it proposed in this path.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
This patch tracks dnode usage for each user/group in the
DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode
accounting have the key prefixed with "obj-" followed by the UID/GID
in string format (as done for the block accounting).
A new SPA feature has been added for dnode accounting as well as
a new ZPL version. The SPA feature must be enabled in the pool
before upgrading the zfs filesystem. During the zfs version upgrade,
a "quotacheck" will be executed by marking all dnode as dirty.
ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Implement tests to ensure that python scripts
that are distributed with ZFS continue to at
minimum run without errors. This will help prevent
accidental breaking of these scripts.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Add a make recipe to enable developers
to easily run flake8 if it is available.
This will help enforce good python coding
standards.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Both scripts were returning an error code of 1
when using the -v argument. -v should exit with
an error code of 0.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Added new section to build icp module.
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5232Closes#5234
The behavior of the Dracut module was very wrong before.
The correct behavior: initramfs should not run `zfs-mount` to completion
if the two generator files exist. If, however, one of them is missing,
it indicates one of three cases:
* The kernel command line did not specify a root ZFS file system, and
another Dracut module is already handling root mount (via systemd).
`mount-zfs` can run, but it will do nothing.
* There is no systemd to run `sysroot.mount` to begin with.
`mount-zfs` must run.
* The root parameter is zfs:AUTO, which cannot be run in sysroot.mount.
`mount-zfs` must run.
In any of these three cases, it is safe to run `zfs-mount` to completion.
`zfs-mount` must also delete itself if it determines it should not run,
or else Dracut will do the insane thing of running it over and over again.
Literally, the definition of insanity, doing the same thing that did not
work before, expecting different results. Doing that may have had a great
result before, when we had a race between devices appearing and pools
being mounted, and `mount-zfs` was tasked with the full responsibility
of importing the needed pool, but nowadays it is wrong behavior and
should be suppressed.
I deduced that self-deletion was the correct thing to do by looking at
other Dracut code, because (as we all are very fully aware of) Dracut
is entirely, ahem, "implementation-defined".
Tested-by: @wphilips
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes#5157Closes#5204
Move the synchronization of inode/znode i_flgas/pflags into
the respective internal zfs function. This is mostly
mechanical work and shouldn't introduce any functional
changes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Issue #227Closes#5223
Init, compute, and fini methods are changed to work on internal context object.
This is necessary because ABI does not guarantee that SIMD registers will be preserved
on function calls. This is technically the case in Linux kernel in between
`kfpu_begin()/kfpu_end()`, but it breaks user-space tests and some kernels that
don't require disabling preemption for using SIMD (osx).
Use scalar compute methods in-place for small buffers, and when the buffer size
does not meet SIMD size alignment.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Combine incrementally computed fletcher4 checksums. Checksums are combined
a posteriori, allowing for parallel computation on chunks to be implemented if
required. The algorithm is general, and does not add changes in each SIMD
implementation.
New test in ztest verifies incremental fletcher computations.
Checksum combining matrix for two buffers `a` and `b`, where `Ca` and `Cb` are
respective fletcher4 checksums, `Cab` is combined checksum, `s` is size of buffer
`b` (divided by sizeof(uint32_t)) is:
Cab[A] = Cb[A] + Ca[A]
Cab[B] = Cb[B] + Ca[B] + s * Ca[A]
Cab[C] = Cb[C] + Ca[C] + s * Ca[B] + s(s+1)/2 * Ca[A]
Cab[D] = Cb[D] + Ca[D] + s * Ca[C] + s(s+1)/2 * Ca[B] + s(s+1)(s+2)/6 * Ca[A]
NOTE: this calculation overflows for larger buffers. Thus, internally, the calculation
is performed on 8MiB chunks.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>