Fixes ABI issues with fletcher4 code, adds support for
incremental updates, and adds ztest method for testing.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes#5164
The variable snapprops_nvlist was never initialized, so properties
were not applied to the received snapshot.
Additionally, add zfs_receive_013_pos.ksh script to ZFS test suite to exercise
'zfs receive' functionality for user properties.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#4338
Introduce a make recipe for flake8 to enable python
style checking. Ensure all python scripts pass flake8.
Return an error code of 0 for arcstat.py -v and
dbufstat.py -v. Add test cases for python scripts.
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ian Lee <IanLee1521@gmail.com>
Closes#5230
Using a benchmark which creates 2 million files in one TXG, I observe
that the thread running spa_sync() is on CPU almost the entire time we
are syncing, and therefore can be a performance bottleneck. About 50% of
the time in spa_sync() is in dmu_objset_do_userquota_updates().
The problem is that dmu_objset_do_userquota_updates() calls
zap_increment_int(DMU_USERUSED_OBJECT) once for every file that was
modified (or created). In this benchmark, all the files are owned by the
same user/group, so all 2 million calls to zap_increment_int() are
modifying the same entry in the zap. The same issue exists for the
DMU_GROUPUSED_OBJECT.
We should keep an in-memory map from user to space delta while we are
syncing, and when we finish, iterate over the in-memory map and modify
the ZAP once per entry. This reduces the number of calls to
zap_increment_int() from "number of objects modified" to "number of
owners/groups of modified files".
This reduced the time spent in spa_sync() in the file create benchmark
by ~33%, from 11 seconds to 7 seconds.
Upstream bugs: DLPX-44799
Ported by: Ned Bass <bass6@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6988
ZFSonLinux-issue: https://github.com/zfsonlinux/zfs/issues/4642
OpenZFS-commit: unmerged
Porting notes:
- Added curly braces around declaration of userquota_cache_t cache to
quiet compiler warning;
- Handled the userobj accounting the same way it proposed in this path.
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
This patch tracks dnode usage for each user/group in the
DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode
accounting have the key prefixed with "obj-" followed by the UID/GID
in string format (as done for the block accounting).
A new SPA feature has been added for dnode accounting as well as
a new ZPL version. The SPA feature must be enabled in the pool
before upgrading the zfs filesystem. During the zfs version upgrade,
a "quotacheck" will be executed by marking all dnode as dirty.
ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Implement tests to ensure that python scripts
that are distributed with ZFS continue to at
minimum run without errors. This will help prevent
accidental breaking of these scripts.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Add a make recipe to enable developers
to easily run flake8 if it is available.
This will help enforce good python coding
standards.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Both scripts were returning an error code of 1
when using the -v argument. -v should exit with
an error code of 0.
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Added new section to build icp module.
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5232Closes#5234
The behavior of the Dracut module was very wrong before.
The correct behavior: initramfs should not run `zfs-mount` to completion
if the two generator files exist. If, however, one of them is missing,
it indicates one of three cases:
* The kernel command line did not specify a root ZFS file system, and
another Dracut module is already handling root mount (via systemd).
`mount-zfs` can run, but it will do nothing.
* There is no systemd to run `sysroot.mount` to begin with.
`mount-zfs` must run.
* The root parameter is zfs:AUTO, which cannot be run in sysroot.mount.
`mount-zfs` must run.
In any of these three cases, it is safe to run `zfs-mount` to completion.
`zfs-mount` must also delete itself if it determines it should not run,
or else Dracut will do the insane thing of running it over and over again.
Literally, the definition of insanity, doing the same thing that did not
work before, expecting different results. Doing that may have had a great
result before, when we had a race between devices appearing and pools
being mounted, and `mount-zfs` was tasked with the full responsibility
of importing the needed pool, but nowadays it is wrong behavior and
should be suppressed.
I deduced that self-deletion was the correct thing to do by looking at
other Dracut code, because (as we all are very fully aware of) Dracut
is entirely, ahem, "implementation-defined".
Tested-by: @wphilips
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Closes#5157Closes#5204
Move the synchronization of inode/znode i_flgas/pflags into
the respective internal zfs function. This is mostly
mechanical work and shouldn't introduce any functional
changes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Issue #227Closes#5223
Init, compute, and fini methods are changed to work on internal context object.
This is necessary because ABI does not guarantee that SIMD registers will be preserved
on function calls. This is technically the case in Linux kernel in between
`kfpu_begin()/kfpu_end()`, but it breaks user-space tests and some kernels that
don't require disabling preemption for using SIMD (osx).
Use scalar compute methods in-place for small buffers, and when the buffer size
does not meet SIMD size alignment.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Combine incrementally computed fletcher4 checksums. Checksums are combined
a posteriori, allowing for parallel computation on chunks to be implemented if
required. The algorithm is general, and does not add changes in each SIMD
implementation.
New test in ztest verifies incremental fletcher computations.
Checksum combining matrix for two buffers `a` and `b`, where `Ca` and `Cb` are
respective fletcher4 checksums, `Cab` is combined checksum, `s` is size of buffer
`b` (divided by sizeof(uint32_t)) is:
Cab[A] = Cb[A] + Ca[A]
Cab[B] = Cb[B] + Ca[B] + s * Ca[A]
Cab[C] = Cb[C] + Ca[C] + s * Ca[B] + s(s+1)/2 * Ca[A]
Cab[D] = Cb[D] + Ca[D] + s * Ca[C] + s(s+1)/2 * Ca[B] + s(s+1)(s+2)/6 * Ca[A]
NOTE: this calculation overflows for larger buffers. Thus, internally, the calculation
is performed on 8MiB chunks.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
As part of its tests, filetest_001_pos.ksh wipes the entire vdev to
create checksum errors. This patch uses the setup/cleanup scripts from
the scrub_mirror test to create a custom 100MB pool, rather than
using the entire device size that is passed into zfs-tests.sh
(which defaults to 2GB). This speeds up the buildbot tests, and also
makes it possible for someone to use real disks (say, 1TB) without the
test taking an insanely long amount of time.
Authored by: ilovezfs <ilovezfs@icloud.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported by: Tony Hutter <hutter2@llnl.gov>
In any pool without the extensible dataset feature flag already enabled,
creating a dataset with dedup set to use one of the new checksums would
result in the following panic as soon as any data was added:
panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature,
&refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c
line 390
Inpsection showed that feature->fi_feature was 7, which is the value of
SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum. This commit
adds extensible dataset as a dependency for the sha512, edonr, and skein
feature flags, which prevents the panic.
OpenZFS-issue: https://www.illumos.org/issues/6585
OpenZFS-commit: 892586e8a1
Porting Notes:
This code was originally from Illumos, but I actually ported it from:
openzfsonosx/zfs@b62a652
Authored by: ilovezfs <ilovezfs@icloud.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Tony Hutter <hutter2@llnl.gov>
zio_checksum_to_feature() expects a zio_checksum enum not a raw property
intval, so the new checksums weren't being detected when the
ZIO_CHECKSUM_VERIFY flag got in the way.
Given a pool without feature@sha512,
zfs create -o dedup=sha512 naughty/fivetwelve_noverify_ds
would fail as expected since the raw intval would indeed be equal to
SPA_FEATURE_SHA512.
However,
zfs create -o dedup=sha512,verify naughty/fivetwelve_verify_ds
would incorrectly succeed because ZIO_CHECKSUM_VERIFY would be in the
way, the raw intval would not be a member of the enum, and
zio_checksum_to_feature() would return SPA_FEATURE_NONE, with the result
that spa_feature_is_enabled() would never be called.
This was first detected with edonr, since in that case verify is
required.
This commit clears the ZIO_CHECKSUM_VERIFY flag before calling
zio_checksum_to_feature() using the ZIO_CHECKSUM_MASK and verifies in
zio_checksum_to_feature() that ZIO_CHECKSUM_MASK has been applied by the
caller to attempt to prevent the same bug from occurring again in the
future.
OpenZFS-issue: https://www.illumos.org/issues/6541
OpenZFS-commit: 971640e6aa
Porting notes:
This code was originally from Illumos, but I actually ported it from:
openzfsonosx/zfs@bef06e1
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tony Hutter <hutter2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee
Porting Notes:
This code is ported on top of the Illumos Crypto Framework code:
b5e030c8db
The list of porting changes includes:
- Copied module/icp/include/sha2/sha2.h directly from illumos
- Removed from module/icp/algs/sha2/sha2.c:
#pragma inline(SHA256Init, SHA384Init, SHA512Init)
- Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since
it now takes in an extra parameter.
- Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c
- Added skein & edonr to libicp/Makefile.am
- Added sha512.S. It was generated from sha512-x86_64.pl in Illumos.
- Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument.
- In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section
to not #include the non-existant endian.h.
- In skein_test.c, renane NULL to 0 in "no test vector" array entries to get
around a compiler warning.
- Fixup test files:
- Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>,
- Remove <note.h> and define NOTE() as NOP.
- Define u_longlong_t
- Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p"
- Rename NULL to 0 in "no test vector" array entries to get around a
compiler warning.
- Remove "for isa in $($ISAINFO); do" stuff
- Add/update Makefiles
- Add some userspace headers like stdio.h/stdlib.h in places of
sys/types.h.
- EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules.
- Update scripts/zfs2zol-patch.sed
- include <sys/sha2.h> in sha2_impl.h
- Add sha2.h to include/sys/Makefile.am
- Add skein and edonr dirs to icp Makefile
- Add new checksums to zpool_get.cfg
- Move checksum switch block from zfs_secpolicy_setprop() to
zfs_check_settable()
- Fix -Wuninitialized error in edonr_byteorder.h on PPC
- Fix stack frame size errors on ARM32
- Don't unroll loops in Skein on 32-bit to save stack space
- Add memory barriers in sha2.c on 32-bit to save stack space
- Add filetest_001_pos.ksh checksum sanity test
- Add option to write psudorandom data in file_write utility
All users of fletcher4 methods must call `fletcher_4_init()/_fini()`
There's no benchmarking overhead when called from user-space.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
This re-use the framework established for SSE2, SSSE3 and
AVX2. However, GCC is using FP registers on Aarch64, so
unlike SSE/AVX2 we can't rely on the registers being left alone
between ASM statements. So instead, the NEON code uses
C variables and GCC extended ASM syntax. Note that since
the kernel explicitly disable vector registers, they
have to be locally re-enabled explicitly.
As we use the variable's number to define the symbolic
name, and GCC won't allow duplicate symbolic names,
numbers have to be unique. Even when the code is not
going to be used (e.g. the case for 4 registers when
using the macro with only 2). Only the actually used
variables should be declared, otherwise the build
will fails in debug mode.
This requires the replacement of the XOR(X,X) syntax
by a new ZERO(X) macro, which does the same thing but
without repeating the argument. And perhaps someday
there will be a machine where there is a more efficient
way to zero a register than XOR with itself. This affects
scalar, SSE2, SSSE3 and AVX2 as they need the new macro.
It's possible to write faster implementations (different
scheduling, different unrolling, interleaving NEON and
scalar, ...) for various cores, but this one has the
advantage of fitting in the current state of the code,
and thus is likely easier to review/check/merge.
The only difference between aarch64-neon and aarch64-neonx2
is that aarch64-neonx2 unroll some functions some more.
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Romain Dolbeau <romain.dolbeau@atos.net>
Closes#4801
The error message in zpool_vdev_remove() said top-level devices
could be removed, but that has never been true.
Reported-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes#4506Closes#5213
In the default case the function must return to avoid dereferencing
'prov_mech' which will be NULL.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: candychencan <chen.can2@zte.com.cn>
Closes#5134
coverity scan CID:147531,type: Argument cannot be negative
- may copy data with negative size
coverity scan CID:147532,type: resource leaks
- may close a fd which is negative
coverity scan CID:147533,type: resource leaks
- may call pwrite64 with a negative size
coverity scan CID:147535,type: resource leaks
- may call fdopen with a negative fd
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5176
coverity scan CID:147536, type: Argument cannot be negative
- may write or close fd which is negative
coverity scan CID:147537, type: Argument cannot be negative
- may call dup2 with a negative fd
coverity scan CID:147538, type: Argument cannot be negative
- may read or fchown with a negative fd
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: GeLiXin <ge.lixin@zte.com.cn>
Closes#5185
When timeout is specified (-t), stop worker threads in the middle of work units.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Issue #5180Closes#5190
Cppcheck 1.63 erroneously complains about an uninitialized value
in buf_init(). Newer versions of cppcheck (1.72) handle this
correctly but we'll initialize the value anyway to silence the
warning.
Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5203
These test cases fail some percentage of the time resulting
in automated testing failures. Disable the offending tests
until they can be made reliable.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5201
Issue #5202Closes#5194
Add a script to change file names when upstreaming to OpenZFS/illumos.
Reviewed-by: Reviewed by: Prashanth Sreenivasa <prashksp@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matt Ahrens <mahrens@delphix.com>
Closes#5178
Avoid calculating (1<<64) if lh_prefix_len == 0. Semantics of the method remain
the same.
Assert (lh_prefix_len > 0) in zap_expand_leaf() to detect possibly the same
problem.
Issue #4883
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Explicitly promote variables to correct type. Undefined behavior is
reported because length of int is not well defined by C standard.
Issue #4883
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Undefined operation is reported by running ztest (or zloop) compiled with GCC
UndefinedBehaviorSanitizer. Error only happens on top level of dnode indirection
with large enough offset values. Logically, left shift operation would work,
but bit shift semantics in C, and limitation of uint64_t, do not produce desired
result.
Issue #5059, #4883
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes#5181
Because the macro ZFS_MAXPROPLEN used in function print_dataset
differs between platforms set it appropriately and calculate the expected
number of passes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes#5154
Enable ro_props_001_pos and onoffs_001_pos which pass reliably.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: ChaoyuZhang <zhang.chaoyu@zte.com.cn>
Closes#5183
The default blocksize in Linux is 1024 due to a GNU-ism. Setting the
expected blocksize resolves the issue. As mentioned in the PR an
alternate solution would be to set POSIXLY_CORRECT=1.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: yuxiang <guo.yong33@zte.com.cn>
Closes#5167
Refactor the code in such a way so that inode->i_mode is being set
at the same time zp->z_mode is being changed. This has the effect of
keeping both in sync without relying on zfs_inode_update.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Closes#5158
In arc_state_fini() the `arc_l2c_only->arcs_list[*]` multilists
must be destroyed. This accidentally regressed in d3c2ae1c.
Reviewed by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5151Closes#5152