In illumos-gate, `zvol_read` and `zvol_write` are both passed uio_t
rather than bio_t. Since we are translating from bio to uio for both, we
might as well unify the logic and have code more similar to its illumos
counterpart. At the same time, we can fix some regressions that occurred
versus the original code from illumos-gate.
We refactor zvol_write to take uio and also correct the
following problems:
1. We did `dnode_hold()` on each IO when we already had a hold.
2. We would attempt to send writes that exceeded `DMU_MAX_ACCESS` to the
DMU.
3. We could call `zil_commit()` twice. In this case, this is because
Linux uses the `->write` function to send flushes and can aggregate the
flush with a write. If a synchronous write occurred with the flush, we
effectively flushed twice when there is no need to do that.
zvol_read also suffers from the first two problems. Other platforms
suffer from the first, so we leave that for a second patch so that there
is a discrete patch for them to cherry-pick.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#4316
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#4251
When using large blocks like 1M, there will be more than UINT16_MAX qwords in
one block, so this ASSERT would go off. Also, it is possible for the histogram
to overflow. We cap them to UINT16_MAX to prevent this.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4257
Perf profiling of dd on a zvol revealed that my system spent 3.16% of
its time in random_get_pseudo_bytes(). No SPL consumers need
cryptographic strength entropy, so we can reduce our overhead by
changing the implementation to utilize a fast PRNG.
The Linux kernel did not export a suitable PRNG function until it
exported get_random_int() in Linux 3.10. While we could implement an
autotools check so that we use it when it is available or even try to
access the symbol on older kernels where it is not exported using the
fact that it is exported on newer ones as justification, we can instead
implement our own pseudo-random data generator. For this purpose, I have
written one based on a 128-bit pseudo-random number generator proposed
in a paper by Sebastiano Vigna that itself was based on work by the late
George Marsaglia.
http://vigna.di.unimi.it/ftp/papers/xorshiftplus.pdf
Profiling the same benchmark with an earlier variant of this patch that
used a slightly different generator (roughly same number of
instructions) by the same author showed that time spent in
random_get_pseudo_bytes() dropped to 0.06%. That is a factor of 50
improvement. This particular generator algorithm is also well known to
be fast:
http://xorshift.di.unimi.it/#speed
The benchmark numbers there state that it runs at 1.12ns/64-bits or 7.14
GBps of throughput on an Intel Core i7-4770 in what is presumably a
single-threaded context. Using it in `random_get_pseudo_bytes()` in the
manner I have will probably not reach that level of performance, but it
should be fairly high and many times higher than the Linux
`get_random_bytes()` function that we use now, which runs at 16.3 MB/s
on my Intel Xeon E3-1276v3 processor when measured by using dd on
/dev/urandom.
Also, putting this generator's seed into per-CPU variables allows us to
eliminate overhead from both spin locks and CPU memory barriers, which
is NUMA friendly.
We could have alternatively modified consumers to use something like
`gethrtime() % 3` as suggested by both Matthew Ahrens and Tim Chase, but
that has a few potential problems that this approach avoids:
1. Switching to `gethrtime() % 3` in hot code paths today requires
diverging from illumos-gate and does nothing about potential future
patches from illumos-gate that call our slow `random_get_pseudo_bytes()`
in different hot code paths. Reimplementing `random_get_pseudo_bytes()`
with a per-CPU PRNG avoids both of those things entirely, which means
less work for us in the future.
2. Looking at the code that implements `gethrtime()`, I think it is
unlikely to be faster than this per-CPU PRNG implementation of
`random_get_pseudo_bytes()`. It would be best to go with something fast
now so that there is no point in revisiting this from a performance
perspective.
3. `gethrtime() % 3` can vary in behavior from system to system based on
kernel version, architecture and clock source. In comparison, this
per-CPU PRNG is about ~40 lines of code in `random_get_pseudo_bytes()`
that should behave consistently across all systems regardless of kernel
version, system architecture or machine clock source. It is unlikely
that we would ever need to revisit this per-CPU PRNG while the same
cannot be said for `gethrtime() % 3`.
4. `gethrtime()` uses CPU memory barriers and maybe atomic instructions
depending on the clock source, so replacing `random_get_pseudo_bytes()`
with `gethrtime()` in hot code paths could still require a future person
working on NUMA scalability to reimplement it anyway while this per-CPU
PRNG would not by virtue of using neither CPU memory barriers nor atomic
instructions. Note that I did not check various clock sources for the
presence of atomic instructions. There is simply too much code to read
and given the drawbacks versus this per-cpu PRNG, there is no point in
being certain.
5. I have heard of instances where poor quality pseudo-random numbers
caused problems for HPC code in ways that took more than a year to
identify and were remedied by switching to a higher quality source of
pseudo-random numbers. While filesystems are different than HPC code, I
do not think it is impossible for us to have instances where poor
quality pseudo-random numbers can cause problems. Opting for a well
studied PRNG algorithm that passes tests for statistical randomness over
changing callers to use `gethrtime() % 3` bypasses the need to think
about both whether poor quality pseudo-random numbers can cause problems
and the statistical quality of numbers from `gethrtime() % 3`.
6. `gethrtime()` calls `getrawmonotonic()`, which uses seqlocks. This is
probably not a huge issue, but anyone using kgdb would never be able to
step through a seqlock critical section, which is not a problem either
now or with the per-CPU PRNG:
https://en.wikipedia.org/wiki/Seqlock
The only downside that I can see is that this code's memory requirement
is O(N) where N is NR_CPUS, versus the current code and `gethrtime() %
3`, which are O(1), but that should not be a problem. The seeds will use
64KB of memory at the high end (i.e `NR_CPU == 4096`) and 16 bytes of
memory at the low end (i.e. `NR_CPU == 1`). In either case, we should
only use a few hundred bytes of code for text, especially since
`spl_rand_jump()` should be inlined into `spl_random_init()`, which
should be removed during early boot as part of "Freeing unused kernel
memory". In either case, the memory requirements are minuscule.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#372
When extracting tokens from the string strtok(2) is allowed to modify
the passed buffer. Therefore the zfs_strcmp_pathname() function must
make a copy of the passed string before passing it to strtok(3).
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@intel.com>
Closes#4312
5809 Blowaway full receive in v1 pool causes kernel panic
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/5809https://github.com/illumos/illumos-gate/commit/f40b29c
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
5767 fix several problems with zfs test suite
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/5767https://github.com/illumos/illumos-gate/commit/52244c0
Porting Notes:
- Only the updates to zpool_main.c were kept because the ZFS test
suite is not currently part of the ZoL source tree. The test
suite itself should be updated to include the latest versions
of the tests once we're running it for every commit
- Fixes `zpool list` output.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
This patch add a module parameter spl_taskq_kick. When writing non-zero value
to it, it will scan all the taskq, if a taskq contains a task pending for more
than 5 seconds, it will be forced to spawn a new thread. This is use as an
emergency recovery from deadlock, not a general solution.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#529
Currently, only the 'b' flag takes an argument which is an offset into
the block at which a blkptr should be decoded. The index into the flag
string needed to be updated after parsing an argument.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4304
6537 Panic on zpool scrub with DEBUG kernel
Reviewed by: Steve Gonczi <gonczi@comcast.net>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
References:
https://www.illumos.org/issues/6537https://github.com/illumos/illumos-gate/commit/8c04a1f
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
6096 ZFS_SMB_ACL_RENAME needs to cleanup better
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
References:
https://www.illumos.org/issues/6096https://github.com/illumos/illumos-gate/commit/8f5190a5
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
6450 scrub/resilver unnecessarily traverses snapshots created
after the scrub started
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
References:
https://www.illumos.org/issues/6450https://github.com/illumos/illumos-gate/commit/38d6103
Ported-by: kernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
For a Case Insensitive file system we must avoid creating negative
entries in the dentry cache. We must also pass the FIGNORECASE into
zfs_lookup so that special files are handled correctly.
We must also prevent negative dentries from being created when files are
unlinked.
Tested by running fsstress from LTP (10 loops, 10 processes, 10,000 ops.)
Also tested with printks (now removed) to ensure that lookups come to
zpl_lookup when negative should not exist.
Tests:
1. ls Some-file.txt; touch some-file.txt; ls Some-file.txt
and ensure no errors.
2. touch Some-file.txt; rm some-file.txt; ls Some-file.txt
and ensure that the last ls shows log messages showing the lookup
went all the way to zpl_lookup.
Thanks to tuxoko for helping me get this correct.
Signed-off-by: Richard Sharpe <realrichardsharpe@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4243
6527 Possible access beyond end of string in zpool comment
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/6527https://github.com/illumos/illumos-gate/commit/2bd7a8d
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
References:
https://www.illumos.org/issues/6495https://github.com/illumos/illumos-gate/commit/2bad225
Ported-by: Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
6414 vdev_config_sync could be simpler
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
References:
https://www.illumos.org/issues/6414https://github.com/illumos/illumos-gate/commit/eb5bb58
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reintroduce a slightly adapted version of the Illumos logic for
synchronous unlinks. The basic idea here is that only files
smaller than zfs_delete_blocks (20480) blocks should be deleted
synchronously. Unlinking larger files should be handled
asynchronously to minimize impact to the caller.
To accomplish this iput() which is responsible for calling
zfs_znode_delete() on Linux is only called in the delete_now
path. Otherwise zfs_async_iput() is used which allows the
last reference to be dropped by a taskq thread effectively
making the removal asynchronous.
Porting notes:
- Add zfs_delete_blocks module option for performance analysis.
The default value is DMU_MAX_DELETEBLKCNT which is the same
as upstream. Reducing this value means that smaller files
will be unlinked asynchronously like large files.
- All occurrences of zfsvfs changes to zsb.
Ported-by: KernelOfTruth kerneloftruth@gmail.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update copy-builtin so it may be run multiple times against
the kernel source tree. This change makes sed more discriminating
to ensure spl/ only occurs once in core-y.
Signed-off-by: Chip Parker <aparker@enthought.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#526
mount.zfs is called by convention (and util-linux) with arguments
last, i.e.
% mount.zfs <dataset> <mountpoint> -o <options>
This is not a problem on glibc since GNU getopt(3) will reorder the
arguments. However, alternative libc such as musl libc (or glibc with
$POSIXLY_CORRECT set) will not permute argv and fail to parse the -o
<options>. Use getopt_long so musl will permute arguments.
Signed-off-by: Christian Neukirchen <chneukirchen@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4222
Since it's set to arc_c_max / 2, it must be set after arc_c_max is set.
Also added protection against it falling below 2 * maxblocksize in
userland builds.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4268
Adjusting arc_c directly is racy because it can happen in the context
of multiple threads. It should always be >= 2 * maxblocksize. Set it
to a known valid value rather than adjusting it directly.
In addition refactor arc_shrink() to a simpler structure, protect against
underflow in the calculation of the new arc_c value.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reverts: 935434efCloses: #3904Closes: #4161
Previous commit be29e6a updated kobj_read_file() so it no longer
unconditionally passes RLIM64_INFINITY. The vn_rdwr() function
needs to be updated accordingly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #513
LLVM's static analyzer showed that we could subtract using an
uninitialized value on an error from vn_rdwr().
The correct behavior is to return -1 on an error, so lets do that
instead.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4104
I noticed that the SPL implementation of kobj_read_file is not correct
after comparing it with the userland implementation of kobj_read_file()
in zfsonlinux/zfs#4104.
Note that we no longer pass RLIM64_INFINITY with this, but our vn_rdwr
implementation did not support it anyway, so there is no difference.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#513
1778 Assertion failed: rn->rn_nozpool == B_FALSE
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
References:
https://www.illumos.org/issues/1778https://github.com/illumos/illumos-gate/commit/bd0f709
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
5518 Memory leaks in libzfs import implementation
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Serghei Samsi <sscdvp@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5518https://github.com/illumos/illumos-gate/commit/078266a
Porting notes:
- One hunk of this change was already applied independently in
commit 4def05f.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
6815179 zpool import with a large number of LUNs is too slow
6844191 zpool import, scanning of disks should be multi-threaded
References:
https://github.com/illumos/illumos-gate/commit/4f67d75
Porting notes:
- This change was originally never ported to Linux due to it
dependence on the thread pool interface. This patch solves
that issue by switching the code to use the existing taskq
implementation which provides the same basic functionality.
However, in order for this to work properly thread_init()
and thread_fini() must be called around to taskq consumer
to perform the needed thread initialization.
- The check_one_slice, nozpool_all_slices, and check_slices
functions have been disabled for Linux. They are difficult,
but possible, to implement for Linux due to how partitions
are get names. Since this is only an optimization this code
can be added at a latter date.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
4950 files sometimes can't be removed from a full filesystem
Reviewed by: Adam Leventhal <adam.leventhal@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/4950https://github.com/illumos/illumos-gate/commit/4bb7380
Porting notes:
- ZoL currently does not log discards to zvols, so the portion of
this patch that modifies the discard logging to mark it as
freeing space has been discarded.
2. may_delete_now had been removed from zfs_remove() in ZoL.
It has been reintroduced.
3. We do not try to emulate vnodes, so the following lines are
not valid on Linux:
mutex_enter(&vp->v_lock);
may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp);
mutex_exit(&vp->v_lock);
This has been replaced with:
mutex_enter(&zp->z_lock);
may_delete_now = atomic_read(&ip->i_count) == 1 && !(zp->z_is_mapped);
mutex_exit(&zp->z_lock);
Ported-by: Richard Yao <richard.yao@clusterhq.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Correct the redhat specfile so that working debuginfo rpms are created
for the kernel modules. The generic specfile already does the right
thing.
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closeszfsonlinux/zfs#4224
Correct the redhat specfile so that working debuginfo rpms are created
for the kernel modules. The generic specfile already does the right
thing.
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4224
Check if the lock is held while holding the z_hold_locks() lock.
This prevents a possible use-after-free bug for callers which are
not holding the lock. There currently are no such callers so this
can't cause a problem today but it has been fixed regardless.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#4244
Issue #4124
To prevent taskq_member holding tq_lock and doing linear search, thus causing
contention. We store the taskq pointer to which the thread belongs in tsd.
This way taskq_member will not need to touch tq_lock, and tsd has per slot
spinlock. So the contention should be reduced greatly.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#500Closes#504Closes#505
The pfn_t typedef was inherited from Illumos but never directly
used by any SPL consumers. This didn't cause any issues until
the Linux 4.5 kernel introduced a typedef of the same name.
See torvalds/linux/commit/34c0fd54, this patch removes the
unused Illumos version to prevent a conflict.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#524
In b4ad50a, we abandoned memalloc_noio_save in favor of spl_fstrans_mark
because earlier kernel with it doesn't turn off __GFP_FS. However, for newer
kernel, we would prefer PF_MEMALLOC_NOIO because it would work for allocation
in kernel which we cannot control otherwise. So in this patch, we turn on both
PF_FSTRANS and PF_MEMALLOC_NOIO in spl_fstrans_mark.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#523
The pfn_t typedef was inherited from Illumos but never directly
used by any libspl consumers. This doesn't cause any issues in
user space but for consistency with the kernel build it has been
removed. See torvalds/linux/commit/34c0fd54.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
The registered xattr .list handler was simplified in the 4.5 kernel
to only perform a permission check. Given a dentry for the file it
must return a boolean indicating if the name is visible. This
differs slightly from the previous APIs which also required the
function to copy the name in to the provided list and return its
size. That is now all the responsibility of the caller.
This should be straight forward change to make to ZoL since we've
always required the caller to make the copy. However, this was
slightly complicated by the need to support 3 older APIs. Yes,
between 2.6.32 and 4.5 there are 4 versions of this interface!
Therefore, while the functional change in this patch is small it
includes significant cleanup to make the code understandable and
maintainable. These changes include:
- Improved configure checks for .list, .get, and .set interfaces.
- Interfaces checked from newest to oldest.
- Strict checking for each possible known interface.
- Configure fails when no known interface is available.
- HAVE_*_XATTR_LIST renamed HAVE_XATTR_LIST_* for consistency
with similar iops and fops configure checks.
- POSIX_ACL_XATTR_{DEFAULT|ACCESS} were removed forcing callers to
move to their replacements, XATTR_NAME_POSIX_ACL_{DEFAULT|ACCESS}.
Compatibility wrapper were added for old kernels.
- ZPL_XATTR_LIST_WRAPPER added which behaves the same as the existing
ZPL_XATTR_{GET|SET} WRAPPERs. Only the inode is guaranteed to be
a valid pointer, passing NULL for the 'list' and 'name' variables
is allowed and must be checked for. All .list functions were
updated to use the wrapper to aid readability.
- zpl_xattr_filldir() updated to use the .list function for its
permission check which is consistent with the updated Linux 4.5
interface. If a .list function is registered it should return 0
to indicate a name should be skipped, if there is no registered
function the name will be added.
- Additional documentation from xattr(7) describing the correct
behavior for each namespace was added before the relevant handlers.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
The follow_link() interface was retired in favor of get_link().
In the process of phasing in get_link() the Linux kernel went
through two different versions. The first of which depended
on put_link() and the final version on a delayed done function.
- Improved configure checks for .follow_link, .get_link, .put_link.
- Interfaces checked from newest to oldest.
- Strict checking for each possible known interface.
- Configure fails when no known interface is available.
- Both versions .get_link are detected and supported as well
two previous versions of .follow_link.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
5045 use atomic_{inc,dec}_* instead of atomic_add_*
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Robert Mustacchi <rm@joyent.com>
References:
https://www.illumos.org/issues/5045https://github.com/illumos/illumos-gate/commit/1a5e258
Porting notes:
- All changes to non-ZFS files dropped.
- Changes to zfs_vfsops.c dropped because they were Illumos specific.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4220
4953 zfs rename <snapshot> need not involve libshare
4954 "zfs create" need not involve libshare if we are not sharing
4955 libshare's get_zfs_dataset need not sort the datasets
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
https://www.illumos.org/issues/4953https://www.illumos.org/issues/4954https://www.illumos.org/issues/4955https://github.com/illumos/illumos-gate/commit/33cde0d
Porting notes:
- Dropped qsort libshare_zfs.c hunk, no equivalent ZoL code.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4219
4039 zfs_rename()/zfs_link() needs stronger test for XDEV
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
References:
https://www.illumos.org/issues/4039https://github.com/illumos/illumos-gate/commit/18e6497
Porting notes:
- This check was updated in Linux in a similar fashion early on in
the port. Therefore, this patch just reorders the function and
updates the comment so it flows the same way as the upstream code.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4218
6298 zfs_create_008_neg and zpool_create_023_neg need to be updated
for large block support
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
References:
https://www.illumos.org/issues/6298https://github.com/illumos/illumos-gate/commit/e9316f7
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4217
3557 dumpvp_size is not updated correctly when a dump zvol's size is changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
3560 dumpadm should be able to remove a dump device
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Albert Lee <trisk@nexenta.com>
References:
https://www.illumos.org/issues/3559https://github.com/illumos/illumos-gate/commit/c61ea56
Porting notes:
- Internal zvol.c changes not applied due to implementation differences.
The external interface and behavior was already consistent with the
latest upstream code.
- Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All
supported kernels (2.6.32 and newer) provide this interface.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4217
When replacing an xattr would cause overflowing in SA, we would fallback
to xattr dir. However, current implementation don't clear the one in SA,
so we would end up with duplicated SA.
For example, running the following script on an xattr=sa filesystem
would cause duplicated "user.1".
-- dup_xattr.sh begin --
randbase64()
{
dd if=/dev/urandom bs=1 count=$1 2>/dev/null | openssl enc -a -A
}
file=$1
touch $file
setfattr -h -n user.1 -v `randbase64 5000` $file
setfattr -h -n user.2 -v `randbase64 20000` $file
setfattr -h -n user.3 -v `randbase64 20000` $file
setfattr -h -n user.1 -v `randbase64 20000` $file
getfattr -m. -d $file
-- dup_xattr.sh end --
Also, when a filesystem is switch from xattr=sa to xattr=on, it will
never modify those in SA. This would cause strange behavior like, you
cannot delete an xattr, or setxattr would cause duplicate and the result
would not match when you getxattr.
For example, the following shell sequence.
-- shell begin --
$ sudo zfs set xattr=sa pp/fs0
$ touch zzz
$ setfattr -n user.test -v asdf zzz
$ sudo zfs set xattr=on pp/fs0
$ setfattr -x user.test zzz
setfattr: zzz: No such attribute
$ getfattr -d zzz
user.test="asdf"
$ setfattr -n user.test -v zxcv zzz
$ getfattr -d zzz
user.test="asdf"
user.test="asdf"
-- shell end --
We fix this behavior, by first finding where the xattr resides before
setxattr. Then, after we successfully updated the xattr in one location,
we will clear the other location. Note that, because update and clear
are not in single tx, we could still end up with duplicated xattr. But
by doing setxattr again, it can be fixed.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#3472Closes#4153
The fast write mutex is intended to protect accounting, but it is
redundant because all accounting is performed through atomic operations.
It also serializes all metaslab IO behind a mutex, which introduces a
theoretical scaling regression that the Illumos developers did not like
when we showed this to them. Removing it makes the selection of the
metaslab_group lock free as it is on Illumos. The selection is not quite
the same without the lock because the loop races with IO completions,
but any imbalances caused by this are likely to be corrected by
subsequent metaslab group selections.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3643