ZoL had lowered the minimum ARC size to 4MiB to better accommodate tiny
systems such as the raspberry pi, however, as of addition of large block
support, the arc_adapt() function depends on arc_c being >= 32MiB (2 *
SPA_MAXBLOCKSIZE).
This patch raises the minimum ARC size to 32MiB and adds a VERIFY test
to arc_adapt() for future-proofing.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
As described in the comment above arc_adapt_thread() it is critical
that the arc_adapt_thread() function never sleep while holding a hash
lock. This behavior was possible in the Linux implementation because
the arc_prune() logic was implemented to be synchronous. Under
illumos the analogous dnlc_reduce_cache() function is asynchronous.
To address this the arc_do_user_prune() function is has been reworked
in to two new functions as follows:
* arc_prune_async() is an asynchronous implementation which dispatches
the prune callback to be run by the system taskq. This makes it
suitable to use in the context of the arc_adapt_thread().
* arc_prune() is a synchronous implementation which depends on the
arc_prune_async() implementation but blocks until the outstanding
callbacks complete. This is used in arc_kmem_reap_now() where it
is safe, and expected, that memory will be freed.
This patch additionally adds the zfs_arc_meta_strategy module option
while allows the meta reclaim strategy to be configured. It defaults
to a balanced strategy which has been proved to work well under Linux
but the illumos meta-only strategy can be enabled.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Replace taskq_wait() with taskq_wait_oustanding(). This way callers
will only block until previously submitted tasks have been completed.
This was the previous behavior of task_wait() prior to the introduction
of taskq_wait_outstanding() so this isn't really a functionalty change
for these callers.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
SPL commit behlendorf/spl@9cef1b5 adds the taskq_wait_outstanding()
interface. See the commit log for the full justification for this
addition. This patch adds the required user space counterpart.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Porting notes and other significant code changes:
The illumos 5368 patch (ARC should cache more metadata), which
was never picked up by ZoL, is mostly reverted by this patch.
Since ZoL relies on the kernel asynchronously calling the shrinker to
actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every
time it runs.
The arc_adapt_thread() function no longer calls arc_do_user_evicts()
since the newly-added arc_user_evicts_thread() calls it periodically.
Notable conflicting ZoL commits which conflicted with this patch or
whose effects are either duplicated or un-done by this patch:
302f753 - Integrate ARC more tightly with Linux
39e055c - Adjust arc_p based on "bytes" in arc_shrink
f521ce1 - Allow "arc_p" to drop to zero or grow to "arc_c"
77765b5 - Remove "arc_meta_used" from arc_adjust calculation
94520ca - Prune metadata from ghost lists in arc_adjust_meta
Trace support for multilist_insert() and multilist_remove() has been
added and produces the following output:
fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 }
fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 }
The following arcstats have been removed:
recycle_miss - Used by arcstat.py and arc_summary.py, both of which
have been updated appropriately.
l2_writes_hdr_miss
The following arcstats have been added:
evict_not_enough - Number of times arc_evict_state() was unable to
evict enough buffers to reach its target amount.
evict_l2_skip - Number of times arc_evict_hdr() skipped eviction
because it was being written to the l2arc.
l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times
l2arc_write_done() failed to acquire hash_lock (and re-tries).
arc_meta_min - Shows the value of the zfs_arc_meta_min module
parameter (see below).
The "index" column of the "dbuf" kstat has been removed since it doesn't
have a direct analog in the new multilist scheme. Additional multilist-
related stats could be added in the future but would likely require
extensions to the mulilist API.
The following module parameters have been added:
zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list
before moving on to the next sub-list.
zfs_arc_meta_min - Enforce a floor on the amount of metadata in
the ARC.
zfs_arc_num_sublists_per_state - Number of multilist sub-lists per
ARC state.
zfs_arc_overflow_shift - Controls amount by which the ARC must exceed
the target size to be considered "overflowing".
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov
5408 managing ZFS cache devices requires lots of RAM
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Porting notes:
Due to the restructuring of the ARC-related structures, this
patch conflicts with at least the following existing ZoL commits:
6e1d7276c9
Fix inaccurate arcstat_l2_hdr_size calculations
The ARC_SPACE_HDRS constant no longer exists and has been
somewhat equivalently replaced by HDR_L2ONLY_SIZE.
e0b0ca983d
Add visibility in to cached dbufs
The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr
struct requires additional structure member names to be used
when referencing the inner items. Also, the presence of L1 or L2
inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR
macros.
Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
5369 arc flags should be an enum
5370 consistent arc_buf_hdr_t naming scheme
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Porting notes:
ZoL has moved some ARC definitions into arc_impl.h.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported by: Tim Chase <tim@chase2k.com>
This reverts only the l2arc_hdr part of commit
ecf3d9b8e6 in preparation for the illumos
5497 "lock contention on arcs_mtx" patch which does the same thing
but uses the newer two-level ARC structure following the Illumos 5408
"managing ZFS cache devices requires lots of RAM" patch.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Illumos 5497 "lock contention on arcs_mtx" reworks eviction and obviates
the need for this.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This reverts commit 037763e44e in
preparation for the illumos 5497 "lock contention on arcs_mtx" patch
which includes a fix for this very problem.
ZoL had picked up a subset of the illumos 5497 patch to deal with the
l2arc compression buffer leak.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This reverts commit 16fcdea363 in preparation
for the illumos 5497 "lock contention on arcs_mtx" patch which eliminates
"marker" within the ARC code.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Commit c3520e7 restructured vdev_add_child() in such a way that
the spa variable was unused during non-debug builds. This is
consistent with the upstream illumos code but because ZoL, unlike
illumos, is built with all compiler warnings enabled this causes
a legitimate warning. Revert this hunk of the patch to keep the
build clean.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3432
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Albert Lee <trisk@omniti.com>
References:
https://www.illumos.org/issues/5818https://github.com/illumos/illumos-gate/commit/81cd5c5
Ported-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3432
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5269https://github.com/illumos/illumos-gate/commit/12380e1e
Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3396
* Add information about the 'zpool events' command in zpool(8).
* More events and payloads defined in zfs-events(5).
* I/O Stages and I/O Flags sections added.
* Remove unused legacy "zio_deadline" payload define.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3467
The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus
outside of dn_struct_rwlock. If the dnode is being freed then the bonus
dbuf may be in the process of getting evicted. In this case there is a
race that may cause dmu_objset_userquota_get_ids() to access the dbuf
after it has been destroyed. To prevent this, ensure that when we are
using the bonus dbuf we are either holding a reference on it or have
taken dn_struct_rwlock.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3443
- Don't check db->bb_blkid, but use the blkid argument instead.
Checking db->db_blkid may be unsafe since we doesn't yet have a
hold on the dbuf so its validity is unknown.
- Call mutex_exit() on found_db, not db, since it's not certain that
they point to the same dbuf, and the mutex was taken on found_db.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3443
* Change the order of the function library check/load.
Redhat based system _can_ have a /lib/lsb/init-functions file (from
the redhat-lsb-core package), but it's only partially what we can use.
Instead, look for that file last, giving the script a chance to catch
the 'real' distribution file.
* Filter out dashes and dots in dataset name in read_mtab().
* Get rid of 'awk' entirely. This is usually in /usr, which might not
be availible.
* Get rid of the 'find /dev/disk/by-*' (find is on /usr, which might not
be availible). Instead use echo in a for loop.
* Rebuild scripts if any of the *.in files changed.
* Move the sed part that filters out duplicates inside the check fo
valid variable.
Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3463Closes#3457
* Based on the init scripts included with Debian GNU/Linux, then take code
from the already existing ones, trying to merge them into one set of
scripts that will work for 'everyone' for better maintainability.
* Add configurable variables to control the workings of the init scripts:
* ZFS_INITRD_PRE_MOUNTROOT_SLEEP
Set a sleep time before we load the module (used primarily by initrd
scripts to allow for slower media (such as USB devices etc) to be
availible before we load the zfs module).
* ZFS_INITRD_POST_MODPROBE_SLEEP
Set a timed sleep in the initrd to after the load of the zfs module.
* ZFS_INITRD_ADDITIONAL_DATASETS
To allow for mounting additional datasets in the initrd. Primarily used
in initrd scripts to allow for when filesystem needed to boot (such as
/usr, /opt, /var etc) isn't directly under the root dataset.
* ZFS_POOL_EXCEPTIONS
Exclude pools from being imported (in the initrd and/or init scripts).
* ZFS_DKMS_ENABLE_DEBUG, ZFS_DKMS_ENABLE_DEBUG_DMU_TX, ZFS_DKMS_DISABLE_STRIP
Set to control how dkms should build the dkms packages.
* ZPOOL_IMPORT_PATH
Set path(s) where "zpool import" should import pools from.
This was previously the job of "USE_DISK_BY_ID" (which is still used
for backwards compatibility) but was renamed to allow for better
control of import path(s).
* If old USE_DISK_BY_ID is set, but not new ZPOOL_IMPORT_PATH, then we
set ZPOOL_IMPORT_PATH to sane defaults just to be on the safe side.
* ZED_ARGS
To allow for local options to zed without having to change the init script.
* The import function, do_import(), imports pools by name instead of '-a'
for better control of pools to import and from where.
* If USE_DISK_BY_ID is set (for backwards compatibility), but isn't 'yes'
then ignore it.
* If pool(s) isn't found with a simple "zpool import" (seen it happen),
try looking for them in /dev/disk/by-id (if it exists). Any duplicates
(pools found with both commands) is filtered out.
* IF we have found extra pool(s) this way, we must force USE_DISK_BY_ID
so that the first, simple "zpool import $pool" is able to find it.
* Fallback on importing the pool using the cache file (if it exists) only
if 'simple' import (either with ZPOOL_IMPORT_PATH or the 'built in'
defaults) didn't work.
* The export function, do_export(), will export all pools imported, EXCEPT
the root pool (if there is one).
* ZED script from the Debian GNU/Linux packages added.
* Refreshed ZED init script from behlendorf@5e7a660 to be portable so it
may be used on both LSB and Redhat style systems.
* If there is no pool(s) imported and zed successfully shut down, we will
unload the zfs modules.
* The function library file for the ZoL init script is installed as
/etc/init.d/zfs-functions.
* The four init scripts, the /etc/{defaults,sysconfig,conf.d}/zfs config file
as well as the common function library is tagged as '%config(noreplace)' in
the rpm rules file to make sure they are not replaced automatically if locally
modifed.
* Pitfals and workarounds:
* If we're running from init, remove stale /etc/dfs/sharetab before importing
pools in the zfs-import init script.
* On Debian GNU/Linux, there's a 'sendsigs' script that will kill basically
everything quite early in the shutdown phase and zed is/should be stopped
much later than that. We don't want zed to be among the ones killed, so add
the zed pid to list of pids for 'sendsigs' to ignore.
* CentOS uses echo_success() and echo_failure() to print out status of
command. These in turn uses "echo -n \0xx[etc]" to move cursor and choose
colour etc. This doesn't work with the modified IFS variable we need to
use in zfs-import for some reason, so work around that when we define
zfs_log_{end,failure}_msg() for RedHat and derivative distributions.
* All scripts passes ShellCheck (with one false positive in do_mount()).
Signed-off-by: Turbo Fredriksson turbo@bayour.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Chris Dunlap <cdunlap@llnl.gov>
Closes#2974Closes#2107
Commit 87abfcb broke the systemd import service by treating the
ExecStart line as if it were a shell command that could be executed.
This isn't the way systemd works and the correct way to handle this
case is with ExecStartPre. This patch updates the zfs import service
files accordingly,
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes#3440
All fprintf() error messages are moved out of the libzfs_init()
library function where they never belonged in the first place. A
libzfs_error_init() function is added to provide useful error
messages for the most common causes of failure.
Additionally, in libzfs_run_process() the 'rc' variable was renamed
to 'error' for consistency with the rest of the code base.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
While module loading itself is synchronous the creation of the /dev/zfs
device is not. This is because /dev/zfs is typically created by a udev
rule after the module is registered and presented to user space through
sysfs. This small window between module loading and device creation
can result in spurious failures of libzfs_init().
This patch closes that race by extending libzfs_init() so it can detect
that the modules are loaded and only if required wait for the /dev/zfs
device to be created. This allows scripts to reliably use the following
shell construct without the need for additional error handling.
$ /sbin/modprobe zfs && /sbin/zpool import -a
To minimize the potential time waiting in libzfs_init() a strategy
similar to adaptive mutexes is employed. The function will busy-wait
for up to 10ms based on the expectation that the modules were just
loaded and therefore the /dev/zfs will be created imminently. If it
takes longer than this it will fall back to polling for up to 10 seconds.
This behavior can be customized to some degree by setting the following
new environment variables. This functionality is provided for backwards
compatibility with existing scripts which depend on the module auto-load
behavior. By default module auto-loading is now disabled.
* ZFS_MODULE_LOADING="YES|yes|ON|on" - Attempt to load modules.
* ZFS_MODULE_TIMEOUT="<seconds>" - Seconds to wait for /dev/zfs
The zfs-import-* systemd service files have been updated to call
'/sbin/modprobe zfs' so they no longer rely on the legacy auto-loading
behavior.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#2556
If the command "shellcheck" exists, then find all shell scripts and
run shellcheck on them.
* Use 'gcc' format with shellcheck.
* Exclude zfs-script-config.sh (which isn't really a script).
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3428
Prefixing an octal value with a leading zero is the standard way
to disambiguate it. This change only impacts the `zfs diff` output
and is therefore very limited in scope.
Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3417
Commit 60e9f69 added the --with-mounthelperdir option for Gentoo
and in the process accidentally modified the default installation
location. For security reasons mount(8) expects it to only be
installed under /sbin.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3426
The dbu_evict_taskq added in 0c66c32d is only invoked via
taskq_dispatch_ent(). In these cases, ZoL's implementation of taskqs
requires the entries to be initialized first with taskq_init_ent() in
order that, among other things, the embedded spinlock is initialized
properly.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3419
5243 zdb -b could be much faster
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5243https://github.com/illumos/illumos-gate/commit/f7950bf
Ported-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3414
If the pool/dataset command-line argument is specified with a trailing
slash, for example, "tank/", it is interpreted as the root dataset.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3415
There seems to be a annoying problem using NFSv4 to access ZFS
file systems under certain circumstances. It's easily reproduced:
nfs_client1: mount server:/export /mnt
nfs_client1: cd /mnt
nfs_client1: echo foo >junk
nfs_client1: cat junk
foo
Now on a different NFSv4 client:
nfs_client2: mount server:/export /mnt
nfs_client2: cd /mnt
nfs_client2: vi junk
# Make some changes to /mnt/junk and save
# This change the inode associated with /mnt/junk
Now back to the original client:
nfs_client1: cat junk
cat: junk: No such file or directory
Admittedly NFSv4 is not advertised as a cluster file system that
maintains a completely coherent view of data across multiple nodes.
But it does have some mechanisms built in that try to deal with
situations like the above. Namely, it employs specialized file
handle lookup routines that return ESTALE when a file handle contains
a non-existant inode value. The ESTALE return triggers a return
full file path lookup from the client to determine if the file has
actually gone away or if the cached file handle is no longer valid.
ZFS behavior can be brought into line with other file systems
(e.g., ext4) by applying the following patch:
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3224
Per the documentation for dnode_next_offset in dnode.c, the "txg"
parameter specifies a lower bound on which transaction the dnode can
be found in. We are interested in all dnodes that are removed between
the first and last transaction in the snapshot. It doesn't need to be
created in that snapshot to correspond to a removed file.
In fact, the behavior of zfs diff in the test case exactly matches
this: the transaction that created the data that was deleted in snapshot
"2" was produced before, in snapshot "1", definitely predating the first
transaction in snapshot "2".
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com>
Closes#2081
When ASSERTs are compiled out by using the --disable-debug configure
option. Then the local variable 'dsl_pool_t *dp' will be unused and
generate a compiler warning. Since this variable is only used once
in the ASSERT replace it with 'ds->ds_dir->dd_pool'.
This has the additional advantage of potentially saving a few bytes
on the stack depending on how gcc decides to compile the function.
This issue was not noticed immediately because the automated builders
use --enable-debug to make the testing as rigorous as possible.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes#3410
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
References:
https://www.illumos.org/issues/5765https://github.com/illumos/illumos-gate/commit/643da460
Porting notes:
* Unused variable 'recordsize' in dmu_send_estimate() dropped
Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3397
5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/5810https://github.com/illumos/illumos-gate/commit/732885fc
Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3387
5351 scrub goes for an extra second each txg
5352 scrub should pause when there is some dirty data
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5351https://github.com/illumos/illumos-gate/commit/6f6a76a
Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3383
Author: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5350https://github.com/illumos/illumos-gate/commit/e651831
Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3382
Author: Alex Reece <alex@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Albert Lee <trisk@nexenta.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5422https://github.com/illumos/illumos-gate/commit/a846f19
Ported-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3381
Commit f4af6bb783 which added support
for REQ_FAILFAST_MASK but the new autoconf test didn't use the same
preprocessor macro name as the code did.
The effect is that FAILFAST mode has not been enabled for ZoL in any
post-2.6.35 kernel.
Retire the HAVE_BIO_RW_FAILFAST interface used in pre-2.6.28 kernels.
Raise an error condition if the FAILFAST interface can't be detected.
Signed-off-by: Tim Chase <tim@onlight.com
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3386
This commit updates the copyright boilerplate within the ZED subtree.
The instructions for appending a contributor copyright line have
been removed. Manually maintaining copyright notices in this
manner is error-prone, imprecise at a file-scope granularity, and
oftentimes inaccurate. These lines can become a pernicious source of
merge conflicts. A commit log is better suited to maintaining this
information. Consequently, a line has been added to the boilerplate
to refer to the git commit log for authoritative copyright attribution.
To account for the scenario where a file may become separated from
the codebase and commit history (i.e., it is copied somewhere else),
a line has been added to identify the file's origin.
http://softwarefreedom.org/resources/2012/ManagingCopyrightInformation.html
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3384
The security and ACL operations should all be performed atomically.
To accomplish this there would need to significant invasive changes
made to the common code base. For the moment it's desirable for
compatibility reasons to avoid this. Therefore the code has been
updated to attempt to unwind the operation in case of failure
rather than panic.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2445
Improve the large block feature test coverage by extending ztest
to frequently change the recordsize. This is specificially designed
to catch corner cases which might otherwise go unnoticed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #354
5027 zfs large block support
Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5027https://github.com/illumos/illumos-gate/commit/b515258
Porting Notes:
* Included in this patch is a tiny ISP2() cleanup in zio_init() from
Illumos 5255.
* Unlike the upstream Illumos commit this patch does not impose an
arbitrary 128K block size limit on volumes. Volumes, like filesystems,
are limited by the zfs_max_recordsize=1M module option.
* By default the maximum record size is limited to 1M by the module
option zfs_max_recordsize. This value may be safely increased up to
16M which is the largest block size supported by the on-disk format.
At the moment, 1M blocks clearly offer a significant performance
improvement but the benefits of going beyond this for the majority
of workloads are less clear.
* The illumos version of this patch increased DMU_MAX_ACCESS to 32M.
This was determined not to be large enough when using 16M blocks
because the zfs_make_xattrdir() function will fail (EFBIG) when
assigning a TX. This was immediately observed under Linux because
all newly created files must have a security xattr created and
that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M.
* On 32-bit platforms a hard limit of 1M is set for blocks due
to the limited virtual address space. We should be able to relax
this one the ABD patches are merged.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#354
The umem_alloc_aligned() function should not assume that a 'void *'
type is 64-bit. It will not be on 32-bit platforms. Rather than
complicating the ASSERT to handle this it is simply removed.
Additionally, the '%lu' format specifier should not be assumed to
imply a 64-bit value. Fix this by using the 'llu' format specifier
which will always be atleast 64-bit and explicitly casing the
variable to an u_longlong_t. This issue is handled the same way
in many other parts of the code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The metaslab_min_alloc_size option is no longer used in the code.
This functionality was removed by commit f3a7f66 and the module
options should have been dropped at that time.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
With debugging enabled and depending on your kernel config, the size of
arc_buf_hdr_t can blow out the stack of arc_evict() and arc_evict_ghost()
to greater than 1024 bytes. Let's avoid this.
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3377
5349 verify that block pointer is plausible before reading
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Xin Li <delphij@FreeBSD.org>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References:
https://www.illumos.org/issues/5349https://github.com/illumos/illumos-gate/commit/f63ab3d5
Porting notes:
* Several variable declarations were moved due to C style needs
Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3373
By the time we're tearing down our superblock the VFS has started releasing
all our inodes/znodes. Some of this work may have been handed off to our
iput taskq so we need to wait for that work to complete. However the iput
from the taskq can itself result in additional work being added to the
taskq:
dsl_pool_iput_taskq
iput
iput_final
evict
destroy_inode
zpl_inode_destroy
zfs_inode_destroy
zfs_iput_async(ZTOI(zp->z_xattr_parent))
taskq_dispatch(dsl_pool_iput_taskq..., iput, ...)
Let's wait until all our znodes have been released.
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3281
5348 zio_checksum_error() only fills in info if ECKSUM
Reviewed by: Alex Reece <alex.reece@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/5348https://github.com/illumos/illumos-gate/commit/373dc1cf
Ported-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3372