OpenZFS provides a library called tpool which implements thread
pools for user space applications. Porting this library means
the zpool utility no longer needs to borrow the kernel mutex and
taskq interfaces from libzpool. This code was updated to use
the tpool library which behaves in a very similar fashion.
Porting libtpool was relatively straight forward and minimal
modifications were needed. The core changes were:
* Fully convert the library to use pthreads.
* Updated signal handling.
* lmalloc/lfree converted to calloc/free
* Implemented portable pthread_attr_clone() function.
Finally, update the build system such that libzpool.so is no
longer linked in to zfs(8), zpool(8), etc. All that is required
is libzfs to which the zcommon soures were added (which is the way
it always should have been). Removing the libzpool dependency
resulted in several build issues which needed to be resolved.
* Moved zfeature support to module/zcommon/zfeature_common.c
* Moved ratelimiting to to module/zfs/zfs_ratelimit.c
* Moved get_system_hostid() to lib/libspl/gethostid.c
* Removed use of cmn_err() in zcommon source
* Removed dprintf_setup() call from zpool_main.c and zfs_main.c
* Removed highbit() and lowbit()
* Removed unnecessary library dependencies from Makefiles
* Removed fletcher-4 kstat in user space
* Added sha2 support explicitly to libzfs
* Added highbit64() and lowbit64() to zpool_util.c
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6442
Turning the multihost property on requires that a hostid be set to allow
ZFS to determine when a foreign system is attemping to import a pool.
The error message instructing the user to set a hostid refers to
genhostid(1).
Genhostid(1) is not available on SUSE Linux. This commit adds a script
modeled after genhostid(1) for those users.
Zgenhostid checks for an /etc/hostid file; if it does not exist, it
creates one and stores a value. If the user has provided a hostid as an
argument, that value is used. Otherwise, a random hostid is generated
and stored.
This differs from the CENTOS 6/7 versions of genhostid, which overwrite
the /etc/hostid file even though their manpages state otherwise.
A man page for zgenhostid is added. The one for genhostid is in (1), but
I put zgenhostid in (8) because I believe it's more appropriate.
The mmp tests are modified to use zgenhostid to set the hostid instead
of using the spl_hostid module parameter. zgenhostid will not replace
an existing /etc/hostid file, so new mmp_clear_hostid calls are
required.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#6358Closes#6379
Add multihost=on|off pool property to control MMP. When enabled
a new thread writes uberblocks to the last slot in each label, at a
set frequency, to indicate to other hosts the pool is actively imported.
These uberblocks are the last synced uberblock with an updated
timestamp. Property defaults to off.
During tryimport, find the "best" uberblock (newest txg and timestamp)
repeatedly, checking for change in the found uberblock. Include the
results of the activity test in the config returned by tryimport.
These results are reported to user in "zpool import".
Allow the user to control the period between MMP writes, and the
duration of the activity test on import, via a new module parameter
zfs_multihost_interval. The period is specified in milliseconds. The
activity test duration is calculated from this value, and from the
mmp_delay in the "best" uberblock found initially.
Add a kstat interface to export statistics about Multiple Modifier
Protection (MMP) updates. Include the last synced txg number, the
timestamp, the delay since the last MMP update, the VDEV GUID, the VDEV
label that received the last MMP update, and the VDEV path. Abbreviated
output below.
$ cat /proc/spl/kstat/zfs/mypool/multihost
31 0 0x01 10 880 105092382393521 105144180101111
txg timestamp mmp_delay vdev_guid vdev_label vdev_path
20468 261337 250274925 68396651780 3 /dev/sda
20468 261339 252023374 6267402363293 1 /dev/sdc
20468 261340 252000858 6698080955233 1 /dev/sdx
20468 261341 251980635 783892869810 2 /dev/sdy
20468 261342 253385953 8923255792467 3 /dev/sdd
20468 261344 253336622 042125143176 0 /dev/sdab
20468 261345 253310522 1200778101278 2 /dev/sde
20468 261346 253286429 0950576198362 2 /dev/sdt
20468 261347 253261545 96209817917 3 /dev/sds
20468 261349 253238188 8555725937673 3 /dev/sdb
Add a new tunable zfs_multihost_history to specify the number of MMP
updates to store history for. By default it is set to zero meaning that
no MMP statistics are stored.
When using ztest to generate activity, for automated tests of the MMP
function, some test functions interfere with the test. For example, the
pool is exported to run zdb and then imported again. Add a new ztest
function, "-M", to alter ztest behavior to prevent this.
Add new tests to verify the new functionality. Tests provided by
Giuseppe Di Natale.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#745Closes#6279
Users can now provide their own scripts to be run
with 'zpool iostat/status -c'. User scripts should be
placed in ~/.zpool.d to be included in zpool's
default search path.
Provide a script which can be used with
'zpool iostat|status -c' that will return the type of
device (hdd, sdd, file).
Provide a script to get various values from smartctl
when using 'zpool iostat/status -c'.
Allow users to define the ZPOOL_SCRIPTS_PATH
environment variable which can be used to override
the default 'zpool iostat/status -c' search path.
Allow the ZPOOL_SCRIPTS_ENABLED environment
variable to enable or disable 'zpool status/iostat -c'
functionality.
Use the new smart script to provide the serial command.
Install /etc/sudoers.d/zfs file which contains the sudoer
rule for smartctl as a sample.
Allow 'zpool iostat/status -c' tests to run in tree.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6121Closes#6153
* events_001_pos - Verify the expected events are generated when
invoking the various zpool sub-commands. These events must
appear in `zpool event` and be consumed by the ZED.
* events_002_pos - Verify the ZED consumes events which were
generated while it wasn't running when it is started.
Additionally, verify that events are only processed once.
As part of this change the default.cfg used by the test suite
was changed to a default.cfg.in file. This was needed so the
install location of all zed scripts, not only the enabled ones,
could be reliably determined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6128
This addition will enable us to sync an open TXG to the main pool
on demand. The functionality is similar to 'sync(2)' but 'zpool sync'
will return when data has hit the main storage instead of potentially
just the ZIL as is the case with the 'sync(2)' cmd.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
Closes#6122
Authored by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Enable internal log for DEBUG builds and in zfs-tests.sh.
- callbacks/zfs_dbgmsg.ksh - Dump interal log via kstat.
- callbacks/zfs_dmesg.ksh - Dump dmesg log.
- default.cfg - 'Test Suite Specific Commands' dropped.
OpenZFS-issue: https://www.illumos.org/issues/7503
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/55a1300Closes#6002
Authored by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Porting Notes:
- Updated 'zpool labelclear' and 'zdb -l' such that they attempt
to find a vdev given solely its short name. This behavior is
consistent with the upstream OpenZFS code and the test cases
depend on it. The actual implementation differs slightly due
to device naming conventions on Linux.
- auto_online_001_pos, auto_replace_001_pos and add-o_ashift
test cases updated to expect failure when no label exists.
- read_efi_label() and zpool_label_disk_check() are read-only
operations and should use O_RDONLY at open time to enforce this.
- zpool_label_disk() and zpool_relabel_disk() write the partition
information using O_DIRECT an fsync() and page cache invalidation
to ensure a consistent view of the device.
- dump_label() in zdb should invalidate the page cache in order
to get the authoritative label from disk.
OpenZFS-issue: https://www.illumos.org/issues/6865
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/c95076cCloses#5981
Authored by: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: George Melikov <mail@gmelikov.ru>
Porting Notes:
- Utilities which aren't available under Linux have been removed.
- Because of sudo's default secure path behavior PATH must be
explicitly reset at the top of libtest.shlib. This avoids the
need for all users to customize secure path on their system.
- Updated ZoL infrastructure to manage constrained path
- Updated all test cases
- Check permissions for usergroup tests
- When testing in-tree create links under bin/
- Update fault cleanup such that missing files during
cleanup aren't fatal.
- Configure su environment with constrained path
OpenZFS-issue: https://www.illumos.org/issues/7290
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1d32ba6Closes#5903
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Background information: This assertion about tx_space_* verifies that we
are not dirtying more stuff than we thought we would. We “need” to know
how much we will dirty so that we can check if we should fail this
transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
typically less than a millisecond), we call dbuf_dirty() on the exact
blocks that will be modified. Once this happens, the temporary
accounting in tx_space_* is unnecessary, because we know exactly what
blocks are newly dirtied; we call dnode_willuse_space() to track this
more exact accounting.
The fundamental problem causing this bug is that dmu_tx_hold_*() relies
on the current state in the DMU (e.g. dn_nlevels) to predict how much
will be dirtied by this transaction, but this state can change before we
actually perform the transaction (i.e. call dbuf_dirty()).
This bug will be fixed by removing the assertion that the tx_space_*
accounting is perfectly accurate (i.e. we never dirty more than was
predicted by dmu_tx_hold_*()). By removing the requirement that this
accounting be perfectly accurate, we can also vastly simplify it, e.g.
removing most of the logic in dmu_tx_count_*().
The new tx space accounting will be very approximate, and may be more or
less than what is actually dirtied. It will still be used to determine
if this transaction will put us over quota. Transactions that are marked
by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
an attempt to determine how much space will be freed by the transaction
— this was rarely accurate enough to determine if a transaction should
be permitted when we are over quota, which is why dmu_tx_mark_netfree()
was introduced in 2014.
We also won’t attempt to give “credit” when overwriting existing blocks,
if those blocks may be freed. This allows us to remove the
do_free_accounting logic in dbuf_dirty(), and associated routines. This
logic attempted to predict what will be on disk when this txg syncs, to
know if the overwritten block will be freed (i.e. exists, and has no
snapshots).
OpenZFS-issue: https://www.illumos.org/issues/7793
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3704e0a
Upstream bugs: DLPX-32883a
Closes#5804
Porting notes:
- DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(),
Using the default dnode size would be slightly better.
- DEBUG_DMU_TX wrappers and configure option removed.
- Resolved _by_dnode() conflicts these changes have not yet been
applied to OpenZFS.
Automated auto-online test to go along with ZED FMA integration (PR 4673)
auto_online_001.pos works with real devices (sd- and mpath) and with non-real
block devices (loop) by adding a scsi_debug device to the pool
Note: In order for test group to run, ZED must not currently be running.
Kernel 3.16.37 or higher needed for scsi_debug to work properly
If timeout occurs on test using a scsi_debug device (error noticed on Ubuntu
system), a reboot might be needed in order for test to pass. (more
investigation into this)
Also suppressed output from is_real_device/is_loop_device/is_mpath_device -
was making the log file very cluttered with useless error messages
"ie /dev/mapper/sdc is not a block device" from previous patch
Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Sydney Vanda <sydney.m.vanda@intel.com>
Closes#5774
This patch adds a command (-c) option to zpool status and zpool iostat. The
-c option allows you to run an arbitrary command on each vdev and display
the first line of output in zpool status/iostat. The environment vars
VDEV_PATH and VDEV_UPATH are set to the vdev's path and "underlying path"
before running the command. For device mapper, multipath, or partitioned
vdevs, VDEV_UPATH is the actual underlying /dev/sd* disk. This can be useful
if the command you're running requires a /dev/sd* device.
The patch also uses /sys/block/<dev>/slaves/ to lookup the underlying device
instead of using libdevmapper. This not only removes the libdevmapper
requirement at build time, but also allows you to resolve device mapper
devices without being root. This means that UDEV_UPATH get set correctly
when running zpool status/iostat as an unprivileged user.
Example:
$ zpool status -c 'echo I am $VDEV_PATH, $VDEV_UPATH'
NAME STATE READ WRITE CKSUM
mypool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
mpatha ONLINE 0 0 0 I am /dev/mapper/mpatha, /dev/sdc
sdb ONLINE 0 0 0 I am /dev/sdb1, /dev/sdb
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#5368
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.
The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.
We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
- Fix autoreplace behaviour on statechange-led.sh script.
ZED sends the following events on an auto-replace:
1. statechange: Disk goes UNAVAIL->ONLINE
2. statechange: Disk goes ONLINE->UNAVAIL
3. vdev_attach: Disk goes ONLINE
Events 1-2 happen when ZED first attempts to do an auto-online. When that
fails, ZED then tries an auto-replace, generating the vdev_attach event in #3.
In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE
transition to turn off the LED. It ignored the #2 ONLINE->UNAVAIL transition,
assuming it was just the "old" VDEV going offline. This is problematic, as
a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want
to ignore that.
This new patch correctly turns on the fault LED every time a drive becomes
UNAVAIL. It also monitors vdev_attach events to trigger turning off the LED
when an auto-replaced disk comes online.
- Remove unnecessary libdevmapper warning with --with-config=kernel
This fixes an unnecessary libdevmapper warning when building
--with-config=kernel. Kernel code does not use libdevmapper, so the warning
is not needed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#2375Closes#5312Closes#5331
1. Enable multipath autoreplace support for FMA.
This extends FMA autoreplace to work with multipath disks. This
requires libdevmapper to be installed at build time.
2. Turn on/off fault LEDs when VDEVs become degraded/faulted/online
Set ZED_USE_ENCLOSURE_LEDS=1 in zed.rc to have ZED turn on/off the enclosure
LED for a drive when a drive becomes FAULTED/DEGRADED. Your enclosure must
be supported by the Linux SES driver for this to work. The enclosure LED
scripts work for multipath devices as well. The scripts will clear the LED
when the fault is cleared.
3. Rate limit ZIO delay and checksum events so as not to flood ZED
ZIO delay and checksum events are rate limited to 5/sec in the zfs module.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#2449Closes#3017Closes#5159
This patch tracks dnode usage for each user/group in the
DMU_USER/GROUPUSED_OBJECT ZAPs. ZAP entries dedicated to dnode
accounting have the key prefixed with "obj-" followed by the UID/GID
in string format (as done for the block accounting).
A new SPA feature has been added for dnode accounting as well as
a new ZPL version. The SPA feature must be enabled in the pool
before upgrading the zfs filesystem. During the zfs version upgrade,
a "quotacheck" will be executed by marking all dnode as dirty.
ZoL-bug-id: https://github.com/zfsonlinux/zfs/issues/3500
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Johann Lombardi <johann.lombardi@intel.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Ported by: Tony Hutter <hutter2@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/4185
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/45818ee
Porting Notes:
This code is ported on top of the Illumos Crypto Framework code:
b5e030c8db
The list of porting changes includes:
- Copied module/icp/include/sha2/sha2.h directly from illumos
- Removed from module/icp/algs/sha2/sha2.c:
#pragma inline(SHA256Init, SHA384Init, SHA512Init)
- Added 'ctx' to lib/libzfs/libzfs_sendrecv.c:zio_checksum_SHA256() since
it now takes in an extra parameter.
- Added CTASSERT() to assert.h from for module/zfs/edonr_zfs.c
- Added skein & edonr to libicp/Makefile.am
- Added sha512.S. It was generated from sha512-x86_64.pl in Illumos.
- Updated ztest.c with new fletcher_4_*() args; used NULL for new CTX argument.
- In icp/algs/edonr/edonr_byteorder.h, Removed the #if defined(__linux) section
to not #include the non-existant endian.h.
- In skein_test.c, renane NULL to 0 in "no test vector" array entries to get
around a compiler warning.
- Fixup test files:
- Rename <sys/varargs.h> -> <varargs.h>, <strings.h> -> <string.h>,
- Remove <note.h> and define NOTE() as NOP.
- Define u_longlong_t
- Rename "#!/usr/bin/ksh" -> "#!/bin/ksh -p"
- Rename NULL to 0 in "no test vector" array entries to get around a
compiler warning.
- Remove "for isa in $($ISAINFO); do" stuff
- Add/update Makefiles
- Add some userspace headers like stdio.h/stdlib.h in places of
sys/types.h.
- EXPORT_SYMBOL *_Init/*_Update/*_Final... routines in ICP modules.
- Update scripts/zfs2zol-patch.sed
- include <sys/sha2.h> in sha2_impl.h
- Add sha2.h to include/sys/Makefile.am
- Add skein and edonr dirs to icp Makefile
- Add new checksums to zpool_get.cfg
- Move checksum switch block from zfs_secpolicy_setprop() to
zfs_check_settable()
- Fix -Wuninitialized error in edonr_byteorder.h on PPC
- Fix stack frame size errors on ARM32
- Don't unroll loops in Skein on 32-bit to save stack space
- Add memory barriers in sha2.c on 32-bit to save stack space
- Add filetest_001_pos.ksh checksum sanity test
- Add option to write psudorandom data in file_write utility
Author: John Wren Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: David Quigley <david.quigley@intel.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Ported-by: Don Brady <don.brady@intel.com>
OpenZFS-issue: https://www.illumos.org/issues/6950
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/dcbf3bd6
Delphix-commit: https://github.com/delphix/delphix-os/commit/978ed49Closes#4929
ZFS Test Suite Performance Regression Tests
This was pulled into OpenZFS via the compressed arc featureand was
separated out in zfsonlinux as a separate pull request from PR-4768.
It originally came in as QA-4903 in Delphix-OS from John Kennedy.
Expected Usage:
$ DISKS="sdb sdc sdd" zfs-tests.sh -r perf-regression.run
Porting Notes:
1. Added assertions in the setup script to make sure required tools
(fio, mpstat, ...) are present.
2. For the config.json generation in perf.shlib used arcstats and
other binaries instead of dtrace to query the values.
3. For the perf data collection:
- use "zpool iostat -lpvyL" instead of the io.d dtrace script
(currently not collecting zfs_read/write latency stats)
- mpstat and iostat take different arguments
- prefetch_io.sh is a placeholder that uses arcstats instead of
dtrace
4. Build machines require fio, mdadm and sysstat pakage (YMMV).
Future Work:
- Need a way to measure zfs_read and zfs_write latencies per pool.
- Need tools to takes two sets of output and display/graph the
differences
- Bring over additional regression tests from Delphix
Authored by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Richard Elling <richard.elling@gmail.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Signed-off-by: Don Brady <don.brady@intel.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/5997
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/1437283
Porting Notes:
In addition to the OpenZFS changes this patch realigns the events
with those found in OpenZFS.
Events which would be logged as sysevents on illumos have been
been mapped to the 'sysevent' class for Linux. In addition, several
subclass names have been changed to match what is used in OpenZFS.
In all cases this means a '.' was changed to an '_' in the subclass.
The scripts provided by ZoL have been updated, however users which
provide scripts for any of the following events will need to rename
them based on the new subclass names.
ereport.fs.zfs.config.sync sysevent.fs.zfs.config_sync
ereport.fs.zfs.zpool.destroy sysevent.fs.zfs.pool_destroy
ereport.fs.zfs.zpool.reguid sysevent.fs.zfs.pool_reguid
ereport.fs.zfs.vdev.remove sysevent.fs.zfs.vdev_remove
ereport.fs.zfs.vdev.clear sysevent.fs.zfs.vdev_clear
ereport.fs.zfs.vdev.check sysevent.fs.zfs.vdev_check
ereport.fs.zfs.vdev.spare sysevent.fs.zfs.vdev_spare
ereport.fs.zfs.vdev.autoexpand sysevent.fs.zfs.vdev_autoexpand
ereport.fs.zfs.resilver.start sysevent.fs.zfs.resilver_start
ereport.fs.zfs.resilver.finish sysevent.fs.zfs.resilver_finish
ereport.fs.zfs.scrub.start sysevent.fs.zfs.scrub_start
ereport.fs.zfs.scrub.finish sysevent.fs.zfs.scrub_finish
ereport.fs.zfs.bootfs.vdev.attach sysevent.fs.zfs.bootfs_vdev_attach
A port of the Illumos Crypto Framework to a Linux kernel module (found
in module/icp). This is needed to do the actual encryption work. We cannot
use the Linux kernel's built in crypto api because it is only exported to
GPL-licensed modules. Having the ICP also means the crypto code can run on
any of the other kernels under OpenZFS. I ended up porting over most of the
internals of the framework, which means that porting over other API calls (if
we need them) should be fairly easy. Specifically, I have ported over the API
functions related to encryption, digests, macs, and crypto templates. The ICP
is able to use assembly-accelerated encryption on amd64 machines and AES-NI
instructions on Intel chips that support it. There are place-holder
directories for similar assembly optimizations for other architectures
(although they have not been written).
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4329
Justification
-------------
This feature adds support for variable length dnodes. Our motivation is
to eliminate the overhead associated with using spill blocks. Spill
blocks are used to store system attribute data (i.e. file metadata) that
does not fit in the dnode's bonus buffer. By allowing a larger bonus
buffer area the use of a spill block can be avoided. Spill blocks
potentially incur an additional read I/O for every dnode in a dnode
block. As a worst case example, reading 32 dnodes from a 16k dnode block
and all of the spill blocks could issue 33 separate reads. Now suppose
those dnodes have size 1024 and therefore don't need spill blocks. Then
the worst case number of blocks read is reduced to from 33 to two--one
per dnode block. In practice spill blocks may tend to be co-located on
disk with the dnode blocks so the reduction in I/O would not be this
drastic. In a badly fragmented pool, however, the improvement could be
significant.
ZFS-on-Linux systems that make heavy use of extended attributes would
benefit from this feature. In particular, ZFS-on-Linux supports the
xattr=sa dataset property which allows file extended attribute data
to be stored in the dnode bonus buffer as an alternative to the
traditional directory-based format. Workloads such as SELinux and the
Lustre distributed filesystem often store enough xattr data to force
spill bocks when xattr=sa is in effect. Large dnodes may therefore
provide a performance benefit to such systems.
Other use cases that may benefit from this feature include files with
large ACLs and symbolic links with long target names. Furthermore,
this feature may be desirable on other platforms in case future
applications or features are developed that could make use of a
larger bonus buffer area.
Implementation
--------------
The size of a dnode may be a multiple of 512 bytes up to the size of
a dnode block (currently 16384 bytes). A dn_extra_slots field was
added to the current on-disk dnode_phys_t structure to describe the
size of the physical dnode on disk. The 8 bits for this field were
taken from the zero filled dn_pad2 field. The field represents how
many "extra" dnode_phys_t slots a dnode consumes in its dnode block.
This convention results in a value of 0 for 512 byte dnodes which
preserves on-disk format compatibility with older software.
Similarly, the in-memory dnode_t structure has a new dn_num_slots field
to represent the total number of dnode_phys_t slots consumed on disk.
Thus dn->dn_num_slots is 1 greater than the corresponding
dnp->dn_extra_slots. This difference in convention was adopted
because, unlike on-disk structures, backward compatibility is not a
concern for in-memory objects, so we used a more natural way to
represent size for a dnode_t.
The default size for newly created dnodes is determined by the value of
a new "dnodesize" dataset property. By default the property is set to
"legacy" which is compatible with older software. Setting the property
to "auto" will allow the filesystem to choose the most suitable dnode
size. Currently this just sets the default dnode size to 1k, but future
code improvements could dynamically choose a size based on observed
workload patterns. Dnodes of varying sizes can coexist within the same
dataset and even within the same dnode block. For example, to enable
automatically-sized dnodes, run
# zfs set dnodesize=auto tank/fish
The user can also specify literal values for the dnodesize property.
These are currently limited to powers of two from 1k to 16k. The
power-of-2 limitation is only for simplicity of the user interface.
Internally the implementation can handle any multiple of 512 up to 16k,
and consumers of the DMU API can specify any legal dnode value.
The size of a new dnode is determined at object allocation time and
stored as a new field in the znode in-memory structure. New DMU
interfaces are added to allow the consumer to specify the dnode size
that a newly allocated object should use. Existing interfaces are
unchanged to avoid having to update every call site and to preserve
compatibility with external consumers such as Lustre. The new
interfaces names are given below. The versions of these functions that
don't take a dnodesize parameter now just call the _dnsize() versions
with a dnodesize of 0, which means use the legacy dnode size.
New DMU interfaces:
dmu_object_alloc_dnsize()
dmu_object_claim_dnsize()
dmu_object_reclaim_dnsize()
New ZAP interfaces:
zap_create_dnsize()
zap_create_norm_dnsize()
zap_create_flags_dnsize()
zap_create_claim_norm_dnsize()
zap_create_link_dnsize()
The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The
spa_maxdnodesize() function should be used to determine the maximum
bonus length for a pool.
These are a few noteworthy changes to key functions:
* The prototype for dnode_hold_impl() now takes a "slots" parameter.
When the DNODE_MUST_BE_FREE flag is set, this parameter is used to
ensure the hole at the specified object offset is large enough to
hold the dnode being created. The slots parameter is also used
to ensure a dnode does not span multiple dnode blocks. In both of
these cases, if a failure occurs, ENOSPC is returned. Keep in mind,
these failure cases are only possible when using DNODE_MUST_BE_FREE.
If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0.
dnode_hold_impl() will check if the requested dnode is already
consumed as an extra dnode slot by an large dnode, in which case
it returns ENOENT.
* The function dmu_object_alloc() advances to the next dnode block
if dnode_hold_impl() returns an error for a requested object.
This is because the beginning of the next dnode block is the only
location it can safely assume to either be a hole or a valid
starting point for a dnode.
* dnode_next_offset_level() and other functions that iterate
through dnode blocks may no longer use a simple array indexing
scheme. These now use the current dnode's dn_num_slots field to
advance to the next dnode in the block. This is to ensure we
properly skip the current dnode's bonus area and don't interpret it
as a valid dnode.
zdb
---
The zdb command was updated to display a dnode's size under the
"dnsize" column when the object is dumped.
For ZIL create log records, zdb will now display the slot count for
the object.
ztest
-----
Ztest chooses a random dnodesize for every newly created object. The
random distribution is more heavily weighted toward small dnodes to
better simulate real-world datasets.
Unused bonus buffer space is filled with non-zero values computed from
the object number, dataset id, offset, and generation number. This
helps ensure that the dnode traversal code properly skips the interior
regions of large dnodes, and that these interior regions are not
overwritten by data belonging to other dnodes. A new test visits each
object in a dataset. It verifies that the actual dnode size matches what
was stored in the ztest block tag when it was created. It also verifies
that the unused bonus buffer space is filled with the expected data
patterns.
ZFS Test Suite
--------------
Added six new large dnode-specific tests, and integrated the dnodesize
property into existing tests for zfs allow and send/recv.
Send/Receive
------------
ZFS send streams for datasets containing large dnodes cannot be received
on pools that don't support the large_dnode feature. A send stream with
large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be
unrecognized by an incompatible receiving pool so that the zfs receive
will fail gracefully.
While not implemented here, it may be possible to generate a
backward-compatible send stream from a dataset containing large
dnodes. The implementation may be tricky, however, because the send
object record for a large dnode would need to be resized to a 512
byte dnode, possibly kicking in a spill block in the process. This
means we would need to construct a new SA layout and possibly
register it in the SA layout object. The SA layout is normally just
sent as an ordinary object record. But if we are constructing new
layouts while generating the send stream we'd have to build the SA
layout object dynamically and send it at the end of the stream.
For sending and receiving between pools that do support large dnodes,
the drr_object send record type is extended with a new field to store
the dnode slot count. This field was repurposed from unused padding
in the structure.
ZIL Replay
----------
The dnode slot count is stored in the uppermost 8 bits of the lr_foid
field. The bits were unused as the object id is currently capped at
48 bits.
Resizing Dnodes
---------------
It should be possible to resize a dnode when it is dirtied if the
current dnodesize dataset property differs from the dnode's size, but
this functionality is not currently implemented. Clearly a dnode can
only grow if there are sufficient contiguous unused slots in the
dnode block, but it should always be possible to shrink a dnode.
Growing dnodes may be useful to reduce fragmentation in a pool with
many spill blocks in use. Shrinking dnodes may be useful to allow
sending a dataset to a pool that doesn't support the large_dnode
feature.
Feature Reference Counting
--------------------------
The reference count for the large_dnode pool feature tracks the
number of datasets that have ever contained a dnode of size larger
than 512 bytes. The first time a large dnode is created in a dataset
the dataset is converted to an extensible dataset. This is a one-way
operation and the only way to decrement the feature count is to
destroy the dataset, even if the dataset no longer contains any large
dnodes. The complexity of reference counting on a per-dnode basis was
too high, so we chose to track it on a per-dataset basis similarly to
the large_block feature.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3542
This is a new implementation of RAIDZ1/2/3 routines using x86_64
scalar, SSE, and AVX2 instruction sets. Included are 3 parity
generation routines (P, PQ, and PQR) and 7 reconstruction routines,
for all RAIDZ level. On module load, a quick benchmark of supported
routines will select the fastest for each operation and they will
be used at runtime. Original implementation is still present and
can be selected via module parameter.
Patch contains:
- specialized gen/rec routines for all RAIDZ levels,
- new scalar raidz implementation (unrolled),
- two x86_64 SIMD implementations (SSE and AVX2 instructions sets),
- fastest routines selected on module load (benchmark).
- cmd/raidz_test - verify and benchmark all implementations
- added raidz_test to the ZFS Test Suite
New zfs module parameters:
- zfs_vdev_raidz_impl (str): selects the implementation to use. On
module load, the parameter will only accept first 3 options, and
the other implementations can be set once module is finished
loading. Possible values for this option are:
"fastest" - use the fastest math available
"original" - use the original raidz code
"scalar" - new scalar impl
"sse" - new SSE impl if available
"avx2" - new AVX2 impl if available
See contents of `/sys/module/zfs/parameters/zfs_vdev_raidz_impl` to
get the list of supported values. If an implementation is not supported
on the system, it will not be shown. Currently selected option is
enclosed in `[]`.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4328
6736 ZFS per-vdev ZAPs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/6736https://github.com/openzfs/openzfs/commit/215198a
Ported-by: Don Brady <don.brady@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4515
Dracut and Systemd updated how they integrate with each other, because
of this our current integrations stopped working (around the time
4.1.13 came out). This patch addresses that issue and gets us booting
again.
Thanks to @Rudd-O for doing the work to get dracut working again and
letting me submit this on his behalf.
Signed-off-by: Manuel Amador (Rudd-O) <rudd-o@rudd-o.com>
Signed-off-by: Matthew Thode <mthode@mthode.org>
Closes#3605Closes#4478
Add the ZFS Test Suite and test-runner framework from illumos.
This is a continuation of the work done by Turbo Fredriksson to
port the ZFS Test Suite to Linux. While this work was originally
conceived as a stand alone project integrating it directly with
the ZoL source tree has several advantages:
* Allows the ZFS Test Suite to be packaged in zfs-test package.
* Facilitates easy integration with the CI testing.
* Users can locally run the ZFS Test Suite to validate ZFS.
This testing should ONLY be done on a dedicated test system
because the ZFS Test Suite in its current form is destructive.
* Allows the ZFS Test Suite to be run directly in the ZoL source
tree enabled developers to iterate quickly during development.
* Developers can easily add/modify tests in the framework as
features are added or functionality is changed. The tests
will then always be in sync with the implementation.
Full documentation for how to run the ZFS Test Suite is available
in the tests/README.md file.
Warning: This test suite is designed to be run on a dedicated test
system. It will make modifications to the system including, but
not limited to, the following.
* Adding new users
* Adding new groups
* Modifying the following /proc files:
* /proc/sys/kernel/core_pattern
* /proc/sys/kernel/core_uses_pid
* Creating directories under /
Notes:
* Not all of the test cases are expected to pass and by default
these test cases are disabled. The failures are primarily due
to assumption made for illumos which are invalid under Linux.
* When updating these test cases it should be done in as generic
a way as possible so the patch can be submitted back upstream.
Most existing library functions have been updated to be Linux
aware, and the following functions and variables have been added.
* Functions:
* is_linux - Used to wrap a Linux specific section.
* block_device_wait - Waits for block devices to be added to /dev/.
* Variables: Linux Illumos
* ZVOL_DEVDIR "/dev/zvol" "/dev/zvol/dsk"
* ZVOL_RDEVDIR "/dev/zvol" "/dev/zvol/rdsk"
* DEV_DSKDIR "/dev" "/dev/dsk"
* DEV_RDSKDIR "/dev" "/dev/rdsk"
* NEWFS_DEFAULT_FS "ext2" "ufs"
* Many of the disabled test cases fail because 'zfs/zpool destroy'
returns EBUSY. This is largely causes by the asynchronous nature
of device handling on Linux and is expected, the impacted test
cases will need to be updated to handle this.
* There are several test cases which have been disabled because
they can trigger a deadlock. A primary example of this is to
recursively create zpools within zpools. These tests have been
disabled until the root issue can be addressed.
* Illumos specific utilities such as (mkfile) should be added to
the tests/zfs-tests/cmd/ directory. Custom programs required by
the test scripts can also be added here.
* SELinux should be either is permissive mode or disabled when
running the tests. The test cases should be updated to conform
to a standard policy.
* Redundant test functionality has been removed (zfault.sh).
* Existing test scripts (zconfig.sh) should be migrated to use
the framework for consistency and ease of testing.
* The DISKS environment variable currently only supports loopback
devices because of how the ZFS Test Suite expects partitions to
be named (p1, p2, etc). Support must be added to generate the
correct partition name based on the device location and name.
* The ZFS Test Suite is part of the illumos code base at:
https://github.com/illumos/illumos-gate/tree/master/usr/src/test
Original-patch-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#6Closes#1534
The dracut code is analogous to the initramfs code and as such
it should be located in the contrib with initramfs for consistency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
* Supports booting of a ZFS snapshot.
Do this by cloning the snapshot into a dataset. If this, the resulting
dataset, already exists, destroy it. Then mount it on root.
* If snapshot does not exist, use base dataset (the part before '@')
as boot filesystem instead.
* If no snapshot is specified on the 'root=' kernel command line, but there
is an '@', then get a list of snapshots below that filesystem and ask the
user which to use.
* Clone with 'mountpoint=none' and 'canmount=noauto' - we mount manually
and explicitly.
* For sub-filesystems, that doesn't have a mountpoint property set, we use
the 'org.zol:mountpoint' to keep track of it's mountpoint.
* Allow rollback of snapshots instead of clone it and boot from the clone.
* Allow mounting a root- and subfs with mountpoint=legacy set
* Allow mounting a filesystem which is using nativ encryption.
* Support all currently used kernel command line arguments
All the different distributions have their own standard on what to specify
on the kernel command line to boot of a ZFS filesystem.
* Extra options:
* zfsdebug=(on,yes,1) Show extra debugging information
* zfsforce=(on,yes,1) Force import the pool
* rollback=(on,yes,1) Rollback (instead of clone) the snapshot
* Only try to import pool if it haven't already been imported
* This will negate the need to force import a pool that have not been exported cleanly.
* Support exclusion of pools to import by setting ZFS_POOL_EXCEPTIONS in /etc/default/zfs.
* Support additional configuration variable ZFS_INITRD_ADDITIONAL_DATASETS
to mount additional filesystems not located under your root dataset.
* Include /etc/modprobe.d/{zfs,spl}.conf in the initrd if it/they exist.
* Include the udev rule to use by-vdev for pool imports.
* Include the /etc/default/zfs file to the initrd.
* Only try /dev/disk/by-* in the initrd if USE_DISK_BY_ID is set.
* Use /dev/disk/by-vdev before anything.
* Add /dev as a last ditch attempt.
* Fallback to using the cache file if that exist if nothing else worked.
* Use /sbin/modprobe instead of built-in (BusyBox) modprobe.
This gets rid of the message "modprobe: can't load module zcommon".
Thanx to pcoultha for finding this.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2116Closes#2114
Provide a Redhat specific zfs-kmod.spec file which uses the old style
kmods (not kmods2) packaging. By using the provided kmodtool script
packages can be built which support weak modules. This allows for the
kernel to be updated without having to rebuild the ZFS kernel modules.
Packages for RHEL/Centos/SL/TOSS which use this spec file can by built
as follows:
$ ./configure --with-spec=redhat
$ make rpms
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Originally it was thought that custom spec files might be required
for Fedora. Happily that has turns out not to be the case. Since
this directory just contains symlinks to the generic spec files it
can be removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
As of automake 1.14.2, currently shipped with Ubuntu 14.04, automake
warns about AM_INIT_AUTOMAKE having more than one argument:
configure.ac:41: warning: AM_INIT_AUTOMAKE: two- and three-arguments forms are deprecated. For more info, see:
configure.ac:41: http://www.gnu.org/software/automake/manual/automake.html#Modernize-AM_005fINIT_005fAUTOMAKE-invocation
This commit fixes the warnings by following above link's advice, so
AM_INIT gets called with the package's name and version. As both are
defined in the META file we're parsing it with `grep`, `cut` and `tr`.
NOTE: autoconf < 1.14 not supporting m4_esyscmd_s so m4_esyscmd was
used and modified `tr` to truncate newlines, too.
Signed-off-by: Hajo M<C3><B6>ller <dasjoe@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3174
Add the arc_summary Makefile to the build system so the script is
properly included in the distribution tarball and installed.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3147
Providing a pkg-config file makes is easy for 3rd party applications
to link against the libzfs libraries. It also allows the libzfs
developers to modify the list of required libraries and cflags
without breaking existing applications.
The following example illustrates how pkg-config can be used:
cc `pkg-config --cflags --libs libzfs` -o myapp myapp.c
/*
* myapp.c
*/
void main()
{
libzfs_handle_t *hdl;
hdl = libzfs_init();
if (hdl)
libzfs_fini(hdl);
}
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #585
These can be manually installed as needed by end users. They
have been added to the repository so they can be kept up to date
with the latest code.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1588
zed monitors ZFS events. When a zevent is posted, zed will run any
scripts that have been enabled for the corresponding zevent class.
Multiple scripts may be invoked for a given zevent. The zevent
nvpairs are passed to the scripts as environment variables.
Events are processed synchronously by the single thread, and there is
no maximum timeout for script execution. Consequently, a misbehaving
script can delay (or forever block) the processing of subsequent
zevents. Plans are to address this in future commits.
Initial scripts have been developed to log events to syslog
and send email in response to checksum/data/io errors and
resilver.finish/scrub.finish events. By default, email will only
be sent if the ZED_EMAIL variable is configured in zed.rc (which is
serving as a config file of sorts until a proper configuration file
is implemented).
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2
This adds systemd unit files replacing the functionality offered by
the SysV init script found in etc/init.d.
It has been developed and tested on Fedora 19, Fedora 20
and openSuSE 13.1.
Four unit files and one target are offered.
zfs-import-cache.service:
Import pools from /etc/zfs/zpool.cache. This unit will wait for
udev to settle.
zfs-import-scan.service:
Import pools by scanning /dev/disk/by-id for zvols. This unit will
only run if /etc/zfs/zpool.cache is not present. This unit will wait
for udev to settle
zfs-mount.service:
Mount ZFS native filesystems. It contains a dependency to be loaded
before local-fs.target.
zfs-share.service:
Share NFS/SMB filesystems. This unit contains a dependency that
will cause it to be restarted whenever the smb or nfs-server unit
is restarted, restoring the shares added.
zfs.target:
This target pulls in the other units in order to start ZFS. It's
the only unit that can be enabled/disabled, all other services
are static and pulled in by dependencies. It will honour zfs=off
and zfs=no options on the kernel command line.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2108
The dbufstat.py command was added to provide a conveniant way to
easily determine what ZFS is caching. The script consumes the
raw /proc/spl/kstat/zfs/dbufs kstat data can consolidates it in
to a more human readable form. This was designed primarily as
a tool to aid developers but it may also be useful for advanced
users who want more visibility in to what the ARC is caching.
When run without options dbufstat.py will default to showing a
list of all objects with at least one buffer present in the
cache. The total cache space consumed by that object will be
printed on the right along with the object type. Similar to the
arcstats.py command the -x option may used to display additional
fields.
Two other modes of operation are also supported by dbufstat.py
and the expectation is additional display modes may be added as
needed. The -t option will summerize the total number of bytes
cached for each object type, and the -b option will show every
dbuf currently cached.
The script was designed to be consistent with arcstat.py and
includes most of the same options and funcationality.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
https://www.illumos.org/issues/2882https://www.illumos.org/issues/2883https://www.illumos.org/issues/2900illumos/illumos-gate@4445fffbbb
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1293
Porting notes:
WARNING: This patch changes the user/kernel ABI. That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules. Ensure you load the matching kernel
modules from master after updating the utilities. Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:
$ zpool list
failed to read pool configuration: bad address
no pools available
$ zfs list
no datasets available
Add zvol minor device creation to the new zfs_snapshot_nvl function.
Remove the logging of the "release" operation in
dsl_dataset_user_release_sync(). The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos. This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).
Squash some "may be used uninitialized" warning/erorrs.
Fix some printf format warnings for %lld and %llu.
Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.
Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
* Modified kstat_update() to read arcstats from proc.
* Fix shebang.
* Added Makefile.am entries for arcstat.py
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1506
Part of the automated testing involves building the source on Debian Lenny
which ships an ancient version of automake (1.10.1). Historically, this
has caused a non-fatal warning about AM_SILENT_RULES not being defined.
But when the autogen.sh script was updated to use autoreconf the warning
became fatal.
configure.ac:31: warning: macro `AM_SILENT_RULES' not found in library
autoreconf: running: /usr/bin/autoconf --force
configure.ac:34: error: possibly undefined macro: AM_SILENT_RULES
If this token and others are legitimate, please use m4_pattern_allow.
To resolve this build issue the call to AM_SILENT_RULES has been wrapped
by m4_ifdef(). This prevents the macro from being expanded on platforms
where it's undefined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'. This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.
http://fedoraproject.org/wiki/Packaging:Guidelineshttp://rpmfusion.org/Packaging/KernelModules/Kmods2
While the spec files have been entirely rewritten from a
user perspective the only major changes are:
* The Fedora packages now have a build dependency on the
rpmfusion repositories. The generic kmod packages also
have a new dependency on kmodtool-1.22 but it is bundled
with the source rpm so no additional packages are needed.
* The kernel binary module packages have been renamed from
zfs-modules-* to kmod-zfs-* as specificed by kmods2.
* The is now a common kmod-zfs-devel-* package in addition
to the per-kernel devel packages. The common package
contains the development headers while the per-kernel
package contains kernel specific build products.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1341
The kernel modules are now available in the Arch User Repository
(AUR) via zfs. Since their packaging is maintained and superior
to ours it is being removed from the tree.
https://wiki.archlinux.org/index.php/ZFS
Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
In the interest of maintaining only one udev helper to give vdevs
user friendly names, the zpool_id and zpool_layout infrastructure
is being retired. They are superseded by vdev_id which incorporates
all the previous functionality.
Documentation for the new vdev_id(8) helper and its configuration
file, vdev_id.conf(5), can be found in their respective man pages.
Several useful example files are installed under /etc/zfs/.
/etc/zfs/vdev_id.conf.alias.example
/etc/zfs/vdev_id.conf.multipath.example
/etc/zfs/vdev_id.conf.sas_direct.example
/etc/zfs/vdev_id.conf.sas_switch.example
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#981
A fsck helper to accomidate distributions that expect to be able
to execute a fsck on all filesystem types. Currently this script
does nothing but it could be extended to act as a compatibility
wrapper for 'zpool scrub'.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#964
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
illumos/illumos-gate@53089ab7c8illumos/illumos-gate@ad135b5d64
illumos changeset: 13700:2889e2596bd6
https://www.illumos.org/issues/2619https://www.illumos.org/issues/2747
NOTE: The grub specific changes were not ported. This change
must be made to the Linux grub packages.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
This commit adds support for building a zfs-modules-dkms sub package
built around Dynamic Kernel Module Support. This is to allow building
packages using the DKMS infrastructure which is intended to ease the
burden of kernel version changes, upgrades, etc.
By default zfs-modules-dkms-* sub package will be built as part of
the 'make rpm' target. Alternately, you can build only the DKMS
module package using the 'make rpm-dkms' target.
Examples:
# To build packaged binaries as well as a dkms packages
$ ./configure && make rpm
# To build only the packaged binary utilities and dkms packages
$ ./configure && make rpm-utils rpm-dkms
Note: Only the RHEL 5/6, CHAOS 5, and Fedora distributions are
supported for building the dkms sub package.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #535