mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 02:27:36 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	06401e4222	Fix ztest_verify_dnode_bt() test case In ztest_verify_dnode_bt the ztest_object_lock must be held in order to safely verify the unused bonus space. Reviewed-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6941	2018-01-09 12:27:12 -08:00
Nathaniel Wesley Filardo	8b20a9f996	zhack: fix getopt return type This fixes zhack's command processing on ARM. On ARM char is unsigned, and so, in promotion to an int, it will never compare equal to -1. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Closes #7016	2018-01-09 11:14:45 -08:00
LOLi	390d679acd	Fix 'zpool add' handling of nested interior VDEVs When replacing a faulted device which was previously handled by a spare multiple levels of nested interior VDEVs will be present in the pool configuration; the following example illustrates one of the possible situations: NAME STATE READ WRITE CKSUM testpool DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 spare-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 /var/tmp/fault-dev UNAVAIL 0 0 0 cannot open /var/tmp/replace-dev ONLINE 0 0 0 /var/tmp/spare-dev1 ONLINE 0 0 0 /var/tmp/safe-dev ONLINE 0 0 0 spares /var/tmp/spare-dev1 INUSE currently in use This is safe and allowed, but get_replication() needs to handle this situation gracefully to let zpool add new devices to the pool. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6678 Closes #6996	2017-12-28 10:15:32 -08:00
Simon Guest	993669a7bf	vdev_id: new slot type ses This extends vdev_id to support a new slot type, ses, for SCSI Enclosure Services. With slot type ses, the disk slot numbers are determined by using the device slot number reported by sg_ses for the device with matching SAS address, found by querying all available enclosures. This is primarily of use on systems with a deficient driver omitting support for bay_identifier in /sys/devices. In my testing, I found that the existing slot types of port and id were not stable across disk replacement, so an alternative was required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Simon Guest <simon.guest@tesujimath.org> Closes #6956	2017-12-20 09:42:07 -08:00
Giuseppe Di Natale	89a66a0457	Handle broken pipes in arc_summary Using a command similar to 'arc_summary.py \| head' causes a broken pipe exception. Gracefully exit in the case of a broken pipe in arc_summary.py. Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6965 Closes #6969	2017-12-19 13:19:24 -08:00
LOLi	c4ba46dead	Handle invalid options in arc_summary If an invalid option is provided to arc_summary.py we handle any error thrown from the getopt Python module and print the usage help message. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6983	2017-12-19 13:02:40 -08:00
LOLi	c30e34faa1	ZTS: Fix create-o_ashift test case The function that fills the uberblock ring buffer on every device label has been reworked to avoid occasional failures caused by a race condition that prevents 'zpool sync' from writing some uberblock sequentially: this happens when the pool sync ioctl dispatch code calls txg_wait_synced() while we're already waiting for a TXG to sync. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6924 Closes #6977	2017-12-19 10:49:33 -08:00
LOLi	e2d936e0f8	Honor --with-mounthelperdir where applicable Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6962	2017-12-17 14:14:07 -08:00
LOLi	4e9b156960	Various ZED fixes * Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #2562 Closes #6858	2017-12-08 16:58:41 -08:00
Tony Hutter	674b89342e	Fix segfault in zpool iostat when adding VDEVs Fix a segfault when running 'zpool iostat -v 1' while adding a VDEV. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #6748 Closes #6872	2017-12-06 11:43:07 -08:00
Prakash Surya	1ce23dcaff	OpenZFS 8585 - improve batching done in zil_commit() Authored by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Brad Lewis <brad.lewis@delphix.com> Reviewed by: Matt Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@joyent.com> Ported-by: Prakash Surya <prakash.surya@delphix.com> Problem ======= The current implementation of zil_commit() can introduce significant latency, beyond what is inherent due to the latency of the underlying storage. The additional latency comes from two main problems: 1. When there's outstanding ZIL blocks being written (i.e. there's already a "writer thread" in progress), then any new calls to zil_commit() will block waiting for the currently oustanding ZIL blocks to complete. The blocks written for each "writer thread" is coined a "batch", and there can only ever be a single "batch" being written at a time. When a batch is being written, any new ZIL transactions will have to wait for the next batch to be written, which won't occur until the current batch finishes. As a result, the underlying storage may not be used as efficiently as possible. While "new" threads enter zil_commit() and are blocked waiting for the next batch, it's possible that the underlying storage isn't fully utilized by the current batch of ZIL blocks. In that case, it'd be better to allow these new threads to generate (and issue) a new ZIL block, such that it could be serviced by the underlying storage concurrently with the other ZIL blocks that are being serviced. 2. Any call to zil_commit() must wait for all ZIL blocks in its "batch" to complete, prior to zil_commit() returning. The size of any given batch is proportional to the number of ZIL transaction in the queue at the time that the batch starts processing the queue; which doesn't occur until the previous batch completes. Thus, if there's a lot of transactions in the queue, the batch could be composed of many ZIL blocks, and each call to zil_commit() will have to wait for all of these writes to complete (even if the thread calling zil_commit() only cared about one of the transactions in the batch). To further complicate the situation, these two issues result in the following side effect: 3. If a given batch takes longer to complete than normal, this results in larger batch sizes, which then take longer to complete and further drive up the latency of zil_commit(). This can occur for a number of reasons, including (but not limited to): transient changes in the workload, and storage latency irregularites. Solution ======== The solution attempted by this change has the following goals: 1. no on-disk changes; maintain current on-disk format. 2. modify the "batch size" to be equal to the "ZIL block size". 3. allow new batches to be generated and issued to disk, while there's already batches being serviced by the disk. 4. allow zil_commit() to wait for as few ZIL blocks as possible. 5. use as few ZIL blocks as possible, for the same amount of ZIL transactions, without introducing significant latency to any individual ZIL transaction. i.e. use fewer, but larger, ZIL blocks. In theory, with these goals met, the new allgorithm will allow the following improvements: 1. new ZIL blocks can be generated and issued, while there's already oustanding ZIL blocks being serviced by the storage. 2. the latency of zil_commit() should be proportional to the underlying storage latency, rather than the incoming synchronous workload. Porting Notes ============= Due to the changes made in commit `119a394ab0`, the lifetime of an itx structure differs than in OpenZFS. Specifically, the itx structure is kept around until the data associated with the itx is considered to be safe on disk; this is so that the itx's callback can be called after the data is committed to stable storage. Since OpenZFS doesn't have this itx callback mechanism, it's able to destroy the itx structure immediately after the itx is committed to an lwb (before the lwb is written to disk). To support this difference, and to ensure the itx's callbacks can still be called after the itx's data is on disk, a few changes had to be made: * A list of itxs was added to the lwb structure. This list contains all of the itxs that have been committed to the lwb, such that the callbacks for these itxs can be called from zil_lwb_flush_vdevs_done(), after the data for the itxs is committed to disk. * A list of itxs was added on the stack of the zil_process_commit_list() function; the "nolwb_itxs" list. In some circumstances, an itx may not be committed to an lwb (e.g. if allocating the "next" ZIL block on disk fails), so this list is used to keep track of which itxs fall into this state, such that their callbacks can be called after the ZIL's writer pipeline is "stalled". * The logic to actually call the itx's callback was moved into the zil_itx_destroy() function. Since all consumers of zil_itx_destroy() were effectively performing the same logic (i.e. if callback is non-null, call the callback), it seemed like useful code cleanup to consolidate this logic into a single function. Additionally, the existing Linux tracepoint infrastructure dealing with the ZIL's probes and structures had to be updated to reflect these code changes. Specifically: * The "zil__cw1" and "zil__cw2" probes were removed, so they had to be removed from "trace_zil.h" as well. * Some of the zilog structure's fields were removed, which affected the tracepoint definitions of the structure. * New tracepoints had to be added for the following 3 new probes: * zil__process__commit__itx * zil__process__normal__itx * zil__commit__io__error OpenZFS-issue: https://www.illumos.org/issues/8585 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/5d95a3a Closes #6566	2017-12-05 09:39:16 -08:00
Brian Behlendorf	ea39f75f64	Fix 'zpool create\|add' replication level check When the pool configuration contains a hole due to a previous device removal ignore this top level vdev. Failure to do so will result in the current configuration being assessed to have a non-uniform replication level and the expected warning will be disabled. The zpool_add_010_pos test case was extended to cover this scenario. Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6907 Closes #6911	2017-12-04 11:50:35 -08:00
Scot W. Stevenson	8d18776973	Fix data on evict_skips in arc_summary.py Display correct data from kstat arcstats for evict_skips, which is currently repeating the data from mutex_misses. Fixes #6882 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6882 Closes #6883	2017-11-18 14:07:04 -08:00
Tom Caputi	d4a72f2386	Sequential scrub and resilvers Currently, scrubs and resilvers can take an extremely long time to complete. This is largely due to the fact that zfs scans process pools in logical order, as determined by each block's bookmark. This makes sense from a simplicity perspective, but blocks in zfs are often scattered randomly across disks, particularly due to zfs's copy-on-write mechanisms. This patch improves performance by splitting scrubs and resilvers into a metadata scanning phase and an IO issuing phase. The metadata scan reads through the structure of the pool and gathers an in-memory queue of I/Os, sorted by size and offset on disk. The issuing phase will then issue the scrub I/Os as sequentially as possible, greatly improving performance. This patch also updates and cleans up some of the scan code which has not been updated in several years. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Saso Kiselkov <saso.kiselkov@nexenta.com> Authored-by: Alek Pinchuk <apinchuk@datto.com> Authored-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #3625 Closes #6256	2017-11-15 17:27:01 -08:00
Scot W. Stevenson	e301113c17	Minor code cleanups in arc_python.py Remove unused library re and associated variable kstat_pobj. Add note to documentation at start of program about required support for old versions of Python. Change variable "format" (which is a built-in function) to "fmt". Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6869	2017-11-15 10:28:11 -08:00
Scot W. Stevenson	5277f208f2	Fix arc_summary.py -d crash with Python3 Prevents arc_summary.py crashing when called with parameter -d or long form --description with Python3. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6849 Closes #6850	2017-11-11 20:27:43 -08:00
Scot W. Stevenson	681957fe2e	Sort output of tunables in arc_summary.py Sort list of tunables printed by _tunable_summary() alphabetically Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6828	2017-11-07 14:50:15 -08:00
Scot W. Stevenson	23ea00a1fe	Add documentation strings to arc_summary.py Include docstrings (PEP8, PEP257) for module and all functions. Separately, remove outdated section in comment at start of module. Separately, remove unused global constant "usetunable". Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6818	2017-11-05 13:11:37 -08:00
George G	2df9ad1c07	Fix column alignment with long zpool names `zpool status` normally aligns NAME/STATE/etc columns: NAME STATE READ WRITE CKSUM dummy ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/dummy-long-1.bin ONLINE 0 0 0 /tmp/dummy-long-2.bin ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 /tmp/dummy-long-3.bin ONLINE 0 0 0 /tmp/dummy-long-4.bin ONLINE 0 0 0 However, if the zpool name is longer than the zvol names, alignment issues arise: NAME STATE READ WRITE CKSUM dummy-very-very-long-zpool-name ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 /tmp/dummy-1.bin ONLINE 0 0 0 /tmp/dummy-2.bin ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 /tmp/dummy-3.bin ONLINE 0 0 0 /tmp/dummy-4.bin ONLINE 0 0 0 `zpool iostat` and `zpool import` are also affected: capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- dummy 104K 1.97G 0 0 152 9.84K dummy-very-very-long-zpool-name 152K 1.97G 0 1 144 13.1K ---------- ----- ----- ----- ----- ----- ----- dummy-very-very-long-zpool-name ONLINE mirror-0 ONLINE /tmp/dummy-1.bin ONLINE /tmp/dummy-2.bin ONLINE mirror-1 ONLINE /tmp/dummy-3.bin ONLINE /tmp/dummy-4.bin ONLINE Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Gaydarov <git@gg7.io> Closes #6786	2017-11-05 13:09:56 -08:00
Scot W. Stevenson	cd1813d36e	Rewrite fHits() in arc_summary.py with SI units Complete rewrite of fHits(). Move units from non-standard English abbreviations to SI units, thereby avoiding confusion because of "long scale" and "short scale" numbers. Remove unused parameter "Decimal". Add function string. Aim to confirm to PEP8. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6815	2017-11-04 13:33:28 -07:00
Scot W. Stevenson	df1f129bc4	Minor code cleanup in arc_summary.py Simplify and inline single-use function div1(); inline twice-used function div2(); add function comment to zfs_header(); replace variable "unused" in get_Kstat() with "_" following convention. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6802	2017-11-03 15:43:53 -07:00
Jason King	f3c8c9e6f0	OpenZFS 640 - number_to_scaled_string is duplicated in several commands Porting Notes: - The OpenZFS patch added nicenum_scale() and nicenum() to a library not used by ZFS. Rather than pull in a new dependency the version of nicenum in lib/libzpool/util.c was simply replaced with the new one. Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net> Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Yuri Pankov <yuripv@gmx.com> Approved by: Dan McDonald <danmcd@joyent.com> Authored by: Jason King <jason.brian.king@gmail.com> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/640 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0a055120 Closes #6796	2017-10-30 14:47:20 -07:00
Scot W. Stevenson	47c8e7fd97	Rewrite of function fBytes() in arc_summary.py Replace if-elif-else construction with shorter loop; remove unused parameter "Decimal"; centralize format string; add function documentation string; conform to PEP8. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Scot W. Stevenson <scot.stevenson@gmail.com> Closes #6784	2017-10-30 14:44:35 -07:00
Brian Behlendorf	867959b588	OpenZFS 8081 - Compiler warnings in zdb Fix compiler warnings in zdb. With these changes, FreeBSD can compile zdb with all compiler warnings enabled save -Wunused-parameter. usr/src/cmd/zdb/zdb.c usr/src/cmd/zdb/zdb_il.c usr/src/uts/common/fs/zfs/sys/sa.h usr/src/uts/common/fs/zfs/sys/spa.h Fix numerous warnings, including: * const-correctness * shadowing global definitions * signed vs unsigned comparisons * missing prototypes, or missing static declarations * unused variables and functions * Unreadable array initializations * Missing struct initializers usr/src/cmd/zdb/zdb.h Add a header file to declare common symbols usr/src/lib/libzpool/common/sys/zfs_context.h usr/src/uts/common/fs/zfs/arc.c usr/src/uts/common/fs/zfs/dbuf.c usr/src/uts/common/fs/zfs/spa.c usr/src/uts/common/fs/zfs/txg.c Add a function prototype for zk_thread_create, and ensure that every callback supplied to this function actually matches the prototype. usr/src/cmd/ztest/ztest.c usr/src/uts/common/fs/zfs/sys/zil.h usr/src/uts/common/fs/zfs/zfs_replay.c usr/src/uts/common/fs/zfs/zvol.c Add a function prototype for zil_replay_func_t, and ensure that every function of this type actually matches the prototype. usr/src/uts/common/fs/zfs/sys/refcount.h Change FTAG so it discards any constness of __func__, necessary since existing APIs expect it passed as void *. Porting Notes: - Many of these fixes have already been applied to Linux. For consistency the OpenZFS version of a change was applied if the warning was addressed in an equivalent but different fashion. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Authored by: Alan Somers <asomers@gmail.com> Approved by: Richard Lowe <richlowe@richlowe.net> Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8081 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/843abe1b8a Closes #6787	2017-10-27 12:46:35 -07:00
LOLi	88f9c9396b	Allow 'zpool events' filtering by pool name Additionally add four new tests: * zpool_events_clear: verify 'zpool events -c' functionality * zpool_events_cliargs: verify command line options and arguments * zpool_events_follow: verify 'zpool events -f' * zpool_events_poolname: verify events filtering by pool name Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #3285 Closes #6762	2017-10-26 16:49:33 -07:00
Arkadiusz Bubała	d3f2cd7e3b	Added no_scrub_restart flag to zpool reopen Added -n flag to zpool reopen that allows a running scrub operation to continue if there is a device with Dirty Time Log. By default if a component device has a DTL and zpool reopen is executed all running scan operations will be restarted. Added functional tests for `zpool reopen` Tests covers following scenarios: * `zpool reopen` without arguments, * `zpool reopen` with pool name as argument, * `zpool reopen` while scrubbing, * `zpool reopen -n` while scrubbing, * `zpool reopen -n` while resilvering, * `zpool reopen` with bad arguments. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com> Closes #6076 Closes #6746	2017-10-26 12:26:09 -07:00
Fabian-Gruenbichler	3ad59c015d	arcstat: flush stdout / outfile after each line Otherwise, if arcstat gets interrupted before the desired number of iterations is reached, the output file will be empty (both if set via '-o' or via shell redirection). Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Closes #6775	2017-10-26 12:18:49 -07:00
Giuseppe Di Natale	64b8c58e3e	Ensure arc_size_break is filled in arc_summary.py Use mfu_size and mru_size pulled from the arcstats kstat file to calculate the mfu and mru percentages for arc size breakdown. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com> Reviewed-by: AndCycle <andcycle@andcycle.idv.tw> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #5526 Closes #6770	2017-10-23 14:18:12 -07:00
Giuseppe Di Natale	63e5e960ba	Correct flake8 errors after STYLE builder update Fix new flake8 errors related to bare excepts and ambiguous variable names due to a STYLE builder update. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov> Closes #6776	2017-10-23 14:01:43 -07:00
adisbladis	f8cd871a01	Use ashift=12 by default on SSDSC2BW48 disks Currently the 480GB models of this disk do not use ashift=12 by default. SSDSC2BW48 is also optimized for 4k blocks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: adisbladis <adis@blad.is> Closes #6774	2017-10-23 11:00:45 -07:00
Tobin Harding	c721ba435f	Fix coverity defects: CID 161388 CID 161388: Resource Leak (REASOURCE_LEAK) Jump to errout so that file descriptor gets closed before returning from function. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tobin C. Harding <me@tobin.cc> Closes #6755	2017-10-17 09:37:50 -07:00
Tobin Harding	ced28193b0	Fix coverity defects: 147480, 147584 CID 147480: Logically dead code (DEADCODE) Remove non-null check and subsequent function call. Add ASSERT to future proof the code. usage label is only jumped to before `zhp` is initialized. CID 147584: Out-of-bounds access (OVERRUN) Subtract length of current string from buffer length for `size` argument to `snprintf`. Starting address for the write is the start of the buffer + the current string length. We need to subtract this string length else risk a buffer overflow. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tobin C. Harding <me@tobin.cc> Closes #6745	2017-10-16 15:32:48 -07:00
Tom Caputi	440a3eb939	Fixes for #6639 Several issues were uncovered by running stress tests with zfs encryption and raw sends in particular. The issues and their associated fixes are as follows: * arc_read_done() has the ability to chain several requests for the same block of data via the arc_callback_t struct. In these cases, the ARC would only use the first request's dsobj from the bookmark to decrypt the data. This is problematic because the first request might be a prefetch zio which is able to handle the key not being loaded, while the second might use a different key that it is sure will work. The fix here is to pass the dsobj with each individual arc_callback_t so that each request can attempt to decrypt the data separately. * DRR_FREE and DRR_FREEOBJECT records in a send file were not having their transactions properly tagged as raw during raw sends, which caused a panic when the dbuf code attempted to decrypt these blocks. * traverse_prefetch_metadata() did not properly set ZIO_FLAG_SPECULATIVE when issuing prefetch IOs. * Added a few asserts and code cleanups to ensure these issues are more detectable in the future. Signed-off-by: Tom Caputi <tcaputi@datto.com>	2017-10-11 16:55:50 -04:00
Tom Caputi	4807c0badb	Encryption patch follow-up * PBKDF2 implementation changed to OpenSSL implementation. * HKDF implementation moved to its own file and tests added to ensure correctness. * Removed libzfs's now unnecessary dependency on libzpool and libicp. * Ztest can now create and test encrypted datasets. This is currently disabled until issue #6526 is resolved, but otherwise functions as advertised. * Several small bug fixes discovered after enabling ztest to run on encrypted datasets. * Fixed coverity defects added by the encryption patch. * Updated man pages for encrypted send / receive behavior. * Fixed a bug where encrypted datasets could receive DRR_WRITE_EMBEDDED records. * Minor code cleanups / consolidation. Signed-off-by: Tom Caputi <tcaputi@datto.com>	2017-10-11 16:54:48 -04:00
Tobin Harding	a0430cc5a9	Use bitwise '&' instead of logical '&&' Make two instances of the same change. Change bitwise AND (&) to logical AND (&&). Currently the code uses a bitwise AND between two boolean values. In the first instance; The first operand is a flag that has been bitwise combined with a bit mask to get a boolean value as to whether a file has group write permissions set. The second operand used is a struct member that is intended as a boolean flag not a bit mask. In the second instance the argument is the same except with world write permissions instead of group write (S_IWOTH, S_IWGRP). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Tobin C. Harding <me@tobin.cc> Closes #6684 Closes #6722	2017-10-05 19:38:55 -07:00
Simon Guest	269db7a4b3	vdev_id: extension for new scsi topology On systems with SCSI rather than SAS disk topology, this change enables the vdev_id script to match against the block device path, and therefore create a vdev alias in /dev/disk/by-vdev. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Simon Guest <simon.guest@tesujimath.org> Closes #6592	2017-09-27 10:39:47 -07:00
LOLi	3fd3e56cfd	Fix some ZFS Test Suite issues * Add 'zfs bookmark' coverage (zfs_bookmark_cliargs) * Add OpenZFS 8166 coverage (zpool_scrub_offline_device) * Fix "busy" zfs_mount_remount failures * Fix bootfs_003_pos, bootfs_004_neg, zdb_005_pos local cleanup * Update usage of $KEEP variable, add get_all_pools() function * Enable history_008_pos and rsend_019_pos (non-32bit builders) * Enable zfs_copies_005_neg, update local cleanup * Fix zfs_send_007_pos (large_dnode + OpenZFS 8199) * Fix rollback_003_pos (use dataset name, not mountpoint, to unmount) * Update default_raidz_setup() to work properly with more than 3 disks * Use $TEST_BASE_DIR instead of hardcoded (/var)/tmp for file VDEVs * Update usage of /dev/random to /dev/urandom Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Issue #6086 Closes #5658 Closes #6143 Closes #6421 Closes #6627 Closes #6632	2017-09-25 10:32:34 -07:00
Olaf Faaland	b33d668ddb	Fix ZTS MMP tests and ztest -M behavior Quote "$MMP_IMPORT_MSG" when it is passed as an argument, as it is a multi-word string. Some tests were passing when they should not have, because the grep was only testing for the first word. Correct the message expected when no hostid is set and the test attempts to enable multihost. It did not match the actual output in that situation. Disable ztest_reguid() when ztest is invoked with the -M option. If ztest performs a reguid, a concurrent import attempt may fail with the error "one or more devices is currently unavailable" if the guid sum is calculated on the original device guids but compared against the guid sum ztest wrote based on the new device guids. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #6666	2017-09-23 09:28:18 -07:00
David Quigley	a9a2bf7152	Remove FRU and LIBTOPO Support FRU and LIBTOPO support are illumos only features that will not be ported to Linux and make the code more complicated than necessary. This commit makes way for further cleanups of the zed/FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #6641	2017-09-18 17:06:40 -07:00
David Quigley	1f4e2c88fd	ZTEST: Always enable asserts The build for ztest always enabled debug information but does not enable asserts unless --enable-debug is used. This will always enable asserts in the ztest code. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: David Quigley <david.quigley@intel.com> Closes #6640	2017-09-15 13:26:05 -07:00
LOLi	835db58592	Add -vnP support to 'zfs send' for bookmarks This leverages the functionality introduced in `cf7684b` to expose verbose, dry-run and parsable 'zfs send' options for bookmarks. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #3666 Closes #6601	2017-09-08 15:24:31 -07:00
Olaf Faaland	4c5b89f59e	Improved dnode allocation and dmu_hold_impl() Refactor dmu_object_alloc_dnsize() and dnode_hold_impl() to simplify the code, fix errors introduced by commit `dbeb879` (PR #6117) interacting badly with large dnodes, and improve performance. * When allocating a new dnode in dmu_object_alloc_dnsize(), update the percpu object ID for the core's metadnode chunk immediately. This eliminates most lock contention when taking the hold and creating the dnode. * Correct detection of the chunk boundary to work properly with large dnodes. * Separate the dmu_hold_impl() code for the FREE case from the code for the ALLOCATED case to make it easier to read. * Fully populate the dnode handle array immediately after reading a block of the metadnode from disk. Subsequently the dnode handle array provides enough information to determine which dnode slots are in use and which are free. * Add several kstats to allow the behavior of the code to be examined. * Verify dnode packing in large_dnode_008_pos.ksh. Since the test is purely creates, it should leave very few holes in the metadnode. * Add test large_dnode_009_pos.ksh, which performs concurrent creates and deletes, to complement existing test which does only creates. With the above fixes, there is very little contention in a test of about 200,000 racing dnode allocations produced by tests 'large_dnode_008_pos' and 'large_dnode_009_pos'. name type data dnode_hold_dbuf_hold 4 0 dnode_hold_dbuf_read 4 0 dnode_hold_alloc_hits 4 3804690 dnode_hold_alloc_misses 4 216 dnode_hold_alloc_interior 4 3 dnode_hold_alloc_lock_retry 4 0 dnode_hold_alloc_lock_misses 4 0 dnode_hold_alloc_type_none 4 0 dnode_hold_free_hits 4 203105 dnode_hold_free_misses 4 4 dnode_hold_free_lock_misses 4 0 dnode_hold_free_lock_retry 4 0 dnode_hold_free_overflow 4 0 dnode_hold_free_refcount 4 57 dnode_hold_free_txg 4 0 dnode_allocate 4 203154 dnode_reallocate 4 0 dnode_buf_evict 4 23918 dnode_alloc_next_chunk 4 4887 dnode_alloc_race 4 0 dnode_alloc_next_block 4 18 The performance is slightly improved for concurrent creates with 16+ threads, and unchanged for low thread counts. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Olaf Faaland <faaland1@llnl.gov> Closes #5396 Closes #6522 Closes #6414 Closes #6564	2017-09-05 16:15:04 -07:00
LOLi	db4c1adaf8	Add support for DMU_OTN_* types in dbufstat.py Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6535	2017-08-22 11:53:40 -07:00
LOLi	f763c3d1df	Fix range locking in ZIL commit codepath Since OpenZFS 7578 (`1b7c1e5`) if we have a ZVOL with logbias=throughput we will force WR_INDIRECT itxs in zvol_log_write() setting itx->itx_lr offset and length to the offset and length of the BIO from zvol_write()->zvol_log_write(): these offset and length are later used to take a range lock in zillog->zl_get_data function: zvol_get_data(). Now suppose we have a ZVOL with blocksize=8K and push 4K writes to offset 0: we will only be range-locking 0-4096. This means the ASSERTion we make in dbuf_unoverride() is no longer valid because now dmu_sync() is called from zilog's get_data functions holding a partial lock on the dbuf. Fix this by taking a range lock on the whole block in zvol_get_data(). Reviewed-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6238 Closes #6315 Closes #6356 Closes #6477	2017-08-21 08:59:48 -07:00
Brian Behlendorf	c8f9061fc7	Retire legacy test infrastructure * Removed zpios kmod, utility, headers and man page. * Removed unused scripts zpios-profile/, zpios-test/, zpool-config/, smb.sh, zpios-sanity.sh, zpios-survey.sh, zpios.sh, and zpool-create.sh. Removed zfs-script-config.sh.in. When building 'make' generates a common.sh with in-tree path information from the common.sh.in template. This file and sourced by the test scripts and used for in-tree testing, it is not included in the packages. When building packages 'make install' uses the same template to create a new common.sh which is appropriate for the packaging. * Removed unused functions/variables from scripts/common.sh.in. Only minimal path information and configuration environment variables remain. * Removed unused scripts from scripts/ directory. * Remaining shell scripts in the scripts directory updated to cleanly pass shellcheck and added to checked scripts. * Renamed tests/test-runner/cmd/ to tests/test-runner/bin/ to match install location name. * Removed last traces of the --enable-debug-dmu-tx configure options which was retired some time ago. Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6509	2017-08-15 17:26:38 -07:00
sckobras	d49d9c2bdc	vdev_id: implement slot numbering by port id With HPE hardware and hpsa-driven SAS adapters, only a single phy is reported, but no individual per-port phys (ie. no phy* entry below port_dir), which breaks topology detection in the current sas_handler code. Instead, slot information can be derived directly from the port number. This change implements a new slot keyword "port" similar to "id" and "lun", and assumes a default phy/port of 0 if no individual phy entry can be found. It allows to use the "sas_direct" topology with current HPE Dxxxx and Apollo 45xx JBODs. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Daniel Kobras <d.kobras@science-computing.de> Closes #6484	2017-08-14 15:18:26 -07:00
Don Brady	d977122da9	Add corruption failure option to zinject(8) Added a 'corrupt' error option that will flip a bit in the data after a read operation. This is useful for generating checksum errors at the device layer (in a mirror config for example). It is also used to validate the diagnosis of checksum errors from the zfs diagnosis engine. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@intel.com> Closes #6345	2017-08-14 15:17:15 -07:00
Tom Caputi	b525630342	Native Encryption for ZFS on Linux This change incorporates three major pieces: The first change is a keystore that manages wrapping and encryption keys for encrypted datasets. These commands mostly involve manipulating the new DSL Crypto Key ZAP Objects that live in the MOS. Each encrypted dataset has its own DSL Crypto Key that is protected with a user's key. This level of indirection allows users to change their keys without re-encrypting their entire datasets. The change implements the new subcommands "zfs load-key", "zfs unload-key" and "zfs change-key" which allow the user to manage their encryption keys and settings. In addition, several new flags and properties have been added to allow dataset creation and to make mounting and unmounting more convenient. The second piece of this patch provides the ability to encrypt, decyrpt, and authenticate protected datasets. Each object set maintains a Merkel tree of Message Authentication Codes that protect the lower layers, similarly to how checksums are maintained. This part impacts the zio layer, which handles the actual encryption and generation of MACs, as well as the ARC and DMU, which need to be able to handle encrypted buffers and protected data. The last addition is the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised. Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes #494 Closes #5769	2017-08-14 10:36:48 -07:00
Brian Behlendorf	c25b8f99f8	Simplify threads, mutexs, cvs and rwlocks * Simplify threads, mutexs, cvs and rwlocks * Update the zk_thread_create() function to use the same trick as Illumos. Specifically, cast the new pthread_t to a void pointer and return that as the kthread_t . This avoids the issues associated with managing a wrapper structure and is safe as long as the callers never attempt to dereference it. Update all function prototypes passed to pthread_create() to match the expected prototype. We were getting away this with before since the function were explicitly cast. * Replaced direct zk_thread_create() calls with thread_create() for code consistency. All consumers of libzpool now use the proper wrappers. * The mutex_held() calls were converted to MUTEX_HELD(). * Removed all mutex_owner() calls and retired the interface. Instead use MUTEX_HELD() which provides the same information and allows the implementation details to be hidden. In this case the use of the pthread_equals() function. * The kthread_t, kmutex_t, krwlock_t, and krwlock_t types had any non essential fields removed. In the case of kthread_t and kcondvar_t they could be directly typedef'd to pthread_t and pthread_cond_t respectively. * Removed all extra ASSERTS from the thread, mutex, rwlock, and cv wrapper functions. In practice, pthreads already provides the vast majority of checks as long as we check the return code. Removing this code from our wrappers help readability. * Added TS_JOINABLE state flag to pass to request a joinable rather than detached thread. This isn't a standard thread_create() state but it's the least invasive way to pass this information and is only used by ztest. TEST_ZTEST_TIMEOUT=3600 Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Tom Caputi <tcaputi@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4547 Closes #5503 Closes #5523 Closes #6377 Closes #6495	2017-08-11 08:51:44 -07:00
Brian Behlendorf	46364cb2f3	Add libtpool (thread pools) OpenZFS provides a library called tpool which implements thread pools for user space applications. Porting this library means the zpool utility no longer needs to borrow the kernel mutex and taskq interfaces from libzpool. This code was updated to use the tpool library which behaves in a very similar fashion. Porting libtpool was relatively straight forward and minimal modifications were needed. The core changes were: * Fully convert the library to use pthreads. * Updated signal handling. * lmalloc/lfree converted to calloc/free * Implemented portable pthread_attr_clone() function. Finally, update the build system such that libzpool.so is no longer linked in to zfs(8), zpool(8), etc. All that is required is libzfs to which the zcommon soures were added (which is the way it always should have been). Removing the libzpool dependency resulted in several build issues which needed to be resolved. * Moved zfeature support to module/zcommon/zfeature_common.c * Moved ratelimiting to to module/zfs/zfs_ratelimit.c * Moved get_system_hostid() to lib/libspl/gethostid.c * Removed use of cmn_err() in zcommon source * Removed dprintf_setup() call from zpool_main.c and zfs_main.c * Removed highbit() and lowbit() * Removed unnecessary library dependencies from Makefiles * Removed fletcher-4 kstat in user space * Added sha2 support explicitly to libzfs * Added highbit64() and lowbit64() to zpool_util.c Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6442	2017-08-09 15:31:08 -07:00

1 2 3 4 5 ...

664 Commits