mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-06-06 14:16:36 +03:00

Author	SHA1	Message	Date
Etienne Dechamps	e8fe6684a5	Document metaslab_aliquot. Signed-off-by: Etienne Dechamps <etienne@edechamps.fr> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-22 14:19:31 -07:00
Etienne Dechamps	bb3250d07e	Allocate disk space fairly in the presence of vdevs of unequal size. The metaslab allocator device selection algorithm contains a bias mechanism whose goal is to achieve roughly equal disk space usage across all top-level vdevs. It seems that the initial rationale for this code was to allow newly added (empty) vdevs to "come up to speed" faster in an attempt to make the pool quickly converge to a steady state where all vdevs are equally utilized. While the code seems to work reasonably well for this use case, there is another scenario in which this algorithm fails miserably: the case where top-level vdevs don't have the same sizes (capacities). ZFS allows this, and it is a good feature to have, so that users who simply want to build a pool with the disks they happen to have lying around can do so even if the disks have heteregenous sizes. Here's a script that simulates a pool with two vdevs, with one 4X larger than the other: dd if=/dev/zero of=/tmp/d1 bs=1 count=1 seek=134217728 dd if=/dev/zero of=/tmp/d2 bs=1 count=1 seek=536870912 zpool create testspace /tmp/d1 /tmp/d2 dd if=/dev/zero of=/testspace/foobar bs=1M count=256 zpool iostat -v testspace Before this commit, the script would output the following: capacity pool alloc free ---------- ----- ----- testspace 252M 375M /tmp/d1 104M 18.5M /tmp/d2 148M 356M ---------- ----- ----- This demonstrates that the current code handles this situation very poorly: d1 shows 85% usage despite the pool itself being only 40% full. d1 is quite saturated at this point, and is slowing down the entire pool due to saturation, fragmentation and the like. In contrast, here's the result with the code in this commit: capacity pool alloc free ---------- ----- ----- testspace 252M 375M /tmp/d1 56.7M 66.3M /tmp/d2 195M 309M ---------- ----- ------ This looks much better. d1 is 46% used, which is close to the overall pool utilization (40%). The code still doesn't result in perfectly balanced allocation, probably because of the way mg_bias is applied which does not guarantee perfect accuracy, but this is still much better than before. Signed-off-by: Etienne Dechamps <etienne@edechamps.fr> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3389	2015-06-22 14:18:29 -07:00
Brian Behlendorf	218b4e0a76	Add zfs_sb_prune_aliases() function For kernels which do not implement a per-suberblock shrinker, those older than Linux 3.1, the shrink_dcache_parent() function was used to attempt to reclaim dentries. This was found not be entirely reliable and could lead to performance issues on older kernels running meta-data heavy workloads. To address this issue a zfs_sb_prune_aliases() function has been added to implement this functionality. It relies on traversing the list of znodes for a filesystem and adding them to a private list with a reference held. The private list can then be safely walked outside the z_znodes_lock to prune dentires and drop the last reference so the inode can be freed. This provides the same synchronous behavior as the per-filesystem shrinker and has the advantage of depending on only long standing interfaces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #3501	2015-06-22 10:22:49 -07:00
Brian Behlendorf	4c6a700910	Increase the number of iput taskq threads The number of threads in the iput taskq has been increased to speed up the number of iputs which can be handled. This has been observed to improve the meta data reclaim regardless of zfs_sb_prune() implementation in use. The taskq has also been renamed z_iput to for consistency with the rest of the I/O pipeline taskqs which are all named z_*. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com>	2015-06-22 10:22:10 -07:00
Matus Kral	57ae840077	Linux 4.1 compat: use read_iter() / write_iter() Linux 3.15 commit torvalds/linux@293bc98 introduced two new methods. The ->read_iter() and ->write_iter() methods were designed to replace the ->aio_read() and ->aio_write() interfaces. Both interfaces were preserved for several kernel releases in order to migrate all existing consumers to the new interfaces. But as of Linux 4.1 the legacy interface has been retired and the ZFS code must be updated to use the new interfaces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3352	2015-06-18 12:06:59 -07:00
Tim Chase	90947b2357	3.12 compat, NUMA-aware per-superblock shrinker Kernels >= 3.12 have a NUMA-aware superblock shrinker which is used in ZoL by zfs_sb_prune(). This patch calls the shrinker for each on-line NUMA node in order that memory be freed for each one. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3495	2015-06-17 10:43:13 -07:00
Brian Behlendorf	8e70975f90	Wait interruptibly in prefetch thread The Linux kernel watchdog will automatically dump a backtrace for any process while sleeps for over 120s in an uninterruptible state. The solution is for the prefetch thread to sleep in an interruptible state. The way the existing code was written this is safe because when woken it will always reevaluate its conditional. As a general rule it is preferable to sleep in an interruptible when possible. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3450 Closes #3402	2015-06-16 16:18:11 -07:00
Brian Behlendorf	b64ccd6c52	Rename cv_wait_interruptible() to cv_wait_sig() This is the counterpart to zfsonlinux/spl@2345368 which replaces the cv_wait_interruptible() function with cv_wait_sig(). There is no functional change to patch merely brings the function names in to sync to maximize portability. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3450 Issue #3402	2015-06-11 10:50:47 -07:00
Tim Chase	121b3cae74	Increase arc_c_min to allow safe operation of arc_adapt() ZoL had lowered the minimum ARC size to 4MiB to better accommodate tiny systems such as the raspberry pi, however, as of addition of large block support, the arc_adapt() function depends on arc_c being >= 32MiB (2 * SPA_MAXBLOCKSIZE). This patch raises the minimum ARC size to 32MiB and adds a VERIFY test to arc_adapt() for future-proofing. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
Brian Behlendorf	f604673836	Make arc_prune() asynchronous As described in the comment above arc_adapt_thread() it is critical that the arc_adapt_thread() function never sleep while holding a hash lock. This behavior was possible in the Linux implementation because the arc_prune() logic was implemented to be synchronous. Under illumos the analogous dnlc_reduce_cache() function is asynchronous. To address this the arc_do_user_prune() function is has been reworked in to two new functions as follows: * arc_prune_async() is an asynchronous implementation which dispatches the prune callback to be run by the system taskq. This makes it suitable to use in the context of the arc_adapt_thread(). * arc_prune() is a synchronous implementation which depends on the arc_prune_async() implementation but blocks until the outstanding callbacks complete. This is used in arc_kmem_reap_now() where it is safe, and expected, that memory will be freed. This patch additionally adds the zfs_arc_meta_strategy module option while allows the meta reclaim strategy to be configured. It defaults to a balanced strategy which has been proved to work well under Linux but the illumos meta-only strategy can be enabled. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
Brian Behlendorf	c5528b9ba6	Use taskq_wait_outstanding() function Replace taskq_wait() with taskq_wait_oustanding(). This way callers will only block until previously submitted tasks have been completed. This was the previous behavior of task_wait() prior to the introduction of taskq_wait_outstanding() so this isn't really a functionalty change for these callers. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
Prakash Surya	ca0bf58d65	Illumos 5497 - lock contention on arcs_mtx Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> Porting notes and other significant code changes: The illumos 5368 patch (ARC should cache more metadata), which was never picked up by ZoL, is mostly reverted by this patch. Since ZoL relies on the kernel asynchronously calling the shrinker to actually reap memory, the shrinker wakes up arc_reclaim_waiters_cv every time it runs. The arc_adapt_thread() function no longer calls arc_do_user_evicts() since the newly-added arc_user_evicts_thread() calls it periodically. Notable conflicting ZoL commits which conflicted with this patch or whose effects are either duplicated or un-done by this patch: `302f753` - Integrate ARC more tightly with Linux `39e055c` - Adjust arc_p based on "bytes" in arc_shrink `f521ce1` - Allow "arc_p" to drop to zero or grow to "arc_c" `77765b5` - Remove "arc_meta_used" from arc_adjust calculation `94520ca` - Prune metadata from ghost lists in arc_adjust_meta Trace support for multilist_insert() and multilist_remove() has been added and produces the following output: fio-12498 [077] .... 112936.448324: zfs_multilist__insert: ml { offset 240 numsublists 80 sublistidx 63 } fio-12498 [077] .... 112936.448347: zfs_multilist__remove: ml { offset 240 numsublists 80 sublistidx 29 } The following arcstats have been removed: recycle_miss - Used by arcstat.py and arc_summary.py, both of which have been updated appropriately. l2_writes_hdr_miss The following arcstats have been added: evict_not_enough - Number of times arc_evict_state() was unable to evict enough buffers to reach its target amount. evict_l2_skip - Number of times arc_evict_hdr() skipped eviction because it was being written to the l2arc. l2_writes_lock_retry - Replaces l2_writes_hdr_miss. Number of times l2arc_write_done() failed to acquire hash_lock (and re-tries). arc_meta_min - Shows the value of the zfs_arc_meta_min module parameter (see below). The "index" column of the "dbuf" kstat has been removed since it doesn't have a direct analog in the new multilist scheme. Additional multilist- related stats could be added in the future but would likely require extensions to the mulilist API. The following module parameters have been added: zfs_arc_evict_batch_limit - Number of ARC headers to free per sub-list before moving on to the next sub-list. zfs_arc_meta_min - Enforce a floor on the amount of metadata in the ARC. zfs_arc_num_sublists_per_state - Number of multilist sub-lists per ARC state. zfs_arc_overflow_shift - Controls amount by which the ARC must exceed the target size to be considered "overflowing". Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov	2015-06-11 10:27:25 -07:00
Chris Williamson	b9541d6b7d	Illumos 5408 - managing ZFS cache devices requires lots of RAM 5408 managing ZFS cache devices requires lots of RAM Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Don Brady <dev.fs.zfs@gmail.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> Porting notes: Due to the restructuring of the ARC-related structures, this patch conflicts with at least the following existing ZoL commits: `6e1d7276c9` Fix inaccurate arcstat_l2_hdr_size calculations The ARC_SPACE_HDRS constant no longer exists and has been somewhat equivalently replaced by HDR_L2ONLY_SIZE. `e0b0ca983d` Add visibility in to cached dbufs The new layering of l{1,2}arc_buf_hdr_t within the arc_buf_hdr struct requires additional structure member names to be used when referencing the inner items. Also, the presence of L1 or L2 inner member is indicated by flags using the new HDR_HAS_L{1,2}HDR macros. Ported by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
George Wilson	2a4324141f	Illumos 5369 - arc flags should be an enum 5369 arc flags should be an enum 5370 consistent arc_buf_hdr_t naming scheme Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Sebastien Roy <sebastien.roy@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Richard Lowe <richlowe@richlowe.net> Porting notes: ZoL has moved some ARC definitions into arc_impl.h. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Ported by: Tim Chase <tim@chase2k.com>	2015-06-11 10:27:25 -07:00
Tim Chase	ad4af89561	Partially revert "Add ddt, ddt_entry, and l2arc_hdr caches" This reverts only the l2arc_hdr part of commit `ecf3d9b8e6` in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which does the same thing but uses the newer two-level ARC structure following the Illumos 5408 "managing ZFS cache devices requires lots of RAM" patch. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
Tim Chase	97639d0a52	Revert "Allow arc_evict_ghost() to only evict meta data" Illumos 5497 "lock contention on arcs_mtx" reworks eviction and obviates the need for this. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:25 -07:00
Tim Chase	f6b3b1f5d6	Revert "fix l2arc compression buffers leak" This reverts commit `037763e44e` in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which includes a fix for this very problem. ZoL had picked up a subset of the illumos 5497 patch to deal with the l2arc compression buffer leak. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:24 -07:00
Tim Chase	7807028ccd	Revert "arc_evict, arc_evict_ghost: reduce stack usage using kmem_zalloc" This reverts commit `16fcdea363` in preparation for the illumos 5497 "lock contention on arcs_mtx" patch which eliminates "marker" within the ARC code. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-06-11 10:27:24 -07:00
Brian Behlendorf	44de2f02d6	Remove unused variable in vdev_add_child() Commit `c3520e7` restructured vdev_add_child() in such a way that the spa variable was unused during non-debug builds. This is consistent with the upstream illumos code but because ZoL, unlike illumos, is built with all compiler warnings enabled this causes a legitimate warning. Revert this hunk of the patch to keep the build clean. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3432	2015-06-11 10:22:38 -07:00
Matthew Ahrens	c3520e7f1f	Illumos 5818 - zfs {ref}compressratio is incorrect with 4k sector size 5818 zfs {ref}compressratio is incorrect with 4k sector size Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Albert Lee <trisk@omniti.com> References: https://www.illumos.org/issues/5818 https://github.com/illumos/illumos-gate/commit/81cd5c5 Ported-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3432	2015-06-10 16:24:01 -07:00
Arne Jansen	9c43027b3f	Illumos 5269 - zpool import slow 5269 zpool import slow Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5269 https://github.com/illumos/illumos-gate/commit/12380e1e Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3396	2015-06-09 13:48:02 -07:00
Ned Bass	5f8e1e8505	dmu_objset_userquota_get_ids uses dn_bonus unsafely The function dmu_objset_userquota_get_ids() checks and uses dn->dn_bonus outside of dn_struct_rwlock. If the dnode is being freed then the bonus dbuf may be in the process of getting evicted. In this case there is a race that may cause dmu_objset_userquota_get_ids() to access the dbuf after it has been destroyed. To prevent this, ensure that when we are using the bonus dbuf we are either holding a reference on it or have taken dn_struct_rwlock. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3443	2015-06-05 12:41:17 -07:00
Ned Bass	d617648c7f	dbuf_try_add_ref minor bug fixes - Don't check db->bb_blkid, but use the blkid argument instead. Checking db->db_blkid may be unsafe since we doesn't yet have a hold on the dbuf so its validity is unknown. - Call mutex_exit() on found_db, not db, since it's not certain that they point to the same dbuf, and the mutex was taken on found_db. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3443	2015-06-05 12:40:38 -07:00
Matthew Ahrens	e5fd1dd682	Illumos 5243 - zdb -b could be much faster 5243 zdb -b could be much faster Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5243 https://github.com/illumos/illumos-gate/commit/f7950bf Ported-by: Don Brady <don.brady@intel.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3414	2015-05-15 11:14:54 -07:00
Jan Sanislo	79065ed5a4	Return -ESTALE to force lookup for missing NFS file handles There seems to be a annoying problem using NFSv4 to access ZFS file systems under certain circumstances. It's easily reproduced: nfs_client1: mount server:/export /mnt nfs_client1: cd /mnt nfs_client1: echo foo >junk nfs_client1: cat junk foo Now on a different NFSv4 client: nfs_client2: mount server:/export /mnt nfs_client2: cd /mnt nfs_client2: vi junk # Make some changes to /mnt/junk and save # This change the inode associated with /mnt/junk Now back to the original client: nfs_client1: cat junk cat: junk: No such file or directory Admittedly NFSv4 is not advertised as a cluster file system that maintains a completely coherent view of data across multiple nodes. But it does have some mechanisms built in that try to deal with situations like the above. Namely, it employs specialized file handle lookup routines that return ESTALE when a file handle contains a non-existant inode value. The ESTALE return triggers a return full file path lookup from the client to determine if the file has actually gone away or if the cached file handle is no longer valid. ZFS behavior can be brought into line with other file systems (e.g., ext4) by applying the following patch: Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3224	2015-05-14 11:16:52 -07:00
Antonio Russo	7290cd3c4e	Relax restriction on zfs_ioc_next_obj() iteration Per the documentation for dnode_next_offset in dnode.c, the "txg" parameter specifies a lower bound on which transaction the dnode can be found in. We are interested in all dnodes that are removed between the first and last transaction in the snapshot. It doesn't need to be created in that snapshot to correspond to a removed file. In fact, the behavior of zfs diff in the test case exactly matches this: the transaction that created the data that was deleted in snapshot "2" was produced before, in snapshot "1", definitely predating the first transaction in snapshot "2". Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <Tim Chase <tim@onlight.com> Closes #2081	2015-05-14 11:16:08 -07:00
Brian Behlendorf	fd0fd6467b	Remove unused 'dsl_pool_t dp' variable When ASSERTs are compiled out by using the --disable-debug configure option. Then the local variable 'dsl_pool_t dp' will be unused and generate a compiler warning. Since this variable is only used once in the ASSERT replace it with 'ds->ds_dir->dd_pool'. This has the additional advantage of potentially saving a few bytes on the stack depending on how gcc decides to compile the function. This issue was not noticed immediately because the automated builders use --enable-debug to make the testing as rigorous as possible. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Closes #3410	2015-05-14 11:09:47 -07:00
Max Grossman	5dc8b7365f	Illumos 5765 - add support for estimating send stream size with lzc_send_space when source is a bookmark 5765 add support for estimating send stream size with lzc_send_space when source is a bookmark Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@nexenta.com> References: https://www.illumos.org/issues/5765 https://github.com/illumos/illumos-gate/commit/643da460 Porting notes: * Unused variable 'recordsize' in dmu_send_estimate() dropped Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3397	2015-05-13 09:03:59 -07:00
Justin T. Gibbs	19b3b1d2a2	Illumos 5393 - spurious failures from dsl_dataset_hold_obj() 5393 spurious failures from dsl_dataset_hold_obj() Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Will Andrews <willa@spectralogic.com> Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5393 https://github.com/illumos/illumos-gate/commit/e1f3c20 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3403	2015-05-13 08:56:04 -07:00
Justin T. Gibbs	63b33e878a	Illumos 5562 - ZFS sa_handle's violate kmem invariants, debug kernels panic on boot 5562 ZFS sa_handle's violate kmem invariants, debug kernels panic on boot Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Rich Lowe <richlowe@richlowe.net> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5562 https://github.com/illumos/illumos-gate/commit/0fda3cc5 Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3388	2015-05-11 15:10:57 -07:00
Matthew Ahrens	252e1a54ab	Illumos 5810 - zdb should print details of bpobj 5810 zdb should print details of bpobj Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Will Andrews <will@freebsd.org> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5810 https://github.com/illumos/illumos-gate/commit/732885fc Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3387	2015-05-11 15:10:24 -07:00
Matthew Ahrens	10400bfeac	Illumos 5351, 5352 - scrub pauses 5351 scrub goes for an extra second each txg 5352 scrub should pause when there is some dirty data Author: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5351 https://github.com/illumos/illumos-gate/commit/6f6a76a Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3383	2015-05-11 15:09:56 -07:00
Matthew Ahrens	08dc1b2ddd	Illumos 5350 - clean up code in dnode_sync() Author: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5350 https://github.com/illumos/illumos-gate/commit/e651831 Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3382	2015-05-11 15:09:51 -07:00
Alex Reece	7224c67fea	Illumos 5422 - preserve AVL invariants in dn_dbufs Author: Alex Reece <alex@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Albert Lee <trisk@nexenta.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5422 https://github.com/illumos/illumos-gate/commit/a846f19 Ported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3381	2015-05-11 15:09:29 -07:00
David Lamparter	214806c7e9	Safely handle security / ACL failures The security and ACL operations should all be performed atomically. To accomplish this there would need to significant invasive changes made to the common code base. For the moment it's desirable for compatibility reasons to avoid this. Therefore the code has been updated to attempt to unwind the operation in case of failure rather than panic. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2445	2015-05-11 14:35:14 -07:00
Matthew Ahrens	f1512ee61e	Illumos 5027 - zfs large block support 5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #354	2015-05-11 12:23:16 -07:00
Brian Behlendorf	f9cab37291	Remove metaslab_min_alloc_size module option The metaslab_min_alloc_size option is no longer used in the code. This functionality was removed by commit `f3a7f66` and the module options should have been dropped at that time. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-11 09:51:57 -07:00
Chris Dunlop	16fcdea363	arc_evict, arc_evict_ghost: reduce stack usage using kmem_zalloc With debugging enabled and depending on your kernel config, the size of arc_buf_hdr_t can blow out the stack of arc_evict() and arc_evict_ghost() to greater than 1024 bytes. Let's avoid this. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3377	2015-05-08 14:14:35 -07:00
Matthew Ahrens	63e3a8616b	Illumos 5349 - verify that block pointer is plausible before reading 5349 verify that block pointer is plausible before reading Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Xin Li <delphij@FreeBSD.org> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5349 https://github.com/illumos/illumos-gate/commit/f63ab3d5 Porting notes: * Several variable declarations were moved due to C style needs Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3373	2015-05-08 14:09:15 -07:00
Chris Dunlop	f0da4d1508	Wait for all znodes to be released before tearing down the superblock By the time we're tearing down our superblock the VFS has started releasing all our inodes/znodes. Some of this work may have been handed off to our iput taskq so we need to wait for that work to complete. However the iput from the taskq can itself result in additional work being added to the taskq: dsl_pool_iput_taskq iput iput_final evict destroy_inode zpl_inode_destroy zfs_inode_destroy zfs_iput_async(ZTOI(zp->z_xattr_parent)) taskq_dispatch(dsl_pool_iput_taskq..., iput, ...) Let's wait until all our znodes have been released. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3281	2015-05-06 14:13:19 -07:00
Matthew Ahrens	7a3066ffdd	Illumos 5348 - zio_checksum_error() only fills in info if ECKSUM 5348 zio_checksum_error() only fills in info if ECKSUM Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5348 https://github.com/illumos/illumos-gate/commit/373dc1cf Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3372	2015-05-05 11:05:37 -07:00
Matthew Ahrens	f3c517d814	Illumos 5820 - verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig) 5820 verify failed in zio_done(): BP_EQUAL(bp, io_bp_orig) Reviewed by: Alex Reece <alex@delphix.com> Reviewed by: George Wilson <george@delphix.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/5820 https://github.com/illumos/illumos-gate/commit/34e8acef00 Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3364	2015-05-04 10:49:49 -07:00
Matthew Ahrens	36c6ffb6b6	Illumos 5808 - spa_check_logs is not necessary on readonly pools 5808 spa_check_logs is not necessary on readonly pools Reviewed by: George Wilson <george@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Reviewed by: Will Andrews <will@freebsd.org> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5808 https://github.com/illumos/illumos-gate/commit/23367a2f Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3369	2015-05-04 10:45:42 -07:00
Will Andrews	50f9ea0149	Illumos 5814 - bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. 5814 bpobj_iterate_impl(): Close a refcount leak iterating on a sublist. Reviewed by: Prakash Surya <prakash.surya@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com> Reviewed by: Simon Klinkert <simon.klinkert@gmail.com> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5814 https://github.com/illumos/illumos-gate/commit/b67dde11 Ported-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3368	2015-05-04 10:28:18 -07:00
Christopher Siden	0c60cc326b	Illumos 4951 - ZFS administrative commands (fix) 4951 ZFS administrative commands should use reserved space, not fail with ENOSPC Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/4951 https://github.com/illumos/illumos-gate/commit/c39f2c8 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-04 09:41:10 -07:00
Matthew Ahrens	3d45fdd6c0	Illumos 4951 - ZFS administrative commands should use reserved space 4951 ZFS administrative commands should use reserved space, not with ENOSPC Reviewed by: John Kennedy <john.kennedy@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/4373 https://github.com/illumos/illumos-gate/commit/7d46dc6 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-04 09:41:10 -07:00
Jerry Jelinek	a0c9a17aef	Illumos 4901 - zfs filesystem/snapshot limit leaks 4901 zfs filesystem/snapshot limit leaks Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/4901 https://github.com/illumos/illumos-gate/commit/adf3407 Ported by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-04 09:41:09 -07:00
Matthew Ahrens	83017311e4	Illumos 3654,3656 3654 zdb should print number of ganged blocks 3656 remove unused function zap_cursor_move_to_key() Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3654 https://www.illumos.org/issues/3656 https://github.com/illumos/illumos-gate/commit/d5ee8a1 Porting Notes: 3655 and 3657 were part of this commit but those hunks were dropped since they apply to mdb. Ported by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-05-04 09:41:09 -07:00
tuxoko	6102d0376e	Add cond_resched to zfs_zget to prevent infinite loop It's been reported that threads would loop infinitely inside zfs_zget. The speculated cause for this is that if an inode is marked for evict, zfs_zget would see that and loop. However, if the looping thread doesn't yield, the inode may not have a chance to finish evict, thus causing a infinite loop. This patch solve this issue by add cond_resched to zfs_zget, making the looping thread to yield when needed. Tested-by: jlavoy <jalavoy@gmail.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3349	2015-05-04 09:12:41 -07:00
Jason Zaman	c9520ecc0f	dmu: fix integer overflows The params to the functions are uint64_t, but the offsets to memcpy / bcopy are calculated using 32bit ints. This patch changes them to also be uint64_t so there isnt an overflow. PaX's Size Overflow caught this when formatting a zvol. Gentoo bug: #546490 PAX: offset: 1ffffb000 db->db_offset: 1ffffa000 db->db_size: 2000 size: 5000 PAX: size overflow detected in function dmu_read /var/tmp/portage/sys-fs/zfs-kmod-0.6.3-r1/work/zfs-zfs-0.6.3/module/zfs/../../module/zfs/dmu.c:781 cicus.366_146 max, count: 15 CPU: 1 PID: 2236 Comm: zvol/10 Tainted: P O 3.17.7-hardened-r1 #1 Call Trace: [<ffffffffa0382ee8>] ? dsl_dataset_get_holds+0x9d58/0x343ce [zfs] [<ffffffff81a59c88>] dump_stack+0x4e/0x7a [<ffffffffa0393c2a>] ? dsl_dataset_get_holds+0x1aa9a/0x343ce [zfs] [<ffffffff81206696>] report_size_overflow+0x36/0x40 [<ffffffffa02dba2b>] dmu_read+0x52b/0x920 [zfs] [<ffffffffa0373ad1>] zrl_is_locked+0x7d1/0x1ce0 [zfs] [<ffffffffa0364cd2>] zil_clean+0x9d2/0xc00 [zfs] [<ffffffffa0364f21>] zil_commit+0x21/0x30 [zfs] [<ffffffffa0373fe1>] zrl_is_locked+0xce1/0x1ce0 [zfs] [<ffffffff81a5e2c7>] ? __schedule+0x547/0xbc0 [<ffffffffa01582e6>] taskq_cancel_id+0x2a6/0x5b0 [spl] [<ffffffff81103eb0>] ? wake_up_state+0x20/0x20 [<ffffffffa0158150>] ? taskq_cancel_id+0x110/0x5b0 [spl] [<ffffffff810f7ff4>] kthread+0xc4/0xe0 [<ffffffff810f7f30>] ? kthread_create_on_node+0x170/0x170 [<ffffffff81a62fa4>] ret_from_fork+0x74/0xa0 [<ffffffff810f7f30>] ? kthread_create_on_node+0x170/0x170 Signed-off-by: Jason Zaman <jason@perfinion.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3333	2015-05-04 09:12:00 -07:00

1 2 3 4 5 ...

902 Commits