mirror of
https://git.proxmox.com/git/mirror_zfs.git
synced 2026-05-25 19:57:43 +03:00
7590972f76e5b6f9880417c593464e259d7c4698
69 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
50c957f702 |
Implement large_dnode pool feature
Justification ------------- This feature adds support for variable length dnodes. Our motivation is to eliminate the overhead associated with using spill blocks. Spill blocks are used to store system attribute data (i.e. file metadata) that does not fit in the dnode's bonus buffer. By allowing a larger bonus buffer area the use of a spill block can be avoided. Spill blocks potentially incur an additional read I/O for every dnode in a dnode block. As a worst case example, reading 32 dnodes from a 16k dnode block and all of the spill blocks could issue 33 separate reads. Now suppose those dnodes have size 1024 and therefore don't need spill blocks. Then the worst case number of blocks read is reduced to from 33 to two--one per dnode block. In practice spill blocks may tend to be co-located on disk with the dnode blocks so the reduction in I/O would not be this drastic. In a badly fragmented pool, however, the improvement could be significant. ZFS-on-Linux systems that make heavy use of extended attributes would benefit from this feature. In particular, ZFS-on-Linux supports the xattr=sa dataset property which allows file extended attribute data to be stored in the dnode bonus buffer as an alternative to the traditional directory-based format. Workloads such as SELinux and the Lustre distributed filesystem often store enough xattr data to force spill bocks when xattr=sa is in effect. Large dnodes may therefore provide a performance benefit to such systems. Other use cases that may benefit from this feature include files with large ACLs and symbolic links with long target names. Furthermore, this feature may be desirable on other platforms in case future applications or features are developed that could make use of a larger bonus buffer area. Implementation -------------- The size of a dnode may be a multiple of 512 bytes up to the size of a dnode block (currently 16384 bytes). A dn_extra_slots field was added to the current on-disk dnode_phys_t structure to describe the size of the physical dnode on disk. The 8 bits for this field were taken from the zero filled dn_pad2 field. The field represents how many "extra" dnode_phys_t slots a dnode consumes in its dnode block. This convention results in a value of 0 for 512 byte dnodes which preserves on-disk format compatibility with older software. Similarly, the in-memory dnode_t structure has a new dn_num_slots field to represent the total number of dnode_phys_t slots consumed on disk. Thus dn->dn_num_slots is 1 greater than the corresponding dnp->dn_extra_slots. This difference in convention was adopted because, unlike on-disk structures, backward compatibility is not a concern for in-memory objects, so we used a more natural way to represent size for a dnode_t. The default size for newly created dnodes is determined by the value of a new "dnodesize" dataset property. By default the property is set to "legacy" which is compatible with older software. Setting the property to "auto" will allow the filesystem to choose the most suitable dnode size. Currently this just sets the default dnode size to 1k, but future code improvements could dynamically choose a size based on observed workload patterns. Dnodes of varying sizes can coexist within the same dataset and even within the same dnode block. For example, to enable automatically-sized dnodes, run # zfs set dnodesize=auto tank/fish The user can also specify literal values for the dnodesize property. These are currently limited to powers of two from 1k to 16k. The power-of-2 limitation is only for simplicity of the user interface. Internally the implementation can handle any multiple of 512 up to 16k, and consumers of the DMU API can specify any legal dnode value. The size of a new dnode is determined at object allocation time and stored as a new field in the znode in-memory structure. New DMU interfaces are added to allow the consumer to specify the dnode size that a newly allocated object should use. Existing interfaces are unchanged to avoid having to update every call site and to preserve compatibility with external consumers such as Lustre. The new interfaces names are given below. The versions of these functions that don't take a dnodesize parameter now just call the _dnsize() versions with a dnodesize of 0, which means use the legacy dnode size. New DMU interfaces: dmu_object_alloc_dnsize() dmu_object_claim_dnsize() dmu_object_reclaim_dnsize() New ZAP interfaces: zap_create_dnsize() zap_create_norm_dnsize() zap_create_flags_dnsize() zap_create_claim_norm_dnsize() zap_create_link_dnsize() The constant DN_MAX_BONUSLEN is renamed to DN_OLD_MAX_BONUSLEN. The spa_maxdnodesize() function should be used to determine the maximum bonus length for a pool. These are a few noteworthy changes to key functions: * The prototype for dnode_hold_impl() now takes a "slots" parameter. When the DNODE_MUST_BE_FREE flag is set, this parameter is used to ensure the hole at the specified object offset is large enough to hold the dnode being created. The slots parameter is also used to ensure a dnode does not span multiple dnode blocks. In both of these cases, if a failure occurs, ENOSPC is returned. Keep in mind, these failure cases are only possible when using DNODE_MUST_BE_FREE. If the DNODE_MUST_BE_ALLOCATED flag is set, "slots" must be 0. dnode_hold_impl() will check if the requested dnode is already consumed as an extra dnode slot by an large dnode, in which case it returns ENOENT. * The function dmu_object_alloc() advances to the next dnode block if dnode_hold_impl() returns an error for a requested object. This is because the beginning of the next dnode block is the only location it can safely assume to either be a hole or a valid starting point for a dnode. * dnode_next_offset_level() and other functions that iterate through dnode blocks may no longer use a simple array indexing scheme. These now use the current dnode's dn_num_slots field to advance to the next dnode in the block. This is to ensure we properly skip the current dnode's bonus area and don't interpret it as a valid dnode. zdb --- The zdb command was updated to display a dnode's size under the "dnsize" column when the object is dumped. For ZIL create log records, zdb will now display the slot count for the object. ztest ----- Ztest chooses a random dnodesize for every newly created object. The random distribution is more heavily weighted toward small dnodes to better simulate real-world datasets. Unused bonus buffer space is filled with non-zero values computed from the object number, dataset id, offset, and generation number. This helps ensure that the dnode traversal code properly skips the interior regions of large dnodes, and that these interior regions are not overwritten by data belonging to other dnodes. A new test visits each object in a dataset. It verifies that the actual dnode size matches what was stored in the ztest block tag when it was created. It also verifies that the unused bonus buffer space is filled with the expected data patterns. ZFS Test Suite -------------- Added six new large dnode-specific tests, and integrated the dnodesize property into existing tests for zfs allow and send/recv. Send/Receive ------------ ZFS send streams for datasets containing large dnodes cannot be received on pools that don't support the large_dnode feature. A send stream with large dnodes sets a DMU_BACKUP_FEATURE_LARGE_DNODE flag which will be unrecognized by an incompatible receiving pool so that the zfs receive will fail gracefully. While not implemented here, it may be possible to generate a backward-compatible send stream from a dataset containing large dnodes. The implementation may be tricky, however, because the send object record for a large dnode would need to be resized to a 512 byte dnode, possibly kicking in a spill block in the process. This means we would need to construct a new SA layout and possibly register it in the SA layout object. The SA layout is normally just sent as an ordinary object record. But if we are constructing new layouts while generating the send stream we'd have to build the SA layout object dynamically and send it at the end of the stream. For sending and receiving between pools that do support large dnodes, the drr_object send record type is extended with a new field to store the dnode slot count. This field was repurposed from unused padding in the structure. ZIL Replay ---------- The dnode slot count is stored in the uppermost 8 bits of the lr_foid field. The bits were unused as the object id is currently capped at 48 bits. Resizing Dnodes --------------- It should be possible to resize a dnode when it is dirtied if the current dnodesize dataset property differs from the dnode's size, but this functionality is not currently implemented. Clearly a dnode can only grow if there are sufficient contiguous unused slots in the dnode block, but it should always be possible to shrink a dnode. Growing dnodes may be useful to reduce fragmentation in a pool with many spill blocks in use. Shrinking dnodes may be useful to allow sending a dataset to a pool that doesn't support the large_dnode feature. Feature Reference Counting -------------------------- The reference count for the large_dnode pool feature tracks the number of datasets that have ever contained a dnode of size larger than 512 bytes. The first time a large dnode is created in a dataset the dataset is converted to an extensible dataset. This is a one-way operation and the only way to decrement the feature count is to destroy the dataset, even if the dataset no longer contains any large dnodes. The complexity of reference counting on a per-dnode basis was too high, so we chose to track it on a per-dataset basis similarly to the large_block feature. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3542 |
||
|
|
26ef0cc7db |
OpenZFS 6531 - Provide mechanism to artificially limit disk performance
Reviewed by: Paul Dagnelie <pcd@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Approved by: Dan McDonald <danmcd@omniti.com> Ported by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/6531 OpenZFS-commit: https://github.com/openzfs/openzfs/commit/97e8130 Porting notes: - Added new IO delay tracepoints, and moved common ZIO tracepoint macros to a new trace_common.h file. - Used zio_delay_taskq() in place of OpenZFS's timeout_generic() function. - Updated zinject man page - Updated zpool_scrub test files |
||
|
|
37f8a8835a |
Illumos 5746 - more checksumming in zfs send
5746 more checksumming in zfs send Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com> Approved by: Albert Lee <trisk@omniti.com> References: https://www.illumos.org/issues/5746 https://github.com/illumos/illumos-gate/commit/98110f0 https://github.com/zfsonlinux/zfs/issues/905 Porting notes: - Minor conflicts due to: - https://github.com/zfsonlinux/zfs/commit/2024041 - https://github.com/zfsonlinux/zfs/commit/044baf0 - https://github.com/zfsonlinux/zfs/commit/88904bb - Fix ISO C90 warnings (-Werror=declaration-after-statement) - arc_buf_t *abuf; - dmu_buf_t *bonus; - zio_cksum_t cksum_orig; - zio_cksum_t *cksump; - Fix format '%llx' format specifier warning - Align message in zstreamdump safe_malloc() with upstream Ported-by: kernelOfTruth kerneloftruth@gmail.com Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3611 |
||
|
|
72540ea314 |
zfsdev_getminor() should check for invalid file handles
Unit testing at ClusterHQ found that passing an invalid file handle to zfs_ioc_hold results in a NULL pointer dereference on a system without assertions: IP: [<ffffffffa0218aa0>] zfsdev_getminor+0x10/0x20 [zfs] Call Trace: [<ffffffffa021b4b0>] zfs_onexit_fd_hold+0x20/0x40 [zfs] [<ffffffffa0214043>] zfs_ioc_hold+0x93/0xd0 [zfs] [<ffffffffa0215890>] zfsdev_ioctl+0x200/0x500 [zfs] An assertion would have caught this had they been enabled, but this is something that the kernel module should handle without failing. We resolve this by searching the linked list to ensure that the file handle's private_data points to a valid zfsdev_state_t. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Andriy Gapon <avg@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3506 |
||
|
|
f1512ee61e |
Illumos 5027 - zfs large block support
5027 zfs large block support Reviewed by: Alek Pinchuk <pinchuk.alek@gmail.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com> Reviewed by: Richard Elling <richard.elling@richardelling.com> Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Approved by: Dan McDonald <danmcd@omniti.com> References: https://www.illumos.org/issues/5027 https://github.com/illumos/illumos-gate/commit/b515258 Porting Notes: * Included in this patch is a tiny ISP2() cleanup in zio_init() from Illumos 5255. * Unlike the upstream Illumos commit this patch does not impose an arbitrary 128K block size limit on volumes. Volumes, like filesystems, are limited by the zfs_max_recordsize=1M module option. * By default the maximum record size is limited to 1M by the module option zfs_max_recordsize. This value may be safely increased up to 16M which is the largest block size supported by the on-disk format. At the moment, 1M blocks clearly offer a significant performance improvement but the benefits of going beyond this for the majority of workloads are less clear. * The illumos version of this patch increased DMU_MAX_ACCESS to 32M. This was determined not to be large enough when using 16M blocks because the zfs_make_xattrdir() function will fail (EFBIG) when assigning a TX. This was immediately observed under Linux because all newly created files must have a security xattr created and that was failing. Therefore, we've set DMU_MAX_ACCESS to 64M. * On 32-bit platforms a hard limit of 1M is set for blocks due to the limited virtual address space. We should be able to relax this one the ABD patches are merged. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #354 |
||
|
|
9b67f60560 |
Illumos 4757, 4913
4757 ZFS embedded-data block pointers ("zero block compression")
4913 zfs release should not be subject to space checks
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Max Grossman <max.grossman@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
References:
https://www.illumos.org/issues/4757
https://www.illumos.org/issues/4913
https://github.com/illumos/illumos-gate/commit/5d7b4d4
Porting notes:
For compatibility with the fastpath code the zio_done() function
needed to be updated. Because embedded-data block pointers do
not require DVAs to be allocated the associated vdevs will not
be marked and therefore should not be unmarked.
Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2544
|
||
|
|
3937ab20f3 |
Allow for lock-free reading zfsdev_state_list.
Restructure the zfsdev_state_list to allow for lock-free reading by converting to a simple singly-linked list from which items are never deleted and over which only forward iterations are performed. It depends on, among other things, the atomicity of accessing the zs_minor integer and zs_next pointer. This fixes a lock inversion in which the zfsdev_state_lock is used by both the sync task (txg_sync) and indirectly by any user program which uses /dev/zfs; the zfsdev_release method uses the same lock and then blocks on the sync task. The most typical failure scenerio occurs when the sync task is cleaning up a user hold while various concurrent "zfs" commands are in progress. Neither Illumos nor Solaris are affected by this issue because they use DDI interface which provides lock-free reading of device state via the ddi_get_soft_state() function. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #2301 |
||
|
|
8c7aa0cfc4 |
Replace zpool_events_next() "block" parm w/ "flags"
zpool_events_next() can be called in blocking mode by specifying a non-zero value for the "block" parameter. However, the design of the ZFS Event Daemon (zed) requires additional functionality from zpool_events_next(). Instead of adding additional arguments to the function, it makes more sense to use flags that can be bitwise-or'd together. This commit replaces the zpool_events_next() int "block" parameter with an unsigned bitwise "flags" parameter. It also defines ZEVENT_NONE to specify the default behavior. Since non-blocking mode can be specified with the existing ZEVENT_NONBLOCK flag, the default behavior becomes blocking mode. This, in effect, inverts the previous use of the "block" parameter. Existing callers of zpool_events_next() have been modified to check for the ZEVENT_NONBLOCK flag. Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2 |
||
|
|
75e3ff58fe |
Add zpool_events_seek() functionality
The ZFS_IOC_EVENTS_SEEK ioctl was added to allow user space callers to seek around the zevent file descriptor by EID. When a specific EID is passed and it exists the cursor will be positioned there. If the EID is no longer cached by the kernel ENOENT is returned. The caller may also pass ZEVENT_SEEK_START or ZEVENT_SEEK_END to seek to those respective locations. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Dunlap <cdunlap@llnl.gov> Issue #2 |
||
|
|
d1d7e2689d |
cstyle: Resolve C style issues
The vast majority of these changes are in Linux specific code. They are the result of not having an automated style checker to validate the code when it was originally written. Others were caused when the common code was slightly adjusted for Linux. This patch contains no functional changes. It only refreshes the code to conform to style guide. Everyone submitting patches for inclusion upstream should now run 'make checkstyle' and resolve any warning prior to opening a pull request. The automated builders have been updated to fail a build if when 'make checkstyle' detects an issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1821 |
||
|
|
ba6a24026c |
Remove ZFC_IOC_*_MINOR ioctl()s
Early versions of ZFS coordinated the creation and destruction of device minors from userspace. This was inherently racy and in late 2009 these ioctl()s were removed leaving everything up to the kernel. This significantly simplified the code. However, we never picked up these changes in ZoL since we'd already significantly adjusted this code for Linux. This patch aims to rectify that by finally removing ZFC_IOC_*_MINOR ioctl()s and moving all the functionality down in to the kernel. Since this cleanup will change the kernel/user ABI it's being done in the same tag as the previous libzfs_core ABI changes. This will minimize, but not eliminate, the disruption to end users. Once merged ZoL, Illumos, and FreeBSD will basically be back in sync in regards to handling ZVOLs in the common code. While each platform must have its own custom zvol.c implemenation the interfaces provided are consistent. NOTES: 1) This patch introduces one subtle change in behavior which could not be easily avoided. Prior to this change callers of 'zfs create -V ...' were guaranteed that upon exit the /dev/zvol/ block device link would be created or an error returned. That's no longer the case. The utilities will no longer block waiting for the symlink to be created. Callers are now responsible for blocking, this is why a 'udev_wait' call was added to the 'label' function in scripts/common.sh. 2) The read-only behavior of a ZVOL now solely depends on if the ZVOL_RDONLY bit is set in zv->zv_flags. The redundant policy setting in the gendisk structure was removed. This both simplifies the code and allows us to safely leverage set_disk_ro() to issue a KOBJ_CHANGE uevent. See the comment in the code for futher details on this. 3) Because __zvol_create_minor() and zvol_alloc() may now be called in a sync task they must use KM_PUSHPAGE. References: illumos/illumos-gate@681d9761e8 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #1969 |
||
|
|
d09f25dc66 |
Illumos #3744
3744 zfs shouldn't ignore errors unmounting snapshots Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Christopher Siden <christopher.siden@delphix.com> References: https://www.illumos.org/issues/3744 illumos/illumos-gate@fc7a6e3fef Ported-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #1775 Porting notes: 1. There is no clear way to distinguish between a failure when we tried to unmount the snapdir of a zvol (which does not exist) and the failure when we try to unmount a snapdir of a dataset, so the changes to zfs_unmount_snap() were dropped in favor of an altered Linux function that unconditionally returns 0. |
||
|
|
023699cd62 |
Posix ACL Support
This change adds support for Posix ACLs by storing them as an xattr which is common practice for many Linux file systems. Since the Posix ACL is stored as an xattr it will not overwrite any existing ZFS/NFSv4 ACLs which may have been set. The Posix ACL will also be non-functional on other platforms although it may be visible as an xattr if that platform understands SA based xattrs. By default Posix ACLs are disabled but they may be enabled with the new 'aclmode=noacl|posixacl' property. Set the property to 'posixacl' to enable them. If ZFS/NFSv4 ACL support is ever added an appropriate acltype will be added. This change passes the POSIX Test Suite cleanly with the exception of xacl/00.t test 45 which is incorrect for Linux (Ext4 fails too). http://www.tuxera.com/community/posix-test-suite/ Signed-off-by: Massimo Maggi <me@massimo-maggi.eu> Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #170 |
||
|
|
13fe019870 |
Illumos #3464
3464 zfs synctask code needs restructuring Reviewed by: Dan Kimmel <dan.kimmel@delphix.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> References: https://www.illumos.org/issues/3464 illumos/illumos-gate@3b2aab1880 Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1495 |
||
|
|
6f1ffb0665 |
Illumos #2882, #2883, #2900
2882 implement libzfs_core 2883 changing "canmount" property to "on" should not always remount dataset 2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Chris Siden <christopher.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com> References: https://www.illumos.org/issues/2882 https://www.illumos.org/issues/2883 https://www.illumos.org/issues/2900 illumos/illumos-gate@4445fffbbb Ported-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1293 Porting notes: WARNING: This patch changes the user/kernel ABI. That means that the zfs/zpool utilities built from master are NOT compatible with the 0.6.2 kernel modules. Ensure you load the matching kernel modules from master after updating the utilities. Otherwise the zfs/zpool commands will be unable to interact with your pool and you will see errors similar to the following: $ zpool list failed to read pool configuration: bad address no pools available $ zfs list no datasets available Add zvol minor device creation to the new zfs_snapshot_nvl function. Remove the logging of the "release" operation in dsl_dataset_user_release_sync(). The logging caused a null dereference because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the logging functions try to get the ds name via the dsl_dataset_name() function. I've got no idea why this particular code would have worked in Illumos. This code has subsequently been completely reworked in Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring). Squash some "may be used uninitialized" warning/erorrs. Fix some printf format warnings for %lld and %llu. Apply a few spa_writeable() changes that were made to Illumos in illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and 3115 fixes. Add a missing call to fnvlist_free(nvl) in log_internal() that was added in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time (zfsonlinux/zfs@9e11c73) because it depended on future work. |
||
|
|
cc92e9d0c3 |
3246 ZFS I/O deadman thread
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Christopher Siden <chris.siden@delphix.com> Approved by: Garrett D'Amore <garrett@damore.org> NOTES: This patch has been reworked from the original in the following ways to accomidate Linux ZFS implementation *) Usage of the cyclic interface was replaced by the delayed taskq interface. This avoids the need to implement new compatibility code and allows us to rely on the existing taskq implementation. *) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h because declaring externs in source files as was done in the original patch is just plain wrong. *) Instead of panicing the system when the deadman triggers a zevent describing the blocked vdev and the first pending I/O is posted. If the panic behavior is desired Linux provides other generic methods to panic the system when threads are observed to hang. *) For reference, to delay zios by 30 seconds for testing you can use zinject as follows: 'zinject -d <vdev> -D30 <pool>' References: illumos/illumos-gate@283b84606b https://www.illumos.org/issues/3246 Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1396 |
||
|
|
0b4d1b5853 |
Add snapdev=[hidden|visible] dataset property
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices. By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/. When set to
'visible' all zvol snapshots for the dataset will be visible.
This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created. When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions. This leads
to a variety of issues:
1) The zvol partition tables will be read in the context of
the `modprobe zfs` for automatically imported pools. This
is undesirable and should be done asynchronously, but for
now reducing the number of visible devices helps.
2) Udev expects to be able to complete its work for a new
block devices fairly quickly. When many zvol devices are
added at the same time this is no longer be true. It can
lead to udev timeouts and missing /dev/zvol links.
3) Simply having lots of devices in /dev/ can be aukward from
a management standpoint. Hidding the devices your unlikely
to ever use helps with this. Any snapshot device which is
needed can be made visible by changing the snapdev property.
NOTE: This patch changes the default behavior for zvols which
was effectively 'snapdev=visible'.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1235
Closes #945
Issue #956
Issue #756
|
||
|
|
0cee24064a |
Speed up 'zfs list -t snapshot -o name -s name'
FreeBSD #xxx: Dramatically optimize listing snapshots when user
requests only snapshot names and wants to sort them by name, ie.
when executes:
# zfs list -t snapshot -o name -s name
Because only name is needed we don't have to read all snapshot
properties.
Below you can find how long does it take to list 34509 snapshots
from a single disk pool before and after this change with cold and
warm cache:
before:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 525s
warm cache: 218s
after:
# time zfs list -t snapshot -o name -s name > /dev/null
cold cache: 1.7s
warm cache: 1.1s
NOTE: This patch only appears in FreeBSD. If/when Illumos picks up
the change we may want to drop this patch and adopt their version.
However, for now this addresses a real issue.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #450
|
||
|
|
6283f55ea1 |
Support custom build directories and move includes
One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense. |