mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-01-25 10:12:13 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	9a8b7a7458	Add basic dynamic kstat support Add the bare minimum functionality to support dynamic kstats. A complete kstat implementation should be done as part of issue #84. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #84	2012-02-02 11:28:00 -08:00
Brian Behlendorf	4b2220f0b9	Add --enable-debug-log configure option Until now the notion of an internal debug logging infrastructure was conflated with enabling ASSERT()s. This patch clarifies things by cleanly breaking the two subsystem apart. The result of this is the following behavior. --enable-debug - Enable/disable code wrapped in ASSERT()s. --disable-debug ASSERT()s are used to check invariants and are never required for correct operation. They are disabled by default because they may impact performance. --enable-debug-log - Enable/disable the debug log infrastructure. --disable-debug-log This infrastructure allows the spl code and its consumer to log messages to an in-kernel log. The granularity of the logging can be controlled by a debug mask. By default the mask disables most debug messages resulting in a negligible performance impact. Because of this the debug log is enabled by default. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:27:54 -08:00
Brian Behlendorf	d7e398ce1a	Cleanup ZFS debug infrastructure Historically the internal zfs debug infrastructure has been scattered throughout the code. Since we expect to start making more use of this code this patch performs some cleanup. * Consolidate the zfs debug infrastructure in the zfs_debug.[ch] files. This includes moving the zfs_flags and zfs_recover variables, plus moving the zfs_panic_recover() function. * Remove the existing unused functionality in zfs_debug.c and replace it with code which correctly utilized the spl logging infrastructure. * Remove the __dprintf() function from zfs_ioctl.c. This is dead code, the dprintf() functionality in the kernel relies on the spl log support. * Remove dprintf() from hdr_recl(). This wasn't particularly useful and was missing the required format specifier anyway. * Subsequent patches should unify the dprintf() and zfs_dbgmsg() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:24:30 -08:00
Brian Behlendorf	0c5dde492f	Allow multiple values per directory entry When using zfs to back a Lustre filesystem it's advantageous to to store a fid with the object id in the directory zap. The only technical impediment to doing this is that the zpl code expects a single value in the zap per directory entry. This change relaxes that requirement such that multiple entries are allowed provided the first one is the object id. The zpl code will just ignore additional entries. This allows the ZoL count to mount datasets which are being used as Lustre server backends. Once the upstream feature flags support is merged in this change should be updated to a read-only feature. Until this occurs other zfs implementations will not be able to read the zfs filesystems created by Lustre. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:22:08 -08:00
Brian Behlendorf	e29be02e46	Export symbol zfs_attr_table Export the zfs_attr_table symbol so it may be used by non-zpl consumers which are still interested in writing a zpl compatible dataset (e.g. Lustre). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-27 09:23:36 -08:00
Ned Bass	3c6ed5410b	Taskq locking optimizations Testing has shown that tq->tq_lock can be highly contended when a large number of small work items are dispatched. The lock hold time is reduced by the following changes: 1) Use exclusive threads in the work_waitq When a single work item is dispatched we only need to wake a single thread to service it. The current implementation uses non-exclusive threads so all threads are woken when the dispatcher calls wake_up(). If a large number of threads are in the queue this overhead can become non-negligible. 2) Conditionally add/remove threads from work waitq Taskq threads need only add themselves to the work wait queue if there are no pending work items. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #32	2012-01-19 14:42:49 -08:00
Ned Bass	0bb43ca282	Revert "Taskq locking optimizations" This reverts commit `ec2b41049f`. A race condition was introduced by which a wake_up() call can be lost after the taskq thread determines there is no pending work items, leading to deadlock: 1. taksq thread enables interrupts 2. dispatcher thread runs, queues work item, call wake_up() 3. taskq thread runs, adds self to waitq, sleeps This could easily happen if an interrupt for an IO completion was outstanding at the point where the taskq thread reenables interrupts, just before the call to add_wait_queue_exclusive(). The handler would run immediately within the race window. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #32	2012-01-19 14:42:39 -08:00
Prakash Surya	ff998d804f	Ignore dataset if the dds_type is DMU_OST_OTHER Since the zpios and potentially other ZFS tests use the DMU_OST_OTHER type to label their datasets, the zpool and zfs commands should gracefully handle this type when it is encountered. This patch modifies the commands' behavior to ignore any datasets with a dds_type of DMU_OST_OTHER. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #536	2012-01-19 09:29:48 -08:00
Brian Behlendorf	b4b599d250	Fix rpm dependencies This change updates the rpm spec files to have strictly correct package dependencies. That means a few things: * The zfs-modules package is now tied to a specific build of the spl-modules packages based on the kernel version. This ensures that the correct spl-modules packages will always get installed and not just the newest. * The zfs package now requires both the zfs-modules and spl packages. Thus a 'yum install zfs' will pull in the minimal set of packages required for a functional system. * The zfs-devel packages now require the zfs package to be installed which is normal behavior for -devel packages. * Remove the redundant distribution release extension. This is already added once because it is part of the kernel package release name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 12:19:52 -08:00
Brian Behlendorf	b40a77aefc	Add the release component to headers When the original build system code was added the release component was accidentally omited from the development header install path. This patch adds the missing path component so it's always clear exactly what release your compiling against. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 12:19:47 -08:00
Brian Behlendorf	87d1123924	Fix rpm dependencies This change updates the rpm spec files to have strictly correct package dependencies. That means a few things: * Add a dependency to the spl package for the spl-modules package. This ensures that when running 'yum install spl' that newest version of the spl-modules will be installed. * Remove the redundant distribution release extension. This is already added once because it is part of the kernel package release name. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:24:36 -08:00
Brian Behlendorf	a2eda2ff48	Add the release component to headers When the original build system code was added the release component was accidentally omited from the development header install path. This patch adds the missing path component so it's always clear exactly what release your compiling against. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:06:26 -08:00
Darik Horn	f783130a1f	Allow GPT+EFI vdev replacement in boot pools. Commit zfsonlinux/zfs@57a4eddc4d allows the bootfs property to be set on any pool, but does not accommodate subsequent vdev changes. For example: # zpool replace rpool /dev/sda /dev/sdb operation not supported on this type of pool property 'bootfs' is not supported on EFI labeled devices For non-Solaris builds, disable the check that emits this error. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-18 11:05:24 -08:00
Ned Bass	ec2b41049f	Taskq locking optimizations Testing has shown that tq->tq_lock can be highly contended when a large number of small work items are dispatched. The lock hold time is reduced by the following changes: 1) Use exclusive threads in the work_waitq When a single work item is dispatched we only need to wake a single thread to service it. The current implementation uses non-exclusive threads so all threads are woken when the dispatcher calls wake_up(). If a large number of threads are in the queue this overhead can become non-negligible. 2) Conditionally add/remove threads from work waitq outside of tq_lock Taskq threads need only add themselves to the work wait queue if there are no pending work items. Furthermore, the add and remove function calls can be made outside of the taskq lock since the wait queues are protected from concurrent access by their own spinlocks. 3) Call wake_up() outside of tq->tq_lock Again, the wait queues are protected by their own spinlock, so the dispatcher functions can drop tq->tq_lock before calling wake_up(). A new splat test taskq:contention was added in a prior commit to measure the impact of these changes. The following table summarizes the results using data from the kernel lock profiler. tq_lock time %diff Wall clock (s) %diff original: 39117614.10 0 41.72 0 exclusive threads: 31871483.61 18.5 34.2 18.0 unlocked add/rm waitq: 13794303.90 64.7 16.17 61.2 unlocked wake_up(): 1589172.08 95.9 16.61 60.2 Each row reflects the average result over 5 test runs. /proc/lock_stats was zeroed out before and collected after each run. Column 1 is the cumulative hold time in microseconds for tq->tq_lock. The tests are cumulative; each row reflects the code changes of the previous rows. %diff is calculated with respect to "original" as 100*(orig-new)/orig. Although calling wake_up() outside of the taskq lock dramatically reduced the taskq lock hold time, the test actually took slightly more wall clock time. This is because the point of contention shifts from the taskq lock to the wait queue lock. But the change still seems worthwhile since it removes our taskq implementation as a bottleneck, assuming the small increase in wall clock time to be statistical noise. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #32	2012-01-18 10:36:57 -08:00
Ned Bass	cf5d23fa1e	Add taskq contention splat test Add a test designed to generate contention on the taskq spinlock by using a large number of threads (100) to perform a large number (131072) of trivial work items from a single queue. This simulates conditions that may occur with the zio free taskq when a 1TB file is removed from a ZFS filesystem, for example. This test should always pass. Its purpose is to provide a benchmark to easily measure the effectiveness of taskq optimizations using statistics from the kernel lock profiler. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #32	2012-01-18 10:36:51 -08:00
Darik Horn	750562833f	Combine libraries: spl, avl, efi, share, unicode. These libraries, which are an artifact of the ZoL development process, conflict with packages that are already in distribution: * libspl: SPL Programming Language * libavl: AVL for Linux * libefi: GRUB And these libraries are potential conflicts: * libshare: the Linux Mount Manager * libunicode: Perl and Python Recompose these five ZoL components into the four libraries that are conventionally provided by Solaris and FreeBSD systems: + libnvpair + libuutil + libzpool + libzfs This change resolves the name conflict, makes ZoL more compatible with existing software that uses autotools to detect ZFS, and allows pkg-zfs to better reflect the official Debian kFreeBSD packaging. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #430	2012-01-17 15:19:50 -08:00
Richard Laager	57a4eddc4d	Allow setting bootfs on any pool The vdev_is_bootable() restrictions are no longer necessary with recent GRUB2 code. FreeBSD has implemented the same change, except that I moved the Solaris comment to be inside the #ifdef __sun__ block. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #317	2012-01-17 13:49:07 -08:00
Darik Horn	966e5200d3	Fix `make distclean` for `--with-config=user` Apply the same fix to SPL that was applied to ZFS earlier at: zfsonlinux/zfs@d433c20651 Additionally quote @LINUX_SYMBOLS@ because it is a null substitution in this configuration, which results in a `[ -f ]` expression that incorrectly evaluates to true. # ./configure --with-config=user # make distclean Making distclean in module make[1]: Entering directory `/spl/module' make -C SUBDIRS=`pwd` clean make: Entering an unknown directory make: *** SUBDIRS=/spl/module: No such file or directory. Stop. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-17 10:06:00 -08:00
Ned Bass	08d08ebba2	Reduce number of zio free threads As described in Issue #458 and #258, unlinking large amounts of data can cause the threads in the zio free wait queue to start spinning. Reducing the number of z_fr_iss threads from a fixed value of 100 to 1 per cpu signficantly reduces contention on the taskq spinlock and improves throughput. Instrumenting the taskq code showed that __taskq_dispatch() can spend a long time holding tq->tq_lock if there are a large number of threads in the queue. It turns out the time spent in wake_up() scales linearly with the number of threads in the queue. When a large number of short work items are dispatched, as seems to be the case with unlink, the worker threads drain the queue faster than the dispatcher can fill it. They then all pile into the work wait queue to wait for new work items. So if 100 threads are in the queue, wake_up() takes about 100 times as long, and the woken threads have to spin until the dispatcher releases the lock. Reducing the number of threads helps with the symptoms, but doesn't get to the root of the problem. It would seem that wake_up() shouldn't scale linearly in time with queue depth, particularly if we are only trying to wake up one thread. In that vein, I tried making all of the waiting processes exclusive to prevent the scheduler from iterating over the entire list, but I still saw the linear time scaling. So further investigation is needed, but in the meantime reducing the thread count is an easy workaround. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #258 Issue #458	2012-01-17 08:54:00 -08:00
Brian Behlendorf	a8783adf24	Increase link count limit to 2^31-1 Originally, the per-file link limit was set to 65536 because the exact Linux VFS limit was unclear. Internally ZFS is able to support 64-bit link counts. After a more careful investigation the limit can be safely raised to 2^31-1. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #514	2012-01-13 11:43:59 -08:00
Brian Behlendorf	0b14b9f327	Run SPL_AC_PACMAN only if $VENDOR is "arch" Unfortunately, Arch's package manager `pacman` shares it's name with a popular arcade video game. Thus, in order to refrain from executing the video game when we mean to execute the package manager, SPL_AC_PACMAN is now only run when $VENDOR is determined to be "arch". Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#517	2012-01-13 09:08:12 -08:00
Prakash Surya	58d956b085	Run ZFS_AC_PACMAN only if $VENDOR is "arch" Unfortunately, Arch's package manager `pacman` shares it's name with a popular arcade video game. Thus, in order to refrain from executing the video game when we mean to execute the package manager, ZFS_AC_PACMAN is now only run when $VENDOR is determined to be "arch". Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #517	2012-01-13 09:03:11 -08:00
Suman Chakravartula	e18be9a637	Add overlay(-O) mount option support Linux supports mounting over non-empty directories by default. In Solaris this is not the case and -O option is required for zfs mount to mount a zfs filesystem over a non-empty directory. For compatibility, I've added support for -O option to mount zfs filesystems over non-empty directories if the user wants to, just like in Solaris. I've defined MS_OVERLAY to record it in the flags variable if the -O option is supplied. The flags variable passes through a few functions and its checked before performing the empty directory check in zfs_mount function. If -O is given, the check is not performed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #473	2012-01-12 15:49:38 -08:00
Darik Horn	96b91ef0d6	Apply the ZoL coding standard to zpl_xattr.c Make the indenting in the zpl_xattr.c file consistent with the Sun coding standard by removing soft tabs. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-12 15:12:03 -08:00
Brian Behlendorf	166dd49de0	Linux 3.2 compat, security_inode_init_security() The security_inode_init_security() API has been changed to include a filesystem specific callback to write security extended attributes. This was done to support the initialization of multiple LSM xattrs and the EVM xattr. This change updates the code to use the new API when it's available. Otherwise it falls back to the previous implementation. In addition, the ZFS_AC_KERNEL_6ARGS_SECURITY_INODE_INIT_SECURITY autoconf test has been made more rigerous by passing the expected types. This is done to ensure we always properly the detect the correct form for the security_inode_init_security() API. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #516	2012-01-12 15:06:39 -08:00
Richard Laager	2932b6a800	Treat /dev/vd* as whole disks Correctly detect /dev/vd devices as whole disks and attempt to create an EFI partition table. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-01-11 16:44:54 -08:00
Darik Horn	588d900433	Linux 3.2 compat: rw_semaphore.wait_lock is raw The wait_lock member of the rw_semaphore struct became a raw_spinlock_t in Linux 3.2 at torvalds/linux@ddb6c9b58a. Wrap spin_lock_* function calls in a new spl_rwsem_* interface to ensure type safety if raw_spinlock_t becomes architecture specific, and to satisfy these compiler warnings: warning: passing argument 1 of ‘spinlock_check’ from incompatible pointer type [enabled by default] note: expected ‘struct spinlock_t ’ but argument is of type ‘struct raw_spinlock_t ’ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #76 Closes: zfsonlinux/zfs#463	2012-01-11 16:28:05 -08:00
Darik Horn	b97f368d04	Avoid using awk in the zpool_id script. Some implementations of `awk` incorrectly parse the \< and \> regex symbols, so use a `while read` loop and regular globbing instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #259	2012-01-11 11:56:56 -08:00
Brian Behlendorf	ab26409db7	Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292	2012-01-11 11:46:02 -08:00
Brian Behlendorf	5f6c14b1ed	Proxmox VE kernel compat, invalidate_inodes() The Proxmox VE kernel contains a patch which renames the function invalidate_inodes() to invalidate_inodes_check(). In the process it adds a 'check' argument and a '#define invalidate_inodes(x)' compatibility wrapper for legacy callers. Therefore, if either of these functions are exported invalidate_inodes() can be safely used. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #58	2011-12-21 14:29:45 -08:00
Prakash Surya	8eaa020b46	Move Arch Linux's VENDOR check above Ubuntu's If the lsb-release package is installed on an Arch Linux distribution, the configure step will incorrectly detect the running distribution as Ubuntu. This is a result of both distributions providing an /etc/lsb-release file, and the Ubuntu VENDOR check being performed first. Since the Arch Linux test check's for a file more specific to the Arch Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's check provides a quick and easy solution. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-19 12:05:10 -08:00
Prakash Surya	cd2817f8a6	Move Arch Linux's VENDOR check above Ubuntu's If the lsb-release package is installed on an Arch Linux distribution, the configure step will incorrectly detect the running distribution as Ubuntu. This is a result of both distributions providing an /etc/lsb-release file, and the Ubuntu VENDOR check being performed first. Since the Arch Linux test check's for a file more specific to the Arch Linux distribution, moving Arch Linux's VENDOR check above Unbuntu's check provides a quick and easy solution. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #72	2011-12-19 12:03:40 -08:00
Darik Horn	afd7da0ce7	Add LIBSELINUX to mount_zfs_LDFLAGS. Regenerating the autotools configuration on Debian and Ubuntu systems causes compilation to fail with this error message: cmd/mount_zfs/../../cmd/mount_zfs/mount_zfs.c:403: undefined reference to `is_selinux_enabled' In the automake template, set "mount_zfs_LDFLAGS = ... $(LIBSELINUX)" so that the /sbin/mount.zfs utility is linked to libselinux. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-12-16 20:04:42 -08:00
Darik Horn	28eb9213d8	Linux 3.2 compat: set_nlink() Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170 Use the new set_nlink() kernel function instead. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #462	2011-12-16 20:02:52 -08:00
Prakash Surya	8f2503e0af	Store copy of tqent_flags prior to servicing task A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is not a preallocated task (when in fact it is a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a huge mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #71	2011-12-16 16:54:00 -08:00
Darik Horn	e6101ea87f	Update the character class in the zpool man page. ZoL and all Solaris derivatives allow pool names to contain the colon and space characters. Update the man page to reflect current behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #438	2011-12-16 14:00:38 -08:00
Prakash Surya	e7e5f78e7b	Swap taskq_ent_t with taskqid_t in taskq_thread_t The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #71	2011-12-16 13:26:54 -08:00
Prakash Surya	6ba3b44614	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on the Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the zfs userland utilities and another for the zfs kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be build using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2011-12-14 19:14:23 -08:00
Prakash Surya	c2dceb5cd5	Add make rule for building Arch Linux packages Added the necessary build infrastructure for building packages compatible with the Arch Linux distribution. As such, one can now run: $ ./configure $ make pkg # Alternatively, one can run 'make arch' as well on an Arch Linux machine to create two binary packages compatible with the pacman package manager, one for the spl userland utilties and another for the spl kernel modules. The new packages can then be installed by running: # pacman -U $package.pkg.tar.xz In addition, source-only packages suitable for an Arch Linux chroot environment or remote builder can also be built using the 'sarch' make rule. NOTE: Since the source dist tarball is created on the fly from the head of the build tree, it's MD5 hash signature will be continually influx. As a result, the md5sum variable was intentionally omitted from the PKGBUILD files, and the '--skipinteg' makepkg option is used. This may or may not have any serious security implications, as the source tarball is not being downloaded from an outside source. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes: #68	2011-12-14 16:44:10 -08:00
Garrett D'Amore	a38718a63d	Illumos #734 : Use taskq_dispatch_ent() interface It has been observed that some of the hottest locks are those of the zio taskqs. Contention on these locks can limit the rate at which zios are dispatched which limits performance. This upstream change from Illumos uses new interface to the taskqs which allow them to utilize a prealloc'ed taskq_ent_t. This removes the need to perform an allocation at dispatch time while holding the contended lock. This has the effect of improving system performance. Reviewed by: Albert Lee <trisk@nexenta.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Alexey Zaytsev <alexey.zaytsev@nexenta.com> Reviewed by: Jason Brian King <jason.brian.king@gmail.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Adam Leventhal <ahl@delphix.com> Approved by: Gordon Ross <gwr@nexenta.com> References to Illumos issue: https://www.illumos.org/issues/734 Ported-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #482	2011-12-14 09:19:30 -08:00
Prakash Surya	699d5ee8a9	Exercise new taskq interface in splat-taskq tests The splat-taskq test functions were slightly modified to exercise the new taskq interface in addition to the old interface. If the old interface passes each of its tests, the new interface is exercised. Both sub tests (old interface and new interface) must pass for each test as a whole to pass. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #65	2011-12-13 16:10:57 -08:00
Prakash Surya	44217f7aad	Implement taskq_dispatch_prealloc() interface This patch implements the taskq_dispatch_prealloc() interface which was introduced by the following illumos-gate commit. It allows for a preallocated taskq_ent_t to be used when dispatching items to a taskq. This eliminates a memory allocation which helps minimize lock contention in the taskq when dispatching functions. commit 5aeb94743e3be0c51e86f73096334611ae3a058e Author: Garrett D'Amore <garrett@nexenta.com> Date: Wed Jul 27 07:13:44 2011 -0700 734 taskq_dispatch_prealloc() desired 943 zio_interrupt ends up calling taskq_dispatch with TQ_SLEEP Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 16:10:57 -08:00
Prakash Surya	ac1e5b6033	Add Test: "Single task queue, recursive dispatch" Added another splat taskq test to ensure tasks can be recursively submitted to a single task queue without issue. When the taskq_dispatch_prealloc() interface is introduced, this use case can potentially cause a deadlock if a taskq_ent_t is dispatched while its tqent_list field is not empty. This _should_ never be a problem with the existing taskq_dispatch() interface. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 16:10:57 -08:00
Prakash Surya	2c02b71b14	Replace tq_work_list and tq_threads in taskq_t To lay the ground work for introducing the taskq_dispatch_prealloc() interface, the tq_work_list and tq_threads fields had to be replaced with new alternatives in the taskq_t structure. The tq_threads field was replaced with tq_thread_list. Rather than storing the pointers to the taskq's kernel threads in an array, they are now stored as a list. In addition to laying the ground work for the taskq_dispatch_prealloc() interface, this change could also enable taskq threads to be dynamically created and destroyed as threads can now be added and removed to this list relatively easily. The tq_work_list field was replaced with tq_active_list. Instead of keeping a list of taskq_ent_t's which are currently being serviced, a list of taskq_threads currently servicing a taskq_ent_t is kept. This frees up the taskq_ent_t's tqent_list field when it is being serviced (i.e. now when a taskq_ent_t is being serviced, it's tqent_list field will be empty). Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 16:10:50 -08:00
Prakash Surya	046a70c93b	Replace struct spl_task with struct taskq_ent The spl_task structure was renamed to taskq_ent, and all of its fields were renamed to have a prefix of 'tqent' rather than 't'. This was to align with the naming convention which the ZFS code assumes. Previously these fields were private so the name never mattered. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #65	2011-12-13 12:28:09 -08:00
Prakash Surya	ed948fa72b	Add SPLAT_TEST_FINI call for SPLAT_TASKQ_TEST6_ID This change adds the neglected SPLAT_TEST_FINI call for the SPLAT_TASKQ_TEST6_ID, just as is done for the other 5 SPLAT_TASKQ_* tests. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #64	2011-12-13 12:26:16 -08:00
Prakash Surya	93806f58a6	Fix usage of MUTEX macro in mutex_enter_nested A call site of the MUTEX macro had incorrectly placed its closing parenthesis, causing two parameters to be passed rather than one. This change moves the misplaced parenthesis to fix the typographical error. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #70	2011-12-13 11:04:21 -08:00
Chris Dunlop	791dc876eb	Allow 64-bit timestamps to be set on 64-bit kernels ZFS and 64-bit linux are perfectly capable of dealing with 64-bit timestamps, but ZFS deliberately prevents setting them. Adjust the SPL such that TIMESPEC_OVERFLOW will not always assume 32-bit values and instead use the correct values for your kernel build. This effectively allows 64-bit timestamps on 64-bit systems. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes ZFS issue #487	2011-12-12 11:06:03 -08:00
Brian Behlendorf	30a9524e45	Set zvol_major/zvol_threads permissions The zvol_major and zvol_threads module options were being created with 0 permission bits. This prevented them from being listed in the /sys/module/zfs/parameters/ directory, although they were visible in `modinfo zfs`. This patch fixes the issue by updating the permission bits to 0444. For the moment these options must be read-only because they are used during module initialization. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #392	2011-12-07 09:27:50 -08:00
Brian Behlendorf	23bdb07d4e	Update default ARC memory limits In the upstream OpenSolaris ZFS code the maximum ARC usage is limited to 3/4 of memory or all but 1GB, whichever is larger. Because of how Linux's VM subsystem is organized these defaults have proven to be too large which can lead to stability issues. To avoid making everyone manually tune the ARC the defaults are being changed to 1/2 of memory or all but 4GB. The rational for this is as follows: * Desktop Systems (less than 8GB of memory) Limiting the ARC to 1/2 of memory is desirable for desktop systems which have highly dynamic memory requirements. For example, launching your web browser can suddenly result in a demand for several gigabytes of memory. This memory must be reclaimed from the ARC cache which can take some time. The user will experience this reclaim time as a sluggish system with poor interactive performance. Thus in this case it is preferable to leave the memory as free and available for immediate use. * Server Systems (more than 8GB of memory) Using all but 4GB of memory for the ARC is preferable for server systems. These systems often run with minimal user interaction and have long running daemons with relatively stable memory demands. These systems will benefit most by having as much data cached in memory as possible. These values should work well for most configurations. However, if you have a desktop system with more than 8GB of memory you may wish to further restrict the ARC. This can still be accomplished by setting the 'zfs_arc_max' module option. Additionally, keep in mind these aren't currently hard limits. The ARC is based on a slab implementation which can suffer from memory fragmentation. Because this fragmentation is not visible from the ARC it may believe it is within the specified limits while actually consuming slightly more memory. How much more memory get's consumed will be determined by how badly fragmented the slabs are. In the long term this can be mitigated by slab defragmentation code which was OpenSolaris solution. Or preferably, using the page cache to back the ARC under Linux would be even better. See issue #75 for the benefits of more tightly integrating with the page cache. This change also fixes a issue where the default ARC max was being set incorrectly for machines with less than 2GB of memory. The constant in the arc_c_max comparison must be explicitly cast to a uint64_t type to prevent overflow and the wrong conditional branch being taken. This failure was typically observed in VMs which are commonly created with less than 2GB of memory. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #75	2011-12-05 12:02:12 -08:00

... 111 112 113 114 115 ...

6747 Commits