mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-06-10 07:56:39 +03:00

Author	SHA1	Message	Date
Richard Yao	15d0411297	Remove Makefile from non-toplevel .gitignore files When building SPL support into the kernel, ./copy-builtin will copy non-toplevel .gitignore files. These files list /Makefile, which causes git-archive to omit ./module/{spl,splat}/Makefile. The absence of these files result in build failures when SPL is selected. ZFS is unaffected because it puts Makefile in the toplevel .gitignore, which is not copied. We fix SPL by emulating that behavior. Reported-by: Fabio Erculiani <lxnay@gentoo.org> Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #152	2012-08-23 12:49:04 -07:00
Prakash Surya	9baf44bc17	Wrap trace_set_debug_header in trace_[get\|put]_tcd To properly support CONFIG_PREEMPT enabled kernels, we must refrain from using a CPU index when preemption is enabled. As a result, this change moves the trace_set_debug_header call (which calls smp_processor_id) within trace_get_tcd and trace_put_tcd (which disable and enable preemption respectively). Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #160	2012-08-23 10:01:20 -07:00
Brian Behlendorf	4047414a6a	Export dmu_buf_rele() symbol While I'd like to remove the various pragmas in module/zfs/dbuf.c. There are consumers such as Lustre which still depend on dmu_buf_* versions of the symbols. Until all consumers can be converted to use only the dbuf_* names leave this symbol exported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-14 08:38:19 -07:00
Brian Behlendorf	bafc4e9e2a	Suppress 'zfs_sb_create' memory warning When mutex debugging is enabled in your kernel the increased size of the mutex structures can push the zfs_sb_t type beyond the 8k warning threshold. This isn't harmful so we suppress the warning for this case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #628	2012-08-10 16:43:32 -07:00
Brian Behlendorf	8f576c2321	Export dbuf_* symbols Export these symbols so they may be used by other ZFS consumers besides the ZPL. Remove three stale prototype definites from dbuf.h. The actual implementations of these functions were removed/renamed long ago. It would be good in the long term to remove the existing pragmas we inherited from Solaris and simply use the dbuf_* names. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-10 16:45:13 -07:00
Dan McDonald	d96eb2b153	Illumos #1693 : persistent 'comment' field for a zpool Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1693 Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #678	2012-08-08 11:49:37 -07:00
Etienne Dechamps	ee5fd0bb80	Set zvol discard_granularity to the volblocksize. Currently, zvols have a discard granularity set to 0, which suggests to the upper layer that discard requests of arbirarily small size and alignment can be made efficiently. In practice however, ZFS does not handle unaligned discard requests efficiently: indeed, it is unable to free a part of a block. It will write zeros to the specified range instead, which is both useless and inefficient (see dnode_free_range). With this patch, zvol block devices expose volblocksize as their discard granularity, so the upper layer is aware that it's not supposed to send discard requests smaller than volblocksize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #862	2012-08-07 14:55:31 -07:00
Richard Yao	6576a1a70d	Fix incorrect type in spl_kmem_cache_set_move() parameter A preprocessor definition renders this harmless. However, it is a good idea to change this to be consistent. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>	2012-08-01 16:35:18 -07:00
Etienne Dechamps	7c0e570888	Limit the number of blocks to discard at once. The number of blocks that can be discarded in one BLKDISCARD ioctl on a zvol is currently unlimited. Some applications, such as mkfs, discard the whole volume at once and they use the maximum possible discard size to do that. As a result, several gigabytes discard requests are not uncommon. Unfortunately, if a large amount of data is allocated in the zvol, ZFS can be quite slow to process discard requests. This is especially true if the volblocksize is low (e.g. the 8K default). As a result, very large discard requests can take a very long time (seconds to minutes under heavy load) to complete. This can cause a number of problems, most notably if the zvol is accessed remotely (e.g. via iSCSI), in which case the client has a high probability of timing out on the request. This patch solves the issue by adding a new tunable module parameter: zvol_max_discard_blocks. This indicates the maximum possible range, in zvol blocks, of one discard operation. It is set by default to 16384 blocks, which appears to be a good tradeoff. Using the default volblocksize of 8K this is equivalent to 128 MB. When using the maximum volblocksize of 128K this is equivalent to 2 GB. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #858	2012-07-31 09:46:09 -07:00
Matthew Ahrens	330d06f90d	Illumos #1644 , #1645 , #1646 , #1647 , #1708 1644 add ZFS "clones" property 1645 add ZFS "written" and "written@..." properties 1646 "zfs send" should estimate size of stream 1647 "zfs destroy" should determine space reclaimed by destroying multiple snapshots 1708 adjust size of zpool history data References: https://www.illumos.org/issues/1644 https://www.illumos.org/issues/1645 https://www.illumos.org/issues/1646 https://www.illumos.org/issues/1647 https://www.illumos.org/issues/1708 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. This change has reordered all of the new ioctl()s to the end of the list. This should help minimize this issue in the future. Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Albert Lee <trisk@opensolaris.org> Approved by: Garrett D'Amore <garret@nexenta.com> Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #826 Closes #664	2012-07-31 09:25:30 -07:00
Etienne Dechamps	a9f2397ee9	Determine the hostid on demand. Currently, the SPL tries to determine the hostid at module load. The hostid is usually determined by running the userland program "hostid" during module initialization. Unfortunately, when the module initializes, it may be way too soon to be able to run any userland programs. This is especially true when the module is compiled directly inside the kernel (built-in); in that case, the SPL would try to run hostid when the kernel is still initializing, which of course is doomed to fail. This patch fixes the issue by deferring hostid generation until something actually needs the hostid (that is, when zone_get_hostid() is called), thus switching to a "on-initialization" model to a "on-demand" (lazy loading) model. ZFS only needs the hostid when some pool operations are requested, and this always happens way after the kernel has finished initialization, thus solving the problem. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#851	2012-07-26 15:14:02 -07:00
Etienne Dechamps	c167aadb27	Add script for builtin module building. This commit introduces a "copy-builtin" script designed to prepare a kernel source tree for building SPL as a builtin module. The script makes a full copy of all needed files, thus making the kernel source tree fully independent of the spl source package. To achieve that, some compilation flags (-include, -I) have been moved to module/Makefile. This Makefile is only used when compiling external modules; when compiling builtin modules, a Kbuild file generated by the configure-builtin script is used instead. This makes sure Makefiles inside the kernel source tree does not contain references to the spl source package. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#851	2012-07-26 15:13:09 -07:00
Etienne Dechamps	38b5ff4d07	Fix undefined reference on spl_mutex_spin_max(). Commit `3160d4f56b` changed the set of conditions under which spl_mutex_spin_max would be implemented as a function by changing an #if in sys/mutex.h. The corresponding implementation file spl-mutex.c, however, has not been updated to reflect the change. This results in undefined reference errors on spl_mutex_spin_max under the following condition: ((!CONFIG_SMP \|\| CONFIG_DEBUG_MUTEXES) && HAVE_MUTEX_OWNER && HAVE_TASK_CURR) This patch fixes the issue by using the same #if in sys/mutex.h and spl-mutex.c. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#851	2012-07-26 14:54:53 -07:00
Etienne Dechamps	94aac9c9bc	Use MODULE variable in module Makefile like zfs. In zfs, each module Makefile contains a MODULE variable which contains the name of the module, and the following declarations reference this variable. In spl, there is a MODULES variable which is never used. Rename it to MODULE and use it like in zfs. This improves consistency between the two build systems. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#851	2012-07-26 14:53:48 -07:00
Etienne Dechamps	2ee4a18b2a	Add script for builtin module building. This commit introduces a "copy-builtin" script designed to prepare a kernel source tree for building ZFS as a builtin module. The script makes a full copy of all needed files, thus making the kernel source tree fully independent of the zfs source package. To achieve that, some compilation flags (-include, -I) have been moved to module/Makefile. This Makefile is only used when compiling external modules; when compiling builtin modules, a Kbuild file generated by the configure-builtin script is used instead. This makes sure Makefiles inside the kernel source tree does not contain references to the zfs source package. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #851	2012-07-26 13:45:09 -07:00
Richard Yao	739a1a82e0	Linux 3.5 compat, end_writeback() changed to clear_inode() The end_writeback() function was changed by moving the call to inode_sync_wait() earlier in to evict(). This effecitvely changes the ordering of the sync but it does not impact the details of the zfs implementation. However, as part of this change end_writeback() was renamed to clear_inode() to reflect the new semantics. This change does impact us and clear_inode() now maps to end_writeback() for kernels prior to 3.5. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #784	2012-07-23 12:29:36 -07:00
Richard Yao	ea1fdf46e2	Linux 3.5 compat, iops->truncate_range() removed The vmtruncate_range() support has been removed from the kernel in favor of using the fallocate method in the file_operations table. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:32 -07:00
Richard Yao	756c3e5a9c	Linux 3.5 compat, eops->encode_fh() takes inodes The export_operations member ->encode_fh() has been updated to take both the child and parent inodes. This interface used to take the child dentry and a bool describing if the parent is needed. NOTE: While updating this code I noticed that we do not currently cleanly handle the case where we're passed a connectable parent. This code should be audited to make sure we're doing the right thing. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #784	2012-07-23 12:29:23 -07:00
Brian Behlendorf	fc173c8589	Disable .zfs directory on 32-bit systems The .zfs control directory implementation currently relies on the fact that there is a direct 1:1 mapping from an object id to its inode number. This works well as long as the system uses a 64-bit value to store the inode number. Unfortunately, the Linux kernel defines the inode number as an 'unsigned long' type. This means that for 32-bit systems will only have 32-bit inode numbers but we still have 64-bit object ids. This problem is particularly acute for the .zfs directories which leverage those upper 32-bits. This is done to avoid conflicting with object ids which are allocated monotonically starting from 0. This is likely to also be a problem for datasets on 32-bit systems with more than ~2 billion files. The right long term fix must remove the simple 1:1 mapping. Until that's done the only safe thing to do is to disable the .zfs directory on 32-bit systems. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-07-20 12:20:57 -07:00
Brian Behlendorf	e8267acd25	32-bit compat, hostid_read() Explicitly cast the sizeof in hostid_read() to prevent the following compiler warning on 32-bit systems. module/spl/spl-generic.c:490:10: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'unsigned int' [-Werror=format] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-07-20 11:14:04 -07:00
Brian Behlendorf	2a4a9dc2f0	Add ddt_object_load() error handling Add the missing error handling to ddt_object_load(). There's no good reason this needs to be fatal. It is preferable that an error be returned. This will allow 'zpool import -FX' to safely attempt to rollback through previous txgs looking for a good one. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-07-20 10:36:21 -07:00
Brian Behlendorf	10be533e33	Add 'inline' keyword The '__attribute__((always_inline))' does not strictly imply 'inline'. Newer versions of gcc detect this misuse and issue the following warning. Including the missing 'inline' resolves the build warning. ./module/zfs/dsl_scan.c:758:1:error: always_inline function might not be inlinable [-Werror=attributes] Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-07-19 13:41:00 -07:00
Richard Yao	0a6b03d3b8	Fix build failures on PaX/GRSecurity patched kernels Gentoo Hardened kernels include the PaX/GRSecurity patches. They use a dialect of C that relies on a GCC plugin. In particular, struct file_operations has been marked do_const in the PaX/GRSecurity dialect, which causes GCC to consider all instances of it as const. This caused failures in the autotools checks and the ZFS source code. To address this, we modify the autotools checks to take into account differences between the PaX C dialect and the regular C dialect. We also modify struct zfs_acl's z_ops member to be a pointer to a function pointer table. Lastly, we modify zpl_put_link() to address a PaX change to the function prototype of nd_get_link(). This avoids compiler errors in the PaX/GRSecurity dialect. Note that the change in zpl_put_link() causes a warning that becomes a build failure when debugging is enabled. Fixing that warning requires ryao/spl@5ca50ef459. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #484	2012-07-17 09:22:43 -07:00
Etienne Dechamps	b5a28807cd	Move partition scanning from userspace to module. Currently, zpool online -e (dynamic vdev expansion) doesn't work on whole disks because we're invoking ioctl(BLKRRPART) from userspace while ZFS still has a partition open on the disk, which results in EBUSY. This patch moves the BLKRRPART invocation from the zpool utility to the module. Specifically, this is done just before opening the device in vdev_disk_open() which is called inside vdev_reopen(). This requires jumping through some hoops to get to the disk device from the partition device, and to make sure we can still open the partition after the BLKRRPART call. Note that this new code path is triggered on dynamic vdev expansion only; other actions, like creating a new pool, are unchanged and still call BLKRRPART from userspace. This change also depends on API changes which are available in 2.6.37 and latter kernels. The build system has been updated to detect this, but there is no compatibility mode for older kernels. This means that online expansion will NOT be available in older kernels. However, it will still be possible to expand the vdev offline. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #808	2012-07-17 09:17:31 -07:00
George Wilson	c7f2d69de3	Illumos #1949 , #1953 1949 crash during reguid causes stale config 1953 allow and unallow missing from zpool history since removal of pyzfs Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Matt Ahrens <matt@delphix.com> Reviewed by: Eric Schrock <eric.schrock@delphix.com> Reviewed by: Bill Pijewski <wdp@joyent.com> Reviewed by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Garrett D'Amore <garrett.damore@gmail.com> Reviewed by: Dan McDonald <danmcd@nexenta.com> Reviewed by: Steve Gonczi <gonczi@comcast.net> Approved by: Eric Schrock <eric.schrock@delphix.com> References: https://www.illumos.org/issues/1949 https://www.illumos.org/issues/1953 Ported by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:33:31 -07:00
Garrett D'Amore	3541dc6d02	Illumos #1748 : desire support for reguid in zfs Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com> Reviewed by: Alexander Stetsenko <ams@nexenta.com> Approved by: Richard Lowe <richlowe@richlowe.net> References: https://www.illumos.org/issues/1748 This commit modifies the user to kernel space ioctl ABI. Extra care should be taken when updating to ensure both the kernel modules and utilities are updated. If only the user space component is updated both the 'zpool events' command and the 'zpool reguid' command will not work until the kernel modules are updated. Ported by: Martin Matuska <martin@matuska.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #665	2012-07-11 13:08:56 -07:00
Richard Yao	36811b4430	Detect kernels that honor gfp flags passed to vmalloc() zfsonlinux/spl@2092cf68d8 used PF_MEMALLOC to workaround a bug in the Linux kernel where allocations did not honor the gfp flags passed to vmalloc(). Unfortunately, PF_MEMALLOC has the side effect of permitting allocations to allocate pages outside of ZONE_NORMAL. This has been observed to result in the depletion of ZONE_DMA32. A kernel patch is available in the Gentoo bug tracker for this issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 This negates any benefit PF_MEMALLOC provides, so we introduce an autotools check to disable the use of PF_MEMALLOC on systems with patched kernels. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #126	2012-07-11 11:44:27 -07:00
Richard Yao	973e8269bd	Constify memory management functions This prevents warnings in ZFS that were caused by changes necessary to support PaX patched kernels. When debugging is enabled, these warnings become build failures. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #131	2012-07-03 16:07:27 -07:00
Brian Behlendorf	42d3b990cf	Update incorrect ddt_zap_lookup() assertion When the ddt_zap_lookup() function was updated to dynamically allocate memory for the cbuf variable, to save stack space, the 'csize <= sizeof (cbuf)' assertion was not updated. The result of this was that the size of the pointer was being used in the comparison rather than the buffer size. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov>	2012-07-03 15:14:34 -07:00
Brian Behlendorf	44e406d712	PowerPC Compatibility Usage of get_current() is not supported across all architectures. The correct interface to use is the '#define current' which will map to the appropriate function, usually current_thread_info(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #119	2012-07-02 09:33:09 -07:00
Etienne Dechamps	b6ad9671ac	Add ZIL statistics. The performance of the ZIL is usually the main bottleneck when dealing with synchronous, write-heavy workloads (e.g. databases). Understanding the behavior of the ZIL is required to diagnose performance issues for these workloads, and to tune ZIL parameters (like zil_slog_limit) accordingly. This commit adds a new kstat page dedicated to the ZIL with some counters which, hopefully, scheds some light into what the ZIL is doing, and how it is doing it. Currently, these statistics are available in /proc/spl/kstat/zfs/zil. A description of the fields can be found in zil.h. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #786	2012-06-29 09:56:51 -07:00
Pawel Jakub Dawidek	0cee24064a	Speed up 'zfs list -t snapshot -o name -s name' FreeBSD #xxx: Dramatically optimize listing snapshots when user requests only snapshot names and wants to sort them by name, ie. when executes: # zfs list -t snapshot -o name -s name Because only name is needed we don't have to read all snapshot properties. Below you can find how long does it take to list 34509 snapshots from a single disk pool before and after this change with cold and warm cache: before: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 525s warm cache: 218s after: # time zfs list -t snapshot -o name -s name > /dev/null cold cache: 1.7s warm cache: 1.1s NOTE: This patch only appears in FreeBSD. If/when Illumos picks up the change we may want to drop this patch and adopt their version. However, for now this addresses a real issue. Ported-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #450	2012-06-14 09:49:04 -07:00
Darik Horn	74497b7ab6	Add zvol_inhibit_dev module option. ZoL can create more zvols at runtime than can be configured during system start, which hangs the init stack at reboot. When a slow system has more than a few hundred zvols, udev will fork bomb during system start and spend too much time in device detection routines, so upstart kills it. The zfs_inhibit_dev option allows an affected system to be rescued by skipping /dev/zd* creation and thereby avoiding the udev overload. All zvols are made inaccessible if this option is set, but the `zfs destroy` and `zfs send` commands still work, and ZFS filesystems can be mounted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-06-13 17:05:16 -07:00
Richard Yao	e0093fea58	Linux 3.4 compat, __clear_close_on_exec replaces FD_CLR torvalds/linux@1dce27c5aa introduced __clear_close_on_exec() as a replacement for FD_CLR. Further commits appear to have removed FD_CLR from the Linux source tree. This causes the following failure: error: implicit declaration of function '__FD_CLR' [-Werror=implicit-function-declaration] To correct this we update the code to use the current __clear_close_on_exec() interface for readability. Then we introduce an autotools check to determine if __clear_close_on_exec() is available. If it isn't then we define some compatibility logic which used the older FD_CLR() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #124	2012-06-13 16:18:51 -07:00
Brian Behlendorf	eaac9ba510	Fix uninit variable in slab reclaim test Gcc version 4.7.0 reports the delta.tv_sec in the slab reclaim test as potentially unitialized. In practice this will never occur but to keep gcc happy we initialize the variable to zero. Signed-off-by: Brian Behlendorf <behlendo@fedora-17-amd64.(none)>	2012-06-13 16:17:22 -07:00
Etienne Dechamps	ee191e802c	Make zil_slog_limit a tunable module parameter. zil_slog_limit specifies the maximum commit size to be written to the separate log device. Larger commits bypass the separate log device and go directly to the data devices. The optimal value for zil_slog_limit directly depends on the latency and throughput characteristics of both the separate log device and the data disks. Small synchronous writes are faster on low-latency separate log devices (e.g. SSDs) whereas large synchronous writes are faster on high-latency data disks (e.g. spindles) because of higher throughput, especially with a large array. The point is, the line between "small" and "large" synchronous writes in this scenario is heavily dependent on the hardware used. That's why it should be made configurable. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #783	2012-06-12 08:45:53 -07:00
Richard Yao	6a0936babc	Linux 3.4 compat, d_make_root() replaces d_alloc_root() torvalds/linux@adc0e91ab1 introduced introduced d_make_root() as a replacement for d_alloc_root(). Further commits appear to have removed d_alloc_root() from the Linux source tree. This causes the following failure: error: implicit declaration of function 'd_alloc_root' [-Werror=implicit-function-declaration] To correct this we update the code to use the current d_make_root() interface for readability. Then we introduce an autotools check to determine if d_make_root() is available. If it isn't then we define some compatibility logic which used the older d_alloc_root() interface. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #776	2012-06-11 10:04:49 -07:00
Etienne Dechamps	ab85f8455b	Honor logbias when writing to ZVOLs. The logbias option is not taken into account when writing to ZVOLs. We fix that by using the same logic as in the zfs filesystem write code (see zfs_log.c). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #774	2012-06-11 09:43:48 -07:00
Brian Behlendorf	2371321e8a	Fix invalid context bug In the module unload path the vm_file_cache was being destroyed under a spin lock. Because this operation might sleep it was possible, although very very unlikely, that this could result in a deadlock. This issue was indentified by using a Linux debug kernel and has been fixed by moving the kmem_cache_destroy() out from under the spin lock. There is no need to lock this operation here. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#771	2012-06-11 09:17:45 -07:00
Jorgen Lundman	93b0dc92ea	Fix ARM 64-bit division Correctly implementating 64-bit division for ARM requires more than just providing the __aeabi_uldivmod() and __aeabi_ldivmod() symbols. They are need to be implemented is such a way that the quotient and remainder and left in specific registers after the division operation completes. This change updates the wrapper functions to accomplish this according to the official ARM Run-time ABI. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#706	2012-05-22 09:27:11 -07:00
Brian Behlendorf	38d31a1e57	Remove Solaris module emulation Originally I believed that these interfaces would be needed. However, in practice it turned out that it was more straight forward and maintainable to use the native Linux interfaces. As such, this is all dead code and can be safely removed. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #109	2012-05-18 13:57:44 -07:00
Prakash Surya	a9a7a01cf5	Add SPLAT test to exercise slab direct reclaim This test is designed to verify that direct reclaim is functioning as expected. We allocate a large number of objects thus creating a large number of slabs. We then apply memory pressure and expect that the direct reclaim path can easily recover those slabs. The registered reclaim function will free the objects and the slab shrinker will call it repeatedly until at least a single slab can be freed. Note it may not be possible to reclaim every last slab via direct reclaim without a failure because the shrinker_rwsem may be contended. For this reason, quickly reclaiming 3/4 of the slabs is considered a success. This should all be possible within 10 seconds. For reference, on a system with 2G of memory this test takes roughly 0.2 seconds to run. It may take longer on larger memory systems but should still easily complete in the alloted 10 seconds. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #107	2012-05-07 11:55:59 -07:00
Brian Behlendorf	b78d4b9d98	Ensure a minimum of one slab is reclaimed To minimize the chance of triggering an OOM during direct reclaim. The kmem caches have been improved to make a best effort to reclaim at least one slab when a reclaim function is registered. This helps avoid the case where objects are released but they are spread over multiple slabs so no memory gets reclaimed. Care has been taken to avoid deadlocking if the reclaim function is unable to make forward progress. Additionally, the reclaim function may be skipped entirely if there are already free slabs which can be safely reaped. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #107	2012-05-07 11:54:28 -07:00
Brian Behlendorf	06089b9e19	Ensure direct reclaim forward progress The Linux direct reclaim path uses this out of band value to determine if forward progress is being made. Normally this is incremented by kmem_freepages() which is part of the various Linux slab implementations. However, since we are using none of that infrastructure we're responsible for incrementing this count. If no forward progress is detected and a subsequent allocation fails the OOM killer will be invoked. If there was forward progress additional reclaim will be attempted via the page cache and registerd shrinker until the allocation succeeds. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #107	2012-05-07 11:54:19 -07:00
Prakash Surya	c0e0fc14e3	Ignore slab cache age and delay in direct reclaim When memory pressure triggers direct memory reclaim, a slabs age and delay should not prevent it from being freed. This patch ensures these values are ignored, allowing an empty slab to be freed in this code path no matter the value of its age and delay. This prevents needless scanning of the partial slabs and has been observed to significantly reduce the total cpu usage. In addition, it should allow for snappier reclaim under memory pressure. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #102	2012-05-07 11:50:04 -07:00
Prakash Surya	cef7605c34	Throttle number of freed slabs based on nr_to_scan Previously, the SPL tried to maintain Solaris semantics by freeing all available (empty) slabs from its slab caches when the shrinker was called. This is not desirable when running on Linux. To make the SPL shrinker more Linux friendly, the actual number of freed slabs from each of the slab caches is now derived from nr_to_scan and skc_slab_objs. Additionally, an accounting bug was fixed in spl_slab_reclaim() which could cause us to reclaim one more slab than requested. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #101	2012-05-07 11:46:15 -07:00
Jorgen Lundman	ef6f91ce0c	Add missing 64-bit divide for 32-bit ARM Leverage the existing generic 64-bit division operations which were originally implemented for x86 to support ARM. All that is required is to make the symbols available to the linker with the expected names. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-05-03 10:07:54 -07:00
Brian Behlendorf	710114089f	Revert "Disable direct reclaim on zvols" This reverts commit `ce90208cf9`. This change was observed to cause problems when using a zvol to back a VM under 2.6.32.59 kernels. This issue was filed as #710. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #342 Issue #710	2012-04-30 14:26:49 -07:00
Brian Behlendorf	b39d3b9f7b	Linux 3.3 compat, iops->create()/mkdir()/mknod() The mode argument of iops->create()/mkdir()/mknod() was changed from an 'int' to a 'umode_t'. To prevent a compiler warning an autoconf check was added to detect the API change and then correctly set a zpl_umode_t typedef. There is no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #701	2012-04-30 12:52:38 -07:00
Richard Yao	ce90208cf9	Disable direct reclaim on zvols Previously, it was possible for the direct reclaim path to be invoked when a write to a zvol was made. When a zvol is used as a swap device, this often causes swap requests to depend on additional swap requests, which deadlocks. We address this by disabling the direct reclaim path on zvols. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #342	2012-04-30 11:25:36 -07:00

... 76 77 78 79 80 ...

4413 Commits