Platforms such as Illumos and FreeBSD have historically provided
global variables which summerize the memory state of a system.
Linux on the otherhand doesn't expose any of this information
to kernel modules and uses entirely different mechanisms for
memory management.
In order to simplify the original ZFS port to Linux these global
variables were emulated by the SPL for the benefit of ZFS. As ZoL
has matured over the years it has moved steadily away from these
interfaces and now no longer depends on them at all.
Therefore, this patch completely removes the global variables
availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree,
and swapfs_reserve. This greatly simplifies the memory management
code and eliminates a common area of confusion.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The get_vmalloc_info() function was used to back the vmem_size()
function. This was always problematic and resulted in brittle
code because the kernel never provided a clean interface for
modules.
However, it turns out that the only caller of this function in
ZFS uses it to determine the total virtual address space size.
This can be determined easily without get_vmalloc_info() so
vmem_size() has been updated to take this approach which allows
us to shed the get_vmalloc_info() dependency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The on_each_cpu() function has been available since Linux 2.6.27.
There is no longer a need to maintain this compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The mutex_lock_nested() function has been available since Linux 2.6.18.
There is no longer a need to maintain this compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The inode structure has used i_mutex as its internal locking
primitive since 2.6.16. The compatibility code to check for
the previous semaphore primitive has been removed. However,
the wrapper function itself is being kept because it's entirely
possible this primitive will change again to allow finer grained
locking.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The kmalloc_node() function has been available since Linux 2.6.12.
There is no longer a need to maintain this compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The uaccess header has been available in the same location since
Linux 2.6.18. There is no longer a need to maintain this
compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The uintptr_t typedef has been available since Linux 2.6.24.
There is no longer a need to maintain this compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The atomic64_xchg() and atomic64_cmpxchg() functions have been
available since Linux 2.6.24. There is no longer a need to
maintain this compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Many of the time functions had grown overly complex in order to
handle kernel compatibility issues. However, as of Linux 2.6.26
all the required functionality is available. This allows us to
retire numerous configure checks and greatly simplify the time
compatibility wrappers.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The fls64() function has been available since Linux 2.6.16 and
it should be used to implemented highbit64(). This allows us
to provide an optimized implementation and simplify the code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Support for the CTL_UNNUMBERED sysctl interface was removed in
Linux 2.6.19. There is no longer any reason to maintain this
compatibility code. There also issue any reason to keep around
the CTL_NAME macro and helpers so they have been retired.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The register_sysctl() interface has been stable since Linux 2.6.21.
There is no longer a need to maintain compatibility code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
There is no longer a need to wrap this because utsname() is provided
by the kernel and can be called directly. This will require a small
change in the ZFS code because utsname is expected to be a global
structure and not a function.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Since the Linux 2.6.29 kernel all mutexes have been adaptive mutexs.
There is no longer any point in keeping this code so it is being
removed to simplify the code.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When the SPL was originally written it was designed to use the
device_create() and device_destroy() functions. Unfortunately,
these functions changed considerably over the years making them
difficult to rely on.
As it turns out a better choice would have been to use the
misc_register()/misc_deregister() functions. This interface
for registering character devices has remained stable, is simple,
and provides everything we need.
Therefore the code has been reworked to use this interface. The
higher level ZFS code has always depended on these same interfaces
so this is also as a step towards minimizing our kernel dependencies.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Apply the license specified in the META file to ensure the
compatibility checks are all performed consistently.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
zfsonlinux/spl#bcb15891ab394e11615eee08bba1fd85ac32e158 implemented
Linux 3.6+ support by adding duplicate vn_rename and vn_remove
functions. The new ones were cleaner, but the duplicate functions made
the codebase less maintainable. This adds some compatibility shims that
allow us to retire the older vn_rename and vn_remove in favor of the new
ones on old kernels. The result is a net 143 line reduction in lines of
code and a cleaner codebase.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#370
Linux kernel 3.17 removes the action function argument from
wait_on_bit(). Add autoconf test and compatibility macro to support
the new interface.
The former "wait_on_bit" interface required an 'action' function to
be provided which does the actual waiting. There were over 20 such
functions in the kernel, many of them identical, though most cases
can be satisfied by one of just two functions: one which uses
io_schedule() and one which just uses schedule(). This API change
was made to consolidate all of those redundant wait functions.
References: torvalds/linux@7431620
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#378
Set LANG=C before calling 'rpmbuild' to avoid rpmbuild failing on
the translated date string in the changelog.
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#306
When creating packages in a git repository the release number
can be automatically set by 'git describe'. This normally works
well but if your repository has newer tags which match the form
NAME-VERSION* the release may be incorrectly calculated. To
prevent this the match patten has been restricted to NAME-VERSION.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
For small objects the Linux slab allocator has several advantages
over its counterpart in the SPL. These include:
1) It is more memory-efficient and packs objects more tightly.
2) It is continually tuned to maximize performance.
Therefore it makes sense to layer the SPLs slab allocator on top
of the Linux slab allocator. This allows us to leverage the
advantages above while preserving the Illumos semantics we depend
on. However, there are some things we need to be careful of:
1) The Linux slab allocator was never designed to work well with
large objects. Because the SPL slab must still handle this use
case a cut off limit was added to transition from Linux slab
backed objects to kmem or vmem backed slabs.
spl_kmem_cache_slab_limit - Objects less than or equal to this
size in bytes will be backed by the Linux slab. By default
this value is zero which disables the Linux slab functionality.
Reasonable values for this cut off limit are in the range of
4096-16386 bytes.
spl_kmem_cache_kmem_limit - Objects less than or equal to this
size in bytes will be backed by a kmem slab. Objects over this
size will be vmem backed instead. This value defaults to
1/8 a page, or 512 bytes on an x86_64 architecture.
2) Be aware that using the Linux slab may inadvertently introduce
new deadlocks. Care has been taken previously to ensure that
all allocations which occur in the write path use GFP_NOIO.
However, there may be internal allocations performed in the
Linux slab which do not honor these flags. If this is the case
a deadlock may occur.
The path forward is definitely to start relying on the Linux slab.
But for that to happen we need to start building confidence that
there aren't any unexpected surprises lurking for us. And ideally
need to move completely away from using the SPLs slab for large
memory allocations. This patch is a first step.
NOTES:
1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab
backed caches but it is not supported for kmem/vmem backed caches.
2) Regardless of the spl_kmem_cache_*_limit settings a cache may
be explicitly set to a given type by passed the KMC_KMEM,
KMC_VMEM, or KMC_SLAB flags during cache creation.
3) The constructors, destructors, and reclaim callbacks are all
functional and will be called regardless of the cache type.
4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to
the issues involved in presenting correct object accounting.
Instead they will appear in /proc/slabinfo under the same names.
5) Several kmem SPLAT tests needed to be fixed because they relied
incorrectly on internal kmem slab accounting. With the updated
test cases all the SPLAT tests pass as expected.
6) An autoconf test was added to ensure that the __GFP_COMP flag
was correctly added to the default flags used when allocating
a slab. This is required to ensure all pages in higher order
slabs are properly refcounted, see ae16ed9.
7) When using the SLUB allocator there is no need to attempt to
set the __GFP_COMP flag. This has been the default behavior
for the SLUB since Linux 2.6.25.
8) When using the SLUB it may be desirable to set the slub_nomerge
kernel parameter to prevent caches from being merged.
Original-patch-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#356
Detect the updated vfs_rename() interface and call it with an
extra flags argument.
References:
torvalds/linux@520c8b1
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #355
This check was originally added for SLES10, a093c6a, to check for
a 'struct vfsmount *' argument which they added. However, since
SLES10 is based on a 2.6.16 kernel which is no longer supported
this functionality was dropped. The checks were refactored to
support Linux 3.13 without concern for historical versions.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#312
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.
This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.
Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped. As a general rule compatibility is
only maintained back to Linux 2.6.26.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1732
Closes zfsonlinux/zfs#1822
Closes#293Closes#307
Needed for Illumos #3582. This interface is supposed to support
a variable-resolution timeout with nanosecond granularity. This
implementation rounds up to microsecond resolution, as nanosecond-
precision timing is rarely needed for real-world performance
tuning and may incur unnecessary busy-waiting. usleep_range() is
used if available, otherwise udelay() or msleep() are used
depending on the length of the delay interval.
Add flags from sys/callo.h as these are used to control the behavior of
cv_timedwait_hires(). Specifically,
CALLOUT_FLAG_ABSOLUTE
Normally, the expiration passed to the timeout API functions is
an expiration interval. If this flag is specified, then it is
interpreted as the expiration time itself.
CALLOUT_FLAG_ROUNDUP
Roundup the expiration time to the next resolution boundary. If this
flag is not specified, the expiration time is rounded down.
References:
https://www.illumos.org/issues/3582illumos/illumos-gate@0689f76
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#304
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.
We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.
Original-patch-by: DHE
Rewrite-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#260
Linux kernel commit torvalds/linux@d9dda78b renamed PDE() to
PDE_DATA(). To handle this detect the prefered interface
and define a PDE_DATA() wrapper for consistency.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257
Linux kernel commmit torvalds/linux@db3808c1 moved the
vmalloc_info structure from a private to a public header.
Now that it's available for kernel modules use it.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #257
Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402
Issue zfsonlinux/zfs#928
When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1402
The only remaining perl dependency is part of the SPL_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380
Rationale see section 3.5 "Using `autoreconf' to Update `configure'
Scripts" of the autoconf manual.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When building from an arbitrary commit in the git tree it's useful
for the resulting packages to be uniquely identifiable. Therefore,
the build system has been updated to detect if your compiling in
git tree.
If you are building in a git tree, and there are commits after the
last annotated tag. Then the <id>-<hash> component of 'git describe'
will be used to overwrite the 'Release:' field in the META file.
The only tricky part is that to ensure the 'make dist' tarball is
built using the correct release. A dist-hook was added to the top
level make file to rewrite the META file using the correct release.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#195
Issue #111
Refresh the existing RPM packaging to conform to the 'Fedora
Packaging Guidelines'. This includes adopting the kmods2
packaging standard which is used fod kmods distributed by
rpmfusion for Fedora/RHEL.
http://fedoraproject.org/wiki/Packaging:Guidelineshttp://rpmfusion.org/Packaging/KernelModules/Kmods2
While the spec files have been entirely rewritten from a
user perspective the only major changes are:
* The Fedora packages now have a build dependency on the
rpmfusion repositories. The generic kmod packages also
have a new dependency on kmodtool-1.22 but it is bundled
with the source rpm so no additional packages are needed.
* The kernel binary module packages have been renamed from
spl-modules-* to kmod-spl-* as specificed by kmods2.
* The is now a common kmod-spl-devel-* package in addition
to the per-kernel devel packages. The common package
contains the development headers while the per-kernel
package contains kernel specific build products.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#222
This was a suggestion that Brian Behlendorf made when reviewing an early
pull request for Linux 3.9 support. This commit was made intentionally
easy to revert should we ever have a reason to reintroduce support for
older kernels.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
torvalds/linux@dcf787f391 enforces
const-correctness in passing struct path *.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The function prototype of vfs_getattr previoulsy took struct vfsmount *
and struct dentry * as arguments. These would always be defined together
in a struct path *.
torvalds/linux@3dadecce20 modified
vfs_getattr to take struct path * is taken as an argument instead.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Linux 3.9 reorganized sched.h, splitting it into numerous files.
torvalds/linux@8bd75c77b7 moved MAX_PRIO
and MAX_RT_PRIO to linux/sched/rt.h.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The kernel modules are now available in the Arch User Repository
(AUR) via zfs. Since their packaging is maintained and superior
to ours it is being removed from the tree.
https://wiki.archlinux.org/index.php/ZFS
Now that various distributions are picking up the packages we
should eventually be able to remove most of this infrastructure.
Packaging belongs with the distributions not upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The HAVE_MUTEX_OWNER_TASK_STRUCT fails on PPC64 with the following
error:
error: 'current' undeclared (first use in this function)
We include linux/sched.h to ensure that current is available.
Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The SPL_AC_ATOMIC_SPINLOCK, SPL_AC_TYPE_ATOMIC64_CMPXCHG, and
SPL_AC_TYPE_ATOMIC64_XCHG were all directly including the
'asm/atomic.h' header. As of Linux 3.4 this header was removed
which results in a build failure.
The right thing to do is include 'linux/atomic.h' however we
can't safely do this because it doesn't exist in 2.6.26 kernels.
Therefore, we include 'linux/fs.h' which in turn includes the
correct atomic header regardless of the kernel version.
When these incorrect APIs are used in ZFS the following build
failure results.
arc.c:791:80: warning: '__ret' may be used uninitialized
in this function [-Wuninitialized]
arc.c:791:1875: error: call to '__cmpxchg_wrong_size'
declared with attribute error: Bad argument size for cmpxchg
Since this is all Linux 2.6.24 compatibility code there's
an argument to be made that it should be removed because
kernels this old are not supported. However, because we're
so close to a release I'm going to leave it in place for now.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closeszfsonlinux/zfs#814Closeszfsonlinux/zfs#1254
Check at ./configure time that the kernel was built with kallsyms
support. If the kernel doesn't have CONFIG_KALLSYMS defined the
modules will still compile cleanly but will not be loadable. So
we really want to catch this early during ./configure. Note that
we do not require CONFIG_KALLSYMS_ALL but it may be safely defined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6
This functionality is no longer required by ZFS, see commit
zfsonlinux/zfs@7b3e34ba5a.
Since there are no other consumers, and because it adds
additional autoconf complexity which must be maintained
the spl_invalidate_inodes() function has been removed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#795
Check at ./configure time that the kernel was built with zlib
support enabled. This support may either be configured as a
module or builtin to the kernel. But if it's missing the build
will fail so it's best to catch this early.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closeszfsonlinux/zfs#582
Some kernels require that we include the 'linux/irqflags.h'
header for the SPL_AC_3ARGS_ON_EACH_CPU check. Otherwise,
the functions local_irq_enable()/local_irq_disable() will not
be defined and the prototype will be misdetected as the four
argument version.
This change actually include 'linux/interrupt.h' which in turn
includes 'linux/irqflags.h' to be as generic as possible.
Additionally, passing NULL as the function can result in a
gcc error because the on_each_cpu() macro executes it
unconditionally. To make the test more robust we pass the
dummy function on_each_cpu_func().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#204
In the upstream kernel the FALLOC_FL_PUNCH_HOLE #define was
introduced after the fallocate() function was moved from the
inode_operations to the file_operations structure. Therefore,
the SPL code assumed that if FALLOC_FL_PUNCH_HOLE was defined
it was safe to use f_ops->fallocate().
Unfortunately, the RHEL6.4 kernel has only backported the
FALLOC_FL_PUNCH_HOLE #define and not the fallocate() change.
To address this compatibility issue the spl_filp_fallocate()
helper function was added to properly detect which interface
is available.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>