1
0
mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2025-01-19 06:26:33 +03:00
mirror_zfs/man/man5/spl-module-parameters.5

358 lines
11 KiB
Groff
Raw Normal View History

'\" te
.\"
.\" Copyright 2013 Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
.\"
.TH SPL-MODULE-PARAMETERS 5 "Oct 28, 2017"
.SH NAME
spl\-module\-parameters \- SPL module parameters
.SH DESCRIPTION
.sp
.LP
Description of the different parameters to the SPL module.
.SS "Module parameters"
.sp
.LP
.sp
.ne 2
.na
\fBspl_kmem_cache_expire\fR (uint)
.ad
.RS 12n
Cache expiration is part of default Illumos cache behavior. The idea is
that objects in magazines which have not been recently accessed should be
returned to the slabs periodically. This is known as cache aging and
when enabled objects will be typically returned after 15 seconds.
.sp
On the other hand Linux slabs are designed to never move objects back to
the slabs unless there is memory pressure. This is possible because under
Linux the cache will be notified when memory is low and objects can be
released.
.sp
By default only the Linux method is enabled. It has been shown to improve
responsiveness on low memory systems and not negatively impact the performance
of systems with more memory. This policy may be changed by setting the
\fBspl_kmem_cache_expire\fR bit mask as follows, both policies may be enabled
concurrently.
.sp
0x01 - Aging (Illumos), 0x02 - Low memory (Linux)
.sp
Default value: \fB0x02\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_kmem_threads\fR (uint)
.ad
.RS 12n
The number of threads created for the spl_kmem_cache task queue. This task
queue is responsible for allocating new slabs for use by the kmem caches.
For the majority of systems and workloads only a small number of threads are
required.
.sp
Default value: \fB4\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_reclaim\fR (uint)
.ad
.RS 12n
When this is set it prevents Linux from being able to rapidly reclaim all the
memory held by the kmem caches. This may be useful in circumstances where
it's preferable that Linux reclaim memory from some other subsystem first.
Setting this will increase the likelihood out of memory events on a memory
constrained system.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_obj_per_slab\fR (uint)
.ad
.RS 12n
The preferred number of objects per slab in the cache. In general, a larger
value will increase the caches memory footprint while decreasing the time
required to perform an allocation. Conversely, a smaller value will minimize
the footprint and improve cache reclaim time but individual allocations may
take longer.
.sp
Refine slab cache sizing This change is designed to improve the memory utilization of slabs by more carefully setting their size. The way the code currently works is problematic for slabs which contain large objects (>1MB). This is due to slabs being unconditionally rounded up to a power of two which may result in unused space at the end of the slab. The reason the existing code rounds up every slab is because it assumes it will backed by the buddy allocator. Since the buddy allocator can only performs power of two allocations this is desirable because it avoids wasting any space. However, this logic breaks down if slab is backed by vmalloc() which operates at a page level granularity. In this case, the optimal thing to do is calculate the minimum required slab size given certain constraints (object size, alignment, objects/slab, etc). Therefore, this patch reworks the spl_slab_size() function so that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs. KMC_KMEM slabs are rounded up to the nearest power of two, and KMC_VMEM slabs are allowed to be the minimum required size. This change also reduces the default number of objects per slab. This reduces how much memory a single cache object can pin, which can result in significant memory saving for highly fragmented caches. But depending on the workload it may result in slabs being allocated and freed more frequently. In practice, this has been shown to be a better default for most workloads. Also the maximum slab size has been reduced to 4MB on 32-bit systems. Due to the limited virtual address space it's critical the we be as frugal as possible. A limit of 4M still lets us reasonably comfortably allocate a limited number of 1MB objects. Finally, the kmem:slab_small and kmem:slab_large SPLAT tests were extended to provide better test coverage of various object sizes and alignments. Caches are created with random parameters and their basic functionality is verified by allocating several slabs worth of objects. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-16 01:06:18 +03:00
Default value: \fB8\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_obj_per_slab_min\fR (uint)
.ad
.RS 12n
The minimum number of objects allowed per slab. Normally slabs will contain
\fBspl_kmem_cache_obj_per_slab\fR objects but for caches that contain very
large objects it's desirable to only have a few, or even just one, object per
slab.
.sp
Default value: \fB1\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_max_size\fR (uint)
.ad
.RS 12n
The maximum size of a kmem cache slab in MiB. This effectively limits
the maximum cache object size to \fBspl_kmem_cache_max_size\fR /
\fBspl_kmem_cache_obj_per_slab\fR. Caches may not be created with
object sized larger than this limit.
.sp
Refine slab cache sizing This change is designed to improve the memory utilization of slabs by more carefully setting their size. The way the code currently works is problematic for slabs which contain large objects (>1MB). This is due to slabs being unconditionally rounded up to a power of two which may result in unused space at the end of the slab. The reason the existing code rounds up every slab is because it assumes it will backed by the buddy allocator. Since the buddy allocator can only performs power of two allocations this is desirable because it avoids wasting any space. However, this logic breaks down if slab is backed by vmalloc() which operates at a page level granularity. In this case, the optimal thing to do is calculate the minimum required slab size given certain constraints (object size, alignment, objects/slab, etc). Therefore, this patch reworks the spl_slab_size() function so that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs. KMC_KMEM slabs are rounded up to the nearest power of two, and KMC_VMEM slabs are allowed to be the minimum required size. This change also reduces the default number of objects per slab. This reduces how much memory a single cache object can pin, which can result in significant memory saving for highly fragmented caches. But depending on the workload it may result in slabs being allocated and freed more frequently. In practice, this has been shown to be a better default for most workloads. Also the maximum slab size has been reduced to 4MB on 32-bit systems. Due to the limited virtual address space it's critical the we be as frugal as possible. A limit of 4M still lets us reasonably comfortably allocate a limited number of 1MB objects. Finally, the kmem:slab_small and kmem:slab_large SPLAT tests were extended to provide better test coverage of various object sizes and alignments. Caches are created with random parameters and their basic functionality is verified by allocating several slabs worth of objects. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-16 01:06:18 +03:00
Default value: \fB32 (64-bit) or 4 (32-bit)\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_slab_limit\fR (uint)
.ad
.RS 12n
For small objects the Linux slab allocator should be used to make the most
efficient use of the memory. However, large objects are not supported by
the Linux slab and therefore the SPL implementation is preferred. This
value is used to determine the cutoff between a small and large object.
.sp
Objects of \fBspl_kmem_cache_slab_limit\fR or smaller will be allocated
using the Linux slab allocator, large objects use the SPL allocator. A
cutoff of 16K was determined to be optimal for architectures using 4K pages.
.sp
Default value: \fB16,384\fR
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_kmem_limit\fR (uint)
.ad
.RS 12n
Depending on the size of a cache object it may be backed by kmalloc()'d
or vmalloc()'d memory. This is because the size of the required allocation
greatly impacts the best way to allocate the memory.
.sp
When objects are small and only a small number of memory pages need to be
allocated, ideally just one, then kmalloc() is very efficient. However,
when allocating multiple pages with kmalloc() it gets increasingly expensive
because the pages must be physically contiguous.
.sp
For this reason we shift to vmalloc() for slabs of large objects which
which removes the need for contiguous pages. We cannot use vmalloc() in
all cases because there is significant locking overhead involved. This
function takes a single global lock over the entire virtual address range
which serializes all allocations. Using slightly different allocation
functions for small and large objects allows us to handle a wide range of
object sizes.
.sp
The \fBspl_kmem_cache_kmem_limit\fR value is used to determine this cutoff
size. One quarter the PAGE_SIZE is used as the default value because
\fBspl_kmem_cache_obj_per_slab\fR defaults to 16. This means that at
most we will need to allocate four contiguous pages.
.sp
Default value: \fBPAGE_SIZE/4\fR
.RE
Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-08 23:37:14 +03:00
.sp
.ne 2
.na
\fBspl_kmem_alloc_warn\fR (uint)
.ad
.RS 12n
As a general rule kmem_alloc() allocations should be small, preferably
just a few pages since they must by physically contiguous. Therefore, a
rate limited warning will be printed to the console for any kmem_alloc()
which exceeds a reasonable threshold.
.sp
Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-08 23:37:14 +03:00
The default warning threshold is set to eight pages but capped at 32K to
accommodate systems using large pages. This value was selected to be small
enough to ensure the largest allocations are quickly noticed and fixed.
But large enough to avoid logging any warnings when a allocation size is
larger than optimal but not a serious concern. Since this value is tunable,
developers are encouraged to set it lower when testing so any new largish
allocations are quickly caught. These warnings may be disabled by setting
the threshold to zero.
.sp
Default value: \fB32,768\fR
Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-08 23:37:14 +03:00
.RE
.sp
.ne 2
.na
\fBspl_kmem_alloc_max\fR (uint)
.ad
.RS 12n
Large kmem_alloc() allocations will fail if they exceed KMALLOC_MAX_SIZE.
Allocations which are marginally smaller than this limit may succeed but
should still be avoided due to the expense of locating a contiguous range
of free pages. Therefore, a maximum kmem size with reasonable safely
margin of 4x is set. Kmem_alloc() allocations larger than this maximum
will quickly fail. Vmem_alloc() allocations less than or equal to this
value will use kmalloc(), but shift to vmalloc() when exceeding this value.
.sp
Default value: \fBKMALLOC_MAX_SIZE/4\fR
Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-08 23:37:14 +03:00
.RE
.sp
.ne 2
.na
\fBspl_kmem_cache_magazine_size\fR (uint)
.ad
.RS 12n
Cache magazines are an optimization designed to minimize the cost of
allocating memory. They do this by keeping a per-cpu cache of recently
freed objects, which can then be reallocated without taking a lock. This
can improve performance on highly contended caches. However, because
objects in magazines will prevent otherwise empty slabs from being
immediately released this may not be ideal for low memory machines.
.sp
For this reason \fBspl_kmem_cache_magazine_size\fR can be used to set a
maximum magazine size. When this value is set to 0 the magazine size will
be automatically determined based on the object size. Otherwise magazines
will be limited to 2-256 objects per magazine (i.e per cpu). Magazines
may never be entirely disabled in this implementation.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_hostid\fR (ulong)
.ad
.RS 12n
The system hostid, when set this can be used to uniquely identify a system.
By default this value is set to zero which indicates the hostid is disabled.
It can be explicitly enabled by placing a unique non-zero value in
\fB/etc/hostid/\fR.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_hostid_path\fR (charp)
.ad
.RS 12n
The expected path to locate the system hostid when specified. This value
may be overridden for non-standard configurations.
.sp
Default value: \fB/etc/hostid\fR
.RE
.sp
.ne 2
.na
\fBspl_panic_halt\fR (uint)
.ad
.RS 12n
Cause a kernel panic on assertion failures. When not enabled, the thread is
halted to facilitate further debugging.
.sp
Set to a non-zero value to enable.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_taskq_kick\fR (uint)
.ad
.RS 12n
Kick stuck taskq to spawn threads. When writing a non-zero value to it, it will
scan all the taskqs. If any of them have a pending task more than 5 seconds old,
it will kick it to spawn more threads. This can be used if you find a rare
deadlock occurs because one or more taskqs didn't spawn a thread when it should.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_taskq_thread_bind\fR (int)
.ad
.RS 12n
Bind taskq threads to specific CPUs. When enabled all taskq threads will
be distributed evenly over the available CPUs. By default, this behavior
is disabled to allow the Linux scheduler the maximum flexibility to determine
where a thread should run.
.sp
Default value: \fB0\fR
.RE
.sp
.ne 2
.na
\fBspl_taskq_thread_dynamic\fR (int)
.ad
.RS 12n
Allow dynamic taskqs. When enabled taskqs which set the TASKQ_DYNAMIC flag
will by default create only a single thread. New threads will be created on
demand up to a maximum allowed number to facilitate the completion of
outstanding tasks. Threads which are no longer needed will be promptly
destroyed. By default this behavior is enabled but it can be disabled to
aid performance analysis or troubleshooting.
.sp
Default value: \fB1\fR
.RE
.sp
.ne 2
.na
\fBspl_taskq_thread_priority\fR (int)
.ad
.RS 12n
Allow newly created taskq threads to set a non-default scheduler priority.
When enabled the priority specified when a taskq is created will be applied
to all threads created by that taskq. When disabled all threads will use
the default Linux kernel thread priority. By default, this behavior is
enabled.
.sp
Default value: \fB1\fR
.RE
.sp
.ne 2
.na
\fBspl_taskq_thread_sequential\fR (int)
.ad
.RS 12n
The number of items a taskq worker thread must handle without interruption
before requesting a new worker thread be spawned. This is used to control
how quickly taskqs ramp up the number of threads processing the queue.
Because Linux thread creation and destruction are relatively inexpensive a
small default value has been selected. This means that normally threads will
be created aggressively which is desirable. Increasing this value will
result in a slower thread creation rate which may be preferable for some
configurations.
.sp
Default value: \fB4\fR
.RE
.sp
.ne 2
.na
\fBspl_max_show_tasks\fR (uint)
.ad
.RS 12n
The maximum number of tasks per pending list in each taskq shown in
/proc/spl/{taskq,taskq-all}. Write 0 to turn off the limit. The proc file will
walk the lists with lock held, reading it could cause a lock up if the list
grow too large without limiting the output. "(truncated)" will be shown if the
list is larger than the limit.
.sp
Default value: \fB512\fR
.RE