The __get_free_pages() function must be used in place of kmalloc()
to ensure the __GFP_COMP is strictly honored. This is due to
kmalloc() being layered on the generic Linux slab caches. It
wasn't until recently that all caches were created using __GFP_COMP.
This means that it is possible for a kmalloc() which passed the
__GFP_COMP flag to be returned a non-compound allocation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This change is designed to improve the memory utilization of
slabs by more carefully setting their size. The way the code
currently works is problematic for slabs which contain large
objects (>1MB). This is due to slabs being unconditionally
rounded up to a power of two which may result in unused space
at the end of the slab.
The reason the existing code rounds up every slab is because it
assumes it will backed by the buddy allocator. Since the buddy
allocator can only performs power of two allocations this is
desirable because it avoids wasting any space. However, this
logic breaks down if slab is backed by vmalloc() which operates
at a page level granularity. In this case, the optimal thing to
do is calculate the minimum required slab size given certain
constraints (object size, alignment, objects/slab, etc).
Therefore, this patch reworks the spl_slab_size() function so
that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs.
KMC_KMEM slabs are rounded up to the nearest power of two, and
KMC_VMEM slabs are allowed to be the minimum required size.
This change also reduces the default number of objects per slab.
This reduces how much memory a single cache object can pin, which
can result in significant memory saving for highly fragmented
caches. But depending on the workload it may result in slabs
being allocated and freed more frequently. In practice, this
has been shown to be a better default for most workloads.
Also the maximum slab size has been reduced to 4MB on 32-bit
systems. Due to the limited virtual address space it's critical
the we be as frugal as possible. A limit of 4M still lets us
reasonably comfortably allocate a limited number of 1MB objects.
Finally, the kmem:slab_small and kmem:slab_large SPLAT tests
were extended to provide better test coverage of various object
sizes and alignments. Caches are created with random parameters
and their basic functionality is verified by allocating several
slabs worth of objects.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This patch achieves the following goals:
1. It replaces the preprocessor kmem flag to gfp flag mapping with
proper translation logic. This eliminates the potential for
surprises that were previously possible where kmem flags were
mapped to gfp flags.
2. It maps vmem_alloc() allocations to kmem_alloc() for allocations
sized less than or equal to the newly-added spl_kmem_alloc_max
parameter. This ensures that small allocations will not contend
on a single global lock, large allocations can still be handled,
and potentially limited virtual address space will not be squandered.
This behavior is entirely different than under Illumos due to
different memory management strategies employed by the respective
kernels. However, this functionally provides the semantics required.
3. The --disable-debug-kmem, --enable-debug-kmem (default), and
--enable-debug-kmem-tracking allocators have been unified in to
a single spl_kmem_alloc_impl() allocation function. This was
done to simplify the code and make it more maintainable.
4. Improve portability by exposing an implementation of the memory
allocations functions that can be safely used in the same way
they are used on Illumos. Specifically, callers may safely
use KM_SLEEP in contexts which perform filesystem IO. This
allows us to eliminate an entire class of Linux specific changes
which were previously required to avoid deadlocking the system.
This change will be largely transparent to existing callers but there
are a few caveats:
1. Because the headers were refactored and extraneous includes removed
callers may find they need to explicitly add additional #includes.
In particular, kmem_cache.h must now be explicitly includes to
access the SPL's kmem cache implementation. This behavior is
different from Illumos but it was done to avoid always masking
the Linux slab functions when kmem.h is included.
2. Callers, like Lustre, which made assumptions about the definitions
of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated.
Other callers such as ZFS which did not will not require changes.
3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains
its original meaning of allowing allocations to access reserved
memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP.
4. The KM_NODEBUG flags has been retired and the default warning
threshold increased to 32k.
5. The kmem_virt() functions has been removed. For callers which
need to distinguish between a physical and virtual address use
is_vmalloc_addr().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Address all cstyle issues in the kmem, vmem, and kmem_cache source
and headers. This will done to make it easier to review subsequent
changes which will rework the kmem/vmem implementation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This change introduces no functional changes to the memory management
interfaces. It only restructures the existing codes by separating the
kmem, vmem, and kmem cache implementations in the separate source and
header files.
Splitting this functionality in to separate files required the addition
of spl_vmem_{init,fini}() and spl_kmem_cache_{initi,fini}() functions.
Additionally, several minor changes to the #include's were required to
accommodate the removal of extraneous header from kmem.h.
But again, while large this patch introduces no functional changes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>