2014-12-08 21:35:51 +03:00
|
|
|
/*
|
2014-12-08 21:04:42 +03:00
|
|
|
* Copyright (C) 2007-2010 Lawrence Livermore National Security, LLC.
|
|
|
|
* Copyright (C) 2007 The Regents of the University of California.
|
|
|
|
* Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
|
|
|
|
* Written by Brian Behlendorf <behlendorf1@llnl.gov>.
|
|
|
|
* UCRL-CODE-235197
|
|
|
|
*
|
|
|
|
* This file is part of the SPL, Solaris Porting Layer.
|
|
|
|
* For details, see <http://zfsonlinux.org/>.
|
|
|
|
*
|
|
|
|
* The SPL is free software; you can redistribute it and/or modify it
|
|
|
|
* under the terms of the GNU General Public License as published by the
|
|
|
|
* Free Software Foundation; either version 2 of the License, or (at your
|
|
|
|
* option) any later version.
|
|
|
|
*
|
|
|
|
* The SPL is distributed in the hope that it will be useful, but WITHOUT
|
|
|
|
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
|
|
|
|
* for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License along
|
|
|
|
* with the SPL. If not, see <http://www.gnu.org/licenses/>.
|
2014-12-08 21:35:51 +03:00
|
|
|
*/
|
2014-12-08 21:04:42 +03:00
|
|
|
|
|
|
|
#ifndef _SPL_KMEM_CACHE_H
|
|
|
|
#define _SPL_KMEM_CACHE_H
|
|
|
|
|
|
|
|
#include <sys/taskq.h>
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Slab allocation interfaces. The SPL slab differs from the standard
|
|
|
|
* Linux SLAB or SLUB primarily in that each cache may be backed by slabs
|
2019-08-30 19:53:15 +03:00
|
|
|
* allocated from the physical or virtual memory address space. The virtual
|
2014-12-08 21:04:42 +03:00
|
|
|
* slabs allow for good behavior when allocation large objects of identical
|
|
|
|
* size. This slab implementation also supports both constructors and
|
2014-12-08 21:35:51 +03:00
|
|
|
* destructors which the Linux slab does not.
|
2014-12-08 21:04:42 +03:00
|
|
|
*/
|
Name anonymous enum of KMC_BIT constants
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name flags total_memory
------------------------ ----------------------- ------------
abd_t KMC_NOMAGAZINE|KMC_SLAB 4.5MB
arc_buf_hdr_t_full KMC_NOMAGAZINE|KMC_SLAB 12.3MB
... <cropped> ...
ddt_cache KMC_VMEM 583.7KB
ddt_entry_cache KMC_NOMAGAZINE|KMC_SLAB 0.0B
... <cropped> ...
zio_buf_1048576 KMC_NODEBUG|KMC_VMEM 0.0B
... <cropped> ...
```
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
2019-10-18 20:25:44 +03:00
|
|
|
typedef enum kmc_bit {
|
2014-12-08 21:04:42 +03:00
|
|
|
KMC_BIT_NOTOUCH = 0, /* Don't update ages */
|
|
|
|
KMC_BIT_NODEBUG = 1, /* Default behavior */
|
|
|
|
KMC_BIT_NOMAGAZINE = 2, /* XXX: Unsupported */
|
|
|
|
KMC_BIT_NOHASH = 3, /* XXX: Unsupported */
|
|
|
|
KMC_BIT_QCACHE = 4, /* XXX: Unsupported */
|
|
|
|
KMC_BIT_KMEM = 5, /* Use kmem cache */
|
|
|
|
KMC_BIT_VMEM = 6, /* Use vmem cache */
|
2019-07-21 20:34:10 +03:00
|
|
|
KMC_BIT_KVMEM = 7, /* Use kvmalloc linux allocator */
|
|
|
|
KMC_BIT_SLAB = 8, /* Use Linux slab cache */
|
|
|
|
KMC_BIT_OFFSLAB = 9, /* Objects not on slab */
|
2014-12-08 21:35:51 +03:00
|
|
|
KMC_BIT_DEADLOCKED = 14, /* Deadlock detected */
|
|
|
|
KMC_BIT_GROWING = 15, /* Growing in progress */
|
2014-12-08 21:04:42 +03:00
|
|
|
KMC_BIT_REAPING = 16, /* Reaping in progress */
|
|
|
|
KMC_BIT_DESTROY = 17, /* Destroy in progress */
|
|
|
|
KMC_BIT_TOTAL = 18, /* Proc handler helper bit */
|
|
|
|
KMC_BIT_ALLOC = 19, /* Proc handler helper bit */
|
|
|
|
KMC_BIT_MAX = 20, /* Proc handler helper bit */
|
Name anonymous enum of KMC_BIT constants
Giving a name to this enum makes it discoverable from
debugging tools like DRGN and SDB. For example, with
the name proposed on this patch we can iterate over
these values in DRGN:
```
>>> prog.type('enum kmc_bit').enumerators
(('KMC_BIT_NOTOUCH', 0), ('KMC_BIT_NODEBUG', 1),
('KMC_BIT_NOMAGAZINE', 2), ('KMC_BIT_NOHASH', 3),
('KMC_BIT_QCACHE', 4), ('KMC_BIT_KMEM', 5),
('KMC_BIT_VMEM', 6), ('KMC_BIT_SLAB', 7),
...
```
This enables SDB to easily pretty-print the flags of
the spl_kmem_caches in the system like this:
```
> spl_kmem_caches -o "name,flags,total_memory"
name flags total_memory
------------------------ ----------------------- ------------
abd_t KMC_NOMAGAZINE|KMC_SLAB 4.5MB
arc_buf_hdr_t_full KMC_NOMAGAZINE|KMC_SLAB 12.3MB
... <cropped> ...
ddt_cache KMC_VMEM 583.7KB
ddt_entry_cache KMC_NOMAGAZINE|KMC_SLAB 0.0B
... <cropped> ...
zio_buf_1048576 KMC_NODEBUG|KMC_VMEM 0.0B
... <cropped> ...
```
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #9478
2019-10-18 20:25:44 +03:00
|
|
|
} kmc_bit_t;
|
2014-12-08 21:04:42 +03:00
|
|
|
|
|
|
|
/* kmem move callback return values */
|
|
|
|
typedef enum kmem_cbrc {
|
|
|
|
KMEM_CBRC_YES = 0, /* Object moved */
|
|
|
|
KMEM_CBRC_NO = 1, /* Object not moved */
|
|
|
|
KMEM_CBRC_LATER = 2, /* Object not moved, try again later */
|
|
|
|
KMEM_CBRC_DONT_NEED = 3, /* Neither object is needed */
|
|
|
|
KMEM_CBRC_DONT_KNOW = 4, /* Object unknown */
|
|
|
|
} kmem_cbrc_t;
|
|
|
|
|
2014-12-08 21:35:51 +03:00
|
|
|
#define KMC_NOTOUCH (1 << KMC_BIT_NOTOUCH)
|
|
|
|
#define KMC_NODEBUG (1 << KMC_BIT_NODEBUG)
|
|
|
|
#define KMC_NOMAGAZINE (1 << KMC_BIT_NOMAGAZINE)
|
|
|
|
#define KMC_NOHASH (1 << KMC_BIT_NOHASH)
|
|
|
|
#define KMC_QCACHE (1 << KMC_BIT_QCACHE)
|
|
|
|
#define KMC_KMEM (1 << KMC_BIT_KMEM)
|
|
|
|
#define KMC_VMEM (1 << KMC_BIT_VMEM)
|
2019-07-21 20:34:10 +03:00
|
|
|
#define KMC_KVMEM (1 << KMC_BIT_KVMEM)
|
2014-12-08 21:35:51 +03:00
|
|
|
#define KMC_SLAB (1 << KMC_BIT_SLAB)
|
|
|
|
#define KMC_OFFSLAB (1 << KMC_BIT_OFFSLAB)
|
|
|
|
#define KMC_DEADLOCKED (1 << KMC_BIT_DEADLOCKED)
|
|
|
|
#define KMC_GROWING (1 << KMC_BIT_GROWING)
|
|
|
|
#define KMC_REAPING (1 << KMC_BIT_REAPING)
|
|
|
|
#define KMC_DESTROY (1 << KMC_BIT_DESTROY)
|
|
|
|
#define KMC_TOTAL (1 << KMC_BIT_TOTAL)
|
|
|
|
#define KMC_ALLOC (1 << KMC_BIT_ALLOC)
|
|
|
|
#define KMC_MAX (1 << KMC_BIT_MAX)
|
|
|
|
|
|
|
|
#define KMC_REAP_CHUNK INT_MAX
|
|
|
|
#define KMC_DEFAULT_SEEKS 1
|
|
|
|
|
2014-12-08 21:04:42 +03:00
|
|
|
#define KMC_RECLAIM_ONCE 0x1 /* Force a single shrinker pass */
|
|
|
|
|
|
|
|
extern struct list_head spl_kmem_cache_list;
|
|
|
|
extern struct rw_semaphore spl_kmem_cache_sem;
|
|
|
|
|
2014-12-08 21:35:51 +03:00
|
|
|
#define SKM_MAGIC 0x2e2e2e2e
|
|
|
|
#define SKO_MAGIC 0x20202020
|
|
|
|
#define SKS_MAGIC 0x22222222
|
|
|
|
#define SKC_MAGIC 0x2c2c2c2c
|
2014-12-08 21:04:42 +03:00
|
|
|
|
Refine slab cache sizing
This change is designed to improve the memory utilization of
slabs by more carefully setting their size. The way the code
currently works is problematic for slabs which contain large
objects (>1MB). This is due to slabs being unconditionally
rounded up to a power of two which may result in unused space
at the end of the slab.
The reason the existing code rounds up every slab is because it
assumes it will backed by the buddy allocator. Since the buddy
allocator can only performs power of two allocations this is
desirable because it avoids wasting any space. However, this
logic breaks down if slab is backed by vmalloc() which operates
at a page level granularity. In this case, the optimal thing to
do is calculate the minimum required slab size given certain
constraints (object size, alignment, objects/slab, etc).
Therefore, this patch reworks the spl_slab_size() function so
that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs.
KMC_KMEM slabs are rounded up to the nearest power of two, and
KMC_VMEM slabs are allowed to be the minimum required size.
This change also reduces the default number of objects per slab.
This reduces how much memory a single cache object can pin, which
can result in significant memory saving for highly fragmented
caches. But depending on the workload it may result in slabs
being allocated and freed more frequently. In practice, this
has been shown to be a better default for most workloads.
Also the maximum slab size has been reduced to 4MB on 32-bit
systems. Due to the limited virtual address space it's critical
the we be as frugal as possible. A limit of 4M still lets us
reasonably comfortably allocate a limited number of 1MB objects.
Finally, the kmem:slab_small and kmem:slab_large SPLAT tests
were extended to provide better test coverage of various object
sizes and alignments. Caches are created with random parameters
and their basic functionality is verified by allocating several
slabs worth of objects.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-16 01:06:18 +03:00
|
|
|
#define SPL_KMEM_CACHE_OBJ_PER_SLAB 8 /* Target objects per slab */
|
2014-12-08 21:35:51 +03:00
|
|
|
#define SPL_KMEM_CACHE_ALIGN 8 /* Default object alignment */
|
Refine slab cache sizing
This change is designed to improve the memory utilization of
slabs by more carefully setting their size. The way the code
currently works is problematic for slabs which contain large
objects (>1MB). This is due to slabs being unconditionally
rounded up to a power of two which may result in unused space
at the end of the slab.
The reason the existing code rounds up every slab is because it
assumes it will backed by the buddy allocator. Since the buddy
allocator can only performs power of two allocations this is
desirable because it avoids wasting any space. However, this
logic breaks down if slab is backed by vmalloc() which operates
at a page level granularity. In this case, the optimal thing to
do is calculate the minimum required slab size given certain
constraints (object size, alignment, objects/slab, etc).
Therefore, this patch reworks the spl_slab_size() function so
that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs.
KMC_KMEM slabs are rounded up to the nearest power of two, and
KMC_VMEM slabs are allowed to be the minimum required size.
This change also reduces the default number of objects per slab.
This reduces how much memory a single cache object can pin, which
can result in significant memory saving for highly fragmented
caches. But depending on the workload it may result in slabs
being allocated and freed more frequently. In practice, this
has been shown to be a better default for most workloads.
Also the maximum slab size has been reduced to 4MB on 32-bit
systems. Due to the limited virtual address space it's critical
the we be as frugal as possible. A limit of 4M still lets us
reasonably comfortably allocate a limited number of 1MB objects.
Finally, the kmem:slab_small and kmem:slab_large SPLAT tests
were extended to provide better test coverage of various object
sizes and alignments. Caches are created with random parameters
and their basic functionality is verified by allocating several
slabs worth of objects.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-16 01:06:18 +03:00
|
|
|
#ifdef _LP64
|
|
|
|
#define SPL_KMEM_CACHE_MAX_SIZE 32 /* Max slab size in MB */
|
|
|
|
#else
|
|
|
|
#define SPL_KMEM_CACHE_MAX_SIZE 4 /* Max slab size in MB */
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#define SPL_MAX_ORDER (MAX_ORDER - 3)
|
|
|
|
#define SPL_MAX_ORDER_NR_PAGES (1 << (SPL_MAX_ORDER - 1))
|
|
|
|
|
|
|
|
#ifdef CONFIG_SLUB
|
|
|
|
#define SPL_MAX_KMEM_CACHE_ORDER PAGE_ALLOC_COSTLY_ORDER
|
|
|
|
#define SPL_MAX_KMEM_ORDER_NR_PAGES (1 << (SPL_MAX_KMEM_CACHE_ORDER - 1))
|
|
|
|
#else
|
|
|
|
#define SPL_MAX_KMEM_ORDER_NR_PAGES (KMALLOC_MAX_SIZE >> PAGE_SHIFT)
|
|
|
|
#endif
|
2014-12-08 21:04:42 +03:00
|
|
|
|
2014-12-08 21:35:51 +03:00
|
|
|
#define POINTER_IS_VALID(p) 0 /* Unimplemented */
|
|
|
|
#define POINTER_INVALIDATE(pp) /* Unimplemented */
|
2014-12-08 21:04:42 +03:00
|
|
|
|
|
|
|
typedef int (*spl_kmem_ctor_t)(void *, void *, int);
|
|
|
|
typedef void (*spl_kmem_dtor_t)(void *, void *);
|
|
|
|
|
|
|
|
typedef struct spl_kmem_magazine {
|
|
|
|
uint32_t skm_magic; /* Sanity magic */
|
|
|
|
uint32_t skm_avail; /* Available objects */
|
|
|
|
uint32_t skm_size; /* Magazine size */
|
|
|
|
uint32_t skm_refill; /* Batch refill size */
|
|
|
|
struct spl_kmem_cache *skm_cache; /* Owned by cache */
|
|
|
|
unsigned int skm_cpu; /* Owned by cpu */
|
|
|
|
void *skm_objs[0]; /* Object pointers */
|
|
|
|
} spl_kmem_magazine_t;
|
|
|
|
|
|
|
|
typedef struct spl_kmem_obj {
|
2014-12-08 21:35:51 +03:00
|
|
|
uint32_t sko_magic; /* Sanity magic */
|
2014-12-08 21:04:42 +03:00
|
|
|
void *sko_addr; /* Buffer address */
|
|
|
|
struct spl_kmem_slab *sko_slab; /* Owned by slab */
|
|
|
|
struct list_head sko_list; /* Free object list linkage */
|
|
|
|
} spl_kmem_obj_t;
|
|
|
|
|
|
|
|
typedef struct spl_kmem_slab {
|
2014-12-08 21:35:51 +03:00
|
|
|
uint32_t sks_magic; /* Sanity magic */
|
2014-12-08 21:04:42 +03:00
|
|
|
uint32_t sks_objs; /* Objects per slab */
|
|
|
|
struct spl_kmem_cache *sks_cache; /* Owned by cache */
|
|
|
|
struct list_head sks_list; /* Slab list linkage */
|
|
|
|
struct list_head sks_free_list; /* Free object list */
|
|
|
|
unsigned long sks_age; /* Last modify jiffie */
|
|
|
|
uint32_t sks_ref; /* Ref count used objects */
|
|
|
|
} spl_kmem_slab_t;
|
|
|
|
|
|
|
|
typedef struct spl_kmem_alloc {
|
|
|
|
struct spl_kmem_cache *ska_cache; /* Owned by cache */
|
|
|
|
int ska_flags; /* Allocation flags */
|
|
|
|
taskq_ent_t ska_tqe; /* Task queue entry */
|
|
|
|
} spl_kmem_alloc_t;
|
|
|
|
|
|
|
|
typedef struct spl_kmem_emergency {
|
|
|
|
struct rb_node ske_node; /* Emergency tree linkage */
|
2015-01-16 02:11:45 +03:00
|
|
|
unsigned long ske_obj; /* Buffer address */
|
2014-12-08 21:04:42 +03:00
|
|
|
} spl_kmem_emergency_t;
|
|
|
|
|
|
|
|
typedef struct spl_kmem_cache {
|
|
|
|
uint32_t skc_magic; /* Sanity magic */
|
|
|
|
uint32_t skc_name_size; /* Name length */
|
|
|
|
char *skc_name; /* Name string */
|
2015-10-12 22:31:05 +03:00
|
|
|
spl_kmem_magazine_t **skc_mag; /* Per-CPU warm cache */
|
2014-12-08 21:04:42 +03:00
|
|
|
uint32_t skc_mag_size; /* Magazine size */
|
|
|
|
uint32_t skc_mag_refill; /* Magazine refill count */
|
|
|
|
spl_kmem_ctor_t skc_ctor; /* Constructor */
|
|
|
|
spl_kmem_dtor_t skc_dtor; /* Destructor */
|
|
|
|
void *skc_private; /* Private data */
|
|
|
|
void *skc_vmp; /* Unused */
|
|
|
|
struct kmem_cache *skc_linux_cache; /* Linux slab cache if used */
|
|
|
|
unsigned long skc_flags; /* Flags */
|
|
|
|
uint32_t skc_obj_size; /* Object size */
|
|
|
|
uint32_t skc_obj_align; /* Object alignment */
|
|
|
|
uint32_t skc_slab_objs; /* Objects per slab */
|
|
|
|
uint32_t skc_slab_size; /* Slab size */
|
|
|
|
atomic_t skc_ref; /* Ref count callers */
|
|
|
|
taskqid_t skc_taskqid; /* Slab reclaim task */
|
|
|
|
struct list_head skc_list; /* List of caches linkage */
|
2014-12-08 21:35:51 +03:00
|
|
|
struct list_head skc_complete_list; /* Completely alloc'ed */
|
|
|
|
struct list_head skc_partial_list; /* Partially alloc'ed */
|
2014-12-08 21:04:42 +03:00
|
|
|
struct rb_root skc_emergency_tree; /* Min sized objects */
|
|
|
|
spinlock_t skc_lock; /* Cache lock */
|
2017-07-24 05:32:14 +03:00
|
|
|
spl_wait_queue_head_t skc_waitq; /* Allocation waiters */
|
2014-12-08 21:04:42 +03:00
|
|
|
uint64_t skc_slab_fail; /* Slab alloc failures */
|
2014-12-08 21:35:51 +03:00
|
|
|
uint64_t skc_slab_create; /* Slab creates */
|
|
|
|
uint64_t skc_slab_destroy; /* Slab destroys */
|
2014-12-08 21:04:42 +03:00
|
|
|
uint64_t skc_slab_total; /* Slab total current */
|
|
|
|
uint64_t skc_slab_alloc; /* Slab alloc current */
|
|
|
|
uint64_t skc_slab_max; /* Slab max historic */
|
|
|
|
uint64_t skc_obj_total; /* Obj total current */
|
|
|
|
uint64_t skc_obj_alloc; /* Obj alloc current */
|
2020-06-27 04:06:50 +03:00
|
|
|
struct percpu_counter skc_linux_alloc; /* Linux-backed Obj alloc */
|
2014-12-08 21:04:42 +03:00
|
|
|
uint64_t skc_obj_max; /* Obj max historic */
|
|
|
|
uint64_t skc_obj_deadlock; /* Obj emergency deadlocks */
|
|
|
|
uint64_t skc_obj_emergency; /* Obj emergency current */
|
|
|
|
uint64_t skc_obj_emergency_max; /* Obj emergency max */
|
|
|
|
} spl_kmem_cache_t;
|
2014-12-08 21:35:51 +03:00
|
|
|
#define kmem_cache_t spl_kmem_cache_t
|
2014-12-08 21:04:42 +03:00
|
|
|
|
|
|
|
extern spl_kmem_cache_t *spl_kmem_cache_create(char *name, size_t size,
|
2014-12-08 21:35:51 +03:00
|
|
|
size_t align, spl_kmem_ctor_t ctor, spl_kmem_dtor_t dtor,
|
Remove skc_reclaim, hdr_recl, kmem_cache shrinker
The SPL kmem_cache implementation provides a mechanism, `skc_reclaim`,
whereby individual caches can register a callback to be invoked when
there is memory pressure. This mechanism is used in only one place: the
ARC registers the `hdr_recl()` reclaim function. This function wakes up
the `arc_reap_zthr`, whose job is to call `kmem_cache_reap()` and
`arc_reduce_target_size()`.
The `skc_reclaim` callbacks are invoked only by shrinker callbacks and
`arc_reap_zthr`, and only callback only wakes up `arc_reap_zthr`. When
called from `arc_reap_zthr`, waking `arc_reap_zthr` is a no-op. When
called from shrinker callbacks, we are already aware of memory pressure
and responding to it. Therefore there is little benefit to ever calling
the `hdr_recl()` `skc_reclaim` callback.
The `arc_reap_zthr` also wakes once a second, and if memory is low when
allocating an ARC buffer. Therefore, additionally waking it from the
shrinker calbacks has little benefit.
The shrinker callbacks can be invoked very frequently, e.g. 10,000 times
per second. Additionally, for invocation of the shrinker callback,
skc_reclaim is invoked many times. Therefore, this mechanism consumes
significant amounts of CPU time.
The kmem_cache shrinker calls `spl_kmem_cache_reap_now()`, which,
in addition to invoking `skc_reclaim()`, does two things to attempt to
free pages for use by the system:
1. Return free objects from the magazine layer to the slab layer
2. Return entirely-free slabs to the page layer (i.e. free pages)
These actions apply only to caches implemented by the SPL, not those
that use the underlying kernel SLAB/SLUB caches. The SPL caches are
used for objects >=32KB, which are primarily linear ABD's cached in the
DBUF cache.
These actions (freeing objects from the magazine layer and returning
entirely-free slabs) are also taken whenever a `kmem_cache_free()` call
finds a full magazine. So there would typically be zero entirely-free
slabs, and the number of objects in magazines is limited (typically no
more than 64 objects per magazine, and there's one magazine per CPU).
Therefore the benefit of `spl_kmem_cache_reap_now()`, while nonzero, is
modest.
We also call `spl_kmem_cache_reap_now()` from the `arc_reap_zthr`, when
memory pressure is detected. Therefore, calling
`spl_kmem_cache_reap_now()` from the kmem_cache shrinker is not needed.
This commit removes the `skc_reclaim` mechanism, its only callback
`hdr_recl()`, and the kmem_cache shrinker callback.
Reviewed-By: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10576
2020-07-19 19:58:30 +03:00
|
|
|
void *reclaim, void *priv, void *vmp, int flags);
|
2014-12-08 21:04:42 +03:00
|
|
|
extern void spl_kmem_cache_set_move(spl_kmem_cache_t *,
|
2014-12-08 21:35:51 +03:00
|
|
|
kmem_cbrc_t (*)(void *, void *, size_t, void *));
|
2014-12-08 21:04:42 +03:00
|
|
|
extern void spl_kmem_cache_destroy(spl_kmem_cache_t *skc);
|
|
|
|
extern void *spl_kmem_cache_alloc(spl_kmem_cache_t *skc, int flags);
|
|
|
|
extern void spl_kmem_cache_free(spl_kmem_cache_t *skc, void *obj);
|
Refactor generic memory allocation interfaces
This patch achieves the following goals:
1. It replaces the preprocessor kmem flag to gfp flag mapping with
proper translation logic. This eliminates the potential for
surprises that were previously possible where kmem flags were
mapped to gfp flags.
2. It maps vmem_alloc() allocations to kmem_alloc() for allocations
sized less than or equal to the newly-added spl_kmem_alloc_max
parameter. This ensures that small allocations will not contend
on a single global lock, large allocations can still be handled,
and potentially limited virtual address space will not be squandered.
This behavior is entirely different than under Illumos due to
different memory management strategies employed by the respective
kernels. However, this functionally provides the semantics required.
3. The --disable-debug-kmem, --enable-debug-kmem (default), and
--enable-debug-kmem-tracking allocators have been unified in to
a single spl_kmem_alloc_impl() allocation function. This was
done to simplify the code and make it more maintainable.
4. Improve portability by exposing an implementation of the memory
allocations functions that can be safely used in the same way
they are used on Illumos. Specifically, callers may safely
use KM_SLEEP in contexts which perform filesystem IO. This
allows us to eliminate an entire class of Linux specific changes
which were previously required to avoid deadlocking the system.
This change will be largely transparent to existing callers but there
are a few caveats:
1. Because the headers were refactored and extraneous includes removed
callers may find they need to explicitly add additional #includes.
In particular, kmem_cache.h must now be explicitly includes to
access the SPL's kmem cache implementation. This behavior is
different from Illumos but it was done to avoid always masking
the Linux slab functions when kmem.h is included.
2. Callers, like Lustre, which made assumptions about the definitions
of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated.
Other callers such as ZFS which did not will not require changes.
3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains
its original meaning of allowing allocations to access reserved
memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP.
4. The KM_NODEBUG flags has been retired and the default warning
threshold increased to 32k.
5. The kmem_virt() functions has been removed. For callers which
need to distinguish between a physical and virtual address use
is_vmalloc_addr().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2014-12-08 23:37:14 +03:00
|
|
|
extern void spl_kmem_cache_set_allocflags(spl_kmem_cache_t *skc, gfp_t flags);
|
Clean up OS-specific ARC and kmem code
OS-specific code (e.g. under `module/os/linux`) does not need to share
its code structure with any other operating systems. In particular, the
ARC and kmem code need not be similar to the code in illumos, because we
won't be syncing this OS-specific code between operating systems. For
example, if/when illumos support is added to the common repo, we would
add a file `module/os/illumos/zfs/arc_os.c` for the illumos versions of
this code.
Therefore, we can simplify the code in the OS-specific ARC and kmem
routines.
These changes do not impact system behavior, they are purely code
cleanup. The changes are:
Arenas are not used on Linux or FreeBSD (they are always `NULL`), so
`heap_arena`, `zio_arena`, and `zio_alloc_arena` can be removed, along
with code that uses them.
In `arc_available_memory()`:
* `desfree` is unused, remove it
* rename `freemem` to avoid conflict with pre-existing `#define`
* remove checks related to arenas
* use units of bytes, rather than converting from bytes to pages and
then back to bytes
`SPL_KMEM_CACHE_REAP` is unused, remove it.
`skc_reap` is unused, remove it.
The `count` argument to `spl_kmem_cache_reap_now()` is unused, remove
it.
`vmem_size()` and associated type and macros are unused, remove them.
In `arc_memory_throttle()`, use a less confusing variable name to store
the result of `arc_free_memory()`.
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10499
2020-06-29 19:01:07 +03:00
|
|
|
extern void spl_kmem_cache_reap_now(spl_kmem_cache_t *skc);
|
2014-12-08 21:04:42 +03:00
|
|
|
extern void spl_kmem_reap(void);
|
2019-10-11 01:45:52 +03:00
|
|
|
extern uint64_t spl_kmem_cache_inuse(kmem_cache_t *cache);
|
|
|
|
extern uint64_t spl_kmem_cache_entry_size(kmem_cache_t *cache);
|
2014-12-08 21:04:42 +03:00
|
|
|
|
2014-12-08 21:35:51 +03:00
|
|
|
#define kmem_cache_create(name, size, align, ctor, dtor, rclm, priv, vmp, fl) \
|
|
|
|
spl_kmem_cache_create(name, size, align, ctor, dtor, rclm, priv, vmp, fl)
|
|
|
|
#define kmem_cache_set_move(skc, move) spl_kmem_cache_set_move(skc, move)
|
|
|
|
#define kmem_cache_destroy(skc) spl_kmem_cache_destroy(skc)
|
|
|
|
#define kmem_cache_alloc(skc, flags) spl_kmem_cache_alloc(skc, flags)
|
|
|
|
#define kmem_cache_free(skc, obj) spl_kmem_cache_free(skc, obj)
|
Clean up OS-specific ARC and kmem code
OS-specific code (e.g. under `module/os/linux`) does not need to share
its code structure with any other operating systems. In particular, the
ARC and kmem code need not be similar to the code in illumos, because we
won't be syncing this OS-specific code between operating systems. For
example, if/when illumos support is added to the common repo, we would
add a file `module/os/illumos/zfs/arc_os.c` for the illumos versions of
this code.
Therefore, we can simplify the code in the OS-specific ARC and kmem
routines.
These changes do not impact system behavior, they are purely code
cleanup. The changes are:
Arenas are not used on Linux or FreeBSD (they are always `NULL`), so
`heap_arena`, `zio_arena`, and `zio_alloc_arena` can be removed, along
with code that uses them.
In `arc_available_memory()`:
* `desfree` is unused, remove it
* rename `freemem` to avoid conflict with pre-existing `#define`
* remove checks related to arenas
* use units of bytes, rather than converting from bytes to pages and
then back to bytes
`SPL_KMEM_CACHE_REAP` is unused, remove it.
`skc_reap` is unused, remove it.
The `count` argument to `spl_kmem_cache_reap_now()` is unused, remove
it.
`vmem_size()` and associated type and macros are unused, remove them.
In `arc_memory_throttle()`, use a less confusing variable name to store
the result of `arc_free_memory()`.
Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10499
2020-06-29 19:01:07 +03:00
|
|
|
#define kmem_cache_reap_now(skc) spl_kmem_cache_reap_now(skc)
|
2014-12-08 21:35:51 +03:00
|
|
|
#define kmem_reap() spl_kmem_reap()
|
2014-12-08 21:04:42 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The following functions are only available for internal use.
|
|
|
|
*/
|
|
|
|
extern int spl_kmem_cache_init(void);
|
|
|
|
extern void spl_kmem_cache_fini(void);
|
|
|
|
|
|
|
|
#endif /* _SPL_KMEM_CACHE_H */
|