Add TRIM support

UNMAP/TRIM support is a frequently-requested feature to help
prevent performance from degrading on SSDs and on various other
SAN-like storage back-ends.  By issuing UNMAP/TRIM commands for
sectors which are no longer allocated the underlying device can
often more efficiently manage itself.

This TRIM implementation is modeled on the `zpool initialize`
feature which writes a pattern to all unallocated space in the
pool.  The new `zpool trim` command uses the same vdev_xlate()
code to calculate what sectors are unallocated, the same per-
vdev TRIM thread model and locking, and the same basic CLI for
a consistent user experience.  The core difference is that
instead of writing a pattern it will issue UNMAP/TRIM commands
for those extents.

The zio pipeline was updated to accommodate this by adding a new
ZIO_TYPE_TRIM type and associated spa taskq.  This new type makes
is straight forward to add the platform specific TRIM/UNMAP calls
to vdev_disk.c and vdev_file.c.  These new ZIO_TYPE_TRIM zios are
handled largely the same way as ZIO_TYPE_READs or ZIO_TYPE_WRITEs.
This makes it possible to largely avoid changing the pipieline,
one exception is that TRIM zio's may exceed the 16M block size
limit since they contain no data.

In addition to the manual `zpool trim` command, a background
automatic TRIM was added and is controlled by the 'autotrim'
property.  It relies on the exact same infrastructure as the
manual TRIM.  However, instead of relying on the extents in a
metaslab's ms_allocatable range tree, a ms_trim tree is kept
per metaslab.  When 'autotrim=on', ranges added back to the
ms_allocatable tree are also added to the ms_free tree.  The
ms_free tree is then periodically consumed by an autotrim
thread which systematically walks a top level vdev's metaslabs.

Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`.  This
may occur when the freed blocks are small and not enough time
was allowed to aggregate them.  An automatic TRIM and a manual
`zpool trim` may be run concurrently, in which case the automatic
TRIM will yield to the manual TRIM.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Contributions-by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Contributions-by: Tim Chase <tim@chase2k.com>
Contributions-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #8419 
Closes #598
This commit is contained in:
Brian Behlendorf
2019-03-29 09:13:20 -07:00
committed by GitHub
parent f94b3cbf43
commit 1b939560be
91 changed files with 5593 additions and 439 deletions
+29 -1
View File
@@ -145,6 +145,7 @@ struct vdev_queue {
avl_tree_t vq_active_tree;
avl_tree_t vq_read_offset_tree;
avl_tree_t vq_write_offset_tree;
avl_tree_t vq_trim_offset_tree;
uint64_t vq_last_offset;
hrtime_t vq_io_complete_ts; /* time last i/o completed */
hrtime_t vq_io_delta_ts;
@@ -260,6 +261,7 @@ struct vdev {
/* pool checkpoint related */
space_map_t *vdev_checkpoint_sm; /* contains reserved blocks */
/* Initialize related */
boolean_t vdev_initialize_exit_wanted;
vdev_initializing_state_t vdev_initialize_state;
list_node_t vdev_initialize_node;
@@ -274,10 +276,34 @@ struct vdev {
uint64_t vdev_initialize_bytes_done;
time_t vdev_initialize_action_time; /* start and end time */
/* for limiting outstanding I/Os */
/* TRIM related */
boolean_t vdev_trim_exit_wanted;
boolean_t vdev_autotrim_exit_wanted;
vdev_trim_state_t vdev_trim_state;
list_node_t vdev_trim_node;
kmutex_t vdev_autotrim_lock;
kcondvar_t vdev_autotrim_cv;
kthread_t *vdev_autotrim_thread;
/* Protects vdev_trim_thread and vdev_trim_state. */
kmutex_t vdev_trim_lock;
kcondvar_t vdev_trim_cv;
kthread_t *vdev_trim_thread;
uint64_t vdev_trim_offset[TXG_SIZE];
uint64_t vdev_trim_last_offset;
uint64_t vdev_trim_bytes_est;
uint64_t vdev_trim_bytes_done;
uint64_t vdev_trim_rate; /* requested rate (bytes/sec) */
uint64_t vdev_trim_partial; /* requested partial TRIM */
uint64_t vdev_trim_secure; /* requested secure TRIM */
time_t vdev_trim_action_time; /* start and end time */
/* for limiting outstanding I/Os (initialize and TRIM) */
kmutex_t vdev_initialize_io_lock;
kcondvar_t vdev_initialize_io_cv;
uint64_t vdev_initialize_inflight;
kmutex_t vdev_trim_io_lock;
kcondvar_t vdev_trim_io_cv;
uint64_t vdev_trim_inflight[2];
/*
* Values stored in the config for an indirect or removing vdev.
@@ -343,6 +369,8 @@ struct vdev {
uint64_t vdev_not_present; /* not present during import */
uint64_t vdev_unspare; /* unspare when resilvering done */
boolean_t vdev_nowritecache; /* true if flushwritecache failed */
boolean_t vdev_has_trim; /* TRIM is supported */
boolean_t vdev_has_securetrim; /* secure TRIM is supported */
boolean_t vdev_checkremove; /* temporary online test */
boolean_t vdev_forcefault; /* force online fault */
boolean_t vdev_splitting; /* split or repair in progress */