Fix consistency of ztest_device_removal_active

ztest currently uses the boolean flag ztest_device_removal_active
to protect some tests that may not run successfully if they occur
at the same time as ztest_device_removal(). Unfortunately, in the
event that ztest is in the middle of a device removal when it
decides to issue a SIGKILL, the device removal will be
automatically restarted (without setting the flag) when the pool
is re-imported on the next run. This patch corrects this by
ensuring that any in-progress removals are completed before running
further tests after the re-import.

This patch also makes a few small changes to prevent race conditions
involving the creation and destruction of spa->spa_vdev_removal,
since this field is not protected by any locks. Some checks that
may run concurrently with setting / unsetting this field have been
updated to check spa->spa_removing_phys.sr_state instead. The most
significant change here is that spa_removal_get_stats() no longer
accounts for in-flight work done, since that could result in a NULL
pointer dereference.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #8105
This commit is contained in:
Tom Caputi
2018-11-28 23:47:09 -05:00
committed by Brian Behlendorf
parent c71c8c715b
commit c40a1124e1
3 changed files with 23 additions and 10 deletions
+1 -1
View File
@@ -462,7 +462,7 @@ spa_checkpoint_check(void *arg, dmu_tx_t *tx)
if (!spa_top_vdevs_spacemap_addressable(spa))
return (SET_ERROR(ZFS_ERR_VDEV_TOO_BIG));
if (spa->spa_vdev_removal != NULL)
if (spa->spa_removing_phys.sr_state == DSS_SCANNING)
return (SET_ERROR(ZFS_ERR_DEVRM_IN_PROGRESS));
if (spa->spa_checkpoint_txg != 0)