zed: Ensure spare activation after kernel-initiated device removal

In addition to hotplug events, the kernel may also mark a failing vdev
as REMOVED. This was observed in a customer report and reproduced by
forcing the NVMe host driver to disable the device after a failed reset
due to command timeout. In such cases, the spare was not activated
because the device had already transitioned to a REMOVED state before
zed processed the event.
To address this, explicitly attempt hot spare activation when the
kernel marks a device as REMOVED.

Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #17187
This commit is contained in:
Ameer Hamza
2025-03-29 00:48:38 +05:00
committed by GitHub
parent dd2a46b5e6
commit 30cc2331f4
5 changed files with 46 additions and 15 deletions
+1 -1
View File
@@ -4271,7 +4271,7 @@ vdev_remove_wanted(spa_t *spa, uint64_t guid)
return (spa_vdev_state_exit(spa, NULL, SET_ERROR(EEXIST)));
vd->vdev_remove_wanted = B_TRUE;
spa_async_request(spa, SPA_ASYNC_REMOVE);
spa_async_request(spa, SPA_ASYNC_REMOVE_BY_USER);
return (spa_vdev_state_exit(spa, vd, 0));
}