Fix dnode allocation race

When performing concurrent object allocations using the new
multi-threaded allocator and large dnodes it's possible to
allocate overlapping large dnodes.

This case should have been handled by detecting an error
returned by dnode_hold_impl().  But that logic only checked
the returned dnp was not-NULL, and the dnp variable was not
reset to NULL when retrying.  Resolve this issue by properly
checking the return value of dnode_hold_impl().

Additionally, it was possible that dnode_hold_impl() would
misreport a dnode as free when it was in fact in use.  This
could occurs for two reasons:

* The per-slot zrl_lock must be held over the entire critical
  section which includes the alloc/free until the new dnode
  is assigned to children_dnodes.  Additionally, all of the
  zrl_lock's in the range must be held to protect moving
  dnodes.

* The dn->dn_ot_type cannot be solely relied upon to check
  the type.  When allocating a new dnode its type will be
  DMU_OT_NONE after dnode_create().  Only latter when
  dnode_allocate() is called will it transition to the new
  type.  This means there's a window when allocating where
  it can mistaken for a free dnode.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Ned Bass <bass6@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6414 
Closes #6439
This commit is contained in:
Brian Behlendorf
2017-08-08 08:38:53 -07:00
committed by GitHub
parent d19a6d5c80
commit 9631681b75
8 changed files with 136 additions and 68 deletions
+3 -3
View File
@@ -61,6 +61,7 @@ dmu_object_alloc_dnsize(objset_t *os, dmu_object_type_t ot, int blocksize,
boolean_t restarted = B_FALSE;
uint64_t *cpuobj = NULL;
int dnodes_per_chunk = 1 << dmu_object_alloc_chunk_shift;
int error;
kpreempt_disable();
cpuobj = &os->os_obj_next_percpu[CPU_SEQID %
@@ -129,7 +130,6 @@ dmu_object_alloc_dnsize(objset_t *os, dmu_object_type_t ot, int blocksize,
uint64_t offset;
uint64_t blkfill;
int minlvl;
int error;
if (os->os_rescan_dnodes) {
offset = 0;
os->os_rescan_dnodes = B_FALSE;
@@ -163,9 +163,9 @@ dmu_object_alloc_dnsize(objset_t *os, dmu_object_type_t ot, int blocksize,
* dmu_tx_assign(), but there is currently no mechanism
* to do so.
*/
(void) dnode_hold_impl(os, object, DNODE_MUST_BE_FREE,
error = dnode_hold_impl(os, object, DNODE_MUST_BE_FREE,
dn_slots, FTAG, &dn);
if (dn != NULL) {
if (error == 0) {
rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
/*
* Another thread could have allocated it; check