Fix zio_execute() deadlock

To avoid deadlocking the system it is crucial that all memory
allocations performed in the zio_execute() call path are marked
KM_PUSHPAGE (GFP_NOFS).  This ensures that while a z_wr_iss
thread is processing the syncing transaction group it does
not re-enter the filesystem code and deadlock on itself.

Call Trace:
 [<ffffffffa02580e8>] cv_wait_common+0x78/0xe0 [spl]
 [<ffffffffa0347bab>] txg_wait_open+0x7b/0xa0 [zfs]
 [<ffffffffa030e73d>] dmu_tx_wait+0xed/0xf0 [zfs]
 [<ffffffffa0376a49>] zfs_putpage+0x219/0x360 [zfs]
 [<ffffffffa038d75e>] zpl_putpage+0x1e/0x60 [zfs]
 [<ffffffffa038d7b2>] zpl_writepage+0x12/0x20 [zfs]
 [<ffffffff8115f907>] writeout+0xa7/0xd0
 [<ffffffff8115fa6b>] move_to_new_page+0x13b/0x170
 [<ffffffff8115fed4>] migrate_pages+0x434/0x4c0
 [<ffffffff811559ab>] compact_zone+0x4fb/0x780
 [<ffffffff81155ed1>] compact_zone_order+0xa1/0xe0
 [<ffffffff8115602c>] try_to_compact_pages+0x11c/0x190
 [<ffffffff811200bb>] __alloc_pages_nodemask+0x5eb/0x8b0
 [<ffffffff81159932>] kmem_getpages+0x62/0x170
 [<ffffffff8115a54a>] fallback_alloc+0x1ba/0x270
 [<ffffffff8115a2c9>] ____cache_alloc_node+0x99/0x160
 [<ffffffff8115b059>] __kmalloc+0x189/0x220
 [<ffffffffa02539fb>] kmem_alloc_debug+0xeb/0x130 [spl]
 [<ffffffffa031454a>] dnode_hold_impl+0x46a/0x550 [zfs]
 [<ffffffffa0314649>] dnode_hold+0x19/0x20 [zfs]
 [<ffffffffa03042e3>] dmu_read+0x33/0x180 [zfs]
 [<ffffffffa034729d>] space_map_load+0xfd/0x320 [zfs]
 [<ffffffffa03300bc>] metaslab_activate+0x10c/0x170 [zfs]
 [<ffffffffa0330ad9>] metaslab_alloc+0x469/0x800 [zfs]
 [<ffffffffa038963c>] zio_dva_allocate+0x6c/0x2f0 [zfs]
 [<ffffffffa038a249>] zio_execute+0x99/0xf0 [zfs]
 [<ffffffffa0254b1c>] taskq_thread+0x1cc/0x330 [spl]
 [<ffffffff8108ddf6>] kthread+0x96/0xa0

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #291
This commit is contained in:
Brian Behlendorf 2011-07-19 09:31:23 -07:00
parent a140dc5469
commit abd39a8289

View File

@ -1074,7 +1074,8 @@ dnode_hold_impl(objset_t *os, uint64_t object, int flag,
int i;
dnode_children_t *winner;
children_dnodes = kmem_alloc(sizeof (dnode_children_t) +
(epb - 1) * sizeof (dnode_handle_t), KM_SLEEP | KM_NODEBUG);
(epb - 1) * sizeof (dnode_handle_t),
KM_PUSHPAGE | KM_NODEBUG);
children_dnodes->dnc_count = epb;
dnh = &children_dnodes->dnc_children[0];
for (i = 0; i < epb; i++) {