Improved dnode allocation and dmu_hold_impl() (#6611)

Refactor dmu_object_alloc_dnsize() and dnode_hold_impl() to simplify the
code, fix errors introduced by commit dbeb879 (PR #6117) interacting
badly with large dnodes, and improve performance.

* When allocating a new dnode in dmu_object_alloc_dnsize(), update the
percpu object ID for the core's metadnode chunk immediately.  This
eliminates most lock contention when taking the hold and creating the
dnode.

* Correct detection of the chunk boundary to work properly with large
dnodes.

* Separate the dmu_hold_impl() code for the FREE case from the code for
the ALLOCATED case to make it easier to read.

* Fully populate the dnode handle array immediately after reading a
block of the metadnode from disk.  Subsequently the dnode handle array
provides enough information to determine which dnode slots are in use
and which are free.

* Add several kstats to allow the behavior of the code to be examined.

* Verify dnode packing in large_dnode_008_pos.ksh.  Since the test is
purely creates, it should leave very few holes in the metadnode.

* Add test large_dnode_009_pos.ksh, which performs concurrent creates
and deletes, to complement existing test which does only creates.

With the above fixes, there is very little contention in a test of about
200,000 racing dnode allocations produced by tests 'large_dnode_008_pos'
and 'large_dnode_009_pos'.

name                            type data
dnode_hold_dbuf_hold            4    0
dnode_hold_dbuf_read            4    0
dnode_hold_alloc_hits           4    3804690
dnode_hold_alloc_misses         4    216
dnode_hold_alloc_interior       4    3
dnode_hold_alloc_lock_retry     4    0
dnode_hold_alloc_lock_misses    4    0
dnode_hold_alloc_type_none      4    0
dnode_hold_free_hits            4    203105
dnode_hold_free_misses          4    4
dnode_hold_free_lock_misses     4    0
dnode_hold_free_lock_retry      4    0
dnode_hold_free_overflow        4    0
dnode_hold_free_refcount        4    57
dnode_hold_free_txg             4    0
dnode_allocate                  4    203154
dnode_reallocate                4    0
dnode_buf_evict                 4    23918
dnode_alloc_next_chunk          4    4887
dnode_alloc_race                4    0
dnode_alloc_next_block          4    18

The performance is slightly improved for concurrent creates with
16+ threads, and unchanged for low thread counts.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
This commit is contained in:
Giuseppe Di Natale
2017-09-13 15:46:15 -07:00
committed by Tony Hutter
parent 89950722c6
commit 45d1abc74d
9 changed files with 616 additions and 258 deletions
+1 -1
View File
@@ -365,7 +365,7 @@ tests = ['async_destroy_001_pos']
[tests/functional/features/large_dnode]
tests = ['large_dnode_001_pos', 'large_dnode_002_pos', 'large_dnode_003_pos',
'large_dnode_004_neg', 'large_dnode_005_pos', 'large_dnode_006_pos',
'large_dnode_007_neg', 'large_dnode_008_pos']
'large_dnode_007_neg', 'large_dnode_008_pos', 'large_dnode_009_pos']
[tests/functional/grow_pool]
tests = ['grow_pool_001_pos']
@@ -9,4 +9,5 @@ dist_pkgdata_SCRIPTS = \
large_dnode_005_pos.ksh \
large_dnode_006_pos.ksh \
large_dnode_007_neg.ksh \
large_dnode_008_pos.ksh
large_dnode_008_pos.ksh \
large_dnode_009_pos.ksh
@@ -42,6 +42,21 @@ function cleanup
datasetexists $TEST_FS && log_must zfs destroy $TEST_FS
}
function verify_dnode_packing
{
zdb -dd $TEST_FS | grep -A 3 'Dnode slots' | awk '
/Total used:/ {total_used=$NF}
/Max used:/ {max_used=$NF}
/Percent empty:/ {print total_used, max_used, int($NF)}
' | while read total_used max_used pct_empty
do
log_note "total_used $total_used max_used $max_used pct_empty $pct_empty"
if [ $pct_empty -gt 5 ]; then
log_fail "Holes in dnode array: pct empty $pct_empty > 5"
fi
done
}
log_onexit cleanup
log_assert "xattrtest runs concurrently on dataset with large dnodes"
@@ -52,9 +67,11 @@ log_must zfs set xattr=sa $TEST_FS
for ((i=0; i < 100; i++)); do
dir="/$TEST_FS/dir.$i"
log_must mkdir "$dir"
log_must eval "xattrtest -R -r -y -x 1 -f 1024 -k -p $dir &"
log_must eval "xattrtest -R -r -y -x 1 -f 1024 -k -p $dir >/dev/null 2>&1 &"
done
log_must wait
verify_dnode_packing
log_pass
@@ -0,0 +1,71 @@
#!/bin/ksh -p
#
# CDDL HEADER START
#
# The contents of this file are subject to the terms of the
# Common Development and Distribution License (the "License").
# You may not use this file except in compliance with the License.
#
# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
# or http://www.opensolaris.org/os/licensing.
# See the License for the specific language governing permissions
# and limitations under the License.
#
# When distributing Covered Code, include this CDDL HEADER in each
# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
# If applicable, add the following below this CDDL HEADER, with the
# fields enclosed by brackets "[]" replaced with your own identifying
# information: Portions Copyright [yyyy] [name of copyright owner]
#
# CDDL HEADER END
#
#
# Copyright (c) 2017 by Lawrence Livermore National Security, LLC.
# Use is subject to license terms.
#
. $STF_SUITE/include/libtest.shlib
#
# DESCRIPTION:
# Run many xattrtests on a dataset with large dnodes and xattr=sa to
# stress concurrent allocation of large dnodes.
#
TEST_FS=$TESTPOOL/large_dnode
verify_runnable "both"
function cleanup
{
datasetexists $TEST_FS && log_must zfs destroy $TEST_FS
}
log_onexit cleanup
log_assert "xattrtest runs concurrently on dataset with large dnodes"
log_must zfs create $TEST_FS
log_must zfs set dnsize=auto $TEST_FS
log_must zfs set xattr=sa $TEST_FS
for ((i=0; i < 100; i++)); do
dir="/$TEST_FS/dir.$i"
log_must mkdir "$dir"
do_unlink=""
if [ $((RANDOM % 2)) -eq 0 ]; then
do_unlink="-k -f 1024"
else
do_unlink="-f $((RANDOM % 1024))"
fi
log_must eval "xattrtest -R -r -y -x 1 $do_unlink -p $dir >/dev/null 2>&1 &"
done
log_must wait
log_must zpool export $TESTPOOL
log_must zpool import $TESTPOOL
log_must ls -lR "/$TEST_FS/" >/dev/null 2>&1
log_must zdb -d $TESTPOOL
log_pass