Reduce zfs_dmu_offset_next_sync penalty

Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #17434
This commit is contained in:
Alexander Motin 2025-06-11 14:50:49 -04:00 committed by GitHub
parent 4ae931aa93
commit 66ec7fb269
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -2530,7 +2530,8 @@ int
dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole, uint64_t *off) dmu_offset_next(objset_t *os, uint64_t object, boolean_t hole, uint64_t *off)
{ {
dnode_t *dn; dnode_t *dn;
int restarted = 0, err; uint64_t txg, maxtxg = 0;
int err;
restart: restart:
err = dnode_hold(os, object, FTAG, &dn); err = dnode_hold(os, object, FTAG, &dn);
@ -2546,19 +2547,22 @@ restart:
* must be synced to disk to accurately report holes. * must be synced to disk to accurately report holes.
* *
* Provided a RL_READER rangelock spanning 0-UINT64_MAX is * Provided a RL_READER rangelock spanning 0-UINT64_MAX is
* held by the caller only a single restart will be required. * held by the caller only limited restarts will be required.
* We tolerate callers which do not hold the rangelock by * We tolerate callers which do not hold the rangelock by
* returning EBUSY and not reporting holes after one restart. * returning EBUSY and not reporting holes after at most
* TXG_CONCURRENT_STATES (3) restarts.
*/ */
if (zfs_dmu_offset_next_sync) { if (zfs_dmu_offset_next_sync) {
rw_exit(&dn->dn_struct_rwlock); rw_exit(&dn->dn_struct_rwlock);
dnode_rele(dn, FTAG); dnode_rele(dn, FTAG);
if (restarted) if (maxtxg == 0) {
txg = spa_last_synced_txg(dmu_objset_spa(os));
maxtxg = txg + TXG_CONCURRENT_STATES;
} else if (txg >= maxtxg)
return (SET_ERROR(EBUSY)); return (SET_ERROR(EBUSY));
txg_wait_synced(dmu_objset_pool(os), 0); txg_wait_synced(dmu_objset_pool(os), ++txg);
restarted = 1;
goto restart; goto restart;
} }