Fix read corruption after block clone after truncate

When copy_file_range overwrites a recent truncation, subsequent reads
can incorrectly determine that it is read hole instead of reading the
cloned blocks.

This can happen when the following conditions are met:
- Truncate adds blkid to dn_free_ranges
- A new TXG is created
- copy_file_range calls dmu_brt_clone which override the block pointer
  and set DB_NOFILL
- Subsequent read, given DB_NOFILL, hits dbuf_read_impl and
  dbuf_read_hole
- dbuf_read_hole calls dnode_block_freed, which returns TRUE because the
  truncated blkids are still in dn_free_ranges

This will not happen if the clone and truncate are in the same TXG,
because the block clone would update the current TXG's dn_free_ranges,
which is why this bug only triggers under high IO load (such as
compilation).

Fix this by skipping the dnode_block_freed call if the block is
overridden. The fix shouldn't cause an issue when the cloned block is
subsequently freed in later TXGs, as dbuf_undirty would remove the
override.

This requires a dedicated test program as it is much harder to trigger
with scripts (this needs to generate a lot of I/O in short period of
time for the bug to trigger reliably).

Assisted-by: Gemini:gemini-3.1-pro
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Gary Guo <gary@kernel.org>
Closes #18412
Closes #18421
This commit is contained in:
Gary Guo
2026-04-15 22:51:53 +01:00
committed by Tony Hutter
parent b2602a400a
commit e7524594a9
9 changed files with 161 additions and 2 deletions
+5 -1
View File
@@ -1481,8 +1481,12 @@ dbuf_read_hole(dmu_buf_impl_t *db, dnode_t *dn, blkptr_t *bp)
* Recheck BP_IS_HOLE() after dnode_block_freed() in case dnode_sync()
* processes the delete record and clears the bp while we are waiting
* for the dn_mtx (resulting in a "no" from block_freed).
*
* If bp != db->db_blkptr, it means that it was overridden (by a block
* clone or direct I/O write). We cannot rely on dnode_block_freed as
* the range can be freed in an earlier TXG but overridden in later.
*/
if (!is_hole && db->db_level == 0)
if (!is_hole && db->db_level == 0 && bp == db->db_blkptr)
is_hole = dnode_block_freed(dn, db->db_blkid) || BP_IS_HOLE(bp);
if (is_hole) {