Commit Graph

8 Commits

Author SHA1 Message Date
Chunwei Chen
c79d5e4f33 Define sops->free_inode() to prevent use-after-free during lookup
On Linux, when doing path lookup with LOOKUP_RCU, dentry and inode can
be dereferenced without refcounts and locks. For this reason, dentry and
inode must only be freed after RCU grace period.

However, zfs currently frees inode in zfs_inode_destroy synchronously
and we can't use GPL-only call_rcu() in zfs directly. Fortunately, on
Linux 5.2 and after, if we define sops->free_inode(), the kernel will do
call_rcu() for us.

This issue may be triggered more easily with init_on_free=1 boot
parameter:

BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:selinux_inode_permission+0x10e/0x1c0
Call Trace:
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? show_trace_log_lvl+0x1be/0x2d9
 ? security_inode_permission+0x37/0x60
 ? __die_body.cold+0x8/0xd
 ? no_context+0x113/0x220
 ? exc_page_fault+0x6d/0x130
 ? asm_exc_page_fault+0x1e/0x30
 ? selinux_inode_permission+0x10e/0x1c0
 security_inode_permission+0x37/0x60
 link_path_walk.part.0.constprop.0+0xb5/0x360
 ? path_init+0x27d/0x3c0
 path_lookupat+0x3e/0x1a0
 filename_lookup+0xc0/0x1d0
 ? __check_object_size.part.0+0x123/0x150
 ? strncpy_from_user+0x4e/0x130
 ? getname_flags.part.0+0x4b/0x1c0
 vfs_statx+0x72/0x120
 ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
 __do_sys_newlstat+0x39/0x70
 ? __x64_sys_ioctl+0x8d/0xd0
 do_syscall_64+0x30/0x40
 entry_SYSCALL_64_after_hwframe+0x62/0xc7

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Co-authored-by: Chunwei Chen <david.chen@nutanix.com>
Closes #17546
2025-08-05 12:30:23 -04:00
Rob Norris
9e009acbdc dmu_tx: rename dmu_tx_assign() flags from TXG_* to DMU_TX_* (#17143)
This helps to avoids confusion with the similarly-named
txg_wait_synced().

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
2025-04-16 09:59:45 -07:00
Rob Norris
9530eb64e0 SPDX: license tags: CDDL-1.0
Sponsored-by: https://despairlabs.com/sponsor/
Signed-off-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
2025-04-16 09:59:44 -07:00
Rob Norris
0d51852ec7 Remove unnecessary CSTYLED escapes on top-level macro invocations
cstyle can handle these cases now, so we don't need to disable it.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16840
2024-12-06 09:05:02 -08:00
tstabrawa
ae48c2f6a9 Revert "Avoid BUG in migrate_folio_extra"
This reverts commit b052035990.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #16568
Closes #16723
2024-11-06 11:54:32 -08:00
Rob Norris
0cf14bf4b5 Linux 6.12: PG_error flag was removed
torvalds/linux@09022bc196 removes the flag, and the corresponding
SetPageError() and ClearPageError() macros, with no replacement offered.

Going back through the upstream history, use of this flag has been
gradually removed over the last year as part of the long tail of
converting everything to folios. Interesting tidbit comments from
torvalds/linux@29e9412b25 and torvalds/linux@420e05d0de suggest that
this flag has not been used meaningfully since page writeback failures
started being recorded in errseq_t instead (the whole "fsyncgate" thing,
~2017, around torvalds/linux@8ed1e46aaf).

Given that, it's possible that since perhaps Linux 4.13 we haven't been
getting anything by setting the flag. I don't know if that's true and/or
if there's something we should be doing instead, but my gut feel is that
its probably fine we only use the page cache as a proxy to allow mmap()
to work, rather than backing IO with it.

As such, I'm expecting that removing this will do no harm, but I'm
leaving it in for older kernels to maintain status quo, and if there is
an overall better way, that is left for a future change.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Closes #16582
2024-10-01 13:54:05 -07:00
tstabrawa
b052035990
Avoid BUG in migrate_folio_extra
Linux page migration code won't wait for writeback to complete unless
it needs to call release_folio.  Call SetPagePrivate wherever
PageUptodate is set and define .release_folio, to cause
fallback_migrate_folio to wait for us.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: tstabrawa <59430211+tstabrawa@users.noreply.github.com>
Closes #15140
Closes #16568
2024-09-26 08:57:09 -07:00
Rob Norris
c22d56e3ed zfs_znode: lift common code to a single shared file
For now, userspace has no znode implementation. Some of the property and
path handling code is used there though and is the same on all
platforms, so we only need a single copy of it.

Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Rob Norris <robn@despairlabs.com>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16492
2024-09-19 15:49:45 -07:00