As a regular kernel function, kern_path() returns errors as negative
errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into
the positive errnos used throughout the ZFS code when it returns them
to other ZFS functions so that the ZFS code properly sees them as
errors.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Closes#7764Closes#7864
Commit torvalds/linux@95582b0 changes the inode i_atime, i_mtime,
and i_ctime members form timespec's to timespec64's to make them
2038 safe. As part of this change the current_time() function was
also updated to return the timespec64 type.
Resolve this issue by introducing a new inode_timespec_t type which
is defined to match the timespec type used by the inode. It should
be used when working with inode timestamps to ensure matching types.
The timestruc_t type under Illumos was used in a similar fashion but
was specified to always be a timespec_t. Rather than incorrectly
define this type all timespec_t types have been replaced by the new
inode_timespec_t type.
Finally, the kernel and user space 'sys/time.h' headers were aligned
with each other. They define as appropriate for the context several
constants as macros and include static inline implementation of
gethrestime(), gethrestime_sec(), and gethrtime().
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7643
Backported-by: Richard Yao <ryao@gentoo.org>
Since torvalds/linux@d0a5b99 IOP_XATTR is used to indicate the inode
has xattr support: clear it for the ctldir inodes to avoid EIO errors.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#6189
Linux 4.9 added current_time() as the preferred interface to get
the filesystem time. CURRENT_TIME was retired in Linux 4.12.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6114
Restructure the handling of mount options to be consistent with
upstream OpenZFS. This required making the following changes.
- The zfs_mntopts_t was renamed vfs_t and adjusted to provide
the minimal needed functionality. This includes a pointer
back to the associated zfsvfs_t. Plus it made it possible
to revert zfs_register_callbacks() and zfsvfs_create() back
to their original prototypes.
- A zfs_mnt_t structure was added for the sole purpose of
providing a structure to pass the osname and raw mount
pointer to zfs_domount() without having to copy them.
- Mount option parsing was moved down from the zpl_* wrapper
functions in to the zfs_* functions. This allowed for the
code to be simplied and it's where similar functionality
appears on other platforms.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The use of zfs_sb_t instead of zfsvfs_t results in unnecessary
conflicts with the upstream source. Change all instances of
zfs_sb_t to zfsvfs_t including updating the variables names.
Whenever possible the code was updated to be consistent with
hope it appears in the upstream OpenZFS source.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The current implementation for allowing nfs to access snapdir is very buggy.
It uses a special fh for snapdirs, such that the next time nfsd does
fh_to_dentry, it actually returns the root inode inside the snapshot. So nfsd
never knows it cross a mountpoint.
The problem is that nfsd will not hold a reference on the vfsmount of the
snapshot. This cause auto unmounter to unmount the snapshot even though nfs is
still holding dentries in it.
To fix this, we return the inode for the snapdirs themselves. However, we also
trigger automount upon fh_to_dentry, and return ESTALE so nfsd will revalidate
and see the mountpoint and do crossmnt.
Because nfsd will now be aware that these are different filesystems users
must add crossmnt to their export options to access snapshot directories.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#3794Closes#4716Closes#5810Closes#5833
Enable picky cstyle checks and resolve the new warnings. The vast
majority of the changes needed were to handle minor issues with
whitespace formatting. This patch contains no functional changes.
Non-whitespace changes are as follows:
* 8 times ; to { } in for/while loop
* fix missing ; in cmd/zed/agents/zfs_diagnosis.c
* comment (confim -> confirm)
* change endline , to ; in cmd/zpool/zpool_main.c
* a number of /* BEGIN CSTYLED */ /* END CSTYLED */ blocks
* /* CSTYLED */ markers
* change == 0 to !
* ulong to unsigned long in module/zfs/dsl_scan.c
* rearrangement of module_param lines in module/zfs/metaslab.c
* add { } block around statement after for_each_online_node
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Håkan Johansson <f96hajo@chalmers.se>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5465
Use it for spa_deadman, zpl_posix_acl_free, snapentry_expire.
This free system_taskq from the above long delay tasks, and allow us to do
taskq_wait_outstanding on system_taskq without being blocked forever, making
system_taskq more generic and useful.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Originally the .zfs/snapshot directory was disabled for 32-bit systems
because 64-bit inode numbers were not supported. This is no longer
the case and this functionality can be enabled by default.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347Closes#2002
Add the TASKQID_INVALID macros and update callers to use the macro
instead of testing against 0. There is no functional change
even though the functions in zfs_ctldir.c incorrectly used -1
instead of 0.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #5347
Call mount and umount via /usr/bin/env instead of /bin/sh in
zfsctl_snapshot_mount() and zfsctl_snapshot_unmount().
This change fixes a shell code injection flaw. The call to /bin/sh
passed the mountpoint unescaped, only surrounded by single quotes. A
mountpoint containing one or more single quotes would cause the command
to fail or potentially execute arbitrary shell code.
This change also provides compatibility with grsecurity patches.
Grsecurity only allows call_usermodehelper() to use helper binaries in
certain paths. /usr/bin/* is allowed, /bin/* is not.
Remove duplicate z_uid/z_gid member which are also held in the
generic vfs inode struct. This is done by first removing the members
from struct znode and then using the KUID_TO_SUID/KGID_TO_SGID
macros to access the respective member from struct inode. In cases
where the uid/gids are being marshalled from/to disk, use the newly
introduced zfs_(uid|gid)_(read|write) functions to properly
save the uids rather than the internal kernel representation.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4685
Issue #227
A mostly mechanical change, taking into account i_nlink is 32 bits vs ZFS's
64 bit on-disk link count.
We revert "xattr dir doesn't get purged during iput" (ddae16a) as this is a
more Linux-integrated fix for the same issue.
In addition, setting the initial link count on a new node has been changed
from setting one less than required in zfs_mknode() then incrementing to the
correct count in zfs_link_create() (which was somewhat bizarre in the first
place), to setting the correct count in zfs_mknode() and not incrementing it
in zfs_link_create(). This both means we no longer set the link count in
sa_bulk_update() twice (once for the initial incorrect count then again for
the correct count), as well as adhering to the Linux requirement of not
incrementing a zero link count without I_LINKABLE (see linux commit
f4e0c30c).
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#4838
Issue #227
ZFS allows for specific permissions to be delegated to normal users
with the `zfs allow` and `zfs unallow` commands. In addition, non-
privileged users should be able to run all of the following commands:
* zpool [list | iostat | status | get]
* zfs [list | get]
Historically this functionality was not available on Linux. In order
to add it the secpolicy_* functions needed to be implemented and mapped
to the equivalent Linux capability. Only then could the permissions on
the `/dev/zfs` be relaxed and the internal ZFS permission checks used.
Even with this change some limitations remain. Under Linux only the
root user is allowed to modify the namespace (unless it's a private
namespace). This means the mount, mountpoint, canmount, unmount,
and remount delegations cannot be supported with the existing code. It
may be possible to add this functionality in the future.
This functionality was validated with the cli_user and delegation test
cases from the ZFS Test Suite. These tests exhaustively verify each
of the supported permissions which can be delegated and ensures only
an authorized user can perform it.
Two minor bug fixes were required for test-running.py. First, the
Timer() object cannot be safely created in a `try:` block when there
is an unconditional `finally` block which references it. Second,
when running as a normal user also check for scripts using the
both the .ksh and .sh suffixes.
Finally, existing users who are simulating delegations by setting
group permissions on the /dev/zfs device should revert that
customization when updating to a version with this change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#362Closes#434Closes#4100Closes#4394Closes#4410Closes#4487
This field is a duplicate of the inode->i_generation, so just
kill it.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4538Closes#4654
struct zvol_state contains a dummy znode, which is around 1KB on x64,
only for zfs_range_lock. But in reality, other than z_range_lock and
z_range_avl, zfs_range_lock only need znode on regular file, which
means we add 1KB on a structure and gain nothing.
In this patch, we remove the dummy znode for zvol_state. In order to
do that, we also need to refactor zfs_range_lock a bit. We move
z_range_lock and z_range_avl pair out of znode_t to form zfs_rlock_t.
This new struct replaces znode_t as the main handle inside the range
lock functions.
We also add pointers to z_size, z_blksz, and z_max_blksz so range lock
code doesn't depend on znode_t. This allows non-ZPL consumers like
Lustre to use the range locks with their equivalent znode_t structure.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4510
6093 zfsctl_shares_lookup should only VN_RELE() on zfs_zget() success
Reviewed by: Gordon Ross <gwr@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
OpenZFS-issue: https://www.illumos.org/issues/6093
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/0f92170Closes#4630
This function was always implemented slightly differently under Linux
and therefore never suffered from this issue. The patch has been
updated and applied as cleanup in order to minimize differences with
the upstream OpenZFS code.
This reverts commit 4cd77889b6. The
i_generation field in the inode is 32-bit and the SA code expects
64-bit fixed values. Revert this optimization for now until
this is cleanly addressed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4538
This field is a duplicate of the inode->i_generation, so just kill it
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4538
The problem for atime:
We have 3 places for atime: inode->i_atime, znode->z_atime and SA. And its
handling is a mess. A huge part of mess regarding atime comes from
zfs_tstamp_update_setup, zfs_inode_update, and zfs_getattr, which behave
inconsistently with those three values.
zfs_tstamp_update_setup clears z_atime_dirty unconditionally as long as you
don't pass ATTR_ATIME. Which means every write(2) operation which only updates
ctime and mtime will cause atime changes to not be written to disk.
Also zfs_inode_update from write(2) will replace inode->i_atime with what's
inside SA(stale). But doesn't touch z_atime. So after read(2) and write(2).
You'll have i_atime(stale), z_atime(new), SA(stale) and z_atime_dirty=0.
Now, if you do stat(2), zfs_getattr will actually replace i_atime with what's
inside, z_atime. So you will have now you'll have i_atime(new), z_atime(new),
SA(stale) and z_atime_dirty=0. These will all gone after umount. And you'll
leave with a stale atime.
The problem for relatime:
We do have a relatime config inside ZFS dataset, but how it should interact
with the mount flag MS_RELATIME is not well defined. It seems it wanted
relatime mount option to override the dataset config by showing it as
temporary in `zfs get`. But at the same time, `zfs set relatime=on|off` would
also seems to want to override the mount option. Not to mention that
MS_RELATIME flag is actually never passed into ZFS, so it never really worked.
How Linux handles atime:
The Linux kernel actually handles atime completely in VFS, except for writing
it to disk. So if we remove the atime handling in ZFS, things would just work,
no matter it's strictatime, relatime, noatime, or even O_NOATIME. And whenever
VFS updates the i_atime, it will notify the underlying filesystem via
sb->dirty_inode().
And also there's one thing to note about atime flags like MS_RELATIME and
other flags like MS_NODEV, etc. They are mount point flags rather than
filesystem(sb) flags. Since native linux filesystem can be mounted at multiple
places at the same time, they can all have different atime settings. So these
flags are never passed down to filesystem drivers.
What this patch tries to do:
We remove znode->z_atime, since we won't gain anything from it. We remove most
of the atime handling and leave it to VFS. The only thing we do with atime is
to write it when dirty_inode() or setattr() is called. We also add
file_accessed() in zpl_read() since it's not provided in vfs_read().
After this patch, only the MS_RELATIME flag will have effect. The setting in
dataset won't do anything. We will make zfstuil to mount ZFS with MS_RELATIME
set according to the setting in dataset in future patch.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4482
While exceptionally unlikely to cause a problem the zfs_snapentry_t
hold should be taken before the dispatch to prevent any possibility
of the task being processed before the hold.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
When a concorrent mount finishes just before calling to
zfsctl_snapshot_ismounted, if we return EISDIR, the VFS will return
with EREMOTE. We should instead just return 0, so VFS may retry and
would likely notice the dentry is alreadly mounted. This will be
inline with when usermode helper return EBUSY.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
By changing the zfs_snapshot_lock from a mutex to a rw lock the
zfsctl_lookup_objset() function can be allowed to run concurrently.
This should reduce the latency of fh_to_dentry lookups in ZFS
snapshots which are being accessed over NFS.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
The zfsctl_snapshot_unmount_delay() function must not be called
from zfsctl_lookup_objset() while it is currently holding the
zfs_snapshot_lock. This will result in a deadlock. It is safe
to call zfsctl_snapshot_unmount_delay_impl() directly because the
function already has a reference on the zfs_snapentry_t.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#3997
There are cases where it's desirable that auto-mounted snapshots
not expire after a fixed duration. They should be unmounted only
when the filesystem they are a snapshot of is unmounted.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
objsetid is not unique across pool, so using it solely as key would cause
panic when automounting two snapshot on different pools with the same
objsetid. We fix this by adding spa pointer as additional key.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #3948
Issue #3786
Issue #3887
When concurrent threads accessing the snapdir, one will succeed the user
helper mount while others will get EBUSY. However, the original code treats
those EBUSY threads as success and goes on to do zfsctl_snapshot_add, which
causes repeated avl_add and thus panic.
Also, if the snapshot is already mounted somewhere else, a thread accessing
the snapdir will also get EBUSY from user helper mount. And it will cause
strange things as doing follow_down_one will fail and then follow_up will jump
up to the mountpoint of the filesystem and confuse the hell out of VFS.
The patch fix both behavior by returning 0 immediately for the EBUSY threads.
Note, this will have a side effect for the second case where the VFS will
retry several times before returning ELOOP.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4018
Accessing a snapshot via NFS should cause an auto-unmount of that
snapshot to be deferred until such as time as the snapshot is idle.
This is analogous to the zpl_revalidate logic employed by locally
mounted snapshots.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3794
This patch is based on the previous work done by @andrey-ve and
@yshui. It triggers the automount by using kern_path() to traverse
to the known snapshout mount point. Once the snapshot is mounted
NFS can access the contents of the snapshot.
Allowing NFS clients to access to the .zfs/snapshot directory would
normally mean that a root user on a client mounting an export with
'no_root_squash' would be able to use mkdir/rmdir/mv to manipulate
snapshots on the server. To prevent configuration mistakes a
zfs_admin_snapshot module option was added which disables the
mkdir/rmdir/mv functionally. System administators desiring this
functionally must explicitly enable it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#2797Closes#1655Closes#616
Re-factor the .zfs/snapshot auto-mouting code to take in to account
changes made to the upstream kernels. And to lay the groundwork for
enabling access to .zfs snapshots via NFS clients. This patch makes
the following core improvements.
* All actively auto-mounted snapshots are now tracked in two global
trees which are indexed by snapshot name and objset id respectively.
This allows for fast lookups of any auto-mounted snapshot regardless
without needing access to the parent dataset.
* Snapshot entries are added to the tree in zfsctl_snapshot_mount().
However, they are now removed from the tree in the context of the
unmount process. This eliminates the need complicated error logic
in zfsctl_snapshot_unmount() to handle unmount failures.
* References are now taken on the snapshot entries in the tree to
ensure they always remain valid while a task is outstanding.
* The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right
after the auto-mount succeeds. This allows to kernel to unmount
idle auto-mounted snapshots if needed removing the need for the
zfsctl_unmount_snapshots() function.
* Snapshots in active use will not be automatically unmounted. As
long as at least one dentry is revalidated every zfs_expire_snapshot/2
seconds the auto-unmount expiration timer will be extended.
* Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS
to be immediately unmounted when the dentry was revalidated. This
was a consequence of ZFS invaliding all snapdir dentries to ensure that
negative dentries didn't mask new snapshots. This patch modifies the
behavior such that only negative dentries are invalidated. This solves
the issue and may result in a performance improvement.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3589Closes#3344Closes#3295Closes#3257Closes#3243Closes#3030Closes#2841
Under Linux filesystem threads responsible for handling I/O are
normally created with the maximum priority. Non-I/O filesystem
processes run with the default priority. ZFS should adopt the
same priority scheme under Linux to maintain good performance
and so that it will complete fairly when other Linux filesystems
are active. The priorities have been updated to the following:
$ ps -eLo rtprio,cls,pid,pri,nice,cmd | egrep 'z_|spl_|zvol|arc|dbu|meta'
- TS 10743 19 -20 [spl_kmem_cache]
- TS 10744 19 -20 [spl_system_task]
- TS 10745 19 -20 [spl_dynamic_tas]
- TS 10764 19 0 [dbu_evict]
- TS 10765 19 0 [arc_prune]
- TS 10766 19 0 [arc_reclaim]
- TS 10767 19 0 [arc_user_evicts]
- TS 10768 19 0 [l2arc_feed]
- TS 10769 39 0 [z_unmount]
- TS 10770 39 -20 [zvol]
- TS 11011 39 -20 [z_null_iss]
- TS 11012 39 -20 [z_null_int]
- TS 11013 39 -20 [z_rd_iss]
- TS 11014 39 -20 [z_rd_int_0]
- TS 11022 38 -19 [z_wr_iss]
- TS 11023 39 -20 [z_wr_iss_h]
- TS 11024 39 -20 [z_wr_int_0]
- TS 11032 39 -20 [z_wr_int_h]
- TS 11033 39 -20 [z_fr_iss_0]
- TS 11041 39 -20 [z_fr_int]
- TS 11042 39 -20 [z_cl_iss]
- TS 11043 39 -20 [z_cl_int]
- TS 11044 39 -20 [z_ioctl_iss]
- TS 11045 39 -20 [z_ioctl_int]
- TS 11046 39 -20 [metaslab_group_]
- TS 11050 19 0 [z_iput]
- TS 11121 38 -19 [z_wr_iss]
Note that under Linux the meaning of a processes priority is inverted
with respect to illumos. High values on Linux indicate a _low_ priority
while high value on illumos indicate a _high_ priority.
In order to preserve the logical meaning of the minclsyspri and
maxclsyspri macros when they are used by the illumos wrapper functions
their values have been inverted. This way when changes are merged
from upstream illumos we won't need to remember to invert the macro.
It could also lead to confusion.
This patch depends on https://github.com/zfsonlinux/spl/pull/466.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes#3607
The vast majority of these changes are in Linux specific code.
They are the result of not having an automated style checker to
validate the code when it was originally written. Others were
caused when the common code was slightly adjusted for Linux.
This patch contains no functional changes. It only refreshes
the code to conform to style guide.
Everyone submitting patches for inclusion upstream should now
run 'make checkstyle' and resolve any warning prior to opening
a pull request. The automated builders have been updated to
fail a build if when 'make checkstyle' detects an issue.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1821
In the current snapshot automount implementation, it is possible for
multiple mounts to attempted concurrently. Only one of the mounts will
succeed and the other will fail. The failed mounts will cause an EREMOTE
to be propagated back to the application.
This commit works around the problem by adding a new exit status,
MOUNT_BUSY to the mount.zfs program which is used when the underlying
mount(2) call returns EBUSY. The zfs code detects this condition and
treats it as if the mount had succeeded.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1819
3741 zfs needs better comments
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Approved by: Christopher Siden <christopher.siden@delphix.com>
References:
https://www.illumos.org/issues/3741illumos/illumos-gate@3e30c24aee
Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
3598 want to dtrace when errors are generated in zfs
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
https://www.illumos.org/issues/3598illumos/illumos-gate@be6fd75a69
Ported-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
Porting notes:
1. include/sys/zfs_context.h has been modified to render some new
macros inert until dtrace is available on Linux.
2. Linux-specific changes have been adapted to use SET_ERROR().
3. I'm NOT happy about this change. It does nothing but ugly
up the code under Linux. Unfortunately we need to take it to
avoid more merge conflicts in the future. -Brian
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>
References:
https://www.illumos.org/issues/2882https://www.illumos.org/issues/2883https://www.illumos.org/issues/2900illumos/illumos-gate@4445fffbbb
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1293
Porting notes:
WARNING: This patch changes the user/kernel ABI. That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules. Ensure you load the matching kernel
modules from master after updating the utilities. Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:
$ zpool list
failed to read pool configuration: bad address
no pools available
$ zfs list
no datasets available
Add zvol minor device creation to the new zfs_snapshot_nvl function.
Remove the logging of the "release" operation in
dsl_dataset_user_release_sync(). The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos. This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).
Squash some "may be used uninitialized" warning/erorrs.
Fix some printf format warnings for %lld and %llu.
Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.
Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled uid_t/git_t are
replaced by kuid_t/kgid_t, which are structures instead of integral
types. This causes any code that uses an integral type to fail to build.
The User Namespace functionality introduced in Linux 3.8 requires
CONFIG_UIDGID_STRICT_TYPE_CHECKS, so we could not build against any
kernel that supported it.
We resolve this by converting between the new kuid_t/kgid_t structures
and the original uid_t/gid_t types.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1589
It is possible for an automounted snapshot which is expiring to
deadlock with a manual unmount of the snapshot. This can occur
because taskq_cancel_id() will block if the task is currently
executing until it completes. But it will never complete because
zfsctl_unmount_snapshot() is holding the zsb->z_ctldir_lock which
zfsctl_expire_snapshot() must acquire.
---------------------- z_unmount/0:2153 ---------------------
mutex_lock <blocking on zsb->z_ctldir_lock>
zfsctl_unmount_snapshot
zfsctl_expire_snapshot
taskq_thread
------------------------- zfs:10690 -------------------------
taskq_wait_id <waiting for z_unmount to exit>
taskq_cancel_id
__zfsctl_unmount_snapshot
zfsctl_unmount_snapshot <takes zsb->z_ctldir_lock>
zfs_unmount_snap
zfs_ioc_destroy_snaps_nvl
zfsdev_ioctl
do_vfs_ioctl
We resolve the deadlock by dropping the zsb->z_ctldir_lock before
calling __zfsctl_unmount_snapshot(). The lock is only there to
prevent concurrent modification to the zsb->z_ctldir_snaps AVL
tree. Moreover, we're careful to remove the zfs_snapentry_t from
the AVL tree before dropping the lock which ensures no other tasks
can find it. On failure it's added back to the tree.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlap <cdunlap@llnl.gov>
Closes#1527
Retire the dmu_snapshot_id() function which was introduced in the
initial .zfs control directory implementation. There is already
an existing dsl_dataset_snap_lookup() which does exactly what we
need, and the dmu_snapshot_id() function as implemented is racy.
https://github.com/zfsonlinux/zfs/issues/1215#issuecomment-12579879
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1238
Rolling back a mounted filesystem with open file handles and
cached dentries+inodes never worked properly in ZoL. The
major issue was that Linux provides no easy mechanism for
modules to invalidate the inode cache for a file system.
Because of this it was possible that an inode from the previous
filesystem would not get properly dropped from the cache during
rolling back. Then a new inode with the same inode number would
be create and collide with the existing cached inode. Ideally
this would trigger an VERIFY() but in practice the error wasn't
handled and it would just NULL reference.
Luckily, this issue can be resolved by sprucing up the existing
Solaris zfs_rezget() functionality for the Linux VFS.
The way it works now is that when a file system is rolled back
all the cached inodes will be traversed and refetched from disk.
If a version of the cached inode exists on disk the in-core
copy will be updated accordingly. If there is no match for that
object on disk it will be unhashed from the inode cache and
marked as stale.
This will effectively make the inode unfindable for lookups
allowing the inode number to be immediately recycled. The inode
will then only be accessible from the cached dentries. Subsequent
dentry lookups which reference a stale inode will result in the
dentry being invalidated. Once invalidated the dentry will drop
its reference on the inode allowing it to be safely pruned from
the cache.
Special care is taken for negative dentries since they do not
reference any inode. These dentires will be invalidate based
on when they were added to the dentry cache. Entries added
before the last rollback will be invalidate to prevent them
from masking real files in the dataset.
Two nice side effects of this fix are:
* Removes the dependency on spl_invalidate_inodes(), it can now
be safely removed from the SPL when we choose to do so.
* zfs_znode_alloc() no longer requires a dentry to be passed.
This effectively reverts this portition of the code to its
upstream counterpart. The dentry is not instantiated more
correctly in the Linux ZPL layer.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ned Bass <bass6@llnl.gov>
Closes#795
A misplaced single quote caused the umount command to fail with a
syntax error when unmounting snapshots under the .zfs/snapshot
control directory.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1210
As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In
particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for
process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the
process). A number of call sites used the number 1 instead of the
constant name, so the behavior was not as expected on kernels with this
change.
One visible consequence of this change was that processes accessing
automounted snapshots received an ELOOP error because they failed to
wait for zfs.mount to complete.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#816
Linux kernel commit d8e794d accidentally broke the delayed work
APIs for non-GPL callers. While the APIs to schedule a delayed
work item are still available to all callers, it is no longer
possible to initialize the delayed work item.
I'm cautiously optimistic we could get the delayed_work_timer_fn
exported for all callers in the upstream kernel. But frankly
the compatibility code to use this kernel interface has always
been problematic.
Therefore, this patch abandons direct use the of the Linux
kernel interface in favor of the new delayed taskq interface.
It provides roughly the same functionality as delayed work queues
but it's a stable interface under our control.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1053
When automounting a snapshot in the .zfs/snapshot directory
make sure to quote both the dataset name and the mount point.
This ensures that if either component contains spaces, which
are allowed, they get handled correctly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1027