mirror_zfs/include/sys/zfs_vfsops.h

225 lines
8.3 KiB
C
Raw Normal View History

2008-11-20 23:01:55 +03:00
/*
* CDDL HEADER START
*
* The contents of this file are subject to the terms of the
* Common Development and Distribution License (the "License").
* You may not use this file except in compliance with the License.
*
* You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
* or http://www.opensolaris.org/os/licensing.
* See the License for the specific language governing permissions
* and limitations under the License.
*
* When distributing Covered Code, include this CDDL HEADER in each
* file and include the License file at usr/src/OPENSOLARIS.LICENSE.
* If applicable, add the following below this CDDL HEADER, with the
* fields enclosed by brackets "[]" replaced with your own identifying
* information: Portions Copyright [yyyy] [name of copyright owner]
*
* CDDL HEADER END
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
2008-11-20 23:01:55 +03:00
*/
#ifndef _SYS_FS_ZFS_VFSOPS_H
#define _SYS_FS_ZFS_VFSOPS_H
#include <sys/isa_defs.h>
#include <sys/types32.h>
#include <sys/list.h>
#include <sys/vfs.h>
#include <sys/zil.h>
#include <sys/sa.h>
2008-11-20 23:01:55 +03:00
#include <sys/rrwlock.h>
#include <sys/zfs_ioctl.h>
#ifdef __cplusplus
extern "C" {
#endif
typedef struct zfsvfs zfsvfs_t;
struct znode;
2008-11-20 23:01:55 +03:00
/*
* This structure emulates the vfs_t from other platforms. It's purpose
* is to faciliate the handling of mount options and minimize structural
* differences between the platforms.
*/
typedef struct vfs {
struct zfsvfs *vfs_data;
char *vfs_mntpoint; /* Primary mount point */
uint64_t vfs_xattr;
boolean_t vfs_readonly;
boolean_t vfs_do_readonly;
boolean_t vfs_setuid;
boolean_t vfs_do_setuid;
boolean_t vfs_exec;
boolean_t vfs_do_exec;
boolean_t vfs_devices;
boolean_t vfs_do_devices;
boolean_t vfs_do_xattr;
boolean_t vfs_atime;
boolean_t vfs_do_atime;
boolean_t vfs_relatime;
boolean_t vfs_do_relatime;
boolean_t vfs_nbmand;
boolean_t vfs_do_nbmand;
} vfs_t;
typedef struct zfs_mnt {
const char *mnt_osname; /* Objset name */
char *mnt_data; /* Raw mount options */
} zfs_mnt_t;
struct zfsvfs {
vfs_t *z_vfs; /* generic fs struct */
Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
2011-02-08 22:16:06 +03:00
struct super_block *z_sb; /* generic super_block */
Add backing_device_info per-filesystem For a long time now the kernel has been moving away from using the pdflush daemon to write 'old' dirty pages to disk. The primary reason for this is because the pdflush daemon is single threaded and can be a limiting factor for performance. Since pdflush sequentially walks the dirty inode list for each super block any delay in processing can slow down dirty page writeback for all filesystems. The replacement for pdflush is called bdi (backing device info). The bdi system involves creating a per-filesystem control structure each with its own private sets of queues to manage writeback. The advantage is greater parallelism which improves performance and prevents a single filesystem from slowing writeback to the others. For a long time both systems co-existed in the kernel so it wasn't strictly required to implement the bdi scheme. However, as of Linux 2.6.36 kernels the pdflush functionality has been retired. Since ZFS already bypasses the page cache for most I/O this is only an issue for mmap(2) writes which must go through the page cache. Even then adding this missing support for newer kernels was overlooked because there are other mechanisms which can trigger writeback. However, there is one critical case where not implementing the bdi functionality can cause problems. If an application handles a page fault it can enter the balance_dirty_pages() callpath. This will result in the application hanging until the number of dirty pages in the system drops below the dirty ratio. Without a registered backing_device_info for the filesystem the dirty pages will not get written out. Thus the application will hang. As mentioned above this was less of an issue with older kernels because pdflush would eventually write out the dirty pages. This change adds a backing_device_info structure to the zfs_sb_t which is already allocated per-super block. It is then registered when the filesystem mounted and unregistered on unmount. It will not be registered for mounted snapshots which are read-only. This change will result in flush-<pool> thread being dynamically created and destroyed per-mounted filesystem for writeback. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #174
2011-08-02 05:24:40 +04:00
struct backing_dev_info z_bdi; /* generic backing dev info */
struct zfsvfs *z_parent; /* parent fs */
2008-11-20 23:01:55 +03:00
objset_t *z_os; /* objset reference */
Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
2011-02-08 22:16:06 +03:00
uint64_t z_flags; /* super_block flags */
2008-11-20 23:01:55 +03:00
uint64_t z_root; /* id of root znode */
uint64_t z_unlinkedobj; /* id of unlinked zapobj */
uint64_t z_max_blksz; /* maximum block size for files */
uint64_t z_fuid_obj; /* fuid table object number */
uint64_t z_fuid_size; /* fuid table size */
avl_tree_t z_fuid_idx; /* fuid tree keyed by index */
avl_tree_t z_fuid_domain; /* fuid tree keyed by domain */
krwlock_t z_fuid_lock; /* fuid lock */
boolean_t z_fuid_loaded; /* fuid tables are loaded */
2009-07-03 02:44:48 +04:00
boolean_t z_fuid_dirty; /* need to sync fuid table ? */
2008-11-20 23:01:55 +03:00
struct zfs_fuid_info *z_fuid_replay; /* fuid info for replay */
zilog_t *z_log; /* intent log pointer */
uint_t z_acl_inherit; /* acl inheritance behavior */
uint_t z_acl_type; /* type of ACL usable on this FS */
2008-11-20 23:01:55 +03:00
zfs_case_t z_case; /* case-sense */
boolean_t z_utf8; /* utf8-only */
int z_norm; /* normalization flags */
boolean_t z_atime; /* enable atimes mount option */
boolean_t z_relatime; /* enable relatime mount option */
2008-11-20 23:01:55 +03:00
boolean_t z_unmounted; /* unmounted */
rrmlock_t z_teardown_lock;
2008-11-20 23:01:55 +03:00
krwlock_t z_teardown_inactive_lock;
Linux 3.1 compat, super_block->s_shrink The Linux 3.1 kernel has introduced the concept of per-filesystem shrinkers which are directly assoicated with a super block. Prior to this change there was one shared global shrinker. The zfs code relied on being able to call the global shrinker when the arc_meta_limit was exceeded. This would cause the VFS to drop references on a fraction of the dentries in the dcache. The ARC could then safely reclaim the memory used by these entries and honor the arc_meta_limit. Unfortunately, when per-filesystem shrinkers were added the old interfaces were made unavailable. This change adds support to use the new per-filesystem shrinker interface so we can continue to honor the arc_meta_limit. The major benefit of the new interface is that we can now target only the zfs filesystem for dentry and inode pruning. Thus we can minimize any impact on the caching of other filesystems. In the context of making this change several other important issues related to managing the ARC were addressed, they include: * The dnlc_reduce_cache() function which was called by the ARC to drop dentries for the Posix layer was replaced with a generic zfs_prune_t callback. The ZPL layer now registers a callback to drop these dentries removing a layering violation which dates back to the Solaris code. This callback can also be used by other ARC consumers such as Lustre. arc_add_prune_callback() arc_remove_prune_callback() * The arc_reduce_dnlc_percent module option has been changed to arc_meta_prune for clarity. The dnlc functions are specific to Solaris's VFS and have already been largely eliminated already. The replacement tunable now represents the number of bytes the prune callback will request when invoked. * Less aggressively invoke the prune callback. We used to call this whenever we exceeded the arc_meta_limit however that's not strictly correct since it results in over zeleous reclaim of dentries and inodes. It is now only called once the arc_meta_limit is exceeded and every effort has been made to evict other data from the ARC cache. * More promptly manage exceeding the arc_meta_limit. When reading meta data in to the cache if a buffer was unable to be recycled notify the arc_reclaim thread to invoke the required prune. * Added arcstat_prune kstat which is incremented when the ARC is forced to request that a consumer prune its cache. Remember this will only occur when the ARC has no other choice. If it can evict buffers safely without invoking the prune callback it will. * This change is also expected to resolve the unexpect collapses of the ARC cache. This would occur because when exceeded just the arc_meta_limit reclaim presure would be excerted on the arc_c value via arc_shrink(). This effectively shrunk the entire cache when really we just needed to reclaim meta data. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #466 Closes #292
2011-12-23 00:20:43 +04:00
list_t z_all_znodes; /* all znodes in the fs */
uint64_t z_nr_znodes; /* number of znodes in the fs */
unsigned long z_rollback_time; /* last online rollback time */
Linux 3.18 compat: Snapshot auto-mounting Re-factor the .zfs/snapshot auto-mouting code to take in to account changes made to the upstream kernels. And to lay the groundwork for enabling access to .zfs snapshots via NFS clients. This patch makes the following core improvements. * All actively auto-mounted snapshots are now tracked in two global trees which are indexed by snapshot name and objset id respectively. This allows for fast lookups of any auto-mounted snapshot regardless without needing access to the parent dataset. * Snapshot entries are added to the tree in zfsctl_snapshot_mount(). However, they are now removed from the tree in the context of the unmount process. This eliminates the need complicated error logic in zfsctl_snapshot_unmount() to handle unmount failures. * References are now taken on the snapshot entries in the tree to ensure they always remain valid while a task is outstanding. * The MNT_SHRINKABLE flag is set on the snapshot vfsmount_t right after the auto-mount succeeds. This allows to kernel to unmount idle auto-mounted snapshots if needed removing the need for the zfsctl_unmount_snapshots() function. * Snapshots in active use will not be automatically unmounted. As long as at least one dentry is revalidated every zfs_expire_snapshot/2 seconds the auto-unmount expiration timer will be extended. * Commit torvalds/linux@bafc9b7 caused snapshots auto-mounted by ZFS to be immediately unmounted when the dentry was revalidated. This was a consequence of ZFS invaliding all snapdir dentries to ensure that negative dentries didn't mask new snapshots. This patch modifies the behavior such that only negative dentries are invalidated. This solves the issue and may result in a performance improvement. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #3589 Closes #3344 Closes #3295 Closes #3257 Closes #3243 Closes #3030 Closes #2841
2015-04-25 02:21:13 +03:00
unsigned long z_snap_defer_time; /* last snapshot unmount deferal */
2008-11-20 23:01:55 +03:00
kmutex_t z_znodes_lock; /* lock for z_all_znodes */
Restructure per-filesystem reclaim Originally when the ARC prune callback was introduced the idea was to register a single callback for the ZPL. The ARC could invoke this call back if it needed the ZPL to drop dentries, inodes, or other cache objects which might be pinning buffers in the ARC. The ZPL would iterate over all ZFS super blocks and perform the reclaim. For the most part this design has worked well but due to limitations in 2.6.35 and earlier kernels there were some problems. This patch is designed to address those issues. 1) iterate_supers_type() is not provided by all kernels which makes it impossible to safely iterate over all zpl_fs_type filesystems in a single callback. The most straight forward and portable way to resolve this is to register a callback per-filesystem during mount. The arc_*_prune_callback() functions have always supported multiple callbacks so this is functionally a very small change. 2) Commit 050d22b removed the non-portable shrink_dcache_memory() and shrink_icache_memory() functions and didn't replace them with equivalent functionality. This meant that for Linux 3.1 and older kernels the ARC had no mechanism to drop dentries and inodes from the caches if needed. This patch adds that missing functionality by calling shrink_dcache_parent() to release dentries which may be pinning inodes. This will result in all unused cache entries being dropped which is a bit heavy handed but it's the only interface available for old kernels. 3) A zpl_drop_inode() callback is registered for kernels older than 2.6.35 which do not support the .evict_inode callback. This ensures that when the last reference on an inode is dropped it is immediately removed from the cache. If this isn't done than inode can end up on the global unused LRU with no mechanism available to ZFS to drop them. Since the ARC buffers are not dropped the hottest inodes can still be recreated without performing disk IO. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Issue #3160
2015-03-18 01:07:47 +03:00
arc_prune_t *z_arc_prune; /* called by ARC to prune caches */
struct inode *z_ctldir; /* .zfs directory inode */
2008-11-20 23:01:55 +03:00
boolean_t z_show_ctldir; /* expose .zfs in the root dir */
boolean_t z_issnap; /* true if this is a snapshot */
boolean_t z_vscan; /* virus scan on/off */
boolean_t z_use_fuids; /* version allows fuids */
2009-01-16 00:59:39 +03:00
boolean_t z_replay; /* set during ZIL replay */
boolean_t z_use_sa; /* version allow system attributes */
Implement SA based xattrs The current ZFS implementation stores xattrs on disk using a hidden directory. In this directory a file name represents the xattr name and the file contexts are the xattr binary data. This approach is very flexible and allows for arbitrarily large xattrs. However, it also suffers from a significant performance penalty. Accessing a single xattr can requires up to three disk seeks. 1) Lookup the dnode object. 2) Lookup the dnodes's xattr directory object. 3) Lookup the xattr object in the directory. To avoid this performance penalty Linux filesystems such as ext3 and xfs try to store the xattr as part of the inode on disk. When the xattr is to large to store in the inode then a single external block is allocated for them. In practice most xattrs are small and this approach works well. The addition of System Attributes (SA) to zfs provides us a clean way to make this optimization. When the dataset property 'xattr=sa' is set then xattrs will be preferentially stored as System Attributes. This allows tiny xattrs (~100 bytes) to be stored with the dnode and up to 64k of xattrs to be stored in the spill block. If additional xattr space is required, which is unlikely under Linux, they will be stored using the traditional directory approach. This optimization results in roughly a 3x performance improvement when accessing xattrs which brings zfs roughly to parity with ext4 and xfs (see table below). When multiple xattrs are stored per-file the performance improvements are even greater because all of the xattrs stored in the spill block will be cached. However, by default SA based xattrs are disabled in the Linux port to maximize compatibility with other implementations. If you do enable SA based xattrs then they will not be visible on platforms which do not support this feature. ---------------------------------------------------------------------- Time in seconds to get/set one xattr of N bytes on 100,000 files ------+--------------------------------+------------------------------ | setxattr | getxattr bytes | ext4 xfs zfs-dir zfs-sa | ext4 xfs zfs-dir zfs-sa ------+--------------------------------+------------------------------ 1 | 2.33 31.88 21.50 4.57 | 2.35 2.64 6.29 2.43 32 | 2.79 30.68 21.98 4.60 | 2.44 2.59 6.78 2.48 256 | 3.25 31.99 21.36 5.92 | 2.32 2.71 6.22 3.14 1024 | 3.30 32.61 22.83 8.45 | 2.40 2.79 6.24 3.27 4096 | 3.57 317.46 22.52 10.73 | 2.78 28.62 6.90 3.94 16384 | n/a 2342.39 34.30 19.20 | n/a 45.44 145.90 7.55 65536 | n/a 2941.39 128.15 131.32* | n/a 141.92 256.85 262.12* Legend: * ext4 - Stock RHEL6.1 ext4 mounted with '-o user_xattr'. * xfs - Stock RHEL6.1 xfs mounted with default options. * zfs-dir - Directory based xattrs only. * zfs-sa - Prefer SAs but spill in to directories as needed, a trailing * indicates overflow in to directories occured. NOTE: Ext4 supports 4096 bytes of xattr name/value pairs per file. NOTE: XFS and ZFS have no limit on xattr name/value pairs per file. NOTE: Linux limits individual name/value pairs to 65536 bytes. NOTE: All setattr/getattr's were done after dropping the cache. NOTE: All tests were run against a single hard drive. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #443
2011-10-25 03:55:20 +04:00
boolean_t z_xattr_sa; /* allow xattrs to be stores as SA */
2008-11-20 23:01:55 +03:00
uint64_t z_version; /* ZPL version */
2009-07-03 02:44:48 +04:00
uint64_t z_shares_dir; /* hidden shares dir */
kmutex_t z_lock;
uint64_t z_userquota_obj;
uint64_t z_groupquota_obj;
uint64_t z_userobjquota_obj;
uint64_t z_groupobjquota_obj;
uint64_t z_replay_eof; /* New end of file - replay only */
sa_attr_type_t *z_attr_table; /* SA attr mapping->id */
uint64_t z_hold_size; /* znode hold array size */
avl_tree_t *z_hold_trees; /* znode hold trees */
kmutex_t *z_hold_locks; /* znode hold locks */
};
Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231
2011-05-19 22:44:07 +04:00
#define ZSB_XATTR 0x0001 /* Enable user xattrs */
Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
2011-02-08 22:16:06 +03:00
/*
* Allow a maximum number of links. While ZFS does not internally limit
* this the inode->i_nlink member is defined as an unsigned int. To be
* safe we use 2^31-1 as the limit.
*/
#define ZFS_LINK_MAX ((1U << 31) - 1U)
2008-11-20 23:01:55 +03:00
/*
* Normal filesystems (those not under .zfs/snapshot) have a total
* file ID size limited to 12 bytes (including the length field) due to
* NFSv2 protocol's limitation of 32 bytes for a filehandle. For historical
* reasons, this same limit is being imposed by the Solaris NFSv3 implementation
* (although the NFSv3 protocol actually permits a maximum of 64 bytes). It
* is not possible to expand beyond 12 bytes without abandoning support
* of NFSv2.
*
* For normal filesystems, we partition up the available space as follows:
* 2 bytes fid length (required)
* 6 bytes object number (48 bits)
* 4 bytes generation number (32 bits)
*
* We reserve only 48 bits for the object number, as this is the limit
* currently defined and imposed by the DMU.
*/
typedef struct zfid_short {
uint16_t zf_len;
uint8_t zf_object[6]; /* obj[i] = obj >> (8 * i) */
uint8_t zf_gen[4]; /* gen[i] = gen >> (8 * i) */
} zfid_short_t;
/*
* Filesystems under .zfs/snapshot have a total file ID size of 22 bytes
* (including the length field). This makes files under .zfs/snapshot
* accessible by NFSv3 and NFSv4, but not NFSv2.
*
* For files under .zfs/snapshot, we partition up the available space
* as follows:
* 2 bytes fid length (required)
* 6 bytes object number (48 bits)
* 4 bytes generation number (32 bits)
* 6 bytes objset id (48 bits)
* 4 bytes currently just zero (32 bits)
*
* We reserve only 48 bits for the object number and objset id, as these are
* the limits currently defined and imposed by the DMU.
*/
typedef struct zfid_long {
zfid_short_t z_fid;
uint8_t zf_setid[6]; /* obj[i] = obj >> (8 * i) */
uint8_t zf_setgen[4]; /* gen[i] = gen >> (8 * i) */
} zfid_long_t;
#define SHORT_FID_LEN (sizeof (zfid_short_t) - sizeof (uint16_t))
#define LONG_FID_LEN (sizeof (zfid_long_t) - sizeof (uint16_t))
extern uint_t zfs_fsyncer_key;
extern int zfs_suspend_fs(zfsvfs_t *zfsvfs);
extern int zfs_resume_fs(zfsvfs_t *zfsvfs, struct dsl_dataset *ds);
extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
2009-07-03 02:44:48 +04:00
const char *domain, uint64_t rid, uint64_t *valuep);
extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
2009-07-03 02:44:48 +04:00
uint64_t *cookiep, void *vbuf, uint64_t *bufsizep);
extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
2009-07-03 02:44:48 +04:00
const char *domain, uint64_t rid, uint64_t quota);
extern boolean_t zfs_owner_overquota(zfsvfs_t *zfsvfs, struct znode *,
boolean_t isgroup);
extern boolean_t zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup,
uint64_t fuid);
extern boolean_t zfs_fuid_overobjquota(zfsvfs_t *zfsvfs, boolean_t isgroup,
uint64_t fuid);
extern int zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers);
extern int zfsvfs_create(const char *name, zfsvfs_t **zfvp);
extern void zfsvfs_free(zfsvfs_t *zfsvfs);
extern int zfs_check_global_label(const char *dsname, const char *hexsl);
2008-11-20 23:01:55 +03:00
extern boolean_t zfs_is_readonly(zfsvfs_t *zfsvfs);
extern int zfs_domount(struct super_block *sb, zfs_mnt_t *zm, int silent);
extern void zfs_preumount(struct super_block *sb);
Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
2011-02-08 22:16:06 +03:00
extern int zfs_umount(struct super_block *sb);
extern int zfs_remount(struct super_block *sb, int *flags, zfs_mnt_t *zm);
Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.
2011-02-08 22:16:06 +03:00
extern int zfs_statvfs(struct dentry *dentry, struct kstatfs *statp);
Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231
2011-05-19 22:44:07 +04:00
extern int zfs_vget(struct super_block *sb, struct inode **ipp, fid_t *fidp);
extern int zfs_prune(struct super_block *sb, unsigned long nr_to_scan,
int *objects);
2010-12-17 22:18:08 +03:00
2008-11-20 23:01:55 +03:00
#ifdef __cplusplus
}
#endif
#endif /* _SYS_FS_ZFS_VFSOPS_H */