mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2024-11-18 02:20:59 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	9fd91daeef	Honor setgit bit on directories Newly created files were always being created with the fsuid/fsgid in the current users credentials. This is correct except in the case when the parent directory sets the 'setgit' bit. In this case according to posix the newly created file/directory should inherit the gid of the parent directory. Additionally, in the case of a subdirectory it should also inherit the 'setgit' bit. Finally, this commit performs a little cleanup of the vattr_t initialization by moving it to a common helper function. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #262	2011-07-20 14:07:13 -07:00
Brian Behlendorf	057e8eee35	Improve fstat(2) performance There is at most a factor of 3x performance improvement to be had by using the Linux generic_fillattr() helper. However, to use it safely we need to ensure the values in a cached inode are kept rigerously up to date. Unfortunately, this isn't the case for the blksize, blocks, and atime fields. At the moment the authoritative values are still stored in the znode. This patch introduces an optimized zfs_getattr_fast() call. The idea is to use the up to date values from the inode and the blksize, block, and atime fields from the znode. At some latter date we should be able to strictly use the inode values and further improve performance. The remaining overhead in the zfs_getattr_fast() call can be attributed to having to take the znode mutex. This overhead is unavoidable until the inode is kept strictly up to date. The the careful reader will notice the we do not use the customary ZFS_ENTER()/ZFS_EXIT() macros. These macro's are designed to ensure the filesystem is not torn down in the middle of an operation. However, in this case the VFS is holding a reference on the active inode so we know this is impossible. =================== Performance Tests ======================== This test calls the fstat(2) system call 10,000,000 times on an open file description in a tight loop. The test results show the zfs stat(2) performance is now only 22% slower than ext4. This is a 2.5x improvement and there is a clear long term plan to get to parity with ext4. filesystem \| test-1 test-2 test-3 \| average \| times-ext4 --------------+-------------------------+---------+----------- ext4 \| 7.785s 7.899s 7.284s \| 7.656s \| 1.000x zfs-0.6.0-rc4 \| 24.052s 22.531s 23.857s \| 23.480s \| 3.066x zfs-faststat \| 9.224s 9.398s 9.485s \| 9.369s \| 1.223x The second test is to run 'du' of a copy of the /usr tree which contains 110514 files. The test is run multiple times both using both a cold cache (/proc/sys/vm/drop_caches) and a hot cache. As expected this change signigicantly improved the zfs hot cache performance and doesn't quite bring zfs to parity with ext4. A little surprisingly the zfs cold cache performance is better than ext4. This can probably be attributed to the zfs allocation policy of co-locating all the meta data on disk which minimizes seek times. By default the ext4 allocator will spread the data over the entire disk only co-locating each directory. filesystem \| cold \| hot --------------+---------+-------- ext4 \| 13.318s \| 1.040s zfs-0.6.0-rc4 \| 4.982s \| 1.762s zfs-faststat \| 4.933s \| 1.345s	2011-07-11 09:11:22 -07:00
Ned A. Bass	aa6d8c1086	Don't store rdev in SA for FIFOs and sockets Update the handling of named pipes and sockets to be consistent with other platforms with regard to the rdev attribute. While all ZFS ipmlementations store the rdev for device files in a system attribute (SA), this is not the case for FIFOs and sockets. Indeed, Linux always passes rdev=0 to mknod() for FIFOs and sockets, so the value is not needed. Add an ASSERT that rdev==0 for FIFOs and sockets to detect if the expected behavior ever changes. Closes #216	2011-05-09 13:35:07 -07:00
Brian Behlendorf	c85b224faf	Call d_instantiate before unlocking inode Under Linux a dentry referencing an inode must be instantiated before the inode is unlocked. To accomplish this without overly modifing the core ZFS code the dentry it passed via the vattr_t. There are cases such as replay when a dentry is not available. In which case it is obviously not initialized at inode creation time, if a dentry is needed it will be spliced as when required via d_lookup().	2011-04-07 09:51:57 -07:00
Brian Behlendorf	81e97e2187	Linux 2.6.29 compat, credentials As of Linux 2.6.29 a clean credential API was added to the Linux kernel. Previously the credential was embedded in the task_struct. Because the SPL already has considerable support for handling this API change the ZPL code has been updated to use the Solaris credential API.	2011-03-22 12:15:54 -07:00
Brian Behlendorf	53cf50e081	Set stat->st_dev and statfs->f_fsid Filesystems like ZFS must use what the kernel calls an anonymous super block. Basically, this is just a filesystem which is not backed by a single block device. Normally this block device's dev_t is stored in the super block. For anonymous super blocks a unique reserved dev_t is assigned as part of get_sb(). This sb->s_dev must then be set in the returned stat structures as stat->st_dev. This allows userspace utilities to easily detect the boundries of a specific filesystem. Tools such as 'du' depend on this for proper accounting. Additionally, under OpenSolaris the statfs->f_fsid is set to the device id. To preserve consistency with OpenSolaris we also set the fsid to the device id. Other Linux filesystem (ext) set the fsid to a unique value determined by the filesystems uuid. This value is unique but maintains no relationship to the device id. This may be desirable when exporting NFS filesystem because it minimizes to chance of a client observing the same fsid from two different servers. Closes #140	2011-03-07 16:06:22 -08:00
Brian Behlendorf	5484965ab6	Drop HAVE_XVATTR macros When I began work on the Posix layer it immediately became clear to me that to integrate cleanly with the Linux VFS certain Solaris specific things would have to go. One of these things was to elimate as many Solaris specific types from the ZPL layer as possible. They would be replaced with their Linux equivalents. This would not only be good for performance, but for the general readability and health of the code. The Solaris and Linux VFS are different beasts and should be treated as such. Most of the code remains common for constructing transactions and such, but there are subtle and important differenced which need to be repsected. This policy went quite for for certain types such as the vnode_t, and it initially seemed to be working out well for the vattr_t. There was a relatively small amount of related xvattr_t code I was forced to comment out with HAVE_XVATTR. But it didn't look that hard to come back soon and replace it all with a native Linux type. However, after going doing this path with xvattr some distance it clear that this code was woven in the ZPL more deeply than I thought. In particular its hooks went very deep in to the ZPL replay code and replacing it would not be as easy as I originally thought. Rather than continue persuing replacing and removing this code I've taken a step back and reevaluted things. This commit reverts many of my previous commits which removed xvattr related code. It restores much of the code to its original upstream state and now relies on improved xvattr_t support in the zfs package itself. The result of this is that much of the code which I had commented out, which accidentally broke things like replay, is now back in place and working. However, there may be a small performance impact for getattr/setattr operations because they now require a translation from native Linux to Solaris types. For now that's a price I'm willing to pay. Once everything is completely functional we can revisting the issue of removing the vattr_t/xvattr_t types. Closes #111	2011-03-02 11:44:34 -08:00
Brian Behlendorf	5095000169	Use -zfs_readlink() error The zfs_readlink() function returns a Solaris positive error value and that needs to be converted to a Linux negative error value. While in this case nothing would actually go wrong, it's still incorrect and should be fixed if for no other reason than clarity.	2011-02-17 09:48:06 -08:00
Brian Behlendorf	8b4f9a2d55	Fix readlink(2) This patch addresses three issues related to symlinks. 1) Revert the zfs_follow_link() function to a modified version of the original zfs_readlink(). The only changes from the original OpenSolaris version relate to using Linux types. For the moment this means no vnode's and no zfsvfs_t. The caller zpl_follow_link() was also updated accordingly. This change was reverted because it was slightly gratuitious. 2) Update zpl_follow_link() to use local variables for the link buffer. I'd forgotten that iov.iov_base is updated by uiomove() so after the call to zfs_readlink() it can not longer be used. We need our own private copy of the link pointer. 3) Allocate MAXPATHLEN instead of MAXPATHLEN+1. By default MAXPATHLEN is 4096 bytes which is a full page, adding one to it pushes it slightly over a page. That means you'll likely end up allocating 2 pages which is wasteful of memory and possibly slightly slower.	2011-02-16 15:54:55 -08:00
Brian Behlendorf	a6695d83b7	Add get/setattr, get/setxattr hooks While the attr/xattr hooks were already in place for regular files this hooks can also apply to directories and special files. While they aren't typically used in this way, it should be supported. This patch registers these additional callbacks for both directory and special inode types.	2011-02-16 09:55:53 -08:00
Brian Behlendorf	ee154f01bf	Add Hooks for Linux Inode Operations The Linux specific inode operations have all been located in the file zpl_inode.c. These functions primarily rely on the reworked zfs_* functions to do their job. They are also responsible for converting the possible Solaris style error codes to negative Linux errors.	2011-02-10 09:27:21 -08:00

1 2

61 Commits