mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2024-12-27 03:19:35 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	7973e464de	Revert "Revert "Fix unlink/xattr deadlock"" This reverts commit `53c7411919` effectively reinstating the asynchronous xattr cleanup code. These Linux changes were reverted because after testing and careful contemplation I was convinced that due to the 89260a1c8851ce05ea04b23606ba438b271d890 commit they were no longer required. Unfortunately, the deadlock described in #1176 was a case which wasn't considered. At mount zfs_unlinked_drain() can occur which will unlink a list of znodes in effectively a random order which isn't safe. The only reason it was safe to originally revert this change was the we could guarantee that the VFS would always prune the xattr leaves before the parents. Therefore, until we can cleanly resolve this deadlock for all cases we need to keep this change in spite of the xattr unlink performance penalty associated with it. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1176 Issue #457	2013-01-17 11:24:20 -08:00
Brian Behlendorf	53c7411919	Revert "Fix unlink/xattr deadlock" This reverts commit `b00131d43c` which is no longer needed due to `e89260a1c8`. This change forces all xattr znodes to hold a reference on their parent which ensures prune_icache() will never attempt to evict both the parent and child concurrently. This effectively prevents the deadlock condition from ever occuring. Therefore we can safely revert back to the upstream synchronous cleanup code. This is nice because it keeps our code base closer to upstream and resolves the performance issues introduced by the original deadlock fix. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #457	2012-12-05 13:41:30 -08:00
Brian Behlendorf	cd38ac58a3	rmdir(2) should return ENOTEMPTY Under Solaris the behavior for rmdir(2) is to return EEXIST when a directory still contains entries. However, on Linux ENOTEMPTY is the expected return value with EEXIST being technically allowed. According to rmdir(2): ENOTEMPTY pathname contains entries other than . and .. ; or, pathname has .. as its final component. POSIX.1-2001 also allows EEXIST for this condition. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #895	2012-08-26 13:55:45 -07:00
Brian Behlendorf	ebe7e575ea	Add .zfs control directory Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. ) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. ) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. ) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. ) Accessing the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <rohan.puri15@gmail.com> Contributiobs-by: Andrew Barnes <barnes333@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #173	2012-03-22 13:03:47 -07:00
Brian Behlendorf	0c5dde492f	Allow multiple values per directory entry When using zfs to back a Lustre filesystem it's advantageous to to store a fid with the object id in the directory zap. The only technical impediment to doing this is that the zpl code expects a single value in the zap per directory entry. This change relaxes that requirement such that multiple entries are allowed provided the first one is the object id. The zpl code will just ignore additional entries. This allows the ZoL count to mount datasets which are being used as Lustre server backends. Once the upstream feature flags support is merged in this change should be updated to a read-only feature. Until this occurs other zfs implementations will not be able to read the zfs filesystems created by Lustre. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:22:08 -08:00
Brian Behlendorf	2cf7f52bc4	Linux compat 2.6.39: mount_nodev() The .get_sb callback has been replaced by a .mount callback in the file_system_type structure. When using the new interface the caller must now use the mount_nodev() helper. Unfortunately, the new interface no longer passes the vfsmount down to the zfs layers. This poses a problem for the existing implementation because we currently save this pointer in the super block for latter use. It provides our only entry point in to the namespace layer for manipulating certain mount options. This needed to be done originally to allow commands like 'zfs set atime=off tank' to work properly. It also allowed me to keep more of the original Solaris code unmodified. Under Solaris there is a 1-to-1 mapping between a mount point and a file system so this is a fairly natural thing to do. However, under Linux they many be multiple entries in the namespace which reference the same filesystem. Thus keeping a back reference from the filesystem to the namespace is complicated. Rather than introduce some ugly hack to get the vfsmount and continue as before. I'm leveraging this API change to update the ZFS code to do things in a more natural way for Linux. This has the upside that is resolves the compatibility issue for the long term and fixes several other minor bugs which have been reported. This commit updates the code to remove this vfsmount back reference entirely. All modifications to filesystem mount options are now passed in to the kernel via a '-o remount'. This is the expected Linux mechanism and allows the namespace to properly handle any options which apply to it before passing them on to the file system itself. Aside from fixing the compatibility issue, removing the vfsmount has had the benefit of simplifying the code. This change which fairly involved has turned out nicely. Closes #246 Closes #217 Closes #187 Closes #248 Closes #231	2011-07-01 13:36:39 -07:00
Gunnar Beutner	b00131d43c	Fix unlink/xattr deadlock The problem here is that prune_icache() tries to evict/delete both the xattr directory inode as well as at least one xattr inode contained in that directory. Here's what happens: 1. File is created. 2. xattr is created for that file (behind the scenes a xattr directory and a file in that xattr directory are created) 3. File is deleted. 4. Both the xattr directory inode and at least one xattr inode from that directory are evicted by prune_icache(); prune_icache() acquires a lock on both inodes before it calls ->evict() on the inodes When the xattr directory inode is evicted zfs_zinactive attempts to delete the xattr files contained in that directory. While enumerating these files zfs_zget() is called to obtain a reference to the xattr file znode - which tries to lock the xattr inode. However that very same xattr inode was already locked by prune_icache() further up the call stack, thus leading to a deadlock. This can be reliably reproduced like this: $ touch test $ attr -s a -V b test $ rm test $ echo 3 > /proc/sys/vm/drop_caches This patch fixes the deadlock by moving the zfs_purgedir() call to zfs_unlinked_drain(). Instead zfs_rmnode() now checks whether the xattr dir is empty and leaves the xattr dir in the unlinked set if it finds any xattrs. To ensure zfs_unlinked_drain() never accesses a stale super block zfsvfs_teardown() has been update to block until the iput taskq has been drained. This avoids a potential race where a file with an xattr directory is removed and the file system is immediately unmounted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #266	2011-06-20 13:47:03 -07:00
Gunnar Beutner	6f0cf71e0d	Removed erroneous zfs_inode_destroy() calls from zfs_rmnode(). iput_final() already calls zpl_inode_destroy() -> zfs_inode_destroy() for us after zfs_zinactive(), thus making sure that the inode is properly cleaned up. The zfs_inode_destroy() calls in zfs_rmnode() would lead to a double-free. Fixes #282	2011-06-20 10:30:17 -07:00
Gunnar Beutner	274b7e79f3	Added missing initialization for va.va_dentry in zfs_get_xattrdir. Without this we may mistakenly believe we have a dentry and try to d_instantiate() it. This will result in the following BUG. It's important to note that while the xattr directory has an inode assoicated with it we never create a dentry for it. kernel BUG at fs/dcache.c:1418! Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #202	2011-04-19 13:57:41 -07:00
Brian Behlendorf	c85b224faf	Call d_instantiate before unlocking inode Under Linux a dentry referencing an inode must be instantiated before the inode is unlocked. To accomplish this without overly modifing the core ZFS code the dentry it passed via the vattr_t. There are cases such as replay when a dentry is not available. In which case it is obviously not initialized at inode creation time, if a dentry is needed it will be spliced as when required via d_lookup().	2011-04-07 09:51:57 -07:00
Brian Behlendorf	3558fd73b5	Prototype/structure update for Linux I appologize in advance why to many things ended up in this commit. When it could be seperated in to a whole series of commits teasing that all apart now would take considerable time and I'm not sure there's much merrit in it. As such I'll just summerize the intent of the changes which are all (or partly) in this commit. Broadly the intent is to remove as much Solaris specific code as possible and replace it with native Linux equivilants. More specifically: 1) Replace all instances of zfsvfs_t with zfs_sb_t. While the type is largely the same calling it private super block data rather than a zfsvfs is more consistent with how Linux names this. While non critical it makes the code easier to read when your thinking in Linux friendly VFS terms. 2) Replace vnode_t with struct inode. The Linux VFS doesn't have the notion of a vnode and there's absolutely no good reason to create one. There are in fact several good reasons to remove it. It just adds overhead on Linux if we were to manage one, it conplicates the code, and it likely will lead to bugs so there's a good change it will be out of date. The code has been updated to remove all need for this type. 3) Replace all vtype_t's with umode types. Along with this shift all uses of types to mode bits. The Solaris code would pass a vtype which is redundant with the Linux mode. Just update all the code to use the Linux mode macros and remove this redundancy. 4) Remove using of vn_* helpers and replace where needed with inode helpers. The big example here is creating iput_aync to replace vn_rele_async. Other vn helpers will be addressed as needed but they should be be emulated. They are a Solaris VFS'ism and should simply be replaced with Linux equivilants. 5) Update znode alloc/free code. Under Linux it's common to embed the inode specific data with the inode itself. This removes the need for an extra memory allocation. In zfs this information is called a znode and it now embeds the inode with it. Allocators have been updated accordingly. 6) Minimal integration with the vfs flags for setting up the super block and handling mount options has been added this code will need to be refined but functionally it's all there. This will be the first and last of these to large to review commits.	2011-02-10 09:27:21 -08:00
Brian Behlendorf	bcf308227c	Remove zfs_ctldir.[ch] This code is used for snapshot and heavily leverages Solaris functionality we do not want to reimplement. These files have been removed, including references to them, and will be replaced by a zfs_snap.c/zpl_snap.c implementation which handles snapshots.	2011-02-10 09:27:21 -08:00
Brian Behlendorf	149e873ab1	Fix minor compiler warnings These compiler warnings were introduced when code which was previously #ifdef'ed out by HAVE_ZPL was re-added for use by the posix layer. All of the following changes should be obviously correct and will cause no semantic changes.	2011-01-06 15:04:28 -08:00
Brian Behlendorf	60101509ee	Add linux kernel disk support Native Linux vdev disk interfaces Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-08-31 13:41:57 -07:00
Brian Behlendorf	572e285762	Update to onnv_147 This is the last official OpenSolaris tag before the public development tree was closed.	2010-08-26 14:24:34 -07:00
Brian Behlendorf	428870ff73	Update core ZFS code from build 121 to build 141.	2010-05-28 13:45:14 -07:00
Brian Behlendorf	9babb37438	Rebase master to b117	2009-07-02 15:44:48 -07:00
Brian Behlendorf	fb5f0bc833	Rebase master to b105	2009-01-15 13:59:39 -08:00
Brian Behlendorf	172bb4bd5e	Move the world out of /zfs/ and seperate out module build tree	2008-12-11 11:08:09 -08:00

19 Commits