mirror_zfs

mirror of https://git.proxmox.com/git/mirror_zfs.git synced 2026-05-22 10:37:35 +03:00

Author	SHA1	Message	Date
Brian Behlendorf	acb5376940	Disable Shutdown/Reboot This support has been disable with HAVE_SHUTDOWN. We can support this at some point by adding the needed reboot notifiers.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	cb28b3494e	Remove SYNC_ATTR check This flag does not need to be support under Linux. As the comment says it was only there to support fsflush() for old filesystem like UFS. This is not needed under Linux.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	e15c023014	Remove mount options Mount option parsing is still very Linux specific and will be handled above this zfs filesystem layer. Honoring those mount options once set if of course the responsibility of the lower layers.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	d7cafa8e3e	Remove zfs_active_fs_count This variable was used to ensure that the ZFS module is never removed while the filesystem is mounted. Once again the generic Linux VFS handles this case for us so it can be removed.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	42ab36aa36	Remove unused mount functions The functions zfs_mount_label_policy(), zfs_mountroot(), zfs_mount() will not be needed because most of what they do is already handled by the generic Linux VFS layer. They all call zfs_domount() which creates the actual dataset, the caller of this library call which will be in the zpl layer is responsible for what's left.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	c0b3dc7d07	Remove zfs_major/zfs_minor/zfsfstype Under Linux we don't need to reserve a major or minor number for the filesystem. We can rely on the VFS to handle colisions without this being handled by the lower ZFS layers. Additionally, there is no need to keep a zfsfstype around. We are not limited on Linux by the OpenSolaris infrastructure which needed this. The upper zpl layer can specify the filesystem type.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	4b3f12ecd5	Remove Solaris VFS Hooks The ZFS code is being restructured to act as a library and a stand alone module. This allows us to leverage most of the existing code with minimal modification. It also means we need to drop the Solaris vfs/vnode functions they will be replaced by Linux equivilants and updated to be Linux friendly.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	960e08fe3e	VFS: Add zfs_inode_update() helper For the moment we have left ZFS unchanged and it updates many values as part of the znode. However, some of these values should be set in the inode. For the moment this is handled by adding a function called zfs_inode_update() which updates the inode based on the znode. This is considered a workaround until we can systematically go through the ZFS code and have it directly update the inode. At which point zfs_update_inode() can be dropped entirely. Keeping two copies of the same data isn't only inefficient it's a breeding ground for bugs.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	7304b6e50f	VFS: Integrate zfs_znode_alloc() Under Linux the convention for filesystem specific data structure is to embed it along with the generic vfs data structure. This differs significantly from Solaris. Since we want to integrates as cleanly with the Linux VFS as possible. This changes modifies zfs_znode_alloc() to allocate a znode with an embedded inode for use with the generic VFS. This is done by calling iget_locked() which will allocate a new inode if needed by calling sb->alloc_inode(). This function allocates enough memory for a znode_t by returns a pointer to the inode structure for Linux's VFS. This function is also responsible for setting the callback znode->z_set_ops_inodes() which is used to register the correct handlers for the inode.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	10c6047ea5	Enable zfs_znode compilation Basic compilation of the bulk of zfs_znode.c has been enabled. After much consideration it was decided to convert the existing vnode based interfaces to more friendly Linux interfaces. The following commits will systematically replace update the requiter interfaces. There are of course pros and cons to this decision. Pros: * This simplifies intergration with Linux in the long term. There is no longer any need to manage vnodes which are a foreign concept to the Linux VFS. * Improved long term maintainability. * Minor performance improvements by removing vnode overhead. Cons: * Added work in the short term to modify multiple ZFS interfaces. * Harder to pull in changes if we ever see any new code from Solaris. * Mixed Solaris and Linux interfaces in some ZFS code.	2011-02-10 09:27:20 -08:00
Brian Behlendorf	a405c8a665	ACL related changes A small collection of ACL related changes related to not supporting fuid mapping. This whole are will need to be closely investigated.	2011-02-10 09:26:26 -08:00
Brian Behlendorf	3fc050aaf2	Init/destroy tsd Add missing tsd_destroy() call for rrw_tsd_key to avoid a leak.	2011-02-10 09:25:38 -08:00
Brian Behlendorf	ab892c5f0a	Replace VOP_* calls with direct zfs_* calls These generic Solaris wrappers are no longer required. Simply directly call the correct zfs functions for clarity.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	590329b50c	Add basic uio support This code originates in OpenSolaris and was modified by KQ Infotech to be compatible with Linux. While supporting uios in the short term is useful to get something working this is not an abstraction we want to keep. This code is expected to be short lived and removed as soon as all the remaining uio based APIs and updated.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	538f669f63	Add trivial acl helpers The zfs acl code makes use of the two OpenSolaris helper functions acl_trivial_access_masks() and ace_trivial_common(). Since they are only called from zfs_acl.c I've brought them over from OpenSolaris and added them as static function to this file. This way I don't need to reimplement this functionality from scratch in the SPL. Long term once I take a more careful look at the acl implementation it may be the case that these functions really aren't needed. If that turns out to be the case they can then be removed.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	c60bc1fbf0	Remove dead ACL code The following code was unused which caused gcc to complain. Since it was deadcode it has simply been removed.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	4e1b54fdde	Remove zfs_parse_bootfs() support Remove unneeded bootfs functions. This support shouldn't be required for the Linux port, and even if it is it would need to be reworked to integrate cleanly with Linux.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	9ee7fac531	VFS: Wrap with HAVE_SHARE Certain NFS/SMB share functionality is not yet in place. These functions used to be wrapped with the generic HAVE_ZPL to prevent them from being compiled. I still don't want them compiled but I'm working toward eliminating the use of HAVE_ZPL. So I'm just renaming the wrapper here to HAVE_SHARE. They still won't be compiled until all the share issues are worked through. Share support is the last missing piece from zfs_ioctl.c.	2011-02-10 09:21:43 -08:00
Brian Behlendorf	bc3e15e386	Wrap with HAVE_MLSLABEL The zfs_check_global_label() function is part of the HAVE_MLSLABEL support which was previously commented out by a HAVE_ZPL check. Since we're still deciding what to do about mls labels wrap it with the preexisting macro to keep it compiled out.	2011-02-10 09:21:42 -08:00
Brian Behlendorf	5649246dd3	Remove znode move functionality Unlike Solaris the Linux implementation embeds the inode in the znode, and has no use for a vnode. So while it's true that fragmention of the znode cache may occur it should not be worse than any of the other Linux FS inode caches. Until proven that this is a problem it's just added complexity we don't need.	2011-02-10 09:21:42 -08:00
Brian Behlendorf	f30484afc3	Conserve stack in zfs_mkdir() Move the sa_attrs array from the stack to the heap to minimize stack space usage.	2011-02-10 09:21:42 -08:00
Brian Behlendorf	1ee1b76786	Conserve stack in zfs_sa_upgrade() As always under Linux stack space is at a premium. Relocate two 20 element sa_bulk_attr_t arrays in zfs_sa_upgrade() from the stack to the heap.	2011-02-10 09:21:42 -08:00
Brian Behlendorf	e5c39b95a7	Export required vfs/vn symbols	2011-02-10 09:21:42 -08:00
Brian Behlendorf	72d5e2da3e	Add HAVE_SCANSTAMP This functionality is not supported under Linux, perhaps it will be some day if it's decided it's useful.	2011-02-10 09:20:33 -08:00
Brian Behlendorf	872e8d2697	Add initial rw_uio functions to the dmu These functions were dropped originally because I felt they would need to be rewritten anyway to avoid using uios. However, this patch readds then with they dea they can just be reworked and the uio bits dropped.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	9a616b5d17	Documentation updates Minor Linux specific documentation updates to the comments and man pages.	2011-02-04 16:14:34 -08:00
Brian Behlendorf	95c73795b0	Fix ZVOL rename minor devices During a rename we need to be careful to destroy and create a new minor for the ZVOL _only_ if the rename succeeded. The previous code would both destroy you minor device unconditionally, it would also fail to create the new minor device on success.	2011-01-07 12:26:02 -08:00
Brian Behlendorf	149e873ab1	Fix minor compiler warnings These compiler warnings were introduced when code which was previously #ifdef'ed out by HAVE_ZPL was re-added for use by the posix layer. All of the following changes should be obviously correct and will cause no semantic changes.	2011-01-06 15:04:28 -08:00
Brian Behlendorf	5b63b3eb6f	Use cv_timedwait_interruptible in arc The issue is that cv_timedwait() sleeps uninterruptibly to block signals and avoid waking up early. Under Linux this counts against the load average keeping it artificially high. This change allows the arc to sleep interruptibly which mean it may be woken up early due to a signal. Normally this means some extra care must be taken to handle a potential signal. But for the arcs usage of cv_timedwait() there is no harm in waking up before the timeout expires so no extra handling is required.	2010-12-14 10:06:44 -08:00
Brian Behlendorf	a7dc7e5d5a	Enable rrwlock.c compilation With the addition of the thread specific data interfaces to the SPL it is safe to enable compilation of the re-enterant read reader/writer locks.	2010-12-07 16:05:25 -08:00
Ned Bass	e06be58641	Fix for access beyond end of device error This commit fixes a sign extension bug affecting l2arc devices. Extremely large offsets may be passed down to the low level block device driver on reads, generating errors similar to attempt to access beyond end of device sdbi1: rw=14, want=36028797014862705, limit=125026959 The unwanted sign extension occurrs because the function arc_read_nolock() stores the offset as a daddr_t, a 32-bit signed int type in the Linux kernel. This offset is then passed to zio_read_phys() as a uint64_t argument, causing sign extension for values of 0x80000000 or greater. To avoid this, we store the offset in a uint64_t. This change also changes a few daddr_t struct members to uint64_t in the libspl headers to avoid similar bugs cropping up in the future. We also add an ASSERT to __vdev_disk_physio() to check for invalid offsets. Closes #66 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-10 21:29:07 -08:00
Brian Behlendorf	1f30b9d432	Linux 2.6.36 compat, use fops->unlocked_ioctl() As of linux-2.6.36 the last in-tree consumer of fops->ioctl() has been removed and thus fops()->ioctl() has also been removed. The replacement hook is fops->unlocked_ioctl() which has existed in kernel since 2.6.12. Since the ZFS code only contains support back to 2.6.18 vintage kernels, I'm not adding an autoconf check for this and simply moving everything to use fops->unlocked_ioctl().	2010-11-10 17:01:08 -08:00
Brian Behlendorf	675de5aa37	Linux 2.6.36 compat, synchronous bio flag The name of the flag used to mark a bio as synchronous has changed again in the 2.6.36 kernel due to the unification of the BIO_RW_* and REQ_* flags. The new flag is called REQ_SYNC. To simplify checking this flag I have introduced the vdev_disk_dio_is_sync() helper function. Based on the results of several new autoconf tests it uses the correct mask to check for a synchronous bio. Preferred interface for flagging a synchronous bio: 2.6.12-2.6.29: BIO_RW_SYNC 2.6.30-2.6.35: BIO_RW_SYNCIO 2.6.36-2.6.xx: REQ_SYNC	2010-11-10 17:00:33 -08:00
Ned Bass	b04cffc9b0	Remove inconsistent use of EOPNOTSUPP Commit `3ee56c292b` changed an ENOTSUP return value in one location to ENOTSUPP to fix user programs seeing an invalid ioctl() error code. However, use of ENOTSUP is widespread in the zfs module. Instead of changing all of those uses, we fixed the ENOTSUP definition in the SPL to be consistent with user space. The changed return value in the above commit is therefore no longer needed, so this commit reverses it to maintain consistency. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-10 13:26:56 -08:00
Ned Bass	3ee56c292b	Make rollbacks fail gracefully Support for rolling back datasets require a functional ZPL, which we currently do not have. The zfs command does not check for ZPL support before attempting a rollback, and in preparation for rolling back a zvol it removes the minor node of the device. To prevent the zvol device node from disappearing after a failed rollback operation, this change wraps the zfs_do_rollback() function in an #ifdef HAVE_ZPL and returns ENOSYS in the absence of a ZPL. This is consistent with the behavior of other ZPL dependent commands such as mount. The orginal error message observed with this bug was rather confusing: internal error: Unknown error 524 Aborted This was because zfs_ioc_rollback() returns ENOTSUP if we don't HAVE_ZPL, but Linux actually has no such error code. It should instead return EOPNOTSUPP, as that is how ENOTSUP is defined in user space. With that we would have gotten the somewhat more helpful message cannot rollback 'tank/fish': unsupported version This is rather a moot point with the above changes since we will no longer make that ioctl call without a ZPL. But, this change updates the error code just in case. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-11-08 14:03:36 -08:00
Brian Behlendorf	7e55f4e00c	Increate zio write interrupt thread count. Increasing the default zio_wr_int thread count from 8 to 16 improves write performence by 13% on large systems. More testing need to be done but I suspect the ideal tuning here is ZTI_BATCH() with a minimum of 8 threads.	2010-11-08 14:03:35 -08:00
Brian Behlendorf	451041db53	Shorten zio_* thread names Linux kernel thread names are expected to be short. This change shortens the zio thread names to 10 characters leaving a few chracters to append the /<cpuid> to which the thread is bound. For example: z_wr_iss/0.	2010-11-08 14:03:35 -08:00
Ned Bass	b1c5821375	Fix panic mounting unformatted zvol On some older kernels, i.e. 2.6.18, zvol_ioctl_by_inode() may get passed a NULL file pointer if the user tries to mount a zvol without a filesystem on it. This change adds checks to prevent a null pointer dereference. Closes #73. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-29 14:46:33 -07:00
Brian Behlendorf	baa40d45cb	Fix missing 'zpool events' It turns out that 'zpool events' over 1024 bytes in size where being silently dropped. This was discovered while writing the zfault.sh tests to validate common failure modes. This could occur because the zfs interface for passing an arbitrary size nvlist_t over an ioctl() is to provide a buffer for the packed nvlist which is usually big enough. In this case 1024 byte is the default. If the kernel determines the buffer is to small it returns ENOMEM and the minimum required size of the nvlist_t. This was working properly but in the case of 'zpool events' the event stream was advanced dispite the error. Thus the retry with the bigger buffer would succeed but it would skip over the previous event. The fix is to pass this size to zfs_zevent_next() and determine before removing the event from the list if it will fit. This was preferable to checking after the event was returned because this avoids the need to rewind the stream.	2010-10-12 14:55:03 -07:00
Brian Behlendorf	a69052be7f	Initial zio delay timing While there is no right maximum timeout for a disk IO we can start laying the ground work to measure how long they do take in practice. This change simply measures the IO time and if it exceeds 30s an event is posted for 'zpool events'. This value was carefully selected because for sd devices it implies that at least one timeout (SD_TIMEOUT) has occured. Unfortunately, even with FAILFAST set we may retry and request and not get an error. This behavior is strongly dependant on the device driver and how it is hooked in to the scsi error handling stack. However by setting the limit at 30s we can log the event even if no error was returned. Slightly longer term we can start recording these delays perhaps as a simple power-of-two histrogram. This histogram can then be reported as part of the 'zpool status' command when given an command line option. None of this code changes the internal behavior of ZFS. Currently it is simply for reporting excessively long delays.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	2959d94a0a	Add FAILFAST support ZFS works best when it is notified as soon as possible when a device failure occurs. This allows it to immediately start any recovery actions which may be needed. In theory Linux supports a flag which can be set on bio's called FAILFAST which provides this quick notification by disabling the retry logic in the lower scsi layers. That's the theory at least. In practice is turns out that while the flag exists you oddly have to set it with the BIO_RW_AHEAD flag. And even when it's set it you may get retries in the low level drivers decides that's the right behavior, or if you don't get the right error codes reported to the scsi midlayer. Unfortunately, without additional kernels patchs there's not much which can be done to improve this. Basically, this just means that it may take 2-3 minutes before a ZFS is notified properly that a device has failed. This can be improved and I suspect I'll be submitting patches upstream to handle this.	2010-10-12 14:55:02 -07:00
Brian Behlendorf	312c07edfd	Generate zevents for speculative and soft errors By default the Solaris code does not log speculative or soft io errors in either 'zpool status' or post an event. Under Linux we don't want to change the expected behavior of 'zpool status' so these io errors are still suppressed there. However, since we do need to know about these events for Linux FMA and the 'zpool events' interface is new we do post the events. With the addition of the zio_flags field the posted events now contain enough information that a user space consumer can identify and discard these events if it sees fit.	2010-10-12 14:55:00 -07:00
Brian Behlendorf	d148e95156	Fix negative zio->io_error which must be positive. All the upper layers of zfs expect zio->io_error to be positive. I was careful but I missed one instance in vdev_disk_physio_completion() which could return a negative error. To ensure all cases are always caught I had additionally added an ASSERT() to check this before zio_interpret(). Finally, as a debugging aid when zfs is build with --enable-debug all errors from the backing block devices will be reported to the console with an error message like this: ZFS: zio error=5 type=1 offset=4217856 size=8192 flags=60440	2010-10-12 14:55:00 -07:00
Brian Behlendorf	398f129ca3	Suppress large kmem_alloc() warning. Observed during failure mode testing, dsl_scan_setup_sync() allocates 73920 bytes. This is way over the limit of what is wise to do with a kmem_alloc() and it should probably be moved to a slab. For now I'm just flagging it with KM_NODEBUG to quiet the error until this can be revisited.	2010-10-12 14:54:59 -07:00
Ned Bass	3a7381e531	Use stored whole_disk property when opening a vdev This commit fixes a bug in vdev_disk_open() in which the whole_disk property was getting set to 0 for disk devices, even when it was stored as a 1 when the zpool was created. The whole_disk property lets us detect when the partition suffix should be stripped from the device name in CLI output. It is also used to determine how writeback cache should be set for a device. When an existing zpool is imported its configuration is read from the vdev label by user space in zpool_read_label(). The whole_disk property is saved in the nvlist which gets passed into the kernel, where it in turn gets saved in the vdev struct in vdev_alloc(). Therefore, this value is available in vdev_disk_open() and should not be overridden by checking the provided device path, since that path will likely point to a partition and the check will return the wrong result. We also add an ASSERT that the whole_disk property is set. We are not aware of any cases where vdev_disk_open() should be called with a config that doesn't have this property set. The ASSERT is there so that when debugging is enabled we can identify any legitimate cases that we are missing. If we never hit the ASSERT, we can at some point remove it along with the conditional whole_disk check. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-04 13:53:18 -07:00
Ricardo M. Correia	0151834d65	Register the space accounting callback even when we don't have the ZPL. This callback is needed for properly accounting the per-uid and per-gid space usage. Even if we don't have the ZPL, we still need this callback in order to have proper on-disk ZPL compatibility and to be able to use Lustre quotas. Fortunately, the callback doesn't have any ZPL/VFS dependencies so we can just move it out of #ifdef HAVE_ZPL. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-10-04 11:34:39 -07:00
Ricardo M. Correia	368f4c10ae	Export ZFS symbols needed by Lustre. Required for the DB_DNODE_ENTER()/DB_DNODE_EXIT() helpers. Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-09-17 16:24:15 -07:00
Ricardo M. Correia	1e411a4c12	Quiet down very frequent large allocation warning in ZFS. In my machine, dnode_hold_impl() allocates 9992 bytes in DEBUG mode and it causes a large stream of stack traces in the logs. Instead, use KM_NODEBUG to quiet down this known large alloc. Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2010-09-17 16:24:15 -07:00
Brian Behlendorf	6283f55ea1	Support custom build directories and move includes One of the neat tricks an autoconf style project is capable of is allow configurion/building in a directory other than the source directory. The major advantage to this is that you can build the project various different ways while making changes in a single source tree. For example, this project is designed to work on various different Linux distributions each of which work slightly differently. This means that changes need to verified on each of those supported distributions perferably before the change is committed to the public git repo. Using nfs and custom build directories makes this much easier. I now have a single source tree in nfs mounted on several different systems each running a supported distribution. When I make a change to the source base I suspect may break things I can concurrently build from the same source on all the systems each in their own subdirectory. wget -c http://github.com/downloads/behlendorf/zfs/zfs-x.y.z.tar.gz tar -xzf zfs-x.y.z.tar.gz cd zfs-x-y-z ------------------------- run concurrently ---------------------- <ubuntu system> <fedora system> <debian system> <rhel6 system> mkdir ubuntu mkdir fedora mkdir debian mkdir rhel6 cd ubuntu cd fedora cd debian cd rhel6 ../configure ../configure ../configure ../configure make make make make make check make check make check make check This change also moves many of the include headers from individual incude/sys directories under the modules directory in to a single top level include directory. This has the advantage of making the build rules cleaner and logically it makes a bit more sense.	2010-09-08 12:38:56 -07:00
Brian Behlendorf	f5e79474f0	Fix zfsdev_compat_ioctl() case For the !CONFIG_COMPAT case fix the zfsdev_compat_ioctl() compatibility function name. This was caught by the chaos4.3 builder.	2010-09-01 16:00:15 -07:00

1 2 3

115 Commits