The Linux specific xattr operations have all been located in the
file zpl_xattr.c. These functions primarily rely on the reworked
zfs_* functions to do their job. They are also responsible for
converting the possible Solaris style error codes to negative
Linux errors.
The Linux specific super block operations have all been located in the
file zpl_super.c. These functions primarily rely on the reworked
zfs_* functions to do their job. They are also responsible for
converting the possible Solaris style error codes to negative
Linux errors.
The Linux specific inode operations have all been located in the
file zpl_inode.c. These functions primarily rely on the reworked
zfs_* functions to do their job. They are also responsible for
converting the possible Solaris style error codes to negative
Linux errors.
The Linux specific file operations have all been located in the
file zpl_file.c. These functions primarily rely on the reworked
zfs_* functions to do their job. They are also responsible for
converting the possible Solaris style error codes to negative
Linux errors.
This first zpl_* commit also includes a common zpl.h header with
minimal entries to register the Linux specific hooks. In also
adds all the new zpl_* file to the Makefile.in. This is not a
standalone commit, you required the following zpl_* commits.
For the moment exactly how to handle xvattr is not clear. This
change largely consists of the code to comment out the offending
bits until something reasonable can be done.
A new flag is required for the zfs_rlock code to determine if
it is operation of the zvol of zpl dataset. This used to be
keyed off the zp->z_vnode, which was a hack to begin with, but
with the removal of vnodes we needed a dedicated flag.
I appologize in advance why to many things ended up in this commit.
When it could be seperated in to a whole series of commits teasing
that all apart now would take considerable time and I'm not sure
there's much merrit in it. As such I'll just summerize the intent
of the changes which are all (or partly) in this commit. Broadly
the intent is to remove as much Solaris specific code as possible
and replace it with native Linux equivilants. More specifically:
1) Replace all instances of zfsvfs_t with zfs_sb_t. While the
type is largely the same calling it private super block data
rather than a zfsvfs is more consistent with how Linux names
this. While non critical it makes the code easier to read when
your thinking in Linux friendly VFS terms.
2) Replace vnode_t with struct inode. The Linux VFS doesn't have
the notion of a vnode and there's absolutely no good reason to
create one. There are in fact several good reasons to remove it.
It just adds overhead on Linux if we were to manage one, it
conplicates the code, and it likely will lead to bugs so there's
a good change it will be out of date. The code has been updated
to remove all need for this type.
3) Replace all vtype_t's with umode types. Along with this shift
all uses of types to mode bits. The Solaris code would pass a
vtype which is redundant with the Linux mode. Just update all the
code to use the Linux mode macros and remove this redundancy.
4) Remove using of vn_* helpers and replace where needed with
inode helpers. The big example here is creating iput_aync to
replace vn_rele_async. Other vn helpers will be addressed as
needed but they should be be emulated. They are a Solaris VFS'ism
and should simply be replaced with Linux equivilants.
5) Update znode alloc/free code. Under Linux it's common to
embed the inode specific data with the inode itself. This removes
the need for an extra memory allocation. In zfs this information
is called a znode and it now embeds the inode with it. Allocators
have been updated accordingly.
6) Minimal integration with the vfs flags for setting up the
super block and handling mount options has been added this
code will need to be refined but functionally it's all there.
This will be the first and last of these to large to review commits.
For the moment we do not use dmu_write_pages() to write pages
directly in to a dmu object. It may be required at some point
in the future, but for now is simplest and cleanest to drop it.
It can be easily readded if/when needed.
For portability reasons it's handy to be able to create a root
znode and basic filesystem components without requiring the full
cooperation of the VFS. We are committing to this to simply the
filesystem creations code.
This code is used for snapshot and heavily leverages Solaris
functionality we do not want to reimplement. These files have
been removed, including references to them, and will be replaced
by a zfs_snap.c/zpl_snap.c implementation which handles snapshots.
Minor update to ensure zfs_sync() is disabled if a kernel oops/panic
is triggered. As the comment says 'data integrity is job one'. This
change could have been done by defining panicstr to oops_in_progress
in the SPL. But I felt it was better to use the native Linux API
here since to be clear.
This flag does not need to be support under Linux. As the comment
says it was only there to support fsflush() for old filesystem like
UFS. This is not needed under Linux.
Mount option parsing is still very Linux specific and will be
handled above this zfs filesystem layer. Honoring those mount
options once set if of course the responsibility of the lower
layers.
This variable was used to ensure that the ZFS module is never
removed while the filesystem is mounted. Once again the generic
Linux VFS handles this case for us so it can be removed.
The functions zfs_mount_label_policy(), zfs_mountroot(), zfs_mount()
will not be needed because most of what they do is already handled
by the generic Linux VFS layer. They all call zfs_domount() which
creates the actual dataset, the caller of this library call which
will be in the zpl layer is responsible for what's left.
Under Linux we don't need to reserve a major or minor number for
the filesystem. We can rely on the VFS to handle colisions without
this being handled by the lower ZFS layers.
Additionally, there is no need to keep a zfsfstype around. We are
not limited on Linux by the OpenSolaris infrastructure which needed
this. The upper zpl layer can specify the filesystem type.
The ZFS code is being restructured to act as a library and a stand
alone module. This allows us to leverage most of the existing code
with minimal modification. It also means we need to drop the Solaris
vfs/vnode functions they will be replaced by Linux equivilants and
updated to be Linux friendly.
For the moment we have left ZFS unchanged and it updates many values
as part of the znode. However, some of these values should be set
in the inode. For the moment this is handled by adding a function
called zfs_inode_update() which updates the inode based on the znode.
This is considered a workaround until we can systematically go
through the ZFS code and have it directly update the inode. At
which point zfs_update_inode() can be dropped entirely. Keeping
two copies of the same data isn't only inefficient it's a breeding
ground for bugs.
Under Linux the convention for filesystem specific data structure is
to embed it along with the generic vfs data structure. This differs
significantly from Solaris.
Since we want to integrates as cleanly with the Linux VFS as possible.
This changes modifies zfs_znode_alloc() to allocate a znode with an
embedded inode for use with the generic VFS. This is done by calling
iget_locked() which will allocate a new inode if needed by calling
sb->alloc_inode(). This function allocates enough memory for a
znode_t by returns a pointer to the inode structure for Linux's VFS.
This function is also responsible for setting the callback
znode->z_set_ops_inodes() which is used to register the correct
handlers for the inode.
Basic compilation of the bulk of zfs_znode.c has been enabled. After
much consideration it was decided to convert the existing vnode based
interfaces to more friendly Linux interfaces. The following commits
will systematically replace update the requiter interfaces. There
are of course pros and cons to this decision.
Pros:
* This simplifies intergration with Linux in the long term. There is
no longer any need to manage vnodes which are a foreign concept to
the Linux VFS.
* Improved long term maintainability.
* Minor performance improvements by removing vnode overhead.
Cons:
* Added work in the short term to modify multiple ZFS interfaces.
* Harder to pull in changes if we ever see any new code from Solaris.
* Mixed Solaris and Linux interfaces in some ZFS code.
Lay the initial ground work for a include/linux/ compatibility
directory. This was less critical in the past because the bulk
of the ZFS code consumes the Solaris API via the SPL. This API
was stable and the bulk Linux API differences were handled in
the SPL.
However, with the addition of a full Posix layer written directly
against the Linux APIs we are going to need more compatibility
code. It makes sense that all this code should be cleanly located
in one place. Subsequent patches should move the existing zvol
and vdev_disk compatibility code in to this directory.
This code originates in OpenSolaris and was modified by KQ Infotech
to be compatible with Linux. While supporting uios in the short
term is useful to get something working this is not an abstraction
we want to keep. This code is expected to be short lived and
removed as soon as all the remaining uio based APIs and updated.
The zfs acl code makes use of the two OpenSolaris helper functions
acl_trivial_access_masks() and ace_trivial_common(). Since they are
only called from zfs_acl.c I've brought them over from OpenSolaris
and added them as static function to this file. This way I don't
need to reimplement this functionality from scratch in the SPL.
Long term once I take a more careful look at the acl implementation
it may be the case that these functions really aren't needed. If
that turns out to be the case they can then be removed.
Remove unneeded bootfs functions. This support shouldn't be required
for the Linux port, and even if it is it would need to be reworked
to integrate cleanly with Linux.
Certain NFS/SMB share functionality is not yet in place. These
functions used to be wrapped with the generic HAVE_ZPL to prevent
them from being compiled. I still don't want them compiled but
I'm working toward eliminating the use of HAVE_ZPL. So I'm just
renaming the wrapper here to HAVE_SHARE. They still won't be
compiled until all the share issues are worked through. Share
support is the last missing piece from zfs_ioctl.c.
The zfs_check_global_label() function is part of the HAVE_MLSLABEL
support which was previously commented out by a HAVE_ZPL check.
Since we're still deciding what to do about mls labels wrap it
with the preexisting macro to keep it compiled out.
Unlike Solaris the Linux implementation embeds the inode in the
znode, and has no use for a vnode. So while it's true that fragmention
of the znode cache may occur it should not be worse than any of the
other Linux FS inode caches. Until proven that this is a problem it's
just added complexity we don't need.
These functions were dropped originally because I felt they would
need to be rewritten anyway to avoid using uios. However, this
patch readds then with they dea they can just be reworked and
the uio bits dropped.
ZFS even under Solaris does not strictly require libshare to be
available. The current implementation attempts to dlopen() the
library to access the needed symbols. If this fails libshare
support is simply disabled.
This means that on Linux we only need the most minimal libshare
implementation. In fact just enough to prevent the build from
failing. Longer term we can decide if we want to implement a
libshare library like Solaris. At best this would be an abstraction
layer between ZFS and NFS/SMB. Alternately, we can drop libshare
entirely and directly integrate ZFS with Linux's NFS/SMB.
Finally the bare bones user-libshare.m4 test was dropped. If we
do decide to implement libshare at some point it will surely be
as part of this package so the check is not needed.
By design the zfs utility is supposed to handle mounting and unmounting
a zfs filesystem. We could allow zfs to do this directly. There are
system calls available to mount/umount a filesystem. And there are
library calls available to manipulate /etc/mtab. But there are a
couple very good reasons not to take this appraoch... for now.
Instead of directly calling the system and library calls to (u)mount
the filesystem we fork and exec a (u)mount process. The principle
reason for this is to delegate the responsibility for locking and
updating /etc/mtab to (u)mount(8). This ensures maximum portability
and ensures the right locking scheme for your version of (u)mount
will be used. If we didn't do this we would have to resort to an
autoconf test to determine what locking mechanism is used.
The downside to using mount(8) instead of mount(2) is that we lose
the exact errno which was returned by the kernel. The return code
from mount(8) provides some insight in to what went wrong but it
not quite as good. For the moment this is translated as a best
guess in to a errno for the higher layers of zfs.
In the long term a shared library called libmount is under development
which provides a common API to address the locking and errno issues.
Once the standard mount utility has been updated to use this library
we can then leverage it. Until then this is the only safe solution.
http://www.kernel.org/pub/linux/utils/util-linux/libmount-docs/index.html
Recently helper functions were added to libzfs_util to load a kernel
module or execute a process. Initially this functionality was limited
to libzfs but it has become clear there will be other consumers. This
change opens up the interface so it may be used where appropriate.
For the moment, the only advantage in registering a umount helper
would be to automatically unshare a zfs filesystem. Since under
Linux this would be unexpected (but nice) behavior there is no
harm in disabling it.
This is desirable because the 'zfs unmount' path invokes the system
umount. This is done to ensure correct mtab locking but has the
side effect that the umount.zfs helper would be called if it exists.
By default this helper calls back in to zfs to do the unmount on
Solaris which we don't want under Linux.
Once libmount is available and we have a safe way to correctly
lock and update the /etc/mtab file we can reconsider the need
for a umount helper. Using libmount is the prefered solution.
While not strictly required to mount a zfs filesystem using a
mount helper has certain advantages.
First, we need it if we want to honor the mount behavior as found
on Solaris. As part of the mount we need to validate that the
dataset has the legacy mount property set if we are using 'mount'
instead of 'zfs mount'.
Secondly, by using a mount helper we can automatically load the
zpl kernel module. This way you can just issue a 'mount' or
'zfs mount' and it will just work.
Finally, it gives us common hook in user space to add any zfs
specific mount options we might want. At the moment we don't
have any but now the infrastructure is at least in place.
If libselinux is detected on your system at configure time link
against it. This allows us to use a library call to detect if
selinux is enabled and if it is to pass the mount option:
"context=\"system_u:object_r:file_t:s0"
For now this is required because none of the existing selinux
policies are aware of the zfs filesystem type. Because of this
they do not properly enable xattr based labeling even though
zfs supports all of the required hooks.
Until distro's add zfs as a known xattr friendly fs type we
must use mntpoint labeling. Alternately, end users could modify
their existing selinux policy with a little guidance.
During a rename we need to be careful to destroy and create a
new minor for the ZVOL _only_ if the rename succeeded. The previous
code would both destroy you minor device unconditionally, it would
also fail to create the new minor device on success.
These compiler warnings were introduced when code which was
previously #ifdef'ed out by HAVE_ZPL was re-added for use
by the posix layer. All of the following changes should be
obviously correct and will cause no semantic changes.
For while now mkdirp has been built as part of libspl however
the protoype was never added to libgen.h. This went unnoticed
until enabling the mount support which uses mkdirp().
The issue is that cv_timedwait() sleeps uninterruptibly to block signals
and avoid waking up early. Under Linux this counts against the load
average keeping it artificially high. This change allows the arc to
sleep interruptibly which mean it may be woken up early due to a signal.
Normally this means some extra care must be taken to handle a potential
signal. But for the arcs usage of cv_timedwait() there is no harm in
waking up before the timeout expires so no extra handling is required.