Commit Graph

160 Commits

Author SHA1 Message Date
Richard Yao
20f04f08aa Fix incorrect usage of strdup() in zfs_unmount_snap()
Modifying the length of a string returned by strdup() is incorrect
because strfree() is allowed to use strlen() to determine which slab
cache was used to do the allocation.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1775
2013-10-29 15:06:18 -07:00
Tim Chase
8769db3966 Allocate the ioctl "output" nvlist with KM_PUSHPAGE.
Some ZFS errors such as certain snapshot failures can occur in
the sync task context.  Because they may require additional memory
allocations, the initial nvlist must be allocated with KM_PUSHPAGE.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1746
Issue #1737
2013-09-25 15:44:22 -07:00
Matthew Ahrens
13fe019870 Illumos #3464
3464 zfs synctask code needs restructuring
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>

References:
  https://www.illumos.org/issues/3464
  illumos/illumos-gate@3b2aab1880

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1495
2013-09-04 16:01:24 -07:00
Matthew Ahrens
6f1ffb0665 Illumos #2882, #2883, #2900
2882 implement libzfs_core
2883 changing "canmount" property to "on" should not always remount dataset
2900 "zfs snapshot" should be able to create multiple, arbitrary snapshots at once

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Chris Siden <christopher.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Reviewed by: Bill Pijewski <wdp@joyent.com>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/2882
  https://www.illumos.org/issues/2883
  https://www.illumos.org/issues/2900
  illumos/illumos-gate@4445fffbbb

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1293

Porting notes:

WARNING: This patch changes the user/kernel ABI.  That means that
the zfs/zpool utilities built from master are NOT compatible with
the 0.6.2 kernel modules.  Ensure you load the matching kernel
modules from master after updating the utilities.  Otherwise the
zfs/zpool commands will be unable to interact with your pool and
you will see errors similar to the following:

  $ zpool list
  failed to read pool configuration: bad address
  no pools available

  $ zfs list
  no datasets available

Add zvol minor device creation to the new zfs_snapshot_nvl function.

Remove the logging of the "release" operation in
dsl_dataset_user_release_sync().  The logging caused a null dereference
because ds->ds_dir is zeroed in dsl_dataset_destroy_sync() and the
logging functions try to get the ds name via the dsl_dataset_name()
function. I've got no idea why this particular code would have worked
in Illumos.  This code has subsequently been completely reworked in
Illumos commit 3b2aab1 (3464 zfs synctask code needs restructuring).

Squash some "may be used uninitialized" warning/erorrs.

Fix some printf format warnings for %lld and %llu.

Apply a few spa_writeable() changes that were made to Illumos in
illumos/illumos-gate.git@cd1c8b8 as part of the 3112, 3113, 3114 and
3115 fixes.

Add a missing call to fnvlist_free(nvl) in log_internal() that was added
in Illumos to fix issue 3085 but couldn't be ported to ZoL at the time
(zfsonlinux/zfs@9e11c73) because it depended on future work.
2013-09-04 15:49:00 -07:00
Pawel Jakub Dawidek
526af78550 Call zvol_create_minors() in spa_open_common() when initializing pool
There is an extremely odd bug that causes zvols to fail to appear on
some systems, but not others. Recently, I was able to consistently
reproduce this issue over a period of 1 month. The issue disappeared
after I applied this change from FreeBSD.

This is from FreeBSD's pool version 28 import, which occurred in
revision 219089.

Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #441
Issue #599
2013-07-03 09:22:44 -07:00
shenyan1
0a6bef26ec kmem_zalloc(..., KM_SLEEP) will never fail
By definitition these allocations will never fail.  For
consistency with the rest of the code remove this dead error
handling code.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1558
2013-07-01 14:51:48 -07:00
Madhav Suresh
c99c90015e Illumos #3006
3006 VERIFY[S,U,P] and ASSERT[S,U,P] frequently check if first
     argument is zero

Reviewed by Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by George Wilson <george.wilson@delphix.com>
Approved by Eric Schrock <eric.schrock@delphix.com>

References:
  illumos/illumos-gate@fb09f5aad4
  https://illumos.org/issues/3006

Requires:
  zfsonlinux/spl@1c6d149feb

Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1509
2013-06-19 15:14:10 -07:00
Eric Dillmann
0b4d1b5853 Add snapdev=[hidden|visible] dataset property
The new snapdev dataset property may be set to control the
visibility of zvol snapshot devices.  By default this value
is set to 'hidden' which will prevent zvol snapshots from
appearing under /dev/zvol/ and /dev/<dataset>/.  When set to
'visible' all zvol snapshots for the dataset will be visible.

This functionality was largely added because when automatic
snapshoting is enabled large numbers of read-only zvol snapshots
will be created.  When creating these devices the kernel will
attempt to read their partition tables, and blkid will attempt
to identify any filesystems on those partitions.  This leads
to a variety of issues:

1) The zvol partition tables will be read in the context of
   the `modprobe zfs` for automatically imported pools.  This
   is undesirable and should be done asynchronously, but for
   now reducing the number of visible devices helps.

2) Udev expects to be able to complete its work for a new
   block devices fairly quickly.  When many zvol devices are
   added at the same time this is no longer be true.  It can
   lead to udev timeouts and missing /dev/zvol links.

3) Simply having lots of devices in /dev/ can be aukward from
   a management standpoint.  Hidding the devices your unlikely
   to ever use helps with this.  Any snapshot device which is
   needed can be made visible by changing the snapdev property.

NOTE: This patch changes the default behavior for zvols which
      was effectively 'snapdev=visible'.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1235
Closes #945
Issue #956
Issue #756
2013-03-05 12:37:54 -08:00
Eric Dillmann
9759c60f1a Illumos #3035 LZ4 compression support in ZFS and GRUB
3035 LZ4 compression support in ZFS and GRUB

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Christopher Siden <csiden@delphix.com>

References:
  illumos/illumos-gate@a6f561b4ae
  https://www.illumos.org/issues/3035
  http://wiki.illumos.org/display/illumos/LZ4+Compression+In+ZFS

This patch has been slightly modified from the upstream Illumos
version to be compatible with Linux.  Due to the very limited
stack space in the kernel a lz4 workspace kmem cache is used.
Since we are using gcc we are also able to take advantage of the
gcc optimized __builtin_ctz functions.

Support for GRUB has been dropped from this patch.  That code
is available but those changes will need to made to the upstream
GRUB package.

Lastly, several hunks of dead code were dropped for clarity.  They
include the functions real_LZ4_uncompress(), LZ4_compressBound()
and the Visual Studio specific hunks wrapped in _MSC_VER.

Ported-by: Eric Dillmann <eric@jave.fr>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1217
2013-01-29 09:28:20 -08:00
Christopher Siden
9ae529ec5d Illumos #2619 and #2747
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <gwilson@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan Kruchinin <dan.kruchinin@gmail.com>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  illumos/illumos-gate@53089ab7c8
  illumos/illumos-gate@ad135b5d64
  illumos changeset: 13700:2889e2596bd6
  https://www.illumos.org/issues/2619
  https://www.illumos.org/issues/2747

NOTE: The grub specific changes were not ported.  This change
must be made to the Linux grub packages.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
2013-01-08 10:35:35 -08:00
Brian Behlendorf
d5446cfc52 Revert "Remove TSD zfs_fsyncer_key"
This reverts commit 31f2b5abdf back
to the original code until the fsync(2) performance regression
can be addressed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-12-20 09:56:28 -08:00
Brian Behlendorf
31f2b5abdf Remove TSD zfs_fsyncer_key
It's my understanding that the zfs_fsyncer_key TSD was added as
a performance omtimization to reduce contention on the zl_lock
from zil_commit().  This issue manifested itself as very long
(100+ms) fsync() system call times for fsync() heavy workloads.

However, under Linux I'm not seeing the same contention that
was originally described.  Therefore, I'm removing this code
in order to ween ourselves off any dependence on TSD.  If the
original performance issue reappears on Linux we can revisit
fixing it without resorting to TSD.

This just leaves one small ZFS TSD consumer.  If it can be
cleanly removed from the code we'll be able to shed the SPL
TSD implementation entirely.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/spl#174
2012-12-19 09:08:01 -08:00
George Wilson
65947351e7 Illumos #3129, #3130
illumos/illumos-gate@d6afdce20f
Illumos changeset: 13794:7c5e0e746b2c

3129 'zpool reopen' restarts resilvers
3130 ztest failure: Assertion failed:
     0 == dmu_objset_destroy(name, B_FALSE) (0x0 == 0x10)

Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  https://www.illumos.org/issues/3129
  https://www.illumos.org/issues/3130

Ported by: Etienne Dechamps <etienne.dechamps@ovh.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #994
2012-10-03 13:59:02 -07:00
Bill Pijewski
37abac6d55 Illumos #2703: add mechanism to report ZFS send progress
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  https://www.illumos.org/issues/2703

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-09-19 13:39:06 -07:00
Chris Siden
1bd201e70d Illumos #1948: zpool list should show more detailed pool info
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <eric.schrock@delphix.com>

References:
  https://www.illumos.org/issues/1948

Ported by:	Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #685
2012-09-19 13:39:05 -07:00
Brian Behlendorf
4ca9a43644 Remove zvol device node
The 'zfs destroy' changes in 330d06f disrupted how zvol devices
get removed on ZoL.  However, it basically boils down to the
fact that we are no longer reliably calling zvol_remove_minor()
via zfs_ioc_destroy_snaps().

Therefore we add the missing call and handle things similarly
to the existing zfs_unmount_snap() case.  Ideally we would check
if this is of type DMU_OST_ZFS or DMU_OST_ZVOL and just do the
right thing as in zfs_ioc_destroy().  However, it looks like
it would be fairly expensive to get the type, and it's harmless
to simply attempt the umount and minor removal.

This is also an issue in the latest FreeBSD and Illumos code.
It was being tracked under the following issue, and we may want
to refresh our code when they settle on what they want to do
about it upstream.

  https://www.illumos.org/issues/3170

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #903
2012-09-10 10:25:08 -07:00
Matthew Ahrens
330d06f90d Illumos #1644, #1645, #1646, #1647, #1708
1644 add ZFS "clones" property
1645 add ZFS "written" and "written@..." properties
1646 "zfs send" should estimate size of stream
1647 "zfs destroy" should determine space reclaimed by
     destroying multiple snapshots
1708 adjust size of zpool history data

References:
  https://www.illumos.org/issues/1644
  https://www.illumos.org/issues/1645
  https://www.illumos.org/issues/1646
  https://www.illumos.org/issues/1647
  https://www.illumos.org/issues/1708

This commit modifies the user to kernel space ioctl ABI.  Extra
care should be taken when updating to ensure both the kernel
modules and utilities are updated.  This change has reordered
all of the new ioctl()s to the end of the list.  This should
help minimize this issue in the future.

Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Albert Lee <trisk@opensolaris.org>
Approved by: Garrett D'Amore <garret@nexenta.com>

Ported by: Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #826
Closes #664
2012-07-31 09:25:30 -07:00
Garrett D'Amore
3541dc6d02 Illumos #1748: desire support for reguid in zfs
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Alexander Eremin <alexander.eremin@nexenta.com>
Reviewed by: Alexander Stetsenko <ams@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

References:
  https://www.illumos.org/issues/1748

This commit modifies the user to kernel space ioctl ABI.  Extra
care should be taken when updating to ensure both the kernel
modules and utilities are updated.  If only the user space
component is updated both the 'zpool events' command and the
'zpool reguid' command will not work until the kernel modules
are updated.

Ported by:     Martin Matuska <martin@matuska.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #665
2012-07-11 13:08:56 -07:00
Pawel Jakub Dawidek
0cee24064a Speed up 'zfs list -t snapshot -o name -s name'
FreeBSD #xxx:  Dramatically optimize listing snapshots when user
requests only snapshot names and wants to sort them by name, ie.
when executes:

  # zfs list -t snapshot -o name -s name

Because only name is needed we don't have to read all snapshot
properties.

Below you can find how long does it take to list 34509 snapshots
from a single disk pool before and after this change with cold and
warm cache:

    before:

        # time zfs list -t snapshot -o name -s name > /dev/null
        cold cache: 525s
        warm cache: 218s

    after:

        # time zfs list -t snapshot -o name -s name > /dev/null
        cold cache: 1.7s
        warm cache: 1.1s

NOTE: This patch only appears in FreeBSD.  If/when Illumos picks up
the change we may want to drop this patch and adopt their version.
However, for now this addresses a real issue.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #450
2012-06-14 09:49:04 -07:00
Martin Matuska
7d5cd71da6 Illumos #1346: zfs incremental receive may leave behind temporary clones
1356 zfs dataset prefetch code not working
Reviewed by: Matthew Ahrens <matt@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References to Illumos issue:
  https://www.illumos.org/issues/1346
  https://www.illumos.org/issues/1356

Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #647
2012-04-11 12:02:27 -07:00
Martin Matuska
b129c6590e OS-926: zfs panic in zfs_fill_zplprops_impl()
This change appears to be exclusive to SmartOS. It is not present in
illumos-gate but it just adds some needed error handling.  This is
clearly preferable to simply ASSERTING which is what would occur
prior to the patch.

Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #652
2012-04-11 11:29:19 -07:00
Brian Behlendorf
4b5d425f14 Add ZFS_META_RELEASE to module load/unload messages
Include the ZFS_META_RELEASE in the module load/unload messages
to more clearly indidcate exactly what version of ZFS has been
loaded.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-03-23 12:14:35 -07:00
Brian Behlendorf
ebe7e575ea Add .zfs control directory
Add support for the .zfs control directory.  This was accomplished
by leveraging as much of the existing ZFS infrastructure as posible
and updating it for Linux as required.  The bulk of the core
functionality is now all there with the following limitations.

*) The .zfs/snapshot directory automount support requires a 2.6.37
   or newer kernel.  The exception is RHEL6.2 which has backported
   the d_automount patches.

*) Creating/destroying/renaming snapshots with mkdir/rmdir/mv
   in the .zfs/snapshot directory works as expected.  However,
   this functionality is only available to root until zfs
   delegations are finished.

      * mkdir - create a snapshot
      * rmdir - destroy a snapshot
      * mv    - rename a snapshot

The following issues are known defeciences, but we expect them to
be addressed by future commits.

*) Add automount support for kernels older the 2.6.37.  This should
   be possible using follow_link() which is what Linux did before.

*) Accessing the .zfs/snapshot directory via NFS is not yet possible.
   The majority of the ground work for this is complete.  However,
   finishing this work will require resolving some lingering
   integration issues with the Linux NFS kernel server.

*) The .zfs/shares directory exists but no futher smb functionality
   has yet been implemented.

Contributions-by: Rohan Puri <rohan.puri15@gmail.com>
Contributiobs-by: Andrew Barnes <barnes333@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #173
2012-03-22 13:03:47 -07:00
Brian Behlendorf
d7e398ce1a Cleanup ZFS debug infrastructure
Historically the internal zfs debug infrastructure has been
scattered throughout the code.  Since we expect to start making
more use of this code this patch performs some cleanup.

* Consolidate the zfs debug infrastructure in the zfs_debug.[ch]
  files.  This includes moving the zfs_flags and zfs_recover
  variables, plus moving the zfs_panic_recover() function.

* Remove the existing unused functionality in zfs_debug.c and
  replace it with code which correctly utilized the spl logging
  infrastructure.

* Remove the __dprintf() function from zfs_ioctl.c.  This is
  dead code, the dprintf() functionality in the kernel relies
  on the spl log support.

* Remove dprintf() from hdr_recl().  This wasn't particularly
  useful and was missing the required format specifier anyway.

* Subsequent patches should unify the dprintf() and zfs_dbgmsg()
  functions.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-02-02 11:24:30 -08:00
Brian Behlendorf
fa6e5ced2f Suppress kmem_alloc() warning in zfs_prop_set_special()
Suppress the warning for this large kmem_alloc() because it is not
that far over the warning threshhold (8k) and it is short lived.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-09-15 20:26:51 -07:00
Martin Matuska
ca5252204a Illumos #1043: Recursive zfs snapshot destroy fails
Prior to revision 11314 if a user was recursively destroying
snapshots of a dataset the target dataset was not required to
exist.  The zfs_secpolicy_destroy_snaps() function introduced
the security check on the target dataset, so since then if the
target dataset does not exist, the recursive destroy is not
performed.  Before 11314, only a delete permission check on
the snapshot's master dataset was performed.

Steps to reproduce:
zfs create pool/a
zfs snapshot pool/a@s1
zfs destroy -r pool@s1

Therefore I suggest to fallback to the old security check, if
the target snapshot does not exist and continue with the destroy.

References to Illumos issue and patch:
- https://www.illumos.org/issues/1043
- https://www.illumos.org/attachments/217/recursive_dataset_destroy.patch

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
2011-08-01 12:09:11 -07:00
Brian Behlendorf
61f218b090 Fix send/recv 'dataset is busy' errors
This commit fixes a regression which was accidentally introduced by
the Linux 2.6.39 compatibility chanages.  As part of these changes
instead of holding an active reference on the namepsace (which is
no longer posible) a reference is taken on the super block.  This
reference ensures the super block remains valid while it is in use.

To handle the unlikely race condition of the filesystem being
unmounted concurrently with the start of a 'zfs send/recv' the
code was updated to only take the super block reference when there
was an existing reference.  This indicates that the filesystem is
active and in use.

Unfortunately, in the 'zfs recv' case this is not the case.  The
newly created dataset will not have a super block without an
active reference which results in the 'dataset is busy' error.

The most straight forward fix for this is to simply update the
code to always take the reference even when it's zero.  This
may expose us to very very unlikely concurrent umount/send/recv
case but the consequences of that are minor.

Closes #319
2011-07-15 16:37:19 -07:00
Gunnar Beutner
3c9609b322 Renamed HAVE_SHARE ifdefs to HAVE_SMB_SHARE.
The remaining code that is guarded by HAVE_SHARE ifdefs is related to the
.zfs/shares functionality which is currently not available on Linux.

On Solaris the .zfs/shares directory can be used to set permissions for
SMB shares.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:28 -07:00
Gunnar Beutner
46e18b3f0f Implemented sharing datasets via NFS using libshare.
The sharenfs and sharesmb properties depend on the libshare library
to export datasets via NFS and SMB. This commit implements the base
libshare functionality as well as support for managing NFS shares.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:28 -07:00
Brian Behlendorf
2cf7f52bc4 Linux compat 2.6.39: mount_nodev()
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure.  When using the new
interface the caller must now use the mount_nodev() helper.

Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers.  This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use.  It provides our only entry point
in to the namespace layer for manipulating certain mount options.

This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly.  It also allowed me
to keep more of the original Solaris code unmodified.  Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do.  However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem.  Thus keeping a back
reference from the filesystem to the namespace is complicated.

Rather than introduce some ugly hack to get the vfsmount and
continue as before.  I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.

This commit updates the code to remove this vfsmount back
reference entirely.  All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.

Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code.  This
change which fairly involved has turned out nicely.

Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
2011-07-01 13:36:39 -07:00
Brian Behlendorf
2b8cad6159 Use vmem_alloc() for zfs_ioc_userspace_many()
The default buffer size when requesting multiple quota entries
is 100 times the zfs_useracct_t size.  In practice this works out
to exactly 27200 bytes.  Since this will be a short lived buffer
in a non-performance critical path it is preferable to vmem_alloc()
the needed memory.
2011-05-20 14:23:18 -07:00
Brian Behlendorf
f01b360e67 Pass caller's credential in zfsdev_ioctl()
Initially when zfsdev_ioctl() was ported to Linux we didn't have
any credential support implemented.  So at the time we simply
passed NULL which wasn't much of a problem since most of the
secpolicy code was disabled.

However, one exception is quota handling which does require the
credential.  Now that proper credentials are supported we can
safely start passing the callers credential.  This is also an
initial step towards fully implemented the zfs secpolicy.
2011-05-20 10:12:25 -07:00
Brian Behlendorf
34b84cb831 Use vmem_alloc() for zfs_ioc_pool_get_history()
The default buffer size when requesting history is 128k.  This
is far to large for a kmem_alloc() so instead use the slower
vmem_alloc().  This path has no performance concerns and the
buffer is immediately free'd after its contents are copied to
the user space buffer.
2011-05-06 09:59:52 -07:00
Brian Behlendorf
d247f2a3cc Suppress 'zfs receive' memory warning
As part of zfs_ioc_recv() a zfs_cmd_t is allocated in the kernel
which is 17808 bytes in size.  This sort of thing in general should
be avoided.  However, since this should be an infrequent event for
now we allow it and simply suppress the warning with the KM_NODEBUG
flag.  This can be revisited latter if/when it becomes an issue.

Closes #178
2011-04-20 10:22:31 -07:00
Darik Horn
a23cc0a443 Add the zpool and filesystem versions
Print the supported zpool and filesystem versions at module load
time.  This change removes an ambiguity and adds information that
system administrators care about.  The phrase "ZFS pool version %s"
is the same as zpool upgrade -v so that the operator is familiar
with the message.

  ZFS: Loaded module v0.6.0, ZFS pool version 28, ZFS filesystem version 5
  ZFS: Unloaded module v0.6.0

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-02-28 09:46:23 -08:00
Brian Behlendorf
3558fd73b5 Prototype/structure update for Linux
I appologize in advance why to many things ended up in this commit.
When it could be seperated in to a whole series of commits teasing
that all apart now would take considerable time and I'm not sure
there's much merrit in it.  As such I'll just summerize the intent
of the changes which are all (or partly) in this commit.  Broadly
the intent is to remove as much Solaris specific code as possible
and replace it with native Linux equivilants.  More specifically:

1) Replace all instances of zfsvfs_t with zfs_sb_t.  While the
type is largely the same calling it private super block data
rather than a zfsvfs is more consistent with how Linux names
this.  While non critical it makes the code easier to read when
your thinking in Linux friendly VFS terms.

2) Replace vnode_t with struct inode.  The Linux VFS doesn't have
the notion of a vnode and there's absolutely no good reason to
create one.  There are in fact several good reasons to remove it.
It just adds overhead on Linux if we were to manage one, it
conplicates the code, and it likely will lead to bugs so there's
a good change it will be out of date.  The code has been updated
to remove all need for this type.

3) Replace all vtype_t's with umode types.  Along with this shift
all uses of types to mode bits.  The Solaris code would pass a
vtype which is redundant with the Linux mode.  Just update all the
code to use the Linux mode macros and remove this redundancy.

4) Remove using of vn_* helpers and replace where needed with
inode helpers.  The big example here is creating iput_aync to
replace vn_rele_async.  Other vn helpers will be addressed as
needed but they should be be emulated.  They are a Solaris VFS'ism
and should simply be replaced with Linux equivilants.

5) Update znode alloc/free code.  Under Linux it's common to
embed the inode specific data with the inode itself.  This removes
the need for an extra memory allocation.  In zfs this information
is called a znode and it now embeds the inode with it.  Allocators
have been updated accordingly.

6) Minimal integration with the vfs flags for setting up the
super block and handling mount options has been added this
code will need to be refined but functionally it's all there.

This will be the first and last of these to large to review commits.
2011-02-10 09:27:21 -08:00
Brian Behlendorf
bcf308227c Remove zfs_ctldir.[ch]
This code is used for snapshot and heavily leverages Solaris
functionality we do not want to reimplement.  These files have
been removed, including references to them, and will be replaced
by a zfs_snap.c/zpl_snap.c implementation which handles snapshots.
2011-02-10 09:27:21 -08:00
Brian Behlendorf
3fc050aaf2 Init/destroy tsd
Add missing tsd_destroy() call for rrw_tsd_key to avoid a leak.
2011-02-10 09:25:38 -08:00
Brian Behlendorf
9ee7fac531 VFS: Wrap with HAVE_SHARE
Certain NFS/SMB share functionality is not yet in place.  These
functions used to be wrapped with the generic HAVE_ZPL to prevent
them from being compiled.  I still don't want them compiled but
I'm working toward eliminating the use of HAVE_ZPL.  So I'm just
renaming the wrapper here to HAVE_SHARE.  They still won't be
compiled until all the share issues are worked through.  Share
support is the last missing piece from zfs_ioctl.c.
2011-02-10 09:21:43 -08:00
Brian Behlendorf
95c73795b0 Fix ZVOL rename minor devices
During a rename we need to be careful to destroy and create a
new minor for the ZVOL _only_ if the rename succeeded.  The previous
code would both destroy you minor device unconditionally, it would
also fail to create the new minor device on success.
2011-01-07 12:26:02 -08:00
Ned Bass
b04cffc9b0 Remove inconsistent use of EOPNOTSUPP
Commit 3ee56c292b changed an ENOTSUP return value
in one location to ENOTSUPP to fix user programs seeing an invalid ioctl()
error code.  However, use of ENOTSUP is widespread in the zfs module.  Instead
of changing all of those uses, we fixed the ENOTSUP definition in the SPL to be
consistent with user space.  The changed return value in the above commit is
therefore no longer needed, so this commit reverses it to maintain consistency.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-11-10 13:26:56 -08:00
Ned Bass
3ee56c292b Make rollbacks fail gracefully
Support for rolling back datasets require a functional ZPL, which we currently
do not have.  The zfs command does not check for ZPL support before attempting
a rollback, and in preparation for rolling back a zvol it removes the minor
node of the device.  To prevent the zvol device node from disappearing after a
failed rollback operation, this change wraps the zfs_do_rollback() function in
an #ifdef HAVE_ZPL and returns ENOSYS in the absence of a ZPL.  This is
consistent with the behavior of other ZPL dependent commands such as mount.

The orginal error message observed with this bug was rather confusing:

    internal error: Unknown error 524
    Aborted

This was because zfs_ioc_rollback() returns ENOTSUP if we don't HAVE_ZPL, but
Linux actually has no such error code.  It should instead return EOPNOTSUPP, as
that is how ENOTSUP is defined in user space.  With that we would have gotten
the somewhat more helpful message

    cannot rollback 'tank/fish': unsupported version

This is rather a moot point with the above changes since we will no longer make
that ioctl call without a ZPL.  But, this change updates the error code just in
case.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-11-08 14:03:36 -08:00
Brian Behlendorf
baa40d45cb Fix missing 'zpool events'
It turns out that 'zpool events' over 1024 bytes in size where being
silently dropped.  This was discovered while writing the zfault.sh
tests to validate common failure modes.

This could occur because the zfs interface for passing an arbitrary
size nvlist_t over an ioctl() is to provide a buffer for the packed
nvlist which is usually big enough.  In this case 1024 byte is the
default.  If the kernel determines the buffer is to small it returns
ENOMEM and the minimum required size of the nvlist_t.  This was
working properly but in the case of 'zpool events' the event stream
was advanced dispite the error.  Thus the retry with the bigger
buffer would succeed but it would skip over the previous event.

The fix is to pass this size to zfs_zevent_next() and determine
before removing the event from the list if it will fit.  This was
preferable to checking after the event was returned because this
avoids the need to rewind the stream.
2010-10-12 14:55:03 -07:00
Brian Behlendorf
f5e79474f0 Fix zfsdev_compat_ioctl() case
For the !CONFIG_COMPAT case fix the zfsdev_compat_ioctl()
compatibility function name.  This was caught by the
chaos4.3 builder.
2010-09-01 16:00:15 -07:00
Brian Behlendorf
00b46022c6 Add linux kernel memory support
Required kmem/vmem changes

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:57 -07:00
Brian Behlendorf
60101509ee Add linux kernel disk support
Native Linux vdev disk interfaces

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:57 -07:00
Brian Behlendorf
325f023544 Add linux kernel device support
This branch contains the majority of the changes required to cleanly
intergrate with Linux style special devices (/dev/zfs).  Mainly this
means dropping all the Solaris style callbacks and replacing them
with the Linux equivilants.

This patch also adds the onexit infrastructure needed to track
some minimal state between ioctls.  Under Linux it would be easy
to do this simply using the file->private_data.  But under Solaris
they apparent need to pass the file descriptor as part of the ioctl
data and then perform a lookup in the kernel.  Once again to keep
code change to a minimum I've implemented the Solaris solution.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:50 -07:00
Brian Behlendorf
d2c15e84e9 Add linux mlslabel support
The ZFS update to onnv_141 brought with it support for a
security label attribute called mlslabel.  This feature
depends on zones to work correctly and thus I am disabling
it under Linux.  Equivilant functionality could be added
at some point in the future.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:49 -07:00
Brian Behlendorf
266852767f Add linux events
This topic branch leverages the Solaris style FMA call points
in ZFS to create a user space visible event notification system
under Linux.  This new system is called zevent and it unifies
all previous Solaris style ereports and sysevent notifications.

Under this Linux specific scheme when a sysevent or ereport event
occurs an nvlist describing the event is created which looks almost
exactly like a Solaris ereport.  These events are queued up in the
kernel when they occur and conditionally logged to the console.
It is then up to a user space application to consume the events
and do whatever it likes with them.

To make this possible the existing /dev/zfs ABI has been extended
with two new ioctls which behave as follows.

* ZFS_IOC_EVENTS_NEXT
Get the next pending event.  The kernel will keep track of the last
event consumed by the file descriptor and provide the next one if
available.  If no new events are available the ioctl() will block
waiting for the next event.  This ioctl may also be called in a
non-blocking mode by setting zc.zc_guid = ZEVENT_NONBLOCK.  In the
non-blocking case if no events are available ENOENT will be returned.
It is possible that ESHUTDOWN will be returned if the ioctl() is
called while module unloading is in progress.  And finally ENOMEM
may occur if the provided nvlist buffer is not large enough to
contain the entire event.

* ZFS_IOC_EVENTS_CLEAR
Clear are events queued by the kernel.  The kernel will keep a fairly
large number of recent events queued, use this ioctl to clear the
in kernel list.  This will effect all user space processes consuming
events.

The zpool command has been extended to use this events ABI with the
'events' subcommand.  You may run 'zpool events -v' to output a
verbose log of all recent events.  This is very similar to the
Solaris 'fmdump -ev' command with the key difference being it also
includes what would be considered sysevents under Solaris.  You
may also run in follow mode with the '-f' option.  To clear the
in kernel event queue use the '-c' option.

$ sudo cmd/zpool/zpool events -fv
TIME                        CLASS
May 13 2010 16:31:15.777711000 ereport.fs.zfs.config.sync
        class = "ereport.fs.zfs.config.sync"
        ena = 0x40982b7897700001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xed976600de75dfa6
        (end detector)

        time = 0x4bec8bc3 0x2e5aed98
        pool = "zpios"
        pool_guid = 0xed976600de75dfa6
        pool_context = 0x0

While the 'zpool events' command is handy for interactive debugging
it is not expected to be the primary consumer of zevents.  This ABI
was primarily added to facilitate the addition of a user space
monitoring daemon.  This daemon would consume all events posted by
the kernel and based on the type of event perform an action.  For
most events simply forwarding them on to syslog is likely enough.
But this interface also cleanly allows for more sophisticated
actions to be taken such as generating an email for a failed drive.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 13:41:36 -07:00
Brian Behlendorf
8a8f5c6b3c Fix zfs_ioc_objset_stats
Interestingly this looks like an upstream bug as well.  If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY.  This
commit adds the proper error handling just to propagate the error
back to user space.  Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output.  That's far far better than crashing the host.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:47 -07:00
Brian Behlendorf
c65aa5b2b9 Fix gcc missing parenthesis warnings
Gcc -Wall warn: 'missing parenthesis'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-31 08:38:35 -07:00
Brian Behlendorf
e75c13c353 Fix gcc missing case warnings
Gcc ASSERT() missing cases are impossible

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:34:03 -07:00
Brian Behlendorf
b8864a233c Fix gcc cast warnings
Gcc -Wall warn: 'lacks a cast'
Gcc -Wall warn: 'comparison between pointer and integer'

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-08-27 15:33:32 -07:00
Brian Behlendorf
572e285762 Update to onnv_147
This is the last official OpenSolaris tag before the public
development tree was closed.
2010-08-26 14:24:34 -07:00
Brian Behlendorf
428870ff73 Update core ZFS code from build 121 to build 141. 2010-05-28 13:45:14 -07:00
Brian Behlendorf
45d1cae3b8 Rebase master to b121 2009-08-18 11:43:27 -07:00
Brian Behlendorf
9babb37438 Rebase master to b117 2009-07-02 15:44:48 -07:00
Brian Behlendorf
d164b20935 Rebase master to b108 2009-02-18 12:51:31 -08:00
Brian Behlendorf
fb5f0bc833 Rebase master to b105 2009-01-15 13:59:39 -08:00
Brian Behlendorf
172bb4bd5e Move the world out of /zfs/ and seperate out module build tree 2008-12-11 11:08:09 -08:00