Commit Graph

127 Commits

Author SHA1 Message Date
Suman Chakravartula
e18be9a637 Add overlay(-O) mount option support
Linux supports mounting over non-empty directories by default.
In Solaris this is not the case and -O option is required for
zfs mount to mount a zfs filesystem over a non-empty directory.

For compatibility, I've added support for -O option to mount
zfs filesystems over non-empty directories if the user wants
to, just like in Solaris.

I've defined MS_OVERLAY to record it in the flags variable if
the -O option is supplied.  The flags variable passes through
a few functions and its checked before performing the empty
directory check in zfs_mount function.  If -O is given, the
check is not performed.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #473
2012-01-12 15:49:38 -08:00
Richard Laager
2932b6a800 Treat /dev/vd* as whole disks
Correctly detect /dev/vd devices as whole disks and attempt to
create an EFI partition table.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2012-01-11 16:44:54 -08:00
Brian Behlendorf
ab26409db7 Linux 3.1 compat, super_block->s_shrink
The Linux 3.1 kernel has introduced the concept of per-filesystem
shrinkers which are directly assoicated with a super block.  Prior
to this change there was one shared global shrinker.

The zfs code relied on being able to call the global shrinker when
the arc_meta_limit was exceeded.  This would cause the VFS to drop
references on a fraction of the dentries in the dcache.  The ARC
could then safely reclaim the memory used by these entries and
honor the arc_meta_limit.  Unfortunately, when per-filesystem
shrinkers were added the old interfaces were made unavailable.

This change adds support to use the new per-filesystem shrinker
interface so we can continue to honor the arc_meta_limit.  The
major benefit of the new interface is that we can now target
only the zfs filesystem for dentry and inode pruning.  Thus we
can minimize any impact on the caching of other filesystems.

In the context of making this change several other important
issues related to managing the ARC were addressed, they include:

* The dnlc_reduce_cache() function which was called by the ARC
to drop dentries for the Posix layer was replaced with a generic
zfs_prune_t callback.  The ZPL layer now registers a callback to
drop these dentries removing a layering violation which dates
back to the Solaris code.  This callback can also be used by
other ARC consumers such as Lustre.

  arc_add_prune_callback()
  arc_remove_prune_callback()

* The arc_reduce_dnlc_percent module option has been changed to
arc_meta_prune for clarity.  The dnlc functions are specific to
Solaris's VFS and have already been largely eliminated already.
The replacement tunable now represents the number of bytes the
prune callback will request when invoked.

* Less aggressively invoke the prune callback.  We used to call
this whenever we exceeded the arc_meta_limit however that's not
strictly correct since it results in over zeleous reclaim of
dentries and inodes.  It is now only called once the arc_meta_limit
is exceeded and every effort has been made to evict other data from
the ARC cache.

* More promptly manage exceeding the arc_meta_limit.  When reading
meta data in to the cache if a buffer was unable to be recycled
notify the arc_reclaim thread to invoke the required prune.

* Added arcstat_prune kstat which is incremented when the ARC
is forced to request that a consumer prune its cache.  Remember
this will only occur when the ARC has no other choice.  If it
can evict buffers safely without invoking the prune callback
it will.

* This change is also expected to resolve the unexpect collapses
of the ARC cache.  This would occur because when exceeded just the
arc_meta_limit reclaim presure would be excerted on the arc_c
value via arc_shrink().  This effectively shrunk the entire cache
when really we just needed to reclaim meta data.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #466
Closes #292
2012-01-11 11:46:02 -08:00
Darik Horn
28eb9213d8 Linux 3.2 compat: set_nlink()
Directly changing inode->i_nlink is deprecated in Linux 3.2 by commit

  SHA: bfe8684869601dacfcb2cd69ef8cfd9045f62170

Use the new set_nlink() kernel function instead.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes: #462
2011-12-16 20:02:52 -08:00
Prakash Surya
6ba3b44614 Add make rule for building Arch Linux packages
Added the necessary build infrastructure for building packages
compatible with the Arch Linux distribution. As such, one can now run:

    $ ./configure
    $ make pkg     # Alternatively, one can run 'make arch' as well

on the Arch Linux machine to create two binary packages compatible with
the pacman package manager, one for the zfs userland utilities and
another for the zfs kernel modules. The new packages can then be
installed by running:

    # pacman -U $package.pkg.tar.xz

In addition, source-only packages suitable for an Arch Linux chroot
environment or remote builder can also be build using the 'sarch' make
rule.

NOTE: Since the source dist tarball is created on the fly from the head
of the build tree, it's MD5 hash signature will be continually influx.
As a result, the md5sum variable was intentionally omitted from the
PKGBUILD files, and the '--skipinteg' makepkg option is used. This may
or may not have any serious security implications, as the source tarball
is not being downloaded from an outside source.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #491
2011-12-14 19:14:23 -08:00
Garrett D'Amore
a38718a63d Illumos #734: Use taskq_dispatch_ent() interface
It has been observed that some of the hottest locks are those
of the zio taskqs.  Contention on these locks can limit the
rate at which zios are dispatched which limits performance.

This upstream change from Illumos uses new interface to the
taskqs which allow them to utilize a prealloc'ed taskq_ent_t.
This removes the need to perform an allocation at dispatch
time while holding the contended lock.  This has the effect
of improving system performance.

Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Alexey Zaytsev <alexey.zaytsev@nexenta.com>
Reviewed by: Jason Brian King <jason.brian.king@gmail.com>
Reviewed by: George Wilson <gwilson@zfsmail.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>

References to Illumos issue:
  https://www.illumos.org/issues/734

Ported-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #482
2011-12-14 09:19:30 -08:00
Gunnar Beutner
590338f63e Added comments for libshare's NFS functions.
Some of the functions' purpose wasn't immediately obvious without
additional explanations. This commit adds these missing comments.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-12-05 09:33:00 -08:00
Suman Chakravartula
ada8ec1ec5 Allow leading digits in userquota/groupquota names
While setting/getting userquota and groupquota properties, the input
was not treated as a possible username or groupname if it had a
leading digit. While useradd in linux recommends the regexp
[a-z_][a-z0-9_-]*[$]? , it is not enforced. This causes problem for
usernames with leading digits in them. We need to be able to support
getting and setting properties for this unconventional but possible
input category

I've updated the code to validate the username or groupname directly
via the API. Also, note that I moved this validation to the beginning
before the check for SID names with @. This also supports usernames
with @ character in them which are valid. Only when input with @ is
not a valid username, it is interpreted as a potential SID name.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #428
2011-11-21 16:29:18 -08:00
Brian Behlendorf
ca5fd24984 Limit maximum ashift value to 12
While we initially allowed you to set your ashift as large as 17
(SPA_MAXBLOCKSIZE) that is actually unsafe.  What wasn't considered
at the time is that each uberblock written to the vdev label ring
buffer will be of this size.  Now the buffer is statically sized
to 128k and we need to be able to fit several uberblocks in it.
With a large ashift that becomes a problem.

Therefore I'm reducing the maximum configurable ashift value to 12.
This is large enough for the 4k sector drives and small enough that
we can still keep the most recent 32 uberblock in the vdev label
ring buffer.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #425
2011-11-11 14:50:48 -08:00
Brian Behlendorf
5547c2f1bf Simplify BDI integration
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code.  The updated code now just
registers the bdi during mount and destroys it during unmount.

The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it.  Luckily the bdi_setup_and_register() function
is trivial.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #367
2011-11-08 10:19:03 -08:00
Zachary Bedell
7a0232735d Make libefi-created GPT compatible with gptfdisk
GPT's created by libefi set the HeaderSize attribute in the GPT
header to 512 -- size of the GPT header INCLUDING the 420 padding
bytes at the end.  Most other tools set the size to 92 -- size of
the actual header itself excluding the padding.  Most tools check
the recorded HeaderSize when verifying CRC, but gptfdisk hardcodes
92 and thus reports CRC verification problems for full-disk vdevs
created IE with `zpool create pool sdc`.

This commit changes libefi's behavior for GPT creation and also
fixes several edge cases where libefi's behavior was similar
(though in an incompatible manner) to gptfdisk.  Libefi assumed
HeaderSize was always 512 even if the GPT recorded a different
value.  Sanity checks of the GPT headersize read from disk were
added before applying checksum calculation -- this will prevent
segfault in cases of bogus on-disk values.

Zpools created with the resuling libefi are verified as correct
both by parted and gptfdisk.  Also pool have been tested to
import correctly on ZFS on Linux as well as Solaris Express 11
livecd.

Signed-off-by: Zachary Bedell <zac@thebedells.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #344
2011-09-26 09:44:43 -07:00
Brian Behlendorf
de0a1c099b Autogen refresh for udev changes
Run autogen.sh using the same autotools versions as upstream:

 * autoconf-2.63
 * automake-1.11.1
 * libtool-2.2.6b
2011-08-08 16:30:27 -07:00
Brian Behlendorf
76659dc110 Add backing_device_info per-filesystem
For a long time now the kernel has been moving away from using the
pdflush daemon to write 'old' dirty pages to disk.  The primary reason
for this is because the pdflush daemon is single threaded and can be
a limiting factor for performance.  Since pdflush sequentially walks
the dirty inode list for each super block any delay in processing can
slow down dirty page writeback for all filesystems.

The replacement for pdflush is called bdi (backing device info).  The
bdi system involves creating a per-filesystem control structure each
with its own private sets of queues to manage writeback.  The advantage
is greater parallelism which improves performance and prevents a single
filesystem from slowing writeback to the others.

For a long time both systems co-existed in the kernel so it wasn't
strictly required to implement the bdi scheme.  However, as of
Linux 2.6.36 kernels the pdflush functionality has been retired.

Since ZFS already bypasses the page cache for most I/O this is only
an issue for mmap(2) writes which must go through the page cache.
Even then adding this missing support for newer kernels was overlooked
because there are other mechanisms which can trigger writeback.

However, there is one critical case where not implementing the bdi
functionality can cause problems.  If an application handles a page
fault it can enter the balance_dirty_pages() callpath.  This will
result in the application hanging until the number of dirty pages in
the system drops below the dirty ratio.

Without a registered backing_device_info for the filesystem the
dirty pages will not get written out.  Thus the application will hang.
As mentioned above this was less of an issue with older kernels because
pdflush would eventually write out the dirty pages.

This change adds a backing_device_info structure to the zfs_sb_t
which is already allocated per-super block.  It is then registered
when the filesystem mounted and unregistered on unmount.  It will
not be registered for mounted snapshots which are read-only.  This
change will result in flush-<pool> thread being dynamically created
and destroyed per-mounted filesystem for writeback.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #174
2011-08-04 13:37:38 -07:00
Gunnar Beutner
ddd0fd9ef6 Use libzfs_run_process() in libshare.
This should simplify the code a bit by re-using existing code
to fork and exec a process.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #190
2011-08-01 13:52:35 -07:00
Gunnar Beutner
3132cb397a Use /dev/null for stdout/stderr in libzfs_run_process().
Simply closing the stdout and/or stderr file descriptors for
the child process can have bad side effects if for example
the child writes to stdout/stderr after open()ing a file.
The open() call might have returned the same file descriptor
one would usually expect for stdout/stderr (1 and 2), thereby
causing mis-directed writes.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #190
2011-08-01 13:52:23 -07:00
James H
5333eb0b3b Call exportfs -v once for NFS shares.
At the moment we call exportfs -v every time we check whether an
NFS share is active. This happens every time you run a zfs or
zpool command, making them extremely slow when you have a lot of
exports. The time taken is approx O(n2) of the number of shares.

This commit stores the output from exportfs -v in a temporary file
and use this to speed up subsequent accesses.

This mechanism is still too slow - if you have tens of thousands
of NFS shares it will still be painful running ANY zfs/zpool
command.

Signed-off-by: Gunnar Beutner <gunnar@beutner.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #341
2011-08-01 13:50:40 -07:00
Alexander Stetsenko
0b7936d5c2 Illumos #278: get rid zfs of python and pyzfs dependencies
Remove all python and pyzfs dependencies for consistency and
to ensure full functionality even in a mimimalist environment.

Reviewed by: gordon.w.ross@gmail.com
Reviewed by: trisk@opensolaris.org
Reviewed by: alexander.r.eremin@gmail.com
Reviewed by: jerry.jelinek@joyent.com
Approved by: garrett@nexenta.com

References to Illumos issue and patch:
- https://www.illumos.org/issues/278
- https://github.com/illumos/illumos-gate/commit/1af68beac3

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
Issue #160

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-08-01 12:09:36 -07:00
Matt Ahrens
f5fc4acaa7 Illumos #1092: zfs refratio property
Add a "REFRATIO" property, which is the compression ratio based on
data referenced. For snapshots, this is the same as COMPRESSRATIO,
but for filesystems/volumes, the COMPRESSRATIO is based on the
data "USED" (ie, includes blocks in children, but not blocks
shared with the origin).

This is needed to figure out how much space a filesystem would
use if it were not compressed (ignoring snapshots).

Reviewed by: George Wilson <George.Wilson@delphix.com>
Reviewed by: Adam Leventhal <Adam.Leventhal@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Mark Musante <Mark.Musante@oracle.com>
Reviewed by: Garrett D'Amore <garrett@nexenta.com>
Approved by: Garrett D'Amore <garrett@nexenta.com>

References to Illumos issue and patch:
- https://www.illumos.org/issues/1092
- https://github.com/illumos/illumos-gate/commit/187d6ac08a

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #340
2011-08-01 12:09:11 -07:00
Kyle Fuller
615ab66d18 Provide a rc.d script for archlinux
Unlike most other Linux distributions archlinux installs its
init scripts in /etc/rc.d insead of /etc/init.d.  This commit
provides an archlinux rc.d script for zfs and extends the
build infrastructure to ensure it get's installed in the
correct place.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #322
2011-07-11 14:12:23 -07:00
Brian Behlendorf
b1c932d318 Add proper library versioning
The zfs libraries were never properly versioned.  Since the API has
remained static for quite some time this we never an issue.  However,
going forward they should be versioned.  This commit versions all
of the libraries to 1.0.0.  From here on out this version must be
updated to reflect changes to the library.
2011-07-06 09:20:28 -07:00
Gunnar Beutner
52e7c3a2e5 Link libshare directly to libzfs
Drop usage of dlopen/dlsym for libshare.  There is no need to do
this because the zfs packages provide libshare.  Unlike on Solaris
we are guaranteed it will be available.

This avoids possible problems with hardcoding the libshare path in
the code (e.g. when users specify a different install path via
configure options).  It additionally simplifies the code which is
good for maintainability.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:28 -07:00
Gunnar Beutner
46e18b3f0f Implemented sharing datasets via NFS using libshare.
The sharenfs and sharesmb properties depend on the libshare library
to export datasets via NFS and SMB. This commit implements the base
libshare functionality as well as support for managing NFS shares.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-07-06 09:20:28 -07:00
Brian Behlendorf
f2cfee80e3 Fix implicit declaration of 'mkdirp'
The lib/libspl/include/libgen.h header file was being mistakenly
left out of the 'make dist' tarball.  It just happens this doesn't
cause a build failure when creating packages because the system
libgen/h is included instead.  This simply results in the following
warning due to the missing forward declaration of mkdirp().

  ../../lib/libzfs/libzfs_mount.c:417:3: warning: implicit declaration
  of function 'mkdirp' [-Wimplicit-function-declaration]
2011-07-01 13:39:47 -07:00
Brian Behlendorf
2cf7f52bc4 Linux compat 2.6.39: mount_nodev()
The .get_sb callback has been replaced by a .mount callback
in the file_system_type structure.  When using the new
interface the caller must now use the mount_nodev() helper.

Unfortunately, the new interface no longer passes the vfsmount
down to the zfs layers.  This poses a problem for the existing
implementation because we currently save this pointer in the
super block for latter use.  It provides our only entry point
in to the namespace layer for manipulating certain mount options.

This needed to be done originally to allow commands like
'zfs set atime=off tank' to work properly.  It also allowed me
to keep more of the original Solaris code unmodified.  Under
Solaris there is a 1-to-1 mapping between a mount point and a
file system so this is a fairly natural thing to do.  However,
under Linux they many be multiple entries in the namespace
which reference the same filesystem.  Thus keeping a back
reference from the filesystem to the namespace is complicated.

Rather than introduce some ugly hack to get the vfsmount and
continue as before.  I'm leveraging this API change to update
the ZFS code to do things in a more natural way for Linux.
This has the upside that is resolves the compatibility issue
for the long term and fixes several other minor bugs which
have been reported.

This commit updates the code to remove this vfsmount back
reference entirely.  All modifications to filesystem mount
options are now passed in to the kernel via a '-o remount'.
This is the expected Linux mechanism and allows the namespace
to properly handle any options which apply to it before passing
them on to the file system itself.

Aside from fixing the compatibility issue, removing the
vfsmount has had the benefit of simplifying the code.  This
change which fairly involved has turned out nicely.

Closes #246
Closes #217
Closes #187
Closes #248
Closes #231
2011-07-01 13:36:39 -07:00
Brian Behlendorf
5c03efc379 Linux compat 2.6.39: security_inode_init_security()
The security_inode_init_security() function now takes an additional
qstr argument which must be passed in from the dentry if available.
Passing a NULL is safe when no qstr is available the relevant
security checks will just be skipped.

Closes #246
Closes #217
Closes #187
2011-07-01 12:40:08 -07:00
Prasad Joshi
b312979252 Tear down and flush the mmap region
The inode eviction should unmap the pages associated with the inode.
These pages should also be flushed to disk to avoid the data loss.
Therefore, use truncate_setsize() in evict_inode() to release the
pagecache.

The API truncate_setsize() was added in 2.6.35 kernel. To ensure
compatibility with the old kernel, the patch defines its own
truncate_setsize function.

Signed-off-by: Prasad Joshi <pjoshi@stec-inc.com>
Closes #255
2011-06-27 09:59:19 -07:00
Christian Kohlschütter
df30f56639 Add "ashift" property to zpool create
Some disks with internal sectors larger than 512 bytes (e.g., 4k) can
suffer from bad write performance when ashift is not configured
correctly.  This is caused by the disk not reporting its actual sector
size, but a sector size of 512 bytes.  The drive may behave this way
for compatibility reasons.  For example, the WDC WD20EARS disks are
known to exhibit this behavior.

When creating a zpool, ZFS takes that wrong sector size and sets the
"ashift" property accordingly (to 9: 1<<9=512), whereas it should be
set to 12 for 4k sectors (1<<12=4096).

This patch allows an adminstrator to manual specify the known correct
ashift size at 'zpool create' time.  This can significantly improve
performance in certain cases.  However, it will have an impact on your
total pool capacity.  See the updated ashift property description
in the zpool.8 man page for additional details.

Valid values for the ashift property range from 9 to 17 (512B-128KB).
Additionally, you may set the ashift to 0 if you wish to auto-detect
the sector size based on what the disk reports, this is the default
behavior.  The most common ashift values are 9 and 12.

  Example:
  zpool create -o ashift=12 tank raidz2 sda sdb sdc sdd

Closes #280

Original-patch-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-06-17 16:35:49 -07:00
Brian Behlendorf
2e08aedba4 Always check -Wno-unused-but-set-variable gcc support
The previous commit 8a7e1ceefa wasn't
quite right.  This check applies to both the user and kernel space
build and as such we must make sure it runs regardless of what
the --with-config option is set too.

For example, if --with-config=kernel then the autoconf test does
not run and we generate build warnings when compiling the kernel
packages.
2011-06-14 16:40:35 -07:00
Brian Behlendorf
8a7e1ceefa Check for -Wno-unused-but-set-variable gcc support
Gcc versions 4.3.2 and earlier do not support the compiler flag
-Wno-unused-but-set-variable.  This can lead to build failures
on older Linux platforms such as Debian Lenny.  Since this is
an optional build argument this changes add a new autoconf check
for the option.  If it is supported by the installed version of
gcc then it is used otherwise it is omited.

See commit's 12c1acde76 and
79713039a2 for the reason the
-Wno-unused-but-set-variable options was originally added.
2011-06-14 14:43:22 -07:00
Brian Behlendorf
1b9d8c340f Fix 'zfs send -D' segfault
Sending pools with dedup results in a segfault due to a Solaris
portability issue.  Under Solaris the pipe(2) library call
creates a bidirectional data channel.  Unfortunately, on Linux
pipe(2) call creates unidirection data channel.  The fix is to
use the socketpair(2) function to create the expected
bidirectional channel.

Seth Heeren did the original leg work on this issue for zfs-fuse.
We finally just rediscovered the same portability issue and
dfurphy was able to point me at the original issue for the fix.

Closes #268
2011-06-09 13:58:48 -07:00
Brian Behlendorf
df554c148e Fix 'zfs set volsize=N pool/dataset'
This change fixes a kernel panic which would occur when resizing
a dataset which was not open.  The objset_t stored in the
zvol_state_t will be set to NULL when the block device is closed.
To avoid this issue we pass the correct objset_t as the third arg.

The code has also been updated to correctly notify the kernel
when the block device capacity changes.  For 2.6.28 and newer
kernels the capacity change will be immediately detected.  For
earlier kernels the capacity change will be detected when the
device is next opened.  This is a known limitation of older
kernels.

Online ext3 resize test case passes on 2.6.28+ kernels:
$ dd if=/dev/zero of=/tmp/zvol bs=1M count=1 seek=1023
$ zpool create tank /tmp/zvol
$ zfs create -V 500M tank/zd0
$ mkfs.ext3 /dev/zd0
$ mkdir /mnt/zd0
$ mount /dev/zd0 /mnt/zd0
$ df -h /mnt/zd0
$ zfs set volsize=800M tank/zd0
$ resize2fs /dev/zd0
$ df -h /mnt/zd0

Original-patch-by: Fajar A. Nugraha <github@fajar.net>
Closes #68
Closes #84
2011-05-02 08:54:40 -07:00
Alejandro R. Sedeño
e90a3de3e8 Add zpl_export.c to the list of targets. 2011-04-29 19:50:35 -07:00
Brian Behlendorf
94e954257a Correct MAXUID
The uid_t on most systems is in fact and unsigned 32-bit value.
This is almost always correct, however you could compile your
kernel to use an unsigned 16-bit value for uid_t.  In practice
I've never encountered a distribution which does this so I'm
willing to overlook this corner case for now.

Closes #165
2011-04-29 14:03:12 -07:00
Gunnar Beutner
055656d4f4 Implemented NFS export_operations.
Implemented the required NFS operations for exporting ZFS datasets
using the in-kernel NFS daemon.
2011-04-29 12:36:13 -07:00
Darik Horn
492b8e9e7b Use gethostid in the Linux convention.
Disable the gethostid() override for Solaris behavior because Linux systems
implement the POSIX standard in a way that allows a negative result.

Mask the gethostid() result to the lower four bytes, like coreutils does in
/usr/bin/hostid, to prevent junk bits or sign-extension on systems that have an
eight byte long type. This can cause a spurious hostid mismatch that prevents
zpool import on 64-bit systems.
2011-04-25 10:36:17 -05:00
Brian Behlendorf
a1cc0b3290 Fix 32-bit MAXOFFSET_T definition
Having MAXOFFSET_T defined to 0x7fffffffl was artificially limiting
the maximum file size on 32-bit systems.  In reality MAXOFFSET_T is
used when working with 'long long' types and as such we now define
it as LLONG_MAX.  This resolves the 2GB file size limit for files
and additionally allows zvols greater than 2GB on 32-bit systems.

Closes #136
Closes #81
2011-04-22 16:21:26 -07:00
Richard Laager
826ab7ad19 Support IEC base-2 prefixes
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2011-04-19 12:56:21 -07:00
Brian Behlendorf
12c1acde76 Set -Wno-unused-but-set-variable globally
As of gcc-4.6 the option -Wunused-but-set-variable is enabled by
default.  While this is a useful warning there are numerous places
in the ZFS code when a variable is set and then only checked in an
ASSERT().  To avoid having to update every instance of this in the
code we now set -Wno-unused-but-set-variable to suppress the warning.

Additionally, when building with --enable-debug and -Werror set these
warning also become fatal.  We can reevaluate the suppression of these
error at a later time if it becomes an issue.  For now we are basically
just reverting to the previous gcc behavior.
2011-04-19 10:44:10 -07:00
Brian Behlendorf
bdf4328b04 Linux 2.6.28 compat, insert_inode_locked()
Added insert_inode_locked() helper function, prior to this most callers
used insert_inode_hash().  The older method doesn't check for collisions
in the inode_hashtable but it still acceptible for use.  Fallback to
using insert_inode_hash() when insert_inode_locked() is unavailable.
2011-03-22 12:15:54 -07:00
Brian Behlendorf
1073d746d6 Linux compat, umount2(2) flags
Older glibc <sys/mount.h> headers did not define all the available
umount2(2) flags.  Both MNT_FORCE and MNT_DETACH are supported in the
kernel back to 2.4.11 so we define them correctly if they are missing.

Closes #95
2011-03-22 12:15:54 -07:00
Brian Behlendorf
f47c42e214 Merge branch 'dracut' 2011-03-22 12:13:04 -07:00
Brian Behlendorf
716895b161 Fix 'LDFLAGS=-Wl,--as-needed' build error
Compiling with 'LDFLAGS=-Wl,--as-needed' exposed the fact that
there were some library linking problems introduced by mount_zfs.
In particular, the libzfs library does use nvpair symbols, and
mount_zfs contains no dependencies on libzpool.

Closes #161
Closes #162
2011-03-18 14:47:19 -07:00
Brian Behlendorf
01c0e61da0 Add init scripts
To support automatically mounting your zfs on filesystem on boot
a basic init script is needed.  Unfortunately, every distribution
has their own idea of the _right_ way to do things.  Rather than
write one very complicated portable init script, which would be
invariably replaced by the distributions own anyway.  I have
instead added support to provide multiple distribution specific
init scripts.

The correct init script for your distribution will be selected
by ZFS_AC_DEFAULT_PACKAGE which will set DEFAULT_INIT_SCRIPT.
During 'make install' the correct script for your system will
be installed from zfs/etc/init.d/zfs.DEFAULT_INIT_SCRIPT to the
usual /etc/init.d/zfs location.

Currently, there is zfs.fedora and a more generic zfs.lsb init
script.  Hopefully, the distribution maintainers who know best
how they want their init scripts to function will feedback their
approved versions to be included in the project.

This change does not consider upstart jobs but I'm not at all
opposed to add that sort of thing.
2011-03-17 16:51:54 -07:00
Brian Behlendorf
9ac97c2a93 Print mount/umount errors
Because we are dependent of the system mount/umount utilities to
ensure correct mtab locking, we should not suppress their error
output.  During a successful mount/umount they will be silent,
but during a failure the error message they print is the only sure
way to know why a mount failed.  This is because the (u)mount(8)
return code does not contain the result of the system call issued.
The only way to clearly idenify why thing failed is to rely on
the error message printed by the tool.

Longer term once libmount is available we can issue the mount/umount
system calls within the tool and still be ensured correct mtab locking.

Closed #107
2011-03-09 15:26:48 -08:00
Brian Behlendorf
d53368f675 Fix mount helper
Several issues related to strange mount/umount behavior were reported
and this commit should address most of them.  The original idea was
to put in place a zfs mount helper (mount.zfs).  This helper is used
to enforce 'legacy' mount behavior, and perform any extra mount argument
processing (selinux, zfsutil, etc).  This helper wasn't ready for the
0.6.0-rc1 release but with this change it's functional but needs to
extensively tested.

This change addresses the following open issues.
Closes #101
Closes #107
Closes #113
Closes #115
Closes #119
2011-03-09 15:26:48 -08:00
Brian Behlendorf
5075c7ea69 Add missing libspl+libzpool libs to libzfs
The libspl and libzpool libraries were missing from the libzfs
Makefile.am.  They should be explicitly listed to avoid build
issues when compiling static libraries and binaries.

Additionally, ensure libzpool is built before libzfs because
libzfs is dependent on libzpool.  This was also exposed as an
issue when forcing static linking.
2011-03-03 15:48:57 -08:00
Brian Behlendorf
45066d1f20 Linux 2.6.38 compat, blkdev_get_by_path()
The open_bdev_exclusive() function has been replaced (again) by the
more generic blkdev_get_by_path() function.  Additionally, the
counterpart function close_bdev_exclusive() has been replaced by
blkdev_put().  Because these functions are more generic versions
of the functions they replaced the compatibility macro must add
the FMODE_EXCL mask to ensure they are exclusive.

Closes #114
2011-02-23 12:29:38 -08:00
Brian Behlendorf
f03e41e8da Improve 'zpool import' safety
There are three improvements here to 'zpool import' proposed by Fajar
in Github issue #98.  They are all good so I'm commiting all three.

1) Add descriptions for "hpet" and "core" blacklist entries.

2) Add "core" to the blacklist, as described in the issue accessing
this device will crash Xen dom0.

3) Refine probing behavior to use fstatat64().  This allows us to
determine if a device is a block device or a regular file without
having to open it.  This is the safest appraoch when probing /dev/
because the simple act of opening a device may have unexpected
consequences.

Closes #98
2011-02-17 09:35:43 -08:00
Brian Behlendorf
07bd86718b Suppress share error on mount
Until code is added to support automatically sharing datasets
we should return success instead of failure.  This prevents the
command line tools from returning a non-zero error code.  While
a user likely won't notice this, test scripts like zconfig.sh
do and correctly fail because of it.
2011-02-16 11:05:55 -08:00
Brian Behlendorf
2c395def27 Linux 2.6.36 compat, sops->evict_inode()
The new prefered inteface for evicting an inode from the inode cache
is the ->evict_inode() callback.  It replaces both the ->delete_inode()
and ->clear_inode() callbacks which were previously used for this.
2011-02-11 13:47:51 -08:00