When calling mount(), care must be taken to avoid passing in flags
that are used only by the user space utilities. Otherwise we may
stomp on flags that are reserved for other purposes in the kernel.
In particular, openSUSE 12.3 kernels have added a new MS_RICHACL
super-block flag whose value conflicts with our MS_COMMENT flag. This
causes incorrect behavior such as the umask being ignored. The
MS_COMMENT flag essentially serves as a placeholder in the option_map
data structure of zfs_mount.c, but its value is never used. Therefore
we can avoid the conflict by defining it to 0.
The MS_USERS, MS_OWNER, and MS_GROUP flags also conflict with reserved
flags in the kernel. While this is not known to have caused any
problems, it is nevertheless incorrect. For the purposes of the
mount.zfs helper, the "users", "owner", and "group" options just serve
as hints to set additional implied options. Therefore we now define
their associated mount flags in terms of the options that they imply
rather than giving them unique values.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1457
Adds zpool split documentation to the zpool man page. I only documented
the options that I could get to work. While it is documented on some
sun blogs that devices can be specified for split, I was not able to get
that to work during my testing.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1456
The Illumos code introduced the ASSERT0 and VERIFY0 macros which
are to be used instead of ASSERT3S(x, ==, 0) and VERIFY3S(x, ==, 0).
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Madhav Suresh <madhav.suresh@delphix.com>
Closes#246
Re-order initialization in spl_kmem_init to allow for kmem tracing
to work. The spl_kmem_init function calls taskq_create prior to
initializing the tracking (calling spl_kmem_init_tracking). Since
taskq_create uses kmem_alloc, NULL dereferences occur because the
global kmem_list hasn't had its next & prev pointers initialized yet.
This commit moves the calls to spl_kmem_init_tracking earlier in the
spl_kmem_init function in order that the subsequent kmem_alloc calls
(by taskq_create) work properly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#243
When SA xattrs are enabled only fallback to checking the directory
xattrs when the name is not found as a SA xattr. Otherwise, the SA
error which should be returned to the caller is overwritten by the
directory xattr errors. Positive return values indicating success
will also be immediately returned.
In the case of #1437 the ERANGE error was being correctly returned
by zpl_xattr_get_sa() only to be overridden with ENOENT which was
returned by the subsequent unnessisary call to zpl_xattr_get_dir().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1437
As a part of scrub/resilver tuning zfs_scrub_limit fell out of use,
but the definition of the variable remained in place.
Moreover various guides still (misleadingly) mention it as a way
to influence resilver/scrub behavior.
This commit removes its finally.
Signed-off-by: Cyril Plisko <cyril.plisko@mountall.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1444
The assertions in ddt_phys_decref and ddt_sync_entry cast ddp->ddp_refcnt
from uint64_t to int64_t, with a reference count bigger than 2^63, e.g. the
reference count of zero blocks commonly available in spare files, we may
mistakenly hit these assertations, so drop the type conversions here.
Signed-off-by: Ying Zhu <casualfisher@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1436
The vn_rdwr() function performs I/O by calling the vfs_write() or
vfs_read() functions. These functions reside just below the system
call layer and the expectation is they have almost the entire 8k of
stack space to work with. In fact, certain layered configurations
such as ext+lvm+md+multipath require the majority of this stack to
avoid stack overflows.
To avoid this posibility the vn_rdwr() call in dump_bytes() has been
moved to the ZIO_TYPE_FREE, taskq. This ensures that all I/O will be
performed with the majority of the stack space available. This ends
up being very similiar to as if the I/O were issued via sys_write()
or sys_read().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1399Closes#1423
3581 spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock is piping hot
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
References:
illumos/illumos-gate@ec94d32https://illumos.org/issues/3581
Notes for Linux port:
Earlier commit 08d08eb reduced contention on this taskq lock by simply
reducing the number of z_fr_iss threads from 100 to one-per-CPU. We
also optimized the taskq implementation in zfsonlinux/spl@3c6ed54.
These changes significantly improved unlink performance to acceptable
levels.
This patch further reduces time spent spinning on this lock by
randomly dispatching the work items over multiple independent task
queues. The Illumos ZFS developers stated that this lock contention
only arose after "3329 spa_sync() spends 10-20% of its time in
spa_free_sync_cb()" was landed. It's not clear if 3329 affects the
Linux port or not. I didn't see spa_free_sync_cb() show up in
oprofile sessions while unlinking large files, but I may just not
have used the right test case.
I tested unlinking a 1 TB of data with and without the patch and
didn't observe a meaningful difference in elapsed time. However,
oprofile showed that the percent time spent in taskq_thread() was
reduced from about 16% to about 5%. Aside from a possible slight
performance benefit this may be worth landing if only for the sake of
maintaining consistency with upstream.
Ported-by: Ned Bass <bass6@llnl.gov>
Closes#1327
3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb()
3330 space_seg_t should have its own kmem_cache
3331 deferred frees should happen after sync_pass 1
3335 make SYNC_PASS_* constants tunable
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Eric Schrock <eric.schrock@delphix.com>
References:
illumos/illumos-gate@01f55e48fbhttps://www.illumos.org/issues/3329https://www.illumos.org/issues/3330https://www.illumos.org/issues/3331https://www.illumos.org/issues/3335
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
3306 zdb should be able to issue reads in parallel
3321 'zpool reopen' command should be documented in the man
page and help
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
illumos/illumos-gate@31d7e8fa33https://www.illumos.org/issues/3306https://www.illumos.org/issues/3321
The vdev_file.c implementation in this patch diverges significantly
from the upstream version. For consistenty with the vdev_disk.c
code the upstream version leverages the Illumos bio interfaces.
This makes sense for Illumos but not for ZoL for two reasons.
1) The vdev_disk.c code in ZoL has been rewritten to use the
Linux block device interfaces which differ significantly
from those in Illumos. Therefore, updating the vdev_file.c
to use the Illumos interfaces doesn't get you consistency
with vdev_disk.c.
2) Using the upstream patch as is would requiring implementing
compatibility code for those Solaris block device interfaces
in user and kernel space. That additional complexity could
lead to confusion and doesn't buy us anything.
For these reasons I've opted to simply move the existing vn_rdwr()
as is in to the taskq function. This has the advantage of being
low risk and easy to understand. Moving the vn_rdwr() function
in to its own taskq thread also neatly avoids the possibility of
a stack overflow.
Finally, because of the additional work which is being handled by
the free taskq the number of threads has been increased. The
thread count under Illumos defaults to 100 but was decreased to 2
in commit 08d08e due to contention. We increase it to 8 until
the contention can be address by porting Illumos #3581.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1354
The existing taskq_wait_id() function can incorrectly block
indefinitely. Reimplement it more simply using wait_event()
in a similar fashion to taskq_wait_all().
This flaw was uncovered in the context of moving vn_rdwr() to
a taskq. Previously taskq_wait_id() had no consumers outside
the SPLAT task framework which is why the issue went unnoticed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The previous code was only waiting for the symver file. But the
postinst target of the DKMS script for SPL will not only create
the symvers file, but also the header spl_config.h.
If we are waiting in the configure script of ZFS for the SPL
symvers file, then we also need to wait for spl_config.h.
Otherwise the configure script will abort because the spl_config.h
is not yet available.
On top of that, the function ZFS_AC_SPL_MODULE_SYMVERS is moved
to the end of the function ZFS_AC_SPL to allow both checks share
the with-spl-timeout parameter.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1431
Recent changes have caused older versions of gcc to mistakenly
flag 'old_umask' in vn_open() as an unitialized variable. To
silence the warning initialize it.
kernel.c: In function 'vn_open':
kernel.c:525:6: error: 'old_umask' may be used uninitialized
in this function
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The zfs_fd must be opened before calling print_all_handlers() or
the ioctl() cannot be used to the zfs control device. This brings
the zinject code back in sync with the Illumos implementation.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matt Ahrens <matthew.ahrens@delphix.com>
Reviewed by: Eric Schrock <eric.schrock@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
NOTES: This patch has been reworked from the original in the
following ways to accomidate Linux ZFS implementation
*) Usage of the cyclic interface was replaced by the delayed taskq
interface. This avoids the need to implement new compatibility
code and allows us to rely on the existing taskq implementation.
*) An extern for zfs_txg_synctime_ms was added to sys/dsl_pool.h
because declaring externs in source files as was done in the
original patch is just plain wrong.
*) Instead of panicing the system when the deadman triggers a
zevent describing the blocked vdev and the first pending I/O
is posted. If the panic behavior is desired Linux provides
other generic methods to panic the system when threads are
observed to hang.
*) For reference, to delay zios by 30 seconds for testing you can
use zinject as follows: 'zinject -d <vdev> -D30 <pool>'
References:
illumos/illumos-gate@283b84606bhttps://www.illumos.org/issues/3246
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1396
Somewhat amazingly it went unnoticed that the delay() function
doesn't actually cause the task to block. Since the task state
is never changed from TASK_RUNNING before schedule_timeout() the
scheduler allows to task to continue running without any delay.
Using schedule_timeout_interruptible() resolves the issue by
correctly setting TASK_UNINTERRUPTIBLE.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add wrappers for the Solaris MSEC_TO_TICK, USEC_TO_TICK, and
NSEC_TO_TICK conversion functions. They are mapped directly to
their Linux counterparts with the exception of NSEC_TO_TICK
can cannot use usecs_to_jiffies() because it is not exported
by the kernel.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
A deadlock was accidentally introduced by commit e95853a which
can occur when the system is under memory pressure. What happens
is that while the txg_quiesce thread is holding the tx->tx_cpu
locks it enters memory reclaim. In the context of this memory
reclaim it then issues synchronous I/O to a ZVOL swap device.
Because the txg_quiesce thread is holding the tx->tx_cpu locks
a new txg cannot be opened to handle the I/O. Deadlock.
The fix is straight forward. Move the memory allocation outside
the critical region where the tx->tx_cpu locks are held. And for
good measure change the offending allocation to KM_PUSHPAGE to
ensure it never attempts to issue I/O during reclaim.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1274
These are build products and should be ignored.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue #1402
When the kmod packaging was introduced the ability to pass the
--enable-debug and --enable-dmu-tx options from configure all
the way through to `make rpm|deb` was accidenally lost. Update
ZFS_AC_RPM to explicitlu set RPM_DEFINE_COMMON with these
rpmbuild defines.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue #1402
Issue #928
When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1402
There are a number of issues with the generic kmod RPM spec in its
current state:
- The "%{__id_u}" macro seems to not be available on some systems (e.g.
Debian squeeze). It appears it has been deprecated. Use "${__id} -u"
instead.
- The way the "--with-linux=" configure option is generated in the
non-RHEL/Fedora case is completely wrong with various newline and
escaping issues (also, $kernel_version is not available in the
generator context).
The second issue made the generator shell snippet (almost) silently
fail, which under specific circumstances can result in broken builds
against the wrong kernel sources.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1416
These are build products and should be ignored.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402
Preserve the release field when creating Debian packages. The
--keep-version option was not used because it results in a failure
when the git '<commit>_<hash>' syntax is used for the release.
The '_' is a valid character for RPM packages but not for DEBs.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Turbo Fredriksson <turbo@bayour.com>
Issue zfsonlinux/zfs#1402
Issue zfsonlinux/zfs#928
When building a custom release in a git tree provide the ability
to prevent the release field from being overwritten by the
`git describe` output.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1402
There are a number of issues with the generic kmod RPM spec in its
current state:
- The "%{__id_u}" macro seems to not be available on some systems (e.g.
Debian squeeze). It appears it has been deprecated. Use "${__id} -u"
instead.
- The way the "--with-linux=" configure option is generated in the
non-RHEL/Fedora case is completely wrong with various newline and
escaping issues (also, $kernel_version is not available in the
generator context).
The second issue made the generator shell snippet (almost) silently
fail, which under specific circumstances can result in broken builds
against the wrong kernel sources.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#231
According to the getxattr(2) man page the ERANGE errno should be
returned when the size of the value buffer is to small to hold the
result. Prior to this patch the implementation would just truncate
the value to size bytes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1408
The zpl_readdir() function shouldn't be registered as part of
the zpl_file_operations table, it must only be part of the
zpl_dir_file_operations table. By removing this callback
the VFS will now correctly return ENOTDIR when calling
getdents() on a file.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1404
Previous patches have allowed you to set an increased ashift to
avoid doing 512b IO with 4k sector devices. However, it was not
possible to set the ashift lower than the reported physical sector
size even when a smaller logical size was supported. In practice,
there are several cases where settong a lower ashift is useful:
* Most modern drives now correctly report their physical sector
size as 4k. This causes zfs to correctly default to using a 4k
sector size (ashift=12). However, for some usage models this
new default ashift value causes an unacceptable increase in
space usage. Filesystems with many small files may see the
total available space reduced to 30-40% which is unacceptable.
* When replacing a drive in an existing pool which was created
with ashift=9 a modern 4k sector drive cannot be used. The
'zpool replace' command will issue an error that the new drive
has an 'incompatible sector alignment'. However, by allowing
the ashift to be manual specified as smaller, non-optimal,
value the device may still be safely used.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1381Closes#1328
Issue #967
Issue #548
3422 zpool create/syseventd race yield non-importable pool
3425 first write to a new zvol can fail with EFBIG
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Garrett D'Amore <garrett@damore.org>
References:
illumos/illumos-gate@bda8819455https://www.illumos.org/issues/3422https://www.illumos.org/issues/3425
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1390
For the DKMS package to successfully build the kernel-devel
headers must be included along gcc, make, and perl. The SPL
code never directly invokes perl but the kernel build system
depends on it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380
For the DKMS package to successfully build the kernel-devel
headers must be included along gcc, make, and perl. The ZFS
code never directly invokes perl but the kernel build system
depends on it.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1380
The only remaining perl dependency is part of the ZFS_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #1380
Commit f6fb7651a0 introduced the idea
of working builds which work correctly. However, because the zfs-kmod
depends on a specific 'spl-devel-kmod = {version}-%{release}' package
and the release component is unique the dependency is never satisfied.
This requires line was introduced to ensure the correct version of the
spl is always used. In this context only the version number is required
so the release component has been dropped to satisfy the dependency.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Part of the automated testing involves building the source on Debian Lenny
which ships an ancient version of automake (1.10.1). Historically, this
has caused a non-fatal warning about AM_SILENT_RULES not being defined.
But when the autogen.sh script was updated to use autoreconf the warning
became fatal.
configure.ac:31: warning: macro `AM_SILENT_RULES' not found in library
autoreconf: running: /usr/bin/autoconf --force
configure.ac:34: error: possibly undefined macro: AM_SILENT_RULES
If this token and others are legitimate, please use m4_pattern_allow.
To resolve this build issue the call to AM_SILENT_RULES has been wrapped
by m4_ifdef(). This prevents the macro from being expanded on platforms
where it's undefined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The only remaining perl dependency is part of the SPL_AC_META macro.
By eliminating this and replacing it with awk we can avoid the need
to pull in perl to rebuild the packages.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#1380
Part of the automated testing involves building the source on Debian Lenny
which ships an ancient version of automake (1.10.1). Historically, this
has caused a non-fatal warning about AM_SILENT_RULES not being defined.
But when the autogen.sh script was updated to use autoreconf the warning
became fatal.
configure.ac:31: warning: macro `AM_SILENT_RULES' not found in library
autoreconf: running: /usr/bin/autoconf --force
configure.ac:34: error: possibly undefined macro: AM_SILENT_RULES
If this token and others are legitimate, please use m4_pattern_allow.
To resolve this build issue the call to AM_SILENT_RULES has been wrapped
by m4_ifdef(). This prevents the macro from being expanded on platforms
where it's undefined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Rationale see section 3.5 "Using `autoreconf' to Update `configure'
Scripts" of the autoconf manual.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
-D and -I are preprocessor flags, so should preferably be in the
appropriate variable.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
* Update manpage with more information about the ACL, guest access
and that samba needs to be able to authenticate user(s).
* Add information that 'net' can be used to modify the share after
ZFS sharing and that it will be undone with a 'zfs unshare'.
* Give an example on how to mount a SMB filesystem shared via ZFS.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#1181
Issue #1170