mirror_zfs/etc/init.d
Brian Behlendorf 3ec3bc2167 OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>

Background information: This assertion about tx_space_* verifies that we
are not dirtying more stuff than we thought we would. We “need” to know
how much we will dirty so that we can check if we should fail this
transaction with ENOSPC/EDQUOT, in dmu_tx_assign(). While the
transaction is open (i.e. between dmu_tx_assign() and dmu_tx_commit() —
typically less than a millisecond), we call dbuf_dirty() on the exact
blocks that will be modified. Once this happens, the temporary
accounting in tx_space_* is unnecessary, because we know exactly what
blocks are newly dirtied; we call dnode_willuse_space() to track this
more exact accounting.

The fundamental problem causing this bug is that dmu_tx_hold_*() relies
on the current state in the DMU (e.g. dn_nlevels) to predict how much
will be dirtied by this transaction, but this state can change before we
actually perform the transaction (i.e. call dbuf_dirty()).

This bug will be fixed by removing the assertion that the tx_space_*
accounting is perfectly accurate (i.e. we never dirty more than was
predicted by dmu_tx_hold_*()). By removing the requirement that this
accounting be perfectly accurate, we can also vastly simplify it, e.g.
removing most of the logic in dmu_tx_count_*().

The new tx space accounting will be very approximate, and may be more or
less than what is actually dirtied. It will still be used to determine
if this transaction will put us over quota. Transactions that are marked
by dmu_tx_mark_netfree() will be excepted from this check. We won’t make
an attempt to determine how much space will be freed by the transaction
— this was rarely accurate enough to determine if a transaction should
be permitted when we are over quota, which is why dmu_tx_mark_netfree()
was introduced in 2014.

We also won’t attempt to give “credit” when overwriting existing blocks,
if those blocks may be freed. This allows us to remove the
do_free_accounting logic in dbuf_dirty(), and associated routines. This
logic attempted to predict what will be on disk when this txg syncs, to
know if the overwritten block will be freed (i.e. exists, and has no
snapshots).

OpenZFS-issue: https://www.illumos.org/issues/7793
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/3704e0a
Upstream bugs: DLPX-32883a
Closes #5804 

Porting notes:
- DNODE_SIZE replaced with DNODE_MIN_SIZE in dmu_tx_count_dnode(),
  Using the default dnode size would be slightly better.
- DEBUG_DMU_TX wrappers and configure option removed.
- Resolved _by_dnode() conflicts these changes have not yet been
  applied to OpenZFS.
2017-03-07 09:51:59 -08:00
..
.gitignore Base init scripts for SYSV systems 2015-05-28 14:14:53 -07:00
Makefile.am Set proper dependency for string replacement targets 2016-08-02 10:28:29 -07:00
README.md Init script fixes 2015-09-29 11:42:24 -07:00
zfs-functions.in Change /etc/mtab to /proc/self/mounts 2016-09-20 10:07:58 -07:00
zfs-import.in Fix spelling 2017-01-03 11:31:18 -06:00
zfs-mount.in Fix spelling 2017-01-03 11:31:18 -06:00
zfs-share.in Add support for alpine linux 2016-03-08 13:19:53 -08:00
zfs-zed.in Add support for alpine linux 2016-03-08 13:19:53 -08:00
zfs.in OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space 2017-03-07 09:51:59 -08:00

DESCRIPTION These script were written with the primary intention of being portable and usable on as many systems as possible.

This is, in practice, usually not possible. But the intention is there. And it is a good one.

They have been tested successfully on:

* Debian GNU/Linux Wheezy
* Debian GNU/Linux Jessie
* Ubuntu Trusty
* CentOS 6.0
* CentOS 6.6
* Gentoo

SUPPORT If you find that they don't work for your platform, please report this at the ZFS On Linux issue tracker at https://github.com/zfsonlinux/zfs/issues.

Please include:

* Distribution name
* Distribution version
* Where to find an install CD image
* Architecture

If you have code to share that fixes the problem, that is much better. But please remember to try your best keep portability in mind. If you suspect that what you're writing/modifying won't work on anything else than your distribution, please make sure to put that code in appropriate if/else/fi code.

It currently MUST be bash (or fully compatible) for this to work.

If you're making your own distribution and you want the scripts to work on that, the biggest problem you'll (probably) have is the part at the beginning of the "zfs-functions.in" file which sets up the logging output.

INSTALLING INIT SCRIPT LINKS To setup the init script links in /etc/rc?.d manually on a Debian GNU/Linux (or derived) system, run the following commands (the order is important!):

update-rc.d zfs-import start 07 S .       stop 07 0 1 6 .
update-rc.d zfs-mount  start 02 2 3 4 5 . stop 06 0 1 6 .
update-rc.d zfs-zed    start 07 2 3 4 5 . stop 08 0 1 6 .
update-rc.d zfs-share  start 27 2 3 4 5 . stop 05 0 1 6 .

To do the same on RedHat, Fedora and/or CentOS:

chkconfig zfs-import
chkconfig zfs-mount
chkconfig zfs-zed
chkconfig zfs-share

On Gentoo:

rc-update add zfs-import boot
rc-update add zfs-mount boot
rc-update add zfs-zed default
rc-update add zfs-share default

The idea here is to make sure all of the ZFS filesystems, including possibly separate datasets like /var, are mounted before anything else is started.

Then, ZED, which depends on /var, can be started. It will consume and act on events that occurred before it started. ZED may also play a role in sharing filesystems in the future, so it is important to start before the 'share' service.

Finally, we share filesystems configured with the share* property.