Only under Ubuntu Lucid the rpm packaging step mistakenly adds
the following files twice to the package because of the /lib
naming convention. This is harmless but results in a warning
which the buildot flags as a failure. Suppress this warning.
warning: File listed twice: /lib/udev/rules.d
warning: File listed twice: /lib/udev/rules.d/60-zpool.rules
warning: File listed twice: /lib/udev/rules.d/60-zvol.rules
warning: File listed twice: /lib/udev/rules.d/90-zfs.rules
warning: File listed twice: /lib/udev/sas_switch_id
warning: File listed twice: /lib/udev/zpool_id
warning: File listed twice: /lib/udev/zvol_id
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Update the code to use the bdi_setup_and_register() helper to
simplify the bdi integration code. The updated code now just
registers the bdi during mount and destroys it during unmount.
The only complication is that for 2.6.32 - 2.6.33 kernels the
helper wasn't available so in these cases the zfs code must
provide it. Luckily the bdi_setup_and_register() function
is trivial.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#367
Fix an unlikely failure cause in zfs_sb_create() which could
leave the dataset owned on error and thus unavailable until
after a reboot. Disown the dataset if SA are expected but
are in fact missing.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Profiling the system during meta data intensive workloads such
as creating/removing millions of files, revealed that the system
was cpu bound. A large fraction of that cpu time was being spent
waiting on the virtual address space spin lock.
It turns out this was caused by certain heavily used kmem_caches
being backed by virtual memory. By default a kmem_cache will
dynamically determine the type of memory used based on the object
size. For large objects virtual memory is usually preferable
and for small object physical memory is a better choice. See
the spl_slab_alloc() function for a longer discussion on this.
However, there is a certain amount of gray area when defining a
'large' object. For the following caches it turns out they were
just over the line:
* dnode_cache
* zio_cache
* zio_link_cache
* zio_buf_512_cache
* zfs_data_buf_512_cache
Now because we know there will be a lot of churn in these caches,
and because we know the slabs will still be reasonably sized.
We can safely request with the KMC_KMEM flag that the caches be
backed with physical memory addresses. This entirely avoids the
need to serialize on the virtual address space lock.
As a bonus this also reduces our vmalloc usage which will be good
for 32-bit kernels which have a very small virtual address space.
It will also probably be good for interactive performance since
unrelated processes could also block of this same global lock.
Finally, we may see less cpu time being burned in the arc_reclaim
and txg_sync_threads.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #258
Be careful not to unconditionally clear the PF_MEMALLOC bit in
the task structure. It may have already been set when entering
zpl_putpage() in which case it must remain set on exit. In
particular the kswapd thread will have PF_MEMALLOC set in
order to prevent it from entering direct reclaim. By clearing
it we allow the following NULL deref to potentially occur.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8109c7ab>] balance_pgdat+0x25b/0x4ff
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #287
zfs_getattr_fast() was missing a lock on the ZFS superblock which
could result in zfs_znode_dmu_fini() clearing the zp->z_sa_hdl member
while zfs_getattr_fast() was accessing the znode. The result of this
would usually be a panic.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Fixes#431
Be careful not to unconditionally clear the PF_MEMALLOC bit in
the task structure. It may have already been set when entering
kv_alloc() in which case it must remain set on exit. In
particular the kswapd thread will have PF_MEMALLOC set in
order to prevent it from entering direct reclaim. By clearing
it we allow the following NULL deref to potentially occur.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8109c7ab>] balance_pgdat+0x25b/0x4ff
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS issue #287
When calculating space needed for SA_BONUS buffers, hdrsize is
always rounded up to next 8-aligned boundary. However, in two places
the round up was done against sum of 'total' plus hdrsize. On the
other hand, hdrsize increments by 4 each time, which means in certain
conditions, we would end up returning with will_spill == 0 and
(total + hdrsize) larger than full_space, leading to a failed
assertion because it's invalid for dmu_set_bonus.
Reviewed by: Matthew Ahrens <matt@delphix.com>
Reviewed by: Dan McDonald <danmcd@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References to Illumos issue:
https://www.illumos.org/issues/1661
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#426
ZFS contains error messages that point to the defunct www.sun.com
domain, which is currently offline. Change these error messages
to use the zfsonlinux.org mirror instead.
This commit depends on:
zfsonlinux/zfsonlinux.github.com@8e10ead3dc
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The old define assumed a specific layout of the kmutex_t struct. This
patch makes the macro independent from the actual struct layout.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The m_owner variable is protected by the mutex itself. Reading the variable
is guaranteed to be atomic (due to it being a word-sized reference) and
ACCESS_ONCE() takes care of read cache effects.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
On kernels with CONFIG_DEBUG_MUTEXES mutex_exit() clears the mutex
owner after releasing the mutex. This would cause mutex_owner()
to return an incorrect owner if another thread managed to lock the
mutex before mutex_exit() had a chance to clear the owner.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS issue #167
This would cause problems when using 'zfs send' with a file as the
target (rather than a pipe or a socket as is usually the case) as
for each write the destination offset in the file would be 0.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS issue #391
Register the setattr/getattr callbacks for symlinks. Without these
the generic inode_setattr() and generic_fillattr() functions will
be used. In the setattr case this will only result in the inode being
updated in memory, the dirty_inode callback would also normally run
but none is registered for zfs.
The straight forward fix is to set the setattr/getattr callbacks
for symlinks so they are handled just like files and directories.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#412
An incomplete guid_to_ds_map would cause restore_write_byref() to fail
while receiving a de-duplicated backup stream.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Garrett D`Amore <garrett@nexenta.com>
Reviewed by: Gordon Ross <gwr@nexenta.com>
Approved by: Gordon Ross <gwr@nexenta.com>
References to Illumos issue and patch:
- https://www.illumos.org/issues/755
- https://github.com/illumos/illumos-gate/commit/ec5cf9d53a
Signed-off-by: Gunnar Beutner <gunnar@beutner.name>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#372
zfs.spec.in and zfs-modules.spec.in had the License field incorrectly
set to @LICENSE@, causing generated rpm packages to report an invalid
license string. Fix this by using @ZFS_META_LICENSE@.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#422
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The URL field in the spl-modules and spl package spec files were
updated to point to the ZFS on Linux repository hosted by github.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
When running the zconfig.sh, zpios-sanity.sh, and zfault.sh
from the installed packages the 90-zfs.rules can cause failures.
These will occur because the test suite assumes it has full
control over loading/unloading the module stack. If the stack
gets asynchronously loaded by the udev rule the test suite
will treat it as a failure. Resolve the issue by disabling
the offending rule during the tests and enabling it on exit.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Export all symbols already marked extern in the zfs_vfsops.h
header. Several non-static symbols have also been added to
the header and exportewd. This allows external modules to
more easily create and manipulate properly created ZFS
filesystem type datasets.
Rename zfsvfs_teardown() to zfs_sb_teardown and export it.
This is done simply for consistency with the rest of the code
base. All other zfsvfs_* functions have already been renamed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The typo did not have any effect (apart from a negligible performance
impact) because skc->skc_flags * KMC_OFFSLAB is always non-null when
at least one bit in skc->skc_flags is set.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
In a non-debug build the ASSERT() would be optimized away
which could cause pending work items to not be cancelled.
We must also use cancel_delayed_work_sync() rather than just
cancel_delayed_work() to actually wait until work items have
completed. Otherwise they might accidentally access free'd
memory.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS bugs #279, #62, #363, #418
File descriptors are a per-process resource. The same descriptor
in different processes can refer to different files. find_file()
incorrectly assumed that file descriptors are globally unique.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes ZFS issue #386
The Lustre packages satify their backend fs requirement by
checking that lustre-backend-fs is provided. Update the zfs
packaging accordingly.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
No longer print the following warning to the console when the
/etc/hostid file is missing. This is the expected default behavior.
Keeping the hostid in sync with the initramfs is now accomplished
by creating the /etc/hostid in the initramfs not on the system.
SPL: The /etc/hostid file is not found.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Export all the symbols for the system attribute (SA) API. This
allows external module to cleanly manipulate the SAs associated
with a dnode. Documention for the SA API can be found in the
module/zfs/sa.c source.
This change also removes the zfs_sa_uprade_pre, and
zfs_sa_uprade_post prototypes. The functions themselves were
dropped some time ago.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Relying on an /etc/hostid file which is installed in the system
image breaks diskless systems which share an image. Certain
cluster infrastructure such as MPI relies on all nodes having
a unique hostid. However, we still must be careful to ensure
the hostid is syncronized between the initramfs and system
images when using zfs root filesystems.
To accompish this the automatically created /etc/hostid file has
been removed from the spl rpm packaging. The /etc/hostid file
is now dynamically created for your initramfs as part of the
dracut install process. This avoids the need to install it in
the actual system images.
This change also resolves the spl_hostid parameter handling
for dracut.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#398Closes#399
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Creating an /etc/hostid file as part of the rpm post install
causes problems for diskless systems which are sharing an image.
While it's still critical to ensure the hostid doesn't change
for zfs root filesystems. This will now be done by setting
the /etc/hostid in the initramfs created by dracut.
This reverts commit 79593b0dec.
The == operator is specific to bash, replace it with the more
correct = operator for sh. This bug can prevent correct booting
when using a zfs root pool.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#416
Due to the confusion in Linux statfs between f_frsize and f_bsize
the blocks counts were changed to be in units of z_max_blksize
instead of SPA_MINBLOCKSIZE as it is on other platforms.
However, the free files calculation in zfs_statvfs() is limited by
the free blocks count, since each dnode consumes one block/sector.
This provided a reasonable estimate of free inodes, but on Linux
this meant that the free inodes count was underestimated by a large
amount, since 256 512-byte dnodes can fit into a 128kB block, and
more if the max blocksize is increased to 1MB or larger.
Also, the use of SPA_MINBLOCKSIZE is semantically incorrect since
DNODE_SIZE may change to a value other than SPA_MINBLOCKSIZE and
may even change per dataset, and devices with large sectors setting
ashift will also use a larger blocksize.
Correct the f_ffree calculation to use (availbytes >> DNODE_SHIFT)
to more accurately compute the maximum number of dnodes that can
be created.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#413Closes#400
When compiling under Debian Lenny with gcc version 4.3.2
(Debian 4.3.2-1.1) the following warning occurs. To quiet
the warning initialize 'error' to zero. Newer versions of
gcc correctly determine that this uninitialized varible is
impossible because ZFS_NUM_USERQUOTA_PROPS is known to be
greater than zero.
cmd/zfs/zfs_main.c:2377: warning: "error" may be
used uninitialized in this function
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Export all the symbols for the ZAP API. This allows external modules
to cleanly interface with ZAP type objects. Previously only a subset
of the functionality was exposed. Documention for the ZAP API can be
found in the sys/zap.h header.
This change also removes a duplicate zap_increment_int() prototype.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
GPT's created by libefi set the HeaderSize attribute in the GPT
header to 512 -- size of the GPT header INCLUDING the 420 padding
bytes at the end. Most other tools set the size to 92 -- size of
the actual header itself excluding the padding. Most tools check
the recorded HeaderSize when verifying CRC, but gptfdisk hardcodes
92 and thus reports CRC verification problems for full-disk vdevs
created IE with `zpool create pool sdc`.
This commit changes libefi's behavior for GPT creation and also
fixes several edge cases where libefi's behavior was similar
(though in an incompatible manner) to gptfdisk. Libefi assumed
HeaderSize was always 512 even if the GPT recorded a different
value. Sanity checks of the GPT headersize read from disk were
added before applying checksum calculation -- this will prevent
segfault in cases of bogus on-disk values.
Zpools created with the resuling libefi are verified as correct
both by parted and gptfdisk. Also pool have been tested to
import correctly on ZFS on Linux as well as Solaris Express 11
livecd.
Signed-off-by: Zachary Bedell <zac@thebedells.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#344
mount-zfs.sh script incorrectly parsed results from zpool list. Correct
bootfs attribute was only found on systems with a single pool or where
the bootable pool's name alphabetized to before all other pool names.
Boot failed when the bootable pool's name came after other pools
(IE 'rpool' and 'mypool' would fail to find bootfs on rpool.)
Patch correctly discards pools whose bootfs attribute is blank ('-').
Signed-off-by: Zachary Bedell <zac@thebedells.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#402
As written, the $(init_SCRIPTS) rule in etc/init.d/Makefule.am
would not work as expected if the init_SCRIPTS variable were
to contain any elements other than zfs. Fix this by replacing
the hard-coded 'zfs' reference with $@.
Signed-off-by: Ned Bass <bass6@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#410
Older versions of gcc (gcc-4.1.2) will treat an 'incompatible
pointer type' as a warning instead of an error. This results
in HAVE_FS_STRUCT_SPINLOCK being defined incorrectly. This
failure mode was observed when using a RHEL6 2.6.32 based kernel
under RHEL5.5 which contains the old version of gcc. To resolve
the issue the warning is explicitly promoted to an error.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Suppress the warning for this large kmem_alloc() because it is not
that far over the warning threshhold (8k) and it is short lived.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The zfs-devel header files for linking with the libspl/libzfs
libraries should be installed under /usr/include not /include.
Ensure the correct install location is used when building an
rpm package.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Caught by code inspection, the variable zsb was referenced after
being freed. Move the kmem_free() to the end of the function.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
It seems that dracut version 009 through 013 won't boot correctly when
the zfs-dracut rpm package has been installed, but 'root=zfs' isn't
used on the boot commandline, for example when the package has been
installed on a system that _doesn't_ boot from a zfs filesystem.
Signed-off-by: Jeremy Gill <jgill@parallax-innovations.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#377
The zfs.spec.in file had the license field hard coded to specify the
CDDL. This was changed to use the @LICENSE@ variable, maintaining
consistency with the zfs-modules.spec.in file.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The URL field in the zfs-modules and zfs package spec files were
updated to point to the ZFS on Linux repository hosted by github.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The 'if' statements found in kernel.m4 were converted to use the
portable alternative provided by autoconf, the AS_IF macro.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
A few of the autoconf error messages were inconsistent with the rest of
the build system. To be specific, the inconsistencies addressed by this
commit are the following:
* The second line of the error message for the CONFIG_PREEMPT check
was missing it's third asterisk.
* A few of the error messages were prefixed by two tabs, whereas the
majority of error messages are only prefixed by a single tab.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This regression was accidentally introduced by commit aa2b489.
I was attempting to simplify the init scripts and accidentally
confused the /etc/init.d and /etc/zfs paths. This change reverts
the init script modifications.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#370