It is just plain unsafe to peek inside in-kernel
mutex structure and make assumptions about what kernel
does with those internal fields like owner.
Kernel is all too happy to stop doing the expected things
like tracing lock owner once you load a tainted module
like spl/zfs that is not GPL.
As such you will get instant assertion failures like this:
VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)&
((&((&zo->zo_lock)->m_mutex))->owner))) ==
((void *)0)) failed (ffff88030be28500 == (null))
PANIC at zfs_onexit.c:104:zfs_onexit_destroy()
Showing stack for process 3626
CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
dump_stack+0x19/0x1b
spl_dumpstack+0x44/0x50 [spl]
spl_panic+0xbf/0xf0 [spl]
zfs_onexit_destroy+0x17c/0x280 [zfs]
zfsdev_release+0x48/0xd0 [zfs]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes#639Closes#632
Prevent race on accessing kmutex_t when the mutex is
embedded in a ref counted structure.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes zfsonlinux/zfs#6401
Closes#637
This reverts commit d89616fda8 which
introduced some build failures which need to be resolved before
this can be merged.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #633
It is just plain unsafe to peek inside in-kernel
mutex structure and make assumptions about what kernel
does with those internal fields like owner.
Kernel is all too happy to stop doing the expected things
like tracing lock owner once you load a tainted module
like spl/zfs that is not GPL.
As such you will get instant assertion failures like this:
VERIFY3(((*(volatile typeof((&((&zo->zo_lock)->m_mutex))->owner) *)&
((&((&zo->zo_lock)->m_mutex))->owner))) ==
((void *)0)) failed (ffff88030be28500 == (null))
PANIC at zfs_onexit.c:104:zfs_onexit_destroy()
Showing stack for process 3626
CPU: 0 PID: 3626 Comm: mkfs.lustre Tainted: P OE ------------ 3.10.0-debug #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
dump_stack+0x19/0x1b
spl_dumpstack+0x44/0x50 [spl]
spl_panic+0xbf/0xf0 [spl]
zfs_onexit_destroy+0x17c/0x280 [zfs]
zfsdev_release+0x48/0xd0 [zfs]
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes#632Closes#633
Add aarch64 to the list of architecture which do not sanitize the
LDFLAGS from the environment. See e0aacd9b for details.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#635
In unattended operations it's often more useful to have node
panic and reboot when it encounters problems as opposed to
sit there indefinitely waiting for somebody to discover it.
This implements an spl_panic_crash module parameter, set it
to nonzero to cause spl_panic() to call panic().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Closes#634
When we load a ZFS pool having spa_name equals to some existing kstat
we would have to create a duplicate entry, which procfs doesn't like.
For instance a ZFS pool named "zil" would have its kstat "txgs"
(module "zfs/zil") intalled under "/proc/spl/kstat/zfs/zil":
unfortunately we already have a kstat named "zil" (module "zfs")
installed in the same procfs location.
Avoid this issue by skipping the duplicate entry creation in procfs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#628
Historically the SPL cached the system hostid the first time it
was accessed. This was done to speed up subsequent accesses.
But in practice the system host id is rarely accessed and its
inconvenient that it doesn't promptly detect /etc/hostid
configuration changes. Therefore, zone_get_hostid() has been
updated to always refresh the system hostid reported.
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#626
Initialize dummy_lock to fix the build error in gcc 7.1.1 with:
error: ‘dummy_lock’ is used uninitialized in this function
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#622
Don't use `uname -r` to determine kernel build directory when the user
specified kernel source with --with-linux. Otherwise, the user is forced
to use --with-linux-obj even if they are the same directory, which is
very counterintuitive.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Exclude Makefile.in in module/ and fix the gitignore in cmd/
Also, ignore *.patch and *.orig files
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Perform the already past expiration time check before updating
cvp->cv_mutex with the provided mutex. This check only depends
on local state. Doing it first ensures that cvp->cv_mutex will not
be updated in the timeout case or if it's ever called with an
expire_time <= now.
Reviewed-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#616
Change SPL_FSTRANS to optionally contains PF_FSTRANS. Also, add
__spl_pf_fstrans_check for the checks specifically for PF_FSTRANS.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#614
The assert() related definitions in glibc 2.25 were altered to warn
about assert(X=Y) when -Wparentheses is used. See
https://abi-laboratory.pro/tracker/changelog/glibc/2.25/log.html
lib/list.c used this construct to set the value of a magic field which
is defined only when debugging.
Replaced the assert()s with #ifndef/#endifs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#610
Before kernel 2.6.29 credentials were embedded in task_structs, and zfs had
cases where one thread would need to refer to the credential of another thread,
forcing it to take a hold on the foreign thread's task_struct to ensure it was
not freed.
Since 2.6.29, the credential has been moved out of the task_struct into a
cred_t.
In addition, the mainline kernel originally did not export __put_task_struct()
but the RHEL5 kernel did, according to zfsonlinux/spl@e811949a57. As of
2.6.39 the mainline kernel exports it.
There is no longer zfs code that takes or releases holds on a task_struct, and
so there is no longer any reference to __put_task_struct().
This affects the linux 4.11 kernel because the prototype for
__put_task_struct() is in a new include file (linux/sched/task.h) and so the
config check failed to detect the exported symbol.
Removing the unnecessary stub and corresponding config check. This works on
kernels since the oldest one currently supported, 2.6.32 as shipped with
Centos/RHEL.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#608
In Linux 4.11, torvalds/linux@2a1f062, signal handling related functions
were moved from sched.h into sched/signal.h.
Add configure checks to detect this and include the new file where
needed.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#608
There are changes to vfs_getattr() in torvalds/linux@a528d35. The new
interface is:
int vfs_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int query_flags)
The request_mask argument indicates which field(s) the caller intends to
use. Fields the caller does not specify via request_mask may be set in
the returned struct anyway, but their values may be approximate.
The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.
This patch uses the query_flags which result in vfs_getattr behaving the same
as it did with the 2-argument version which the kernel provided before
Linux 4.11.
Members blksize and blocks are now always the same size regardless of
arch. They match the size of the equivalent members in vnode_t.
The configure checks are modified to ensure that the appropriate
vfs_getattr() interface is used.
A more complete fix, removing the ZFS dependency on vfs_getattr()
entirely, is deferred as it is a much larger project.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#608
Unlike other architectures which sanitize the LDFLAGS from the
environment in arch/<arch>/Makefile. The powerpc Makefile
allows LDFLAGS to be passed through resulting in the following
build failure.
/usr/bin/ld: unrecognized option '-Wl,-z,relro'
LDFLAGS is set in /usr/lib/rpm/redhat/macros by default. Clear
the environment variable when building kmods for powerpc.
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#607
Replace uses of set_task_state(current, STATE) with
set_current_state(STATE).
In Linux 4.11, torvalds/linux@642fa44, set_task_state() is removed.
All spl uses are of the form set_task_state(current, STATE).
set_current_state(STATE) is equivalent and has been available since
Linux 2.2.26.
Furthermore, set_current_state(STATE) is already used in about 15
locations within spl. This change should have no impact other than
removing an unnecessary dependency.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#603
Resolve a false positive in the kmemleak checker by shifting to the
kernel slab. It shows up because vn_file_cache is using KMC_KMEM
which is directly allocated using __get_free_pages, which is not
automatically tracked by kmemleak.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#599
SLAB_USERCOPY flag was used to indicate PAX
not to kill copies from kernel to userland.
With recent grsecurity patchset and
CONFIG_GRKERNSEC_HIDESYM that enables
CONFIG_PAX_USERCOPY zfs would panic.
Handle newer API while keeping old one functional.
Tested-by: RageLtMan <rageltman@sempervictus>
Reviewed-by: spendergrsec <spender@grsecurity.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Kevin Tanguy <kevin.tanguy@ovh.net>
Closes#595
When building SPL within the kernel tree, C99 initializers cause
build failures and need to be converted to C89 as kernel CFLAGS
specify -std=gnu89.
This fix was provided by @behlendorf in #595 discussion notes and
manually implemented in the current master revision.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: RageLtMan <rageltman@sempervictus>
Closes#597
The main complication from the RT patch set is that the RW semaphore
locks change such that read locks on an rwsem can be taken only by
a single thread. All other threads are locked out. This single
thread can take a read lock multiple times though. The underlying
implementation changes to a mutex with an additional read_depth
count.
The implementation can be best understood by inspecting the RT
patch. rwsem_rt.h and rt.c give the best insight into how RT
rwsem works. My implementation for rwsem_tryupgrade is basically
an inversion of rt_downgrade_write found in rt.c. Please see the
comments in the code.
Unfortunately, I have to drop SPLAT rwlock test4 completely as this
test tries to take multiple locks from different threads, which RT
rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh
and zfs-tests.sh pass on my Debian-testing VM with the kernel
linux-image-4.8.0-1-rt-amd64.
Tested-by: kernelOfTruth <kerneloftruth@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Closes zfsonlinux/zfs#5491
Closes#589Closes#308
Commit f58040c0fc should have removed
this comment which is no longer relevant.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org>
Issue #589
Refactor the code by making splat_test_{init,fini}, splat_subsystem_{init,fini}
into functions. They don't have reason to be macro and it would be too bloated
to inline every call.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Add a dedicated system_delay_taskq for long delay like spa_deadman and
zpl_posix_acl_free. This will allow us to use system_taskq in the manner of
dispatch multiple tasks and call taskq_wait_outstanding.
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#588
To prevent holding tq_lock for too long.
Before zfsonlinux/zfs@8e71ab9, hogging delay tasks and cat /proc/spl/taskq
would easily cause a lockup. While that bug has been fixed. It's probably
still a good idea to do this just in case task lists grow too large.
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#586
Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the
taskq implementation and test cases to use them. This is solely
for the purposes of readability and introduces no functional change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add a minimal implementation of vmem_size() which accounts for the
virtual memory usage of the SPL's kmem cache. This functionality
is only useful on 32-bit systems with a small virtual address space.
The following assumptions are made:
1) The major SPL consumer of virtual memory is the kmem cache.
2) Memory allocated with vmem_alloc() is short lived and can be ignored.
3) Allow a 4MB floor as a generous pad given normal consumption.
4) The spl_kmem_cache_sem only contends with cache create/destroy.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
In Linux 4.9, torvalds/linux@81243ea, group_info changed from 2d array via
->blocks to 1d array via ->gid. We change the spl cred functions accordingly.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#581
No need to crhold current_cred(), fix possible leak in splat_cred_test2
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#556
init_groups has 0 nblocks, therefore calling the current crgetgroups with
init_groups would result in out-of-bound access. We fix this by returning NULL
when nblocks is 0.
Cap crgetngroups to NGROUPS_PER_BLOCK, since crgetgroups will only return
blocks[0].
Also, remove all get_group_info. The cred already holds reference on the
group_info, and cred is not mutable. So there's no reason to hold extra
reference, if we hold cred.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#556
When iterating per_cpu values, we need to use for_each_possible_cpu. While
NR_CPUS indicates the number of CPU supported by the kernel, it might not
initialize all of them if the kernel decides it's not possible to use them.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#578
Linux 4.8, starting from torvalds/linux@19c5d690e, will set owner to 1 when
read held instead of leave it NULL. So we change the condition to
`rw_owner(rwp) <= 1` in RW_READ_HELD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closeszfsonlinux/zfs#5233Closes#577
Due to changes in the task_struct the following warning is occurs
when initializing the global p0. Since this structure only exists
for it's address to be taken initialize it in a manor which isn't
sensitive to internal changes to the structure.
module/spl/spl-generic.c:58:1: error: missing braces around
initializer [-Werror=missing-braces]
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#576
Explicitly cast type in splat-rwlock.c test case to silence
the following warning.
warning: format ‘%ld’ expects argument of type ‘long int’,
but argument N has type ‘int’
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#574
When building from the head of a branch a release number is
automatically generated with `git describe` using the last tag
on that branch as the base. For this to work the last tag on the
branch needs to be predictable given the current META file.
This logic was accidentally broken when an -rcX tag was added to
the branch. Update it to search for a VERSION or VERSION-RELEASE
tag.
Reviewed-by: Chris Siebenmann <cks.git01@cs.toronto.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue zfsonlinux/zfs#5105
Closes#572
In order to support ABD with large blocks the spl_kmem_alloc_warn
limit needs to be increased to 64K.
A 16M block requires that pointers be stored for 4096 4K-pages
on an x86_64 system. Each of these pointers is 8 bytes requiring
an allocation of 8*4096=32,768 bytes. The addition of a small
header to this structure pushes the allocation over the default
32K warning threshold.
In addition, fix a small bug where MAX was used instead of MIN
when setting the default. This ensures a reasonable limit is
still set on systems with page sizes larger then 4K.
Reviewed-by: David Quigley <david.quigley@intel.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#571
Update splat_cmd to reference the correct location of the splat utility.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Liu Hua<liu.hua130@zte.com.cn>
Closes#570