When --enable-asan is provided to configure then build all user
space components with fsanitize=address. For kernel support
use the Linux KASAN feature instead.
https://github.com/google/sanitizers/wiki/AddressSanitizer
When using gcc version 4.8 any test case which intentionally
generates a core dump will fail when using --enable-asan.
The default behavior is to disable core dumps and only newer
versions allow this behavior to be controled at run time with
the ASAN_OPTIONS environment variable.
Additionally, this patch includes some build system cleanup.
* Rules.am updated to set the minimum AM_CFLAGS, AM_CPPFLAGS,
and AM_LDFLAGS. Any additional flags should be added on a
per-Makefile basic. The --enable-debug and --enable-asan
options apply to all user space binaries and libraries.
* Compiler checks consolidated in always-compiler-options.m4
and renamed for consistency.
* -fstack-check compiler flag was removed, this functionality
is provided by asan when configured with --enable-asan.
* Split DEBUG_CFLAGS in to DEBUG_CFLAGS, DEBUG_CPPFLAGS, and
DEBUG_LDFLAGS.
* Moved default kernel build flags in to module/Makefile.in and
split in to ZFS_MODULE_CFLAGS and ZFS_MODULE_CPPFLAGS. These
flags are set with the standard ccflags-y kbuild mechanism.
* -Wframe-larger-than checks applied only to binaries or
libraries which include source files which are built in
both user space and kernel space. This restriction is
relaxed for user space only utilities.
* -Wno-unused-but-set-variable applied only to libzfs and
libzpool. The remaining warnings are the result of an
ASSERT using a variable when is always declared.
* -D_POSIX_PTHREAD_SEMANTICS and -D__EXTENSIONS__ dropped
because they are Solaris specific and thus not needed.
* Ensure $GDB is defined as gdb by default in zloop.sh.
Signed-off-by: DHE <git@dehacked.net>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#7027
Support integration with new QAT products: Intel(R) C62x Chipset,
or Atom(R) C3000 Processor Product Family SoC:
1. Detect new file name in auto-conf.
2. Change MAX_INSTANCES to 48.
3. Change "num_inst" to U16 to clean a build warning.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes#6767
generic_start_io_acct/generic_end_io_acct in the master
branch of the linux kernel requires that the request_queue
be provided.
Move the logic from freemem in the spl to arc_free_memory
in arc.c. Do this so we can take advantage of global_page_state
interface checks in zfs.
Upstream kernel replaced struct block_device with
struct gendisk in struct bio. Determine if the
function bio_set_dev exists during configure
and have zfs use that if it exists.
bio_set_dev https://github.com/torvalds/linux/commit/74d4699
global_node_page_state https://github.com/torvalds/linux/commit/75ef718
io acct https://github.com/torvalds/linux/commit/d62e26b
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6635
vm_node_stat must be used instead of vm_zone_stat. Unfortunately the
old code still compiles potentially leading to silent failure of
arc_evictable_memory()
AKAMAI: CR 3816601: Regression in zfs dropcache test
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Closes#6528
Commit torvalds/linux@4e4cbee9. The bio->bi_error field was
replaced with bio->bi_status which is an enum that describes
all possible error types.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6351
Don't use `uname -r` to determine kernel build directory when the user
specified kernel source with --with-linux. Otherwise, the user is forced
to use --with-linux-obj even if they are the same directory, which is
very counterintuitive.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/617/head
Linux 4.9 added current_time() as the preferred interface to get
the filesystem time. CURRENT_TIME was retired in Linux 4.12.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6114
Linux has read-ahead logic designed to accelerate sequential workloads.
ZFS has its own read-ahead logic called zprefetch that operates on both
ZVOLs and datasets. Having two prefetchers active at the same time can
cause overprefetching, which unnecessarily reduces IOPS performance on
CoW filesystems like ZFS.
Testing shows that entirely disabling the Linux prefetch results in
a significant performance penalty for reads while commensurate benefits
are seen in random writes. It appears that read-ahead benefits are
inversely proportional to random write benefits, and so a single page
of Linux-layer read-ahead appears to offer the middle ground for both
workloads.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #5902
All filesystems were converted to dynamically allocated BDIs. The
destruction of backing_dev_info structures is handled as part of
super block destruction. Refactor the code to abstract away the
details of creating and destroying a BDI.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6089
This patch implement the hardware accelerator method in GZIP compression
in ZFS. When the ZFS pool is enabled GZIP compression, the compression
API will be automatically transferred to the hardware accelerator to
free up CPU resource and speed up the compression time.
* To enable Intel QAT hardware acceleration in ZOL you need to have QAT
hardware and the driver installed:
* QAT hardware DH8950:
http://ark.intel.com/products/79483/Intel-QuickAssist-Adapter-8950
* QAT driver:
https://01.org/intel-quickassist-technology
* Start QAT driver in your system:
service qat_service start
* Enable QAT in ZFS, e.g.:
./configure --with-qat=<qat-driver-path>/QAT1.6
make
* Set GZIP compression in ZFS dataset:
zfs set compression = gzip <dataset>
* Get QAT hardware statistics by:
cat /proc/spl/kstat/zfs/qat
* To disable QAT in ZFS:
insmod zfs.ko zfs_qat_disable=1
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Weigang Li <weigang.li@intel.com>
Closes#5846
In torvalds/linux@a528d35, there are changes to the getattr family of functions,
struct kstat, and the interface of inode_operations .getattr.
The inode_operations .getattr and simple_getattr() interface changed to:
int (*getattr) (const struct path *, struct dentry *, struct kstat *,
u32 request_mask, unsigned int query_flags)
The request_mask argument indicates which field(s) the caller intends to use.
Fields the caller has not specified via request_mask may be set in the returned
struct anyway, but their values may be approximate.
The query_flags argument indicates whether the filesystem must update
the attributes from the backing store.
Currently both fields are ignored. It is possible that getattr-related
functions within zfs could be optimized based on the request_mask.
struct kstat includes new fields:
u32 result_mask; /* What fields the user got */
u64 attributes; /* See STATX_ATTR_* flags */
struct timespec btime; /* File creation time */
Fields attribute and btime are cleared; the result_mask reflects this. These
appear to be optional based on simple_getattr() and vfs_getattr() within the
kernel, which take the same approach.
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Closes#5875
Properly annotate functions and data section so that objtool does not complain
when CONFIG_STACK_VALIDATION and CONFIG_FRAME_POINTER are enabled.
Pass KERNELCPPFLAGS to assembler.
Use kfpu_begin()/kfpu_end() to protect SIMD regions in Linux kernel.
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Closes#5872Closes#5041
Add the appropriate compiler flags to accept c99 code. This will help to
minimize differences with upstream, and aid porting changes. One change was
necessary in zvol.c because the DEFINE_IDA() macro does not work with the new
compiler flags.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#5756
Disable the warnings regarding ISO C90 forbidding mixed
declarations and code. While this functionality was
introduced as part of C99 gcc does allow this in C90
mode as an extension.
https://gcc.gnu.org/onlinedocs/gcc/Mixed-Declarations.html#Mixed-Declarations
Allowing this usage helps minimize the changes required
when porting patches from OpenZFS. The downside here is
that this functionality is specific to gcc.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#5686
The .write/.read file operations callbacks can be retired since
support for .read_iter/.write_iter and .aio_read/.aio_write has
been added. The vfs_write()/vfs_read() entry functions will
select the correct interface for the kernel. This is desirable
because all VFS write/read operations now rely on common code.
This change also add the generic write checks to make sure that
ulimits are enforced correctly on write.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5587Closes#5673
[bio] The req_op enum was changed to req_opf. Update the "Linux 4.8 API"
autotools checks to use an int to determine whether the various REQ_OP
values are defined. This should work properly on kernels >= 4.8.
[bio] bio_set_op_attrs() is now an inline function and can't be detected
with #ifdef. Add a configure check to determine whether bio_set_op_attrs()
is defined. Move the local definition of it from vdev_disk.c to
blkdev_compat.h for consistency with other related compability shims.
[bio] The read/write flags and their modifiers, including WRITE_FLUSH,
WRITE_FUA and WRITE_FLUSH_FUA have been removed from fs.h. Add the new
bio_set_flush() compatibility wrapper to replace VDEV_WRITE_FLUSH_FUA
and set the flags appropriately for each supported kernel version.
[vfs] The generic_readlink() function has been made static. If .readlink
in inode_operations is NULL, generic_readlink() is used.
[zol typo] Completely unrelated to 4.10 compat, fix a typo in the check
for REQ_OP_SECURE_ERASE so that the proper macro is defined:
s/HAVE_REQ_OP_SECURE_DISCARD/HAVE_REQ_OP_SECURE_ERASE/
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes#5499
Linux kernel commit 723c038475b78 removed this field.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: DHE <git@dehacked.net>
Closes#5393
Linux 3.14 introduces inode->set_acl(). Normally, acl modification will come
from setxattr, which will handle by the acl xattr_handler, and we already
handles that well. However, nfsd will directly calls inode->set_acl or
return error if it doesn't exists.
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed-by: Massimo Maggi <me@massimo-maggi.eu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#5371Closes#5375
Originally, these two function are inline, so their usability is tied to
posix_acl_release. However, since Linux 3.14, they became EXPORT_SYMBOL, so we
can always use them. In this patch, we create an independent test for these
two functions so we can use them when possible.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Linux 3.11 add O_TMPFILE to open(2), which allow creating an unlinked file on
supported filesystem. It's basically doing open(2) and unlink(2) atomically.
The filesystem support is added through i_op->tmpfile. We basically copy the
create operation except we get rid of the link and name related stuff and add
the new node to unlinked set.
We also add support for linkat(2) to link tmpfile. However, since all previous
file operation will skip ZIL, we force a txg_wait_synced to make sure we are
sync safe.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
In torvalds/linux@31051c8 the inode_change_ok() function was
renamed setattr_prepare() and updated to take a dentry ratheri
than an inode. Update the code to call the setattr_prepare()
and add a wrapper function which call inode_change_ok() for
older kernels.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Requires-spl: refs/pull/581/head
In Linux 4.9, torvalds/linux@fd50eca, iops->{set,get,remove}xattr and
generic_{set,get,remove}xattr are removed. xattr operations will directly
go through sb->s_xattr.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
In Linux 4.9, torvalds/linux@2773bf0, iops->rename() and iops->rename2() are
merged together into iops->rename(), it now wants flags.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Without plugging, the default 'noop' scheduler will not merge
the BIOs which are part of a large ZIO.
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Isaac Huang <he.huang@intel.com>
Closes#5181
API Change: Module parameter set/get methods take const parameter in
Grsecurity kernel v4.7.1
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Jason Zaman <jason@perfinion.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4997Closes#5001
Fix bugs due to kernel change in torvalds/linux@4bacc9c923 ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").
This problem crashes system when use zfs as a layer of overlayfs.
Signed-off-by: Chen Haiquan <oc@yunify.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4914Closes#4935
All users of bio->bi_rw have been replaced with compatibility wrappers.
This allows the kernel specific logic to be abstracted away, and for
each of the supported cases to be documented with the wrapper. The
updated interfaces are as follows:
* void blk_queue_set_write_cache(struct request_queue *, bool, bool)
* boolean_t bio_is_flush(struct bio *)
* boolean_t bio_is_fua(struct bio *)
* boolean_t bio_is_discard(struct bio *)
* boolean_t bio_is_secure_erase(struct bio *)
* VDEV_WRITE_FLUSH_FUA
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#4951
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.
Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.
Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4944Closes#4946
The posix_acl_valid() function has been updated to require a
user namespace. Filesystem callers should normally provide the
user_ns from the super block associcated with the ACL; the
zpl_posix_acl_valid() wrapper has been added for this purpose.
See https://github.com/torvalds/linux/commit/0d4d717f for
complete details.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#4922
Remove ZFS_AC_KERNEL_CURRENT_UMASK and ZFS_AC_KERNEL_POSIX_ACL_CACHING
configure checks, all supported kernel provide this functionality.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Closes#4922
Kernel 4.8 paved the way to enabling mounting a file system inside a
non-init user namespace. To facilitate this a s_user_ns member was
added holding the userns in which the filesystem's instance was
mounted. This enables doing the uid/gid translation relative to
this particular username space and not the default init_user_ns.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4928
The rw argument has been removed from submit_bio/submit_bio_wait.
Callers are now expected to set bio->bi_rw instead of passing it
in. See https://github.com/torvalds/linux/commit/4e49ea4a for
complete details.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4892
Issue #4899
Prior to b39c22b, which was first generally available in the 0.6.5
release as b39c22b, ZoL never actually submitted synchronous read or write
requests to the Linux block layer. This means the vdev_disk_dio_is_sync()
function had always returned false and, therefore, the completion in
dio_request_t.dr_comp was never actually used.
In b39c22b, synchronous ZIO operations were translated to synchronous
BIO requests in vdev_disk_io_start(). The follow-on commits 5592404 and
aa159af fixed several problems introduced by b39c22b. In particular,
5592404 introduced the new flag parameter "wait" to __vdev_disk_physio()
but under ZoL, since vdev_disk_physio() is never actually used, the wait
flag was always zero so the new code had no effect other than to cause
a bug in the use of the dio_request_t.dr_comp which was fixed by aa159af.
The original rationale for introducing synchronous operations in b39c22b
was to hurry certains requests through the BIO layer which would have
otherwise been subject to its unplug timer which would increase the
latency. This behavior of the unplug timer, however, went away during the
transition of the plug/unplug system between kernels 2.6.32 and 2.6.39.
To handle the unplug timer behavior on 2.6.32-2.6.35 kernels the
BIO_RW_UNPLUG flag is used as a hint to suppress the plugging behavior.
For kernels 2.6.36-2.6.38, the REQ_UNPLUG macro will be available and
ise used for the same purpose.
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4858
Since the concept of a kuid and the need to translate from it to
ordinary integer type was added in kernel version 3.5 implement necessary
plumbing to be able to detect this condition during compile time. If
the kernel doesn't support the kuid then just fall back to directly
accessing the respective struct inode's members
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #4685
Issue #227
Linux 4.5 added member "name" to xattr_handler. xattr_handler which matches to
whole name rather than prefix should use "name" instead of "prefix".
Otherwise, kernel will return with EINVAL when it tries to resolve handlers.
Also, we remove the strcmp checks when xattr_handler has name, because
xattr_resolve_name will do the check for us.
Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4549Closes#4537
This is initial support for x86 vectorized implementations of ZFS parity
and checksum algorithms.
For the compilation phase, configure step checks if toolchain supports relevant
instruction sets. Each implementation must ensure that the code is not passed
to compiler if relevant instruction set is not supported. For this purpose,
following new defines are provided if instruction set is supported:
- HAVE_SSE,
- HAVE_SSE2,
- HAVE_SSE3,
- HAVE_SSSE3,
- HAVE_SSE4_1,
- HAVE_SSE4_2,
- HAVE_AVX,
- HAVE_AVX2.
For detecting if an instruction set can be used in runtime, following functions
are provided in (include/linux/simd_x86.h):
- zfs_sse_available()
- zfs_sse2_available()
- zfs_sse3_available()
- zfs_ssse3_available()
- zfs_sse4_1_available()
- zfs_sse4_2_available()
- zfs_avx_available()
- zfs_avx2_available()
- zfs_bmi1_available()
- zfs_bmi2_available()
These function should be called once, on module load, or initialization.
They are safe to use from user and kernel space.
If an implementation is using more than single instruction set, both compiler
and runtime support for all relevant instruction sets should be checked.
Kernel fpu methods:
- kfpu_begin()
- kfpu_end()
Use __get_cpuid_max and __cpuid_count from <cpuid.h>
Both gcc and clang have support for these. They also handle ebx register
in case it is used for PIC code.
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Closes#4381
Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#4251
The follow_link() interface was retired in favor of get_link().
In the process of phasing in get_link() the Linux kernel went
through two different versions. The first of which depended
on put_link() and the final version on a delayed done function.
- Improved configure checks for .follow_link, .get_link, .put_link.
- Interfaces checked from newest to oldest.
- Strict checking for each possible known interface.
- Configure fails when no known interface is available.
- Both versions .get_link are detected and supported as well
two previous versions of .follow_link.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Issue #4228
3557 dumpvp_size is not updated correctly when a dump zvol's size is changed
3558 setting the volsize on a dump device does not return back ENOSPC
3559 setting a volsize larger than the space available sometimes succeeds
3560 dumpadm should be able to remove a dump device
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Albert Lee <trisk@nexenta.com>
References:
https://www.illumos.org/issues/3559https://github.com/illumos/illumos-gate/commit/c61ea56
Porting notes:
- Internal zvol.c changes not applied due to implementation differences.
The external interface and behavior was already consistent with the
latest upstream code.
- Retired 2.6.28 HAVE_CHECK_DISK_SIZE_CHANGE configure check. All
supported kernels (2.6.32 and newer) provide this interface.
Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#4217
While stack size will vary by architecture it has historically defaulted to
8K on x86_64 systems. However, as of Linux 3.15 the default thread stack
size was increased to 16K. These kernels are now the default in most non-
enterprise distributions which means we no longer need to assume 8K stacks.
This patch takes advantage of that fact by appropriately reverting stack
conservation changes which were made to ensure stability. Changes which
may have had a negative impact on performance for certain workloads. This
also has the side effect of bringing the code slightly more in line with
upstream.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes#4059
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio
based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed
side-effect of making the vdev_disk_io_start() synchronous for
certain I/Os.
This in turn resulted in vdev_disk_io_start() being able to
re-dispatch zio's which would result in a RCU stalls when a disk
was removed from the system. Additionally, this could negatively
impact performance and explains the performance regressions reported
in both #3829 and #3780.
This patch resolves the issue by making the blocking behavior
dependent on a 'wait' flag being passed rather than overloading
the passed bio flags.
Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to
non-rotational devices where there is no benefit to queuing to
aggregate the I/O.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3652
Issue #3780
Issue #3785
Issue #3817
Issue #3821
Issue #3829
Issue #3832
Issue #3870
zfsonlinux/zfs@e20cd6f7a8 caused us to
lose IO accounting on zvols. When I originally wrote that last year, the
symbols we needed to maintain IO accounting were GPL exported, but
torvalds/linux@394ffa503b provided
suitable symbols for restoring this functionality 4 months later. We
can call them to restore the IO accounting on Linux 3.19 and later as
well as any older kernels where that patch is backported.
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#3741
This autotools check was never needed because we can check for the
existence of QUEUE_FLAG_NONROT in the kernel headers.
Also, the comment in config/kernel-blk-queue-nonrot.m4 is incorrect.
This was a Linux 2.6.28 API change, not a Linux 2.6.27 API change.
Signed-off-by: Richard Yao <ryao@gentoo.org>
This autotools check was never needed because we can check for the
existence of QUEUE_FLAG_DISCARD in the kernel headers.
Signed-off-by: Richard Yao <ryao@gentoo.org>