mirror_zfs/module/zfs
Brian Behlendorf a584ef2605
Direct IO support
Direct IO via the O_DIRECT flag was originally introduced in XFS by
IRIX for database workloads. Its purpose was to allow the database
to bypass the page and buffer caches to prevent unnecessary IO
operations (e.g.  readahead) while preventing contention for system
memory between the database and kernel caches.

On Illumos, there is a library function called directio(3C) that
allows user space to provide a hint to the file system that Direct IO
is useful, but the file system is free to ignore it. The semantics
are also entirely a file system decision. Those that do not
implement it return ENOTTY.

Since the semantics were never defined in any standard, O_DIRECT is
implemented such that it conforms to the behavior described in the
Linux open(2) man page as follows.

    1.  Minimize cache effects of the I/O.

    By design the ARC is already scan-resistant which helps mitigate
    the need for special O_DIRECT handling.  Data which is only
    accessed once will be the first to be evicted from the cache.
    This behavior is in consistent with Illumos and FreeBSD.

    Future performance work may wish to investigate the benefits of
    immediately evicting data from the cache which has been read or
    written with the O_DIRECT flag.  Functionally this behavior is
    very similar to applying the 'primarycache=metadata' property
    per open file.

    2. O_DIRECT _MAY_ impose restrictions on IO alignment and length.

    No additional alignment or length restrictions are imposed.

    3. O_DIRECT _MAY_ perform unbuffered IO operations directly
       between user memory and block device.

    No unbuffered IO operations are currently supported.  In order
    to support features such as transparent compression, encryption,
    and checksumming a copy must be made to transform the data.

    4. O_DIRECT _MAY_ imply O_DSYNC (XFS).

    O_DIRECT does not imply O_DSYNC for ZFS.  Callers must provide
    O_DSYNC to request synchronous semantics.

    5. O_DIRECT _MAY_ disable file locking that serializes IO
       operations.  Applications should avoid mixing O_DIRECT
       and normal IO or mmap(2) IO to the same file.  This is
       particularly true for overlapping regions.

    All I/O in ZFS is locked for correctness and this locking is not
    disabled by O_DIRECT.  However, concurrently mixing O_DIRECT,
    mmap(2), and normal I/O on the same file is not recommended.

This change is implemented by layering the aops->direct_IO operations
on the existing AIO operations.  Code already existed in ZFS on Linux
for bypassing the page cache when O_DIRECT is specified.

References:
  * http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s09.html
  * https://blogs.oracle.com/roch/entry/zfs_and_directio
  * https://ext4.wiki.kernel.org/index.php/Clarifying_Direct_IO's_Semantics
  * https://illumos.org/man/3c/directio

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #224 
Closes #7823
2018-08-27 10:04:21 -07:00
..
abd.c Update build system and packaging 2018-05-29 16:00:33 -07:00
aggsum.c Fix preemptible warning in aggsum_add() 2018-06-07 15:55:11 -07:00
arc.c Fix issues with raw receive_write_byref() 2018-08-20 11:03:56 -07:00
blkptr.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
bplist.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
bpobj.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.c Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
bqueue.c Call cv_signal() with mutex held 2017-06-26 14:36:49 -07:00
cityhash.c OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
dataset_kstats.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
dbuf_stats.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dbuf.c Fix issues with raw receive_write_byref() 2018-08-20 11:03:56 -07:00
ddt_zap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
ddt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu_diff.c Fix issues found with zfs diff 2018-05-01 11:24:20 -07:00
dmu_object.c OpenZFS 9438 - Holes can lose birth time info if a block has a mix of birth times 2018-07-30 09:27:49 -07:00
dmu_objset.c Check encrypted dataset + embedded recv earlier 2018-08-15 09:49:19 -07:00
dmu_send.c Fix issues with raw receive_write_byref() 2018-08-20 11:03:56 -07:00
dmu_traverse.c Fix traverse_impl() kmem leak 2018-08-15 09:53:44 -07:00
dmu_tx.c Introduce kstat dmu_tx_dirty_frees_delay 2018-07-25 09:52:27 -07:00
dmu_zfetch.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dmu.c Fix issues with raw receive_write_byref() 2018-08-20 11:03:56 -07:00
dnode_sync.c Fix OpenZFS 9337 mismerge 2018-08-02 10:21:48 -07:00
dnode.c Fix OpenZFS 9337 mismerge 2018-08-02 10:21:48 -07:00
dsl_bookmark.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
dsl_crypt.c Check encrypted dataset + embedded recv earlier 2018-08-15 09:49:19 -07:00
dsl_dataset.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_deadlist.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_destroy.c Stack overflow when destroying deeply nested clones 2018-08-22 11:03:31 -07:00
dsl_dir.c zfs_ioc_unload_key can drop extra spa ref 2018-08-03 14:50:51 -07:00
dsl_pool.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_prop.c Update build system and packaging 2018-05-29 16:00:33 -07:00
dsl_scan.c dsl_scan_scrub_cb: don't double-account non-embedded blocks 2018-07-24 09:33:56 -07:00
dsl_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_userhold.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
edonr_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
fm.c OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
gzip.c Update build system and packaging 2018-05-29 16:00:33 -07:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c Fix LZ4_uncompress_unknownOutputSize caused panic 2017-05-19 13:45:46 -07:00
lzjb.c Change KM_PUSHPAGE -> KM_SLEEP 2015-01-16 14:41:26 -08:00
Makefile.in Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
metaslab.c OpenZFS 9112 - Improve allocation performance on high-end systems 2018-07-31 10:52:33 -07:00
mmp.c Update build system and packaging 2018-05-29 16:00:33 -07:00
multilist.c Update build system and packaging 2018-05-29 16:00:33 -07:00
pathname.c Update build system and packaging 2018-05-29 16:00:33 -07:00
policy.c Take user namespaces into account in policy checks 2018-03-07 15:40:42 -08:00
qat_compress.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat_crypt.c Fix inst_num overflow in qat_crypt.c 2018-05-01 20:44:24 -07:00
qat.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
qat.h Resolve QAT issues with incompressible data 2018-03-29 17:40:34 -07:00
range_tree.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
refcount.c Linux 4.11 compat: avoid refcount_t name conflict 2017-02-28 16:10:18 -08:00
rrwlock.c Fix spelling 2017-01-03 11:31:18 -06:00
sa.c Fix kernel unaligned access on sparc64 2018-07-11 13:10:40 -07:00
sha256.c SHA256 QAT acceleration 2018-03-15 10:53:58 -07:00
skein_zfs.c DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
spa_boot.c Add linux kernel module support 2010-08-31 13:41:58 -07:00
spa_checkpoint.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
spa_config.c OpenZFS 9591 - ms_shift can be incorrectly changed 2018-06-21 09:35:26 -07:00
spa_errlog.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_history.c Update build system and packaging 2018-05-29 16:00:33 -07:00
spa_misc.c OpenZFS 9112 - Improve allocation performance on high-end systems 2018-07-31 10:52:33 -07:00
spa_stats.c Add pool state /proc entry, "SUSPENDED" pools 2018-06-06 09:33:54 -07:00
spa.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
space_map.c OpenZFS 9442 - decrease indirect block size of spacemaps 2018-07-25 14:11:35 -07:00
space_reftree.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
trace.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
txg.c OpenZFS 9464 - txg_kick() fails to see that we are quiescing 2018-06-04 14:56:06 -07:00
uberblock.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
unique.c Performance optimization of AVL tree comparator functions 2016-08-31 14:35:34 -07:00
vdev_cache.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_disk.c Add support for autoexpand property 2018-07-23 15:40:15 -07:00
vdev_file.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_births.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_indirect_mapping.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
vdev_indirect.c OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
vdev_label.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
vdev_mirror.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_missing.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_queue.c OpenZFS 9112 - Improve allocation performance on high-end systems 2018-07-31 10:52:33 -07:00
vdev_raidz_math_aarch64_neon_common.h ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_aarch64_neon.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_aarch64_neonx2.c ABD raidz NEON support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_avx512bw.c ABD: Adapt avx512bw raidz assembly 2016-12-15 17:31:33 -08:00
vdev_raidz_math_avx512f.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
vdev_raidz_math_impl.h codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math_scalar.c ABD Vectorized raidz 2016-11-29 14:34:33 -08:00
vdev_raidz_math_sse2.c ABD raidz avx512f support 2016-11-29 14:34:33 -08:00
vdev_raidz_math_ssse3.c codebase style improvements for OpenZFS 6459 port 2017-01-22 13:25:40 -08:00
vdev_raidz_math.c Update build system and packaging 2018-05-29 16:00:33 -07:00
vdev_raidz.c OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_removal.c OpenZFS 9112 - Improve allocation performance on high-end systems 2018-07-31 10:52:33 -07:00
vdev_root.c OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
vdev.c s/VERIFY/VERIFY3S in vdev_checkpoint_sm_object 2018-08-21 16:08:14 -07:00
zap_leaf.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zap_micro.c OpenZFS 9329 - panic in zap_leaf_lookup() due to concurrent zapification 2018-05-31 10:53:49 -07:00
zap.c OpenZFS 9328 - zap code can take advantage of c99 2018-05-31 10:53:11 -07:00
zcp_get.c Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
zcp_synctask.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zcp.c OpenZFS 9424 - ztest failure: "unprotected error in call to Lua API (Invalid value type 'function' for key 'error')" 2018-07-10 21:29:23 -07:00
zfeature.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zfs_acl.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_byteswap.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ctldir.c Fix deadlock between zfs umount & snapentry_expire 2018-08-01 15:00:02 -07:00
zfs_debug.c enable zfs_dbgmsg() by default, without dprintf() 2018-03-21 15:37:32 -07:00
zfs_dir.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fm.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_fuid.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ioctl.c Added encryption support for zfs recv -o / -x 2018-08-15 09:48:49 -07:00
zfs_log.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_onexit.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_rlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_sa.c Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_vfsops.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
zfs_vnops.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
zfs_znode.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zil.c OpenZFS 9112 - Improve allocation performance on high-end systems 2018-07-31 10:52:33 -07:00
zio_checksum.c Undo c89 workarounds to match with upstream 2017-11-04 13:25:13 -07:00
zio_compress.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_crypt.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio_inject.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zio.c Reduce taskq and context-switch cost of zio pipe 2018-08-02 15:51:45 -07:00
zle.c Fix zle_decompress out of bound access 2018-02-09 10:08:05 -08:00
zpl_ctldir.c RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zpl_export.c Use cstyle -cpP in make cstyle check 2016-12-12 10:46:26 -08:00
zpl_file.c Direct IO support 2018-08-27 10:04:21 -07:00
zpl_inode.c Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zpl_super.c Fix zpl_mount() deadlock 2018-07-11 15:49:10 -07:00
zpl_xattr.c Add missing checks to zpl_xattr_* functions 2018-08-02 14:03:56 -07:00
zrlock.c Update build system and packaging 2018-05-29 16:00:33 -07:00
zthr.c OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zvol.c Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00