Go to file
Matthew Ahrens 0929c4de39
Improve ZVOL sync write performance by using a taskq
== Summary ==

Prior to this change, sync writes to a zvol are processed serially.
This commit makes zvols process concurrently outstanding sync writes in
parallel, similar to how reads and async writes are already handled.
The result is that the throughput of sync writes is tripled.

== Background ==

When a write comes in for a zvol (e.g. over iscsi), it is processed by
calling `zvol_request()` to initiate the operation.  ZFS is expected to
later call `BIO_END_IO()` when the operation completes (possibly from a
different thread).  There are a limited number of threads that are
available to call `zvol_request()` - one one per iscsi client (unless
using MC/S).  Therefore, to ensure good performance, the latency of
`zvol_request()` is important, so that many i/o operations to the zvol
can be processed concurrently.  In other words, if the client has
multiple outstanding requests to the zvol, the zvol should have multiple
outstanding requests to the storage hardware (i.e. issue multiple
concurrent `zio_t`'s).

For reads, and async writes (i.e. writes which can be acknowledged
before the data reaches stable storage), `zvol_request()` achieves low
latency by dispatching the bulk of the work (including waiting for i/o
to disk) to a taskq.  The taskq callback (`zvol_read()` or
`zvol_write()`) blocks while waiting for the i/o to disk to complete.
The `zvol_taskq` has 32 threads (by default), so we can have up to 32
concurrent i/os to disk in service of requests to zvols.

However, for sync writes (i.e. writes which must be persisted to stable
storage before they can be acknowledged, by calling `zil_commit()`),
`zvol_request()` does not use `zvol_taskq`.  Instead it blocks while
waiting for the ZIL write to disk to complete.  This has the effect of
serializing sync writes to each zvol.  In other words, each zvol will
only process one sync write at a time, waiting for it to be written to
the ZIL before accepting the next request.

The same issue applies to FLUSH operations, for which `zvol_request()`
calls `zil_commit()` directly.

== Description of change ==

This commit changes `zvol_request()` to use
`taskq_dispatch_ent(zvol_taskq)` for sync writes, and FLUSh operations.
Therefore we can have up to 32 threads (the taskq threads)
simultaneously calling `zil_commit()`, for a theoretical performance
improvement of up to 32x.

To avoid the locking issue described in the comment (which this commit
removes), we acquire the rangelock from the taskq callback (e.g.
`zvol_write()`) rather than from `zvol_request()`.  This applies to all
writes (sync and async), reads, and discard operations.  This means that
multiple simultaneously-outstanding i/o's which access the same block
can complete in any order.  This was previously thought to be incorrect,
but a review of the block device interface requirements revealed that
this is fine - the order is inherently not defined.  The shorter hold
time of the rangelock should also have a slight performance improvement.

For an additional slight performance improvement, we use
`taskq_dispatch_ent()` instead of `taskq_dispatch()`, which avoids a
`kmem_alloc()` and eliminates a failure mode.  This applies to all
writes (sync and async), reads, and discard operations.

== Performance results ==

We used a zvol as an iscsi target (server) for a Windows initiator
(client), with a single connection (the default - i.e. not MC/S).

We used `diskspd` to generate a workload with 4 threads, doing 1MB
writes to random offsets in the zvol.  Without this change we get
231MB/s, and with the change we get 728MB/s, which is 3.15x the original
performance.

We ran a real-world workload, restoring a MSSQL database, and saw
throughput 2.5x the original.

We saw more modest performance wins (typically 1.5x-2x) when using MC/S
with 4 connections, and with different number of client threads (1, 8,
32).

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #10163
2020-03-31 10:50:44 -07:00
.github Add an .editorconfig; document git whitespace settings 2020-01-27 13:32:52 -08:00
cmd zfs_get: change time format string from %k to %H 2020-03-26 08:28:22 -07:00
config Fix CONFIG_MODULES=no Linux kernel config 2020-02-28 09:23:48 -08:00
contrib Fix cstyle warnings 2020-03-17 15:42:27 -07:00
etc Fix zfs-functions packaging bug 2020-03-10 09:53:20 -07:00
include Let default arc_c_max be platform dependent 2020-03-27 09:14:46 -07:00
lib Compile cityhash code into libzfs 2020-03-27 09:11:22 -07:00
man Let default arc_c_max be platform dependent 2020-03-27 09:14:46 -07:00
module Improve ZVOL sync write performance by using a taskq 2020-03-31 10:50:44 -07:00
rpm Change http://zfsonlinux.org links to https://zfsonlinux.org 2020-01-13 16:43:59 -08:00
scripts zloop.sh should call ZDB with pool name 2020-03-11 10:02:23 -07:00
tests Reset l2ad_hand and l2ad_first in l2arc_evict 2020-03-31 10:46:48 -07:00
udev Create symbolic links in /dev/disk/by-vdev for nvme disk devices 2019-12-17 17:50:20 -08:00
.editorconfig Add an .editorconfig; document git whitespace settings 2020-01-27 13:32:52 -08:00
.gitignore Adapt gitignore for modules 2019-12-02 13:23:47 -08:00
.gitmodules Add zimport.sh compatibility test script 2014-02-21 12:10:31 -08:00
.travis.yml Add .travis.yml 2017-11-13 09:18:18 -08:00
AUTHORS Update build system and packaging 2018-05-29 16:00:33 -07:00
autogen.sh Cause autogen.sh to fail if autoreconf fails 2018-07-06 09:27:37 -07:00
CODE_OF_CONDUCT.md Add CODE_OF_CONDUCT.md 2019-04-30 10:58:45 -07:00
configure.ac Fix zfs-functions packaging bug 2020-03-10 09:53:20 -07:00
copy-builtin bash scripts: use /usr/bin/env for bash shebangs 2020-02-10 13:13:46 -08:00
COPYRIGHT ICP: Improve AES-GCM performance 2020-02-10 12:59:50 -08:00
LICENSE Update build system and packaging 2018-05-29 16:00:33 -07:00
Makefile.am Perform KABI checks in parallel 2019-10-01 12:50:34 -07:00
META Update maximum kernel version to 5.4 2019-12-23 14:24:36 -08:00
NEWS Add NEWS file 2018-09-18 12:03:47 -07:00
NOTICE Update build system and packaging 2018-05-29 16:00:33 -07:00
README.md Update README for OpenZFS 2020-02-25 11:43:20 -08:00
TEST Remove CI builder customization from TEST 2020-03-16 10:46:03 -07:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

img

OpenZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the OpenZFS community. This repository contains the code for running OpenZFS on Linux and FreeBSD.

codecov coverity

Official Resources

  • Wiki - for using and developing this repo
  • ZoL Site - Linux release info & links
  • Mailing lists
  • OpenZFS site - for conference videos and info on other platforms (illumos, OSX, Windows, etc)

Installation

Full documentation for installing OpenZFS on your favorite Linux distribution can be found at the ZoL Site.

FreeBSD support is a work in progress. See the PR.

Contribute & Develop

We have a separate document with contribution guidelines.

We have a Code of Conduct.

Release

OpenZFS is released under a CDDL license. For more details see the NOTICE, LICENSE and COPYRIGHT files; UCRL-CODE-235197

Supported Kernels

  • The META file contains the officially recognized supported Linux kernel versions.