f1eed34ac7
This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
329 lines
11 KiB
Diff
329 lines
11 KiB
Diff
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
|
|
Date: Mon, 6 Apr 2020 12:16:40 +0200
|
|
Subject: [PATCH] PVE: [Up] qemu-img dd: add osize and read from/to
|
|
stdin/stdout
|
|
|
|
Neither convert nor dd were previously able to write to or
|
|
read from a pipe. Particularly serializing an image file
|
|
into a raw stream or vice versa can be useful, but using
|
|
`qemu-img convert -f qcow2 -O raw foo.qcow2 /dev/stdout` in
|
|
a pipe will fail trying to seek.
|
|
|
|
While dd and convert have overlapping use cases, `dd` is a
|
|
simple read/write loop while convert is much more
|
|
sophisticated and has ways to dealing with holes and blocks
|
|
of zeroes.
|
|
Since these typically can't be detected in pipes via
|
|
SEEK_DATA/HOLE or skipped while writing, dd seems to be the
|
|
better choice for implementing stdin/stdout streams.
|
|
|
|
This patch causes "if" and "of" to default to stdin and
|
|
stdout respectively, allowing only the "raw" format to be
|
|
used in these cases.
|
|
Since the input can now be a pipe we have no way of
|
|
detecting the size of the output image to create. Since we
|
|
also want to support images with a size not matching the
|
|
dd command's "bs" parameter (which, together with "count"
|
|
could be used to calculate the desired size, and is already
|
|
used to limit it), the "osize" option is added to explicitly
|
|
override the output file's size.
|
|
|
|
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
|
|
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
|
|
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
|
|
---
|
|
qemu-img-cmds.hx | 4 +-
|
|
qemu-img.c | 202 ++++++++++++++++++++++++++++++-----------------
|
|
2 files changed, 133 insertions(+), 73 deletions(-)
|
|
|
|
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
|
|
index 068692d13e..73e0bb1d2c 100644
|
|
--- a/qemu-img-cmds.hx
|
|
+++ b/qemu-img-cmds.hx
|
|
@@ -58,9 +58,9 @@ SRST
|
|
ERST
|
|
|
|
DEF("dd", img_dd,
|
|
- "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] if=input of=output")
|
|
+ "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
|
|
SRST
|
|
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
|
|
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
|
|
ERST
|
|
|
|
DEF("info", img_info,
|
|
diff --git a/qemu-img.c b/qemu-img.c
|
|
index de51233825..ad770f6570 100644
|
|
--- a/qemu-img.c
|
|
+++ b/qemu-img.c
|
|
@@ -4997,10 +4997,12 @@ static int img_bitmap(int argc, char **argv)
|
|
#define C_IF 04
|
|
#define C_OF 010
|
|
#define C_SKIP 020
|
|
+#define C_OSIZE 040
|
|
|
|
struct DdInfo {
|
|
unsigned int flags;
|
|
int64_t count;
|
|
+ int64_t osize;
|
|
};
|
|
|
|
struct DdIo {
|
|
@@ -5076,6 +5078,19 @@ static int img_dd_skip(const char *arg,
|
|
return 0;
|
|
}
|
|
|
|
+static int img_dd_osize(const char *arg,
|
|
+ struct DdIo *in, struct DdIo *out,
|
|
+ struct DdInfo *dd)
|
|
+{
|
|
+ dd->osize = cvtnum("size", arg);
|
|
+
|
|
+ if (dd->osize < 0) {
|
|
+ return 1;
|
|
+ }
|
|
+
|
|
+ return 0;
|
|
+}
|
|
+
|
|
static int img_dd(int argc, char **argv)
|
|
{
|
|
int ret = 0;
|
|
@@ -5116,6 +5131,7 @@ static int img_dd(int argc, char **argv)
|
|
{ "if", img_dd_if, C_IF },
|
|
{ "of", img_dd_of, C_OF },
|
|
{ "skip", img_dd_skip, C_SKIP },
|
|
+ { "osize", img_dd_osize, C_OSIZE },
|
|
{ NULL, NULL, 0 }
|
|
};
|
|
const struct option long_options[] = {
|
|
@@ -5191,91 +5207,112 @@ static int img_dd(int argc, char **argv)
|
|
arg = NULL;
|
|
}
|
|
|
|
- if (!(dd.flags & C_IF && dd.flags & C_OF)) {
|
|
- error_report("Must specify both input and output files");
|
|
+ if (!(dd.flags & C_IF) && (!fmt || strcmp(fmt, "raw") != 0)) {
|
|
+ error_report("Input format must be raw when readin from stdin");
|
|
ret = -1;
|
|
goto out;
|
|
}
|
|
-
|
|
- blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
|
|
- force_share);
|
|
-
|
|
- if (!blk1) {
|
|
+ if (!(dd.flags & C_OF) && strcmp(out_fmt, "raw") != 0) {
|
|
+ error_report("Output format must be raw when writing to stdout");
|
|
ret = -1;
|
|
goto out;
|
|
}
|
|
|
|
- drv = bdrv_find_format(out_fmt);
|
|
- if (!drv) {
|
|
- error_report("Unknown file format");
|
|
- ret = -1;
|
|
- goto out;
|
|
- }
|
|
- proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
|
|
+ if (dd.flags & C_IF) {
|
|
+ blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
|
|
+ force_share);
|
|
|
|
- if (!proto_drv) {
|
|
- error_report_err(local_err);
|
|
- ret = -1;
|
|
- goto out;
|
|
- }
|
|
- if (!drv->create_opts) {
|
|
- error_report("Format driver '%s' does not support image creation",
|
|
- drv->format_name);
|
|
- ret = -1;
|
|
- goto out;
|
|
- }
|
|
- if (!proto_drv->create_opts) {
|
|
- error_report("Protocol driver '%s' does not support image creation",
|
|
- proto_drv->format_name);
|
|
- ret = -1;
|
|
- goto out;
|
|
+ if (!blk1) {
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
}
|
|
- create_opts = qemu_opts_append(create_opts, drv->create_opts);
|
|
- create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
|
|
-
|
|
- opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
|
|
|
|
- size = blk_getlength(blk1);
|
|
- if (size < 0) {
|
|
- error_report("Failed to get size for '%s'", in.filename);
|
|
+ if (dd.flags & C_OSIZE) {
|
|
+ size = dd.osize;
|
|
+ } else if (dd.flags & C_IF) {
|
|
+ size = blk_getlength(blk1);
|
|
+ if (size < 0) {
|
|
+ error_report("Failed to get size for '%s'", in.filename);
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ } else if (dd.flags & C_COUNT) {
|
|
+ size = dd.count * in.bsz;
|
|
+ } else {
|
|
+ error_report("Output size must be known when reading from stdin");
|
|
ret = -1;
|
|
goto out;
|
|
}
|
|
|
|
- if (dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
|
|
+ if (!(dd.flags & C_OSIZE) && dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
|
|
dd.count * in.bsz < size) {
|
|
size = dd.count * in.bsz;
|
|
}
|
|
|
|
- /* Overflow means the specified offset is beyond input image's size */
|
|
- if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
|
|
- size < in.bsz * in.offset)) {
|
|
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
|
|
- } else {
|
|
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
|
|
- size - in.bsz * in.offset, &error_abort);
|
|
- }
|
|
+ if (dd.flags & C_OF) {
|
|
+ drv = bdrv_find_format(out_fmt);
|
|
+ if (!drv) {
|
|
+ error_report("Unknown file format");
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
|
|
|
|
- ret = bdrv_create(drv, out.filename, opts, &local_err);
|
|
- if (ret < 0) {
|
|
- error_reportf_err(local_err,
|
|
- "%s: error while creating output image: ",
|
|
- out.filename);
|
|
- ret = -1;
|
|
- goto out;
|
|
- }
|
|
+ if (!proto_drv) {
|
|
+ error_report_err(local_err);
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ if (!drv->create_opts) {
|
|
+ error_report("Format driver '%s' does not support image creation",
|
|
+ drv->format_name);
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ if (!proto_drv->create_opts) {
|
|
+ error_report("Protocol driver '%s' does not support image creation",
|
|
+ proto_drv->format_name);
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ create_opts = qemu_opts_append(create_opts, drv->create_opts);
|
|
+ create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
|
|
|
|
- /* TODO, we can't honour --image-opts for the target,
|
|
- * since it needs to be given in a format compatible
|
|
- * with the bdrv_create() call above which does not
|
|
- * support image-opts style.
|
|
- */
|
|
- blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
|
|
- false, false, false);
|
|
+ opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
|
|
|
|
- if (!blk2) {
|
|
- ret = -1;
|
|
- goto out;
|
|
+ /* Overflow means the specified offset is beyond input image's size */
|
|
+ if (dd.flags & C_OSIZE) {
|
|
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
|
|
+ } else if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
|
|
+ size < in.bsz * in.offset)) {
|
|
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
|
|
+ } else {
|
|
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
|
|
+ size - in.bsz * in.offset, &error_abort);
|
|
+ }
|
|
+
|
|
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
|
|
+ if (ret < 0) {
|
|
+ error_reportf_err(local_err,
|
|
+ "%s: error while creating output image: ",
|
|
+ out.filename);
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+
|
|
+ /* TODO, we can't honour --image-opts for the target,
|
|
+ * since it needs to be given in a format compatible
|
|
+ * with the bdrv_create() call above which does not
|
|
+ * support image-opts style.
|
|
+ */
|
|
+ blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
|
|
+ false, false, false);
|
|
+
|
|
+ if (!blk2) {
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
}
|
|
|
|
if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
|
|
@@ -5292,20 +5329,43 @@ static int img_dd(int argc, char **argv)
|
|
in.buf = g_new(uint8_t, in.bsz);
|
|
|
|
for (out_pos = 0; in_pos < size; ) {
|
|
+ int in_ret, out_ret;
|
|
int bytes = (in_pos + in.bsz > size) ? size - in_pos : in.bsz;
|
|
-
|
|
- ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
|
|
- if (ret < 0) {
|
|
+ if (blk1) {
|
|
+ in_ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
|
|
+ if (in_ret == 0) {
|
|
+ in_ret = bytes;
|
|
+ }
|
|
+ } else {
|
|
+ in_ret = read(STDIN_FILENO, in.buf, bytes);
|
|
+ if (in_ret == 0) {
|
|
+ /* early EOF is considered an error */
|
|
+ error_report("Input ended unexpectedly");
|
|
+ ret = -1;
|
|
+ goto out;
|
|
+ }
|
|
+ }
|
|
+ if (in_ret < 0) {
|
|
error_report("error while reading from input image file: %s",
|
|
- strerror(-ret));
|
|
+ strerror(-in_ret));
|
|
+ ret = -1;
|
|
goto out;
|
|
}
|
|
in_pos += bytes;
|
|
|
|
- ret = blk_pwrite(blk2, out_pos, bytes, in.buf, 0);
|
|
- if (ret < 0) {
|
|
+ if (blk2) {
|
|
+ out_ret = blk_pwrite(blk2, out_pos, in_ret, in.buf, 0);
|
|
+ if (out_ret == 0) {
|
|
+ out_ret = in_ret;
|
|
+ }
|
|
+ } else {
|
|
+ out_ret = write(STDOUT_FILENO, in.buf, in_ret);
|
|
+ }
|
|
+
|
|
+ if (out_ret != in_ret) {
|
|
error_report("error while writing to output image file: %s",
|
|
- strerror(-ret));
|
|
+ strerror(-out_ret));
|
|
+ ret = -1;
|
|
goto out;
|
|
}
|
|
out_pos += bytes;
|