2021-03-15 18:41:24 +03:00
|
|
|
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
|
|
From: Stefan Reiter <s.reiter@proxmox.com>
|
|
|
|
Date: Mon, 7 Dec 2020 15:21:03 +0100
|
|
|
|
Subject: [PATCH] block: add alloc-track driver
|
|
|
|
|
|
|
|
Add a new filter node 'alloc-track', which seperates reads and writes to
|
|
|
|
different children, thus allowing to put a backing image behind any
|
|
|
|
blockdev (regardless of driver support). Since we can't detect any
|
|
|
|
pre-allocated blocks, we can only track new writes, hence the write
|
|
|
|
target ('file') for this node must always be empty.
|
|
|
|
|
|
|
|
Intended use case is for live restoring, i.e. add a backup image as a
|
|
|
|
block device into a VM, then put an alloc-track on the restore target
|
|
|
|
and set the backup as backing. With this, one can use a regular
|
|
|
|
'block-stream' to restore the image, while the VM can already run in the
|
|
|
|
background. Copy-on-read will help make progress as the VM reads as
|
|
|
|
well.
|
|
|
|
|
|
|
|
This only worked if the target supports backing images, so up until now
|
|
|
|
only for qcow2, with alloc-track any driver for the target can be used.
|
|
|
|
|
|
|
|
If 'auto-remove' is set, alloc-track will automatically detach itself
|
|
|
|
once the backing image is removed. It will be replaced by 'file'.
|
|
|
|
|
|
|
|
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
|
2022-01-13 12:34:33 +03:00
|
|
|
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
|
2022-10-14 15:07:16 +03:00
|
|
|
[FE: adapt to changed function signatures
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
make error return value consistent with QEMU
|
|
|
|
avoid premature break during read]
|
2022-10-14 15:07:16 +03:00
|
|
|
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
|
2021-03-15 18:41:24 +03:00
|
|
|
---
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
block/alloc-track.c | 352 ++++++++++++++++++++++++++++++++++++++++++++
|
2021-03-15 18:41:24 +03:00
|
|
|
block/meson.build | 1 +
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
2 files changed, 353 insertions(+)
|
2021-03-15 18:41:24 +03:00
|
|
|
create mode 100644 block/alloc-track.c
|
|
|
|
|
|
|
|
diff --git a/block/alloc-track.c b/block/alloc-track.c
|
|
|
|
new file mode 100644
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
index 0000000000..b75d7c6460
|
2021-03-15 18:41:24 +03:00
|
|
|
--- /dev/null
|
|
|
|
+++ b/block/alloc-track.c
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
@@ -0,0 +1,352 @@
|
2021-03-15 18:41:24 +03:00
|
|
|
+/*
|
|
|
|
+ * Node to allow backing images to be applied to any node. Assumes a blank
|
|
|
|
+ * image to begin with, only new writes are tracked as allocated, thus this
|
|
|
|
+ * must never be put on a node that already contains data.
|
|
|
|
+ *
|
|
|
|
+ * Copyright (c) 2020 Proxmox Server Solutions GmbH
|
|
|
|
+ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
|
|
|
|
+ *
|
|
|
|
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
|
|
|
|
+ * See the COPYING file in the top-level directory.
|
|
|
|
+ */
|
|
|
|
+
|
|
|
|
+#include "qemu/osdep.h"
|
|
|
|
+#include "qapi/error.h"
|
|
|
|
+#include "block/block_int.h"
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+#include "block/dirty-bitmap.h"
|
2021-03-15 18:41:24 +03:00
|
|
|
+#include "qapi/qmp/qdict.h"
|
|
|
|
+#include "qapi/qmp/qstring.h"
|
|
|
|
+#include "qemu/cutils.h"
|
|
|
|
+#include "qemu/option.h"
|
|
|
|
+#include "qemu/module.h"
|
|
|
|
+#include "sysemu/block-backend.h"
|
|
|
|
+
|
|
|
|
+#define TRACK_OPT_AUTO_REMOVE "auto-remove"
|
|
|
|
+
|
|
|
|
+typedef enum DropState {
|
|
|
|
+ DropNone,
|
|
|
|
+ DropRequested,
|
|
|
|
+ DropInProgress,
|
|
|
|
+} DropState;
|
|
|
|
+
|
|
|
|
+typedef struct {
|
|
|
|
+ BdrvDirtyBitmap *bitmap;
|
|
|
|
+ DropState drop_state;
|
|
|
|
+ bool auto_remove;
|
|
|
|
+} BDRVAllocTrackState;
|
|
|
|
+
|
|
|
|
+static QemuOptsList runtime_opts = {
|
|
|
|
+ .name = "alloc-track",
|
|
|
|
+ .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
|
|
|
|
+ .desc = {
|
|
|
|
+ {
|
|
|
|
+ .name = TRACK_OPT_AUTO_REMOVE,
|
|
|
|
+ .type = QEMU_OPT_BOOL,
|
|
|
|
+ .help = "automatically replace this node with 'file' when 'backing'"
|
|
|
|
+ "is detached",
|
|
|
|
+ },
|
|
|
|
+ { /* end of list */ }
|
|
|
|
+ },
|
|
|
|
+};
|
|
|
|
+
|
|
|
|
+static void track_refresh_limits(BlockDriverState *bs, Error **errp)
|
|
|
|
+{
|
|
|
|
+ BlockDriverInfo bdi;
|
|
|
|
+
|
|
|
|
+ if (!bs->file) {
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ /* always use alignment from underlying write device so RMW cycle for
|
|
|
|
+ * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
|
|
|
|
+ * cluster allocation in 'file') */
|
|
|
|
+ bdrv_get_info(bs->file->bs, &bdi);
|
|
|
|
+ bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
|
|
|
|
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int track_open(BlockDriverState *bs, QDict *options, int flags,
|
|
|
|
+ Error **errp)
|
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+ QemuOpts *opts;
|
|
|
|
+ Error *local_err = NULL;
|
|
|
|
+ int ret = 0;
|
|
|
|
+
|
|
|
|
+ opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
|
|
|
|
+ qemu_opts_absorb_qdict(opts, options, &local_err);
|
|
|
|
+ if (local_err) {
|
|
|
|
+ error_propagate(errp, local_err);
|
|
|
|
+ ret = -EINVAL;
|
|
|
|
+ goto fail;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
|
|
|
|
+
|
|
|
|
+ /* open the target (write) node, backing will be attached by block layer */
|
|
|
|
+ bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
|
|
|
|
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
|
|
|
|
+ &local_err);
|
|
|
|
+ if (local_err) {
|
|
|
|
+ ret = -EINVAL;
|
|
|
|
+ error_propagate(errp, local_err);
|
|
|
|
+ goto fail;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ track_refresh_limits(bs, errp);
|
|
|
|
+ uint64_t gran = bs->bl.request_alignment;
|
|
|
|
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
|
|
|
|
+ if (local_err) {
|
|
|
|
+ ret = -EIO;
|
|
|
|
+ error_propagate(errp, local_err);
|
|
|
|
+ goto fail;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ s->drop_state = DropNone;
|
|
|
|
+
|
|
|
|
+fail:
|
|
|
|
+ if (ret < 0) {
|
|
|
|
+ bdrv_unref_child(bs, bs->file);
|
|
|
|
+ if (s->bitmap) {
|
|
|
|
+ bdrv_release_dirty_bitmap(s->bitmap);
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+ qemu_opts_del(opts);
|
|
|
|
+ return ret;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static void track_close(BlockDriverState *bs)
|
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+ if (s->bitmap) {
|
|
|
|
+ bdrv_release_dirty_bitmap(s->bitmap);
|
|
|
|
+ }
|
|
|
|
+}
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+static coroutine_fn int64_t track_co_getlength(BlockDriverState *bs)
|
2021-03-15 18:41:24 +03:00
|
|
|
+{
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ return bdrv_co_getlength(bs->file->bs);
|
2021-03-15 18:41:24 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int coroutine_fn track_co_preadv(BlockDriverState *bs,
|
2022-02-11 12:24:33 +03:00
|
|
|
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
|
2021-03-15 18:41:24 +03:00
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+ QEMUIOVector local_qiov;
|
|
|
|
+ int ret;
|
|
|
|
+
|
|
|
|
+ /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
|
|
|
|
+ uint64_t cur_offset, local_offset;
|
|
|
|
+ int64_t local_bytes;
|
|
|
|
+ bool alloc;
|
|
|
|
+
|
2022-02-11 12:24:33 +03:00
|
|
|
+ if (offset < 0 || bytes < 0) {
|
|
|
|
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
|
2022-10-14 15:07:16 +03:00
|
|
|
+ return -EIO;
|
2022-02-11 12:24:33 +03:00
|
|
|
+ }
|
|
|
|
+
|
2021-03-15 18:41:24 +03:00
|
|
|
+ /* a read request can span multiple granularity-sized chunks, and can thus
|
|
|
|
+ * contain blocks with different allocation status - we could just iterate
|
|
|
|
+ * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
|
|
|
|
+ * to find the next flip and consider everything up to that in one go */
|
|
|
|
+ for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
|
|
|
|
+ local_offset = offset + cur_offset;
|
|
|
|
+ alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
|
|
|
|
+ if (alloc) {
|
|
|
|
+ local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
|
|
|
|
+ bytes - cur_offset);
|
|
|
|
+ } else {
|
|
|
|
+ local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
|
|
|
|
+ bytes - cur_offset);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ /* _bitmap_next_X return is -1 if no end found within limit, otherwise
|
|
|
|
+ * offset of next flip (to start of image) */
|
|
|
|
+ local_bytes = local_bytes < 0 ?
|
|
|
|
+ bytes - cur_offset :
|
|
|
|
+ local_bytes - local_offset;
|
|
|
|
+
|
|
|
|
+ qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
|
|
|
|
+
|
|
|
|
+ if (alloc) {
|
|
|
|
+ ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
|
|
|
|
+ &local_qiov, flags);
|
|
|
|
+ } else if (bs->backing) {
|
|
|
|
+ ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
|
|
|
|
+ &local_qiov, flags);
|
|
|
|
+ } else {
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
|
|
|
|
+ ret = 0;
|
2021-03-15 18:41:24 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ if (ret != 0) {
|
|
|
|
+ break;
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ return ret;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
|
2022-02-11 12:24:33 +03:00
|
|
|
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
|
2021-03-15 18:41:24 +03:00
|
|
|
+{
|
|
|
|
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
|
2022-02-11 12:24:33 +03:00
|
|
|
+ int64_t offset, int64_t bytes, BdrvRequestFlags flags)
|
2021-03-15 18:41:24 +03:00
|
|
|
+{
|
2022-02-11 12:24:33 +03:00
|
|
|
+ return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
|
2021-03-15 18:41:24 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
|
2022-02-11 12:24:33 +03:00
|
|
|
+ int64_t offset, int64_t bytes)
|
2021-03-15 18:41:24 +03:00
|
|
|
+{
|
2022-02-11 12:24:33 +03:00
|
|
|
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
|
2021-03-15 18:41:24 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static coroutine_fn int track_co_flush(BlockDriverState *bs)
|
|
|
|
+{
|
|
|
|
+ return bdrv_co_flush(bs->file->bs);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int coroutine_fn track_co_block_status(BlockDriverState *bs,
|
|
|
|
+ bool want_zero,
|
|
|
|
+ int64_t offset,
|
|
|
|
+ int64_t bytes,
|
|
|
|
+ int64_t *pnum,
|
|
|
|
+ int64_t *map,
|
|
|
|
+ BlockDriverState **file)
|
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+
|
|
|
|
+ bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
|
|
|
|
+ int64_t next_flipped;
|
|
|
|
+ if (alloc) {
|
|
|
|
+ next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
|
|
|
|
+ } else {
|
|
|
|
+ next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ /* in case not the entire region has the same state, we need to set pnum to
|
|
|
|
+ * indicate for how many bytes our result is valid */
|
|
|
|
+ *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
|
|
|
|
+ *map = offset;
|
|
|
|
+
|
|
|
|
+ if (alloc) {
|
|
|
|
+ *file = bs->file->bs;
|
|
|
|
+ return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
|
|
|
|
+ } else if (bs->backing) {
|
|
|
|
+ *file = bs->backing->bs;
|
|
|
|
+ }
|
|
|
|
+ return 0;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
|
|
|
|
+ BdrvChildRole role, BlockReopenQueue *reopen_queue,
|
|
|
|
+ uint64_t perm, uint64_t shared,
|
|
|
|
+ uint64_t *nperm, uint64_t *nshared)
|
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+
|
|
|
|
+ *nshared = BLK_PERM_ALL;
|
|
|
|
+
|
|
|
|
+ /* in case we're currently dropping ourselves, claim to not use any
|
|
|
|
+ * permissions at all - which is fine, since from this point on we will
|
|
|
|
+ * never issue a read or write anymore */
|
|
|
|
+ if (s->drop_state == DropInProgress) {
|
|
|
|
+ *nperm = 0;
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ if (role & BDRV_CHILD_DATA) {
|
|
|
|
+ *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
|
|
|
|
+ } else {
|
|
|
|
+ /* 'backing' is also a child of our BDS, but we don't expect it to be
|
|
|
|
+ * writeable, so we only forward 'consistent read' */
|
|
|
|
+ *nperm = perm & BLK_PERM_CONSISTENT_READ;
|
|
|
|
+ }
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static void track_drop(void *opaque)
|
|
|
|
+{
|
|
|
|
+ BlockDriverState *bs = (BlockDriverState*)opaque;
|
|
|
|
+ BlockDriverState *file = bs->file->bs;
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+
|
|
|
|
+ assert(file);
|
|
|
|
+
|
|
|
|
+ /* we rely on the fact that we're not used anywhere else, so let's wait
|
|
|
|
+ * until we're only used once - in the drive connected to the guest (and one
|
|
|
|
+ * ref is held by bdrv_ref in track_change_backing_file) */
|
|
|
|
+ if (bs->refcnt > 2) {
|
|
|
|
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
|
|
|
|
+ return;
|
|
|
|
+ }
|
2021-04-06 17:02:56 +03:00
|
|
|
+ AioContext *aio_context = bdrv_get_aio_context(bs);
|
|
|
|
+ aio_context_acquire(aio_context);
|
2021-03-15 18:41:24 +03:00
|
|
|
+
|
|
|
|
+ bdrv_drained_begin(bs);
|
|
|
|
+
|
|
|
|
+ /* now that we're drained, we can safely set 'DropInProgress' */
|
|
|
|
+ s->drop_state = DropInProgress;
|
|
|
|
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
|
|
|
|
+
|
|
|
|
+ bdrv_replace_node(bs, file, &error_abort);
|
2021-04-06 17:00:26 +03:00
|
|
|
+ bdrv_set_backing_hd(bs, NULL, &error_abort);
|
|
|
|
+ bdrv_drained_end(bs);
|
2021-03-15 18:41:24 +03:00
|
|
|
+ bdrv_unref(bs);
|
2021-04-06 17:02:56 +03:00
|
|
|
+ aio_context_release(aio_context);
|
2021-03-15 18:41:24 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int track_change_backing_file(BlockDriverState *bs,
|
|
|
|
+ const char *backing_file,
|
|
|
|
+ const char *backing_fmt)
|
|
|
|
+{
|
|
|
|
+ BDRVAllocTrackState *s = bs->opaque;
|
|
|
|
+ if (s->auto_remove && s->drop_state == DropNone &&
|
|
|
|
+ backing_file == NULL && backing_fmt == NULL)
|
|
|
|
+ {
|
|
|
|
+ /* backing file has been disconnected, there's no longer any use for
|
|
|
|
+ * this node, so let's remove ourselves from the block graph - we need
|
|
|
|
+ * to schedule this for later however, since when this function is
|
|
|
|
+ * called, the blockjob modifying us is probably not done yet and has a
|
|
|
|
+ * blocker on 'bs' */
|
|
|
|
+ s->drop_state = DropRequested;
|
|
|
|
+ bdrv_ref(bs);
|
|
|
|
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ return 0;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static BlockDriver bdrv_alloc_track = {
|
|
|
|
+ .format_name = "alloc-track",
|
|
|
|
+ .instance_size = sizeof(BDRVAllocTrackState),
|
|
|
|
+
|
|
|
|
+ .bdrv_file_open = track_open,
|
|
|
|
+ .bdrv_close = track_close,
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ .bdrv_co_getlength = track_co_getlength,
|
2021-03-15 18:41:24 +03:00
|
|
|
+ .bdrv_child_perm = track_child_perm,
|
|
|
|
+ .bdrv_refresh_limits = track_refresh_limits,
|
|
|
|
+
|
|
|
|
+ .bdrv_co_pwrite_zeroes = track_co_pwrite_zeroes,
|
|
|
|
+ .bdrv_co_pwritev = track_co_pwritev,
|
|
|
|
+ .bdrv_co_preadv = track_co_preadv,
|
|
|
|
+ .bdrv_co_pdiscard = track_co_pdiscard,
|
|
|
|
+
|
|
|
|
+ .bdrv_co_flush = track_co_flush,
|
|
|
|
+ .bdrv_co_flush_to_disk = track_co_flush,
|
|
|
|
+
|
|
|
|
+ .supports_backing = true,
|
|
|
|
+
|
|
|
|
+ .bdrv_co_block_status = track_co_block_status,
|
|
|
|
+ .bdrv_change_backing_file = track_change_backing_file,
|
|
|
|
+};
|
|
|
|
+
|
|
|
|
+static void bdrv_alloc_track_init(void)
|
|
|
|
+{
|
|
|
|
+ bdrv_register(&bdrv_alloc_track);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+block_init(bdrv_alloc_track_init);
|
|
|
|
diff --git a/block/meson.build b/block/meson.build
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index eece0d5743..8a68162cc0 100644
|
2021-03-15 18:41:24 +03:00
|
|
|
--- a/block/meson.build
|
|
|
|
+++ b/block/meson.build
|
|
|
|
@@ -2,6 +2,7 @@ block_ss.add(genh)
|
|
|
|
block_ss.add(files(
|
|
|
|
'accounting.c',
|
|
|
|
'aio_task.c',
|
|
|
|
+ 'alloc-track.c',
|
|
|
|
'amend.c',
|
|
|
|
'backup.c',
|
|
|
|
'backup-dump.c',
|