2022-05-17 11:29:49 +03:00
|
|
|
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
|
|
From: Fabian Ebner <f.ebner@proxmox.com>
|
|
|
|
Date: Tue, 17 May 2022 09:46:02 +0200
|
|
|
|
Subject: [PATCH] Revert "block/rbd: implement bdrv_co_block_status"
|
|
|
|
|
|
|
|
During backup, bdrv_co_block_status is called for each block copy
|
|
|
|
chunk. When RBD is used, the current implementation with
|
|
|
|
rbd_diff_iterate2() using whole_object=true takes about linearly more
|
|
|
|
time, depending on the image size. Since there are linearly more
|
|
|
|
chunks, the slowdown is quadratic, becoming unacceptable for large
|
|
|
|
images (starting somewhere between 500-1000 GiB in my testing).
|
|
|
|
|
|
|
|
This reverts commit 0347a8fd4c3faaedf119be04c197804be40a384b as a
|
|
|
|
stop-gap measure, until it's clear how to make the implemenation
|
|
|
|
more efficient.
|
|
|
|
|
|
|
|
Upstream bug report:
|
|
|
|
https://gitlab.com/qemu-project/qemu/-/issues/1026
|
|
|
|
|
|
|
|
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
|
|
|
|
---
|
|
|
|
block/rbd.c | 112 ----------------------------------------------------
|
|
|
|
1 file changed, 112 deletions(-)
|
|
|
|
|
|
|
|
diff --git a/block/rbd.c b/block/rbd.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index 0913a0af39..1dab254517 100644
|
2022-05-17 11:29:49 +03:00
|
|
|
--- a/block/rbd.c
|
|
|
|
+++ b/block/rbd.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -108,12 +108,6 @@ typedef struct RBDTask {
|
2022-05-17 11:29:49 +03:00
|
|
|
int64_t ret;
|
|
|
|
} RBDTask;
|
|
|
|
|
|
|
|
-typedef struct RBDDiffIterateReq {
|
|
|
|
- uint64_t offs;
|
|
|
|
- uint64_t bytes;
|
|
|
|
- bool exists;
|
|
|
|
-} RBDDiffIterateReq;
|
|
|
|
-
|
|
|
|
static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
|
|
|
|
BlockdevOptionsRbd *opts, bool cache,
|
|
|
|
const char *keypairs, const char *secretid,
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -1456,111 +1450,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs,
|
2022-05-17 11:29:49 +03:00
|
|
|
return spec_info;
|
|
|
|
}
|
|
|
|
|
|
|
|
-/*
|
|
|
|
- * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
|
|
|
|
- * value in the callback routine. Choose a value that does not conflict with
|
|
|
|
- * an existing exitcode and return it if we want to prematurely stop the
|
|
|
|
- * execution because we detected a change in the allocation status.
|
|
|
|
- */
|
|
|
|
-#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
|
|
|
|
-
|
|
|
|
-static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
|
|
|
|
- int exists, void *opaque)
|
|
|
|
-{
|
|
|
|
- RBDDiffIterateReq *req = opaque;
|
|
|
|
-
|
|
|
|
- assert(req->offs + req->bytes <= offs);
|
|
|
|
- /*
|
|
|
|
- * we do not diff against a snapshot so we should never receive a callback
|
|
|
|
- * for a hole.
|
|
|
|
- */
|
|
|
|
- assert(exists);
|
|
|
|
-
|
|
|
|
- if (!req->exists && offs > req->offs) {
|
|
|
|
- /*
|
|
|
|
- * we started in an unallocated area and hit the first allocated
|
|
|
|
- * block. req->bytes must be set to the length of the unallocated area
|
|
|
|
- * before the allocated area. stop further processing.
|
|
|
|
- */
|
|
|
|
- req->bytes = offs - req->offs;
|
|
|
|
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
- if (req->exists && offs > req->offs + req->bytes) {
|
|
|
|
- /*
|
|
|
|
- * we started in an allocated area and jumped over an unallocated area,
|
|
|
|
- * req->bytes contains the length of the allocated area before the
|
|
|
|
- * unallocated area. stop further processing.
|
|
|
|
- */
|
|
|
|
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
- req->bytes += len;
|
|
|
|
- req->exists = true;
|
|
|
|
-
|
|
|
|
- return 0;
|
|
|
|
-}
|
|
|
|
-
|
|
|
|
-static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
|
|
|
|
- bool want_zero, int64_t offset,
|
|
|
|
- int64_t bytes, int64_t *pnum,
|
|
|
|
- int64_t *map,
|
|
|
|
- BlockDriverState **file)
|
|
|
|
-{
|
|
|
|
- BDRVRBDState *s = bs->opaque;
|
|
|
|
- int status, r;
|
|
|
|
- RBDDiffIterateReq req = { .offs = offset };
|
|
|
|
- uint64_t features, flags;
|
|
|
|
-
|
|
|
|
- assert(offset + bytes <= s->image_size);
|
|
|
|
-
|
|
|
|
- /* default to all sectors allocated */
|
|
|
|
- status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
|
|
|
|
- *map = offset;
|
|
|
|
- *file = bs;
|
|
|
|
- *pnum = bytes;
|
|
|
|
-
|
|
|
|
- /* check if RBD image supports fast-diff */
|
|
|
|
- r = rbd_get_features(s->image, &features);
|
|
|
|
- if (r < 0) {
|
|
|
|
- return status;
|
|
|
|
- }
|
|
|
|
- if (!(features & RBD_FEATURE_FAST_DIFF)) {
|
|
|
|
- return status;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
- /* check if RBD fast-diff result is valid */
|
|
|
|
- r = rbd_get_flags(s->image, &flags);
|
|
|
|
- if (r < 0) {
|
|
|
|
- return status;
|
|
|
|
- }
|
|
|
|
- if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
|
|
|
|
- return status;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
- r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
|
|
|
|
- qemu_rbd_diff_iterate_cb, &req);
|
|
|
|
- if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
|
|
|
|
- return status;
|
|
|
|
- }
|
|
|
|
- assert(req.bytes <= bytes);
|
|
|
|
- if (!req.exists) {
|
|
|
|
- if (r == 0) {
|
|
|
|
- /*
|
|
|
|
- * rbd_diff_iterate2 does not invoke callbacks for unallocated
|
|
|
|
- * areas. This here catches the case where no callback was
|
|
|
|
- * invoked at all (req.bytes == 0).
|
|
|
|
- */
|
|
|
|
- assert(req.bytes == 0);
|
|
|
|
- req.bytes = bytes;
|
|
|
|
- }
|
|
|
|
- status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
|
|
|
|
- }
|
|
|
|
-
|
|
|
|
- *pnum = req.bytes;
|
|
|
|
- return status;
|
|
|
|
-}
|
|
|
|
-
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
static int64_t coroutine_fn qemu_rbd_co_getlength(BlockDriverState *bs)
|
2022-05-17 11:29:49 +03:00
|
|
|
{
|
|
|
|
BDRVRBDState *s = bs->opaque;
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -1796,7 +1685,6 @@ static BlockDriver bdrv_rbd = {
|
2022-05-17 11:29:49 +03:00
|
|
|
#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
|
|
|
|
.bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
|
|
|
|
#endif
|
|
|
|
- .bdrv_co_block_status = qemu_rbd_co_block_status,
|
|
|
|
|
|
|
|
.bdrv_snapshot_create = qemu_rbd_snap_create,
|
|
|
|
.bdrv_snapshot_delete = qemu_rbd_snap_remove,
|