2020-10-29 20:05:43 +03:00
|
|
|
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
|
|
From: Stefan Reiter <s.reiter@proxmox.com>
|
|
|
|
Date: Thu, 22 Oct 2020 17:34:18 +0200
|
|
|
|
Subject: [PATCH] PVE: Migrate dirty bitmap state via savevm
|
|
|
|
|
|
|
|
QEMU provides 'savevm' registrations as a mechanism for arbitrary state
|
|
|
|
to be migrated along with a VM. Use this to send a serialized version of
|
|
|
|
dirty bitmap state data from proxmox-backup-qemu, and restore it on the
|
|
|
|
target node.
|
|
|
|
|
|
|
|
Also add a flag to query-proxmox-support so qemu-server can determine if
|
|
|
|
safe migration is possible and makes sense.
|
|
|
|
|
|
|
|
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
|
2022-01-13 12:34:33 +03:00
|
|
|
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
[FE: split up state_pending for 8.0]
|
|
|
|
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
|
2020-10-29 20:05:43 +03:00
|
|
|
---
|
2020-11-10 19:16:15 +03:00
|
|
|
include/migration/misc.h | 3 ++
|
2021-02-11 19:11:11 +03:00
|
|
|
migration/meson.build | 2 +
|
2021-05-27 13:43:32 +03:00
|
|
|
migration/migration.c | 1 +
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
migration/pbs-state.c | 104 +++++++++++++++++++++++++++++++++++++++
|
2020-11-10 19:16:15 +03:00
|
|
|
pve-backup.c | 1 +
|
|
|
|
qapi/block-core.json | 6 +++
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
6 files changed, 117 insertions(+)
|
2020-10-29 20:05:43 +03:00
|
|
|
create mode 100644 migration/pbs-state.c
|
|
|
|
|
|
|
|
diff --git a/include/migration/misc.h b/include/migration/misc.h
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index 8b49841016..78f63ca400 100644
|
2020-10-29 20:05:43 +03:00
|
|
|
--- a/include/migration/misc.h
|
|
|
|
+++ b/include/migration/misc.h
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -77,4 +77,7 @@ bool migration_in_bg_snapshot(void);
|
2020-10-29 20:05:43 +03:00
|
|
|
/* migration/block-dirty-bitmap.c */
|
|
|
|
void dirty_bitmap_mig_init(void);
|
|
|
|
|
|
|
|
+/* migration/pbs-state.c */
|
|
|
|
+void pbs_state_mig_init(void);
|
|
|
|
+
|
|
|
|
#endif
|
2021-02-11 19:11:11 +03:00
|
|
|
diff --git a/migration/meson.build b/migration/meson.build
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index a7824b5266..de6a271b58 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/migration/meson.build
|
|
|
|
+++ b/migration/meson.build
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
@@ -6,8 +6,10 @@ migration_files = files(
|
|
|
|
'vmstate.c',
|
2021-02-11 19:11:11 +03:00
|
|
|
'qemu-file.c',
|
2021-05-27 13:43:32 +03:00
|
|
|
'yank_functions.c',
|
2021-02-11 19:11:11 +03:00
|
|
|
+ 'pbs-state.c',
|
|
|
|
)
|
|
|
|
softmmu_ss.add(migration_files)
|
|
|
|
+softmmu_ss.add(libproxmox_backup_qemu)
|
2020-10-29 20:05:43 +03:00
|
|
|
|
2021-02-11 19:11:11 +03:00
|
|
|
softmmu_ss.add(files(
|
|
|
|
'block-dirty-bitmap.c',
|
2021-05-27 13:43:32 +03:00
|
|
|
diff --git a/migration/migration.c b/migration/migration.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index bda4789193..c8318aa258 100644
|
2021-05-27 13:43:32 +03:00
|
|
|
--- a/migration/migration.c
|
|
|
|
+++ b/migration/migration.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -245,6 +245,7 @@ void migration_object_init(void)
|
2021-05-27 13:43:32 +03:00
|
|
|
blk_mig_init();
|
|
|
|
ram_mig_init();
|
|
|
|
dirty_bitmap_mig_init();
|
|
|
|
+ pbs_state_mig_init();
|
|
|
|
}
|
|
|
|
|
2022-02-11 12:24:33 +03:00
|
|
|
void migration_cancel(const Error *error)
|
2020-10-29 20:05:43 +03:00
|
|
|
diff --git a/migration/pbs-state.c b/migration/pbs-state.c
|
|
|
|
new file mode 100644
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index 0000000000..887e998b9e
|
2020-10-29 20:05:43 +03:00
|
|
|
--- /dev/null
|
|
|
|
+++ b/migration/pbs-state.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -0,0 +1,104 @@
|
2020-10-29 20:05:43 +03:00
|
|
|
+/*
|
|
|
|
+ * PBS (dirty-bitmap) state migration
|
|
|
|
+ */
|
|
|
|
+
|
|
|
|
+#include "qemu/osdep.h"
|
|
|
|
+#include "migration/misc.h"
|
|
|
|
+#include "qemu-file.h"
|
|
|
|
+#include "migration/vmstate.h"
|
|
|
|
+#include "migration/register.h"
|
|
|
|
+#include "proxmox-backup-qemu.h"
|
|
|
|
+
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+typedef struct PBSState {
|
|
|
|
+ bool active;
|
|
|
|
+} PBSState;
|
|
|
|
+
|
2020-11-10 19:16:15 +03:00
|
|
|
+/* state is accessed via this static variable directly, 'opaque' is NULL */
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+static PBSState pbs_state;
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+static void pbs_state_pending(void *opaque, uint64_t *must_precopy,
|
|
|
|
+ uint64_t *can_postcopy)
|
2020-10-29 20:05:43 +03:00
|
|
|
+{
|
|
|
|
+ /* we send everything in save_setup, so nothing is ever pending */
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+/* receive PBS state via f and deserialize, called on target */
|
|
|
|
+static int pbs_state_load(QEMUFile *f, void *opaque, int version_id)
|
|
|
|
+{
|
|
|
|
+ /* safe cast, we cannot migrate to target with less bits than source */
|
|
|
|
+ size_t buf_size = (size_t)qemu_get_be64(f);
|
|
|
|
+
|
|
|
|
+ uint8_t *buf = (uint8_t *)malloc(buf_size);
|
|
|
|
+ size_t read = qemu_get_buffer(f, buf, buf_size);
|
|
|
|
+
|
|
|
|
+ if (read < buf_size) {
|
|
|
|
+ fprintf(stderr, "error receiving PBS state: not enough data\n");
|
|
|
|
+ return -EIO;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ proxmox_import_state(buf, buf_size);
|
|
|
|
+
|
|
|
|
+ free(buf);
|
|
|
|
+ return 0;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+/* serialize PBS state and send to target via f, called on source */
|
|
|
|
+static int pbs_state_save_setup(QEMUFile *f, void *opaque)
|
|
|
|
+{
|
|
|
|
+ size_t buf_size;
|
|
|
|
+ uint8_t *buf = proxmox_export_state(&buf_size);
|
|
|
|
+
|
|
|
|
+ /* LV encoding */
|
|
|
|
+ qemu_put_be64(f, buf_size);
|
|
|
|
+ qemu_put_buffer(f, buf, buf_size);
|
|
|
|
+
|
|
|
|
+ proxmox_free_state_buf(buf);
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+ pbs_state.active = false;
|
2020-10-29 20:05:43 +03:00
|
|
|
+ return 0;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static bool pbs_state_is_active(void *opaque)
|
|
|
|
+{
|
2020-11-10 19:16:15 +03:00
|
|
|
+ /* we need to return active exactly once, else .save_setup is never called,
|
|
|
|
+ * but if we'd just return true the migration doesn't make progress since
|
|
|
|
+ * it'd be waiting for us */
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+ return pbs_state.active;
|
2020-10-29 20:05:43 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static bool pbs_state_is_active_iterate(void *opaque)
|
|
|
|
+{
|
|
|
|
+ /* we don't iterate, everything is sent in save_setup */
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+ return pbs_state_is_active(opaque);
|
2020-10-29 20:05:43 +03:00
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static bool pbs_state_has_postcopy(void *opaque)
|
|
|
|
+{
|
|
|
|
+ /* PBS state can't change during a migration (since that's blocking any
|
|
|
|
+ * potential backups), so we can copy everything before the VM is stopped */
|
|
|
|
+ return false;
|
|
|
|
+}
|
|
|
|
+
|
2020-11-10 19:16:15 +03:00
|
|
|
+static void pbs_state_save_cleanup(void *opaque)
|
|
|
|
+{
|
|
|
|
+ /* reset active after migration succeeds or fails */
|
|
|
|
+ pbs_state.active = false;
|
|
|
|
+}
|
|
|
|
+
|
2020-10-29 20:05:43 +03:00
|
|
|
+static SaveVMHandlers savevm_pbs_state_handlers = {
|
|
|
|
+ .save_setup = pbs_state_save_setup,
|
|
|
|
+ .has_postcopy = pbs_state_has_postcopy,
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ .state_pending_exact = pbs_state_pending,
|
|
|
|
+ .state_pending_estimate = pbs_state_pending,
|
2020-10-29 20:05:43 +03:00
|
|
|
+ .is_active_iterate = pbs_state_is_active_iterate,
|
|
|
|
+ .load_state = pbs_state_load,
|
|
|
|
+ .is_active = pbs_state_is_active,
|
2020-11-10 19:16:15 +03:00
|
|
|
+ .save_cleanup = pbs_state_save_cleanup,
|
2020-10-29 20:05:43 +03:00
|
|
|
+};
|
|
|
|
+
|
|
|
|
+void pbs_state_mig_init(void)
|
|
|
|
+{
|
fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.
Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.
So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 20:43:01 +03:00
|
|
|
+ pbs_state.active = true;
|
2020-10-29 20:05:43 +03:00
|
|
|
+ register_savevm_live("pbs-state", 0, 1,
|
|
|
|
+ &savevm_pbs_state_handlers,
|
2020-11-10 19:16:15 +03:00
|
|
|
+ NULL);
|
2020-10-29 20:05:43 +03:00
|
|
|
+}
|
|
|
|
diff --git a/pve-backup.c b/pve-backup.c
|
2023-05-15 16:39:54 +03:00
|
|
|
index 6d6d7708b6..e9264e5025 100644
|
2020-10-29 20:05:43 +03:00
|
|
|
--- a/pve-backup.c
|
|
|
|
+++ b/pve-backup.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -1110,6 +1110,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
|
2020-11-24 18:41:20 +03:00
|
|
|
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
|
2020-10-29 20:05:43 +03:00
|
|
|
ret->pbs_dirty_bitmap = true;
|
2021-03-16 19:30:22 +03:00
|
|
|
ret->pbs_dirty_bitmap_savevm = true;
|
2020-10-29 20:05:43 +03:00
|
|
|
+ ret->pbs_dirty_bitmap_migration = true;
|
2021-03-16 19:30:22 +03:00
|
|
|
ret->query_bitmap_info = true;
|
2020-10-29 20:05:43 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
diff --git a/qapi/block-core.json b/qapi/block-core.json
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
index 43838212e3..e7412f6322 100644
|
2020-10-29 20:05:43 +03:00
|
|
|
--- a/qapi/block-core.json
|
|
|
|
+++ b/qapi/block-core.json
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -974,6 +974,11 @@
|
2021-03-16 19:30:22 +03:00
|
|
|
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
|
|
|
|
# safely be set for savevm-async.
|
2020-10-29 20:05:43 +03:00
|
|
|
#
|
|
|
|
+# @pbs-dirty-bitmap-migration: True if safe migration of dirty-bitmaps including
|
|
|
|
+# PBS state is supported. Enabling 'dirty-bitmaps'
|
|
|
|
+# migration cap if this is false/unset may lead
|
|
|
|
+# to crashes on migration!
|
|
|
|
+#
|
2020-11-24 18:41:20 +03:00
|
|
|
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
|
|
|
|
#
|
2020-10-29 20:05:43 +03:00
|
|
|
##
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -981,6 +986,7 @@
|
2020-11-24 18:41:20 +03:00
|
|
|
'data': { 'pbs-dirty-bitmap': 'bool',
|
|
|
|
'query-bitmap-info': 'bool',
|
2021-03-16 19:30:22 +03:00
|
|
|
'pbs-dirty-bitmap-savevm': 'bool',
|
2020-11-24 18:41:20 +03:00
|
|
|
+ 'pbs-dirty-bitmap-migration': 'bool',
|
|
|
|
'pbs-library-version': 'str' } }
|
2020-10-29 20:05:43 +03:00
|
|
|
|
|
|
|
##
|