2018-02-19 12:38:54 +03:00
|
|
|
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
2020-03-10 17:12:50 +03:00
|
|
|
From: Dietmar Maurer <dietmar@proxmox.com>
|
2020-04-07 17:53:19 +03:00
|
|
|
Date: Mon, 6 Apr 2020 12:16:46 +0200
|
2021-02-11 19:11:11 +03:00
|
|
|
Subject: [PATCH] PVE: add savevm-async for background state snapshots
|
2017-04-05 11:49:19 +03:00
|
|
|
|
2020-07-02 14:07:28 +03:00
|
|
|
Put qemu_savevm_state_{header,setup} into the main loop and the rest
|
|
|
|
of the iteration into a coroutine. The former need to lock the
|
|
|
|
iothread (and we can't unlock it in the coroutine), and the latter
|
|
|
|
can't deal with being in a separate thread, so a coroutine it must
|
|
|
|
be.
|
|
|
|
|
2021-02-11 19:11:11 +03:00
|
|
|
Truncate output file at 1024 boundary.
|
|
|
|
|
|
|
|
Do not block the VM and save the state on aborting a snapshot, as the
|
|
|
|
snapshot will be invalid anyway.
|
|
|
|
|
|
|
|
Also, when aborting, wait for the target file to be closed, otherwise a
|
|
|
|
client might run into race-conditions when trying to remove the file
|
|
|
|
still opened by QEMU.
|
|
|
|
|
2019-06-06 13:58:15 +03:00
|
|
|
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
|
2020-03-10 17:12:50 +03:00
|
|
|
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
|
2020-07-02 14:07:28 +03:00
|
|
|
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
[SR: improve aborting
|
|
|
|
register yank before migration_incoming_state_destroy]
|
2021-02-11 19:11:11 +03:00
|
|
|
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
[FE: further improve aborting
|
2023-01-26 16:46:13 +03:00
|
|
|
adapt to removal of QEMUFileOps
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
improve condition for entering final stage
|
|
|
|
adapt to QAPI and other changes for 8.0]
|
2022-08-18 14:44:16 +03:00
|
|
|
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
|
2017-04-05 11:49:19 +03:00
|
|
|
---
|
2019-06-06 13:58:15 +03:00
|
|
|
hmp-commands-info.hx | 13 +
|
2023-05-24 16:56:53 +03:00
|
|
|
hmp-commands.hx | 17 ++
|
2021-05-27 13:43:32 +03:00
|
|
|
include/migration/snapshot.h | 2 +
|
2023-05-24 16:56:53 +03:00
|
|
|
include/monitor/hmp.h | 3 +
|
2021-02-11 19:11:11 +03:00
|
|
|
migration/meson.build | 1 +
|
2023-10-17 15:10:09 +03:00
|
|
|
migration/savevm-async.c | 531 +++++++++++++++++++++++++++++++++++
|
2023-05-24 16:56:53 +03:00
|
|
|
monitor/hmp-cmds.c | 38 +++
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
qapi/migration.json | 34 +++
|
2023-05-24 16:56:53 +03:00
|
|
|
qapi/misc.json | 16 ++
|
2020-04-07 17:53:19 +03:00
|
|
|
qemu-options.hx | 12 +
|
|
|
|
softmmu/vl.c | 10 +
|
2023-10-17 15:10:09 +03:00
|
|
|
11 files changed, 677 insertions(+)
|
2021-02-11 19:11:11 +03:00
|
|
|
create mode 100644 migration/savevm-async.c
|
2017-04-05 11:49:19 +03:00
|
|
|
|
|
|
|
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
|
2023-10-17 15:10:09 +03:00
|
|
|
index f5b37eb74a..10fdd822e0 100644
|
2017-04-05 11:49:19 +03:00
|
|
|
--- a/hmp-commands-info.hx
|
|
|
|
+++ b/hmp-commands-info.hx
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -525,6 +525,19 @@ SRST
|
2021-05-27 13:43:32 +03:00
|
|
|
Show current migration parameters.
|
2020-04-07 17:53:19 +03:00
|
|
|
ERST
|
|
|
|
|
2019-06-06 13:58:15 +03:00
|
|
|
+ {
|
2017-04-05 11:49:19 +03:00
|
|
|
+ .name = "savevm",
|
|
|
|
+ .args_type = "",
|
|
|
|
+ .params = "",
|
|
|
|
+ .help = "show savevm status",
|
2017-04-05 12:38:26 +03:00
|
|
|
+ .cmd = hmp_info_savevm,
|
2017-04-05 11:49:19 +03:00
|
|
|
+ },
|
|
|
|
+
|
2020-04-07 17:53:19 +03:00
|
|
|
+SRST
|
|
|
|
+ ``info savevm``
|
|
|
|
+ Show savevm status.
|
|
|
|
+ERST
|
|
|
|
+
|
2019-06-06 13:58:15 +03:00
|
|
|
{
|
2020-04-07 17:53:19 +03:00
|
|
|
.name = "balloon",
|
|
|
|
.args_type = "",
|
2017-04-05 11:49:19 +03:00
|
|
|
diff --git a/hmp-commands.hx b/hmp-commands.hx
|
2023-10-17 15:10:09 +03:00
|
|
|
index 2cbd0f77a0..e352f86872 100644
|
2017-04-05 11:49:19 +03:00
|
|
|
--- a/hmp-commands.hx
|
|
|
|
+++ b/hmp-commands.hx
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -1865,3 +1865,20 @@ SRST
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
List event channels in the guest
|
2022-12-14 17:16:32 +03:00
|
|
|
ERST
|
|
|
|
#endif
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ {
|
|
|
|
+ .name = "savevm-start",
|
|
|
|
+ .args_type = "statefile:s?",
|
|
|
|
+ .params = "[statefile]",
|
|
|
|
+ .help = "Prepare for snapshot and halt VM. Save VM state to statefile.",
|
2017-04-05 12:38:26 +03:00
|
|
|
+ .cmd = hmp_savevm_start,
|
2017-04-05 11:49:19 +03:00
|
|
|
+ },
|
|
|
|
+
|
|
|
|
+ {
|
|
|
|
+ .name = "savevm-end",
|
|
|
|
+ .args_type = "",
|
|
|
|
+ .params = "",
|
|
|
|
+ .help = "Resume VM after snaphot.",
|
2021-02-11 19:11:11 +03:00
|
|
|
+ .cmd = hmp_savevm_end,
|
|
|
|
+ .coroutine = true,
|
2017-04-05 11:49:19 +03:00
|
|
|
+ },
|
2019-11-20 17:45:35 +03:00
|
|
|
diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
|
2021-05-27 13:43:32 +03:00
|
|
|
index e72083b117..c846d37806 100644
|
2019-11-20 17:45:35 +03:00
|
|
|
--- a/include/migration/snapshot.h
|
|
|
|
+++ b/include/migration/snapshot.h
|
2021-05-27 13:43:32 +03:00
|
|
|
@@ -61,4 +61,6 @@ bool delete_snapshot(const char *name,
|
|
|
|
bool has_devices, strList *devices,
|
|
|
|
Error **errp);
|
2019-11-20 17:45:35 +03:00
|
|
|
|
|
|
|
+int load_snapshot_from_blockdev(const char *filename, Error **errp);
|
2021-05-27 13:43:32 +03:00
|
|
|
+
|
2019-11-20 17:45:35 +03:00
|
|
|
#endif
|
|
|
|
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
|
2023-10-17 15:10:09 +03:00
|
|
|
index 13f9a2dedb..7a7def7530 100644
|
2019-11-20 17:45:35 +03:00
|
|
|
--- a/include/monitor/hmp.h
|
|
|
|
+++ b/include/monitor/hmp.h
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -28,6 +28,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
|
2019-11-20 17:45:35 +03:00
|
|
|
void hmp_info_uuid(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_info_chardev(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_info_mice(Monitor *mon, const QDict *qdict);
|
|
|
|
+void hmp_info_savevm(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
|
2023-05-24 16:56:53 +03:00
|
|
|
@@ -94,6 +95,8 @@ void hmp_closefd(Monitor *mon, const QDict *qdict);
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
void hmp_mouse_move(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_mouse_button(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_mouse_set(Monitor *mon, const QDict *qdict);
|
2019-11-20 17:45:35 +03:00
|
|
|
+void hmp_savevm_start(Monitor *mon, const QDict *qdict);
|
|
|
|
+void hmp_savevm_end(Monitor *mon, const QDict *qdict);
|
|
|
|
void hmp_sendkey(Monitor *mon, const QDict *qdict);
|
2022-12-14 17:16:32 +03:00
|
|
|
void coroutine_fn hmp_screendump(Monitor *mon, const QDict *qdict);
|
2020-04-07 17:53:19 +03:00
|
|
|
void hmp_chardev_add(Monitor *mon, const QDict *qdict);
|
2021-02-11 19:11:11 +03:00
|
|
|
diff --git a/migration/meson.build b/migration/meson.build
|
2023-10-17 15:10:09 +03:00
|
|
|
index 37ddcb5d60..07f6057acc 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/migration/meson.build
|
|
|
|
+++ b/migration/meson.build
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -26,6 +26,7 @@ system_ss.add(files(
|
|
|
|
'options.c',
|
2021-02-11 19:11:11 +03:00
|
|
|
'postcopy-ram.c',
|
|
|
|
'savevm.c',
|
|
|
|
+ 'savevm-async.c',
|
|
|
|
'socket.c',
|
|
|
|
'tls.c',
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
'threadinfo.c',
|
2021-02-11 19:11:11 +03:00
|
|
|
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
|
2017-04-05 11:49:19 +03:00
|
|
|
new file mode 100644
|
2023-10-17 15:10:09 +03:00
|
|
|
index 0000000000..e9fc18fb10
|
2017-04-05 11:49:19 +03:00
|
|
|
--- /dev/null
|
2021-02-11 19:11:11 +03:00
|
|
|
+++ b/migration/savevm-async.c
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -0,0 +1,531 @@
|
2017-04-05 11:49:19 +03:00
|
|
|
+#include "qemu/osdep.h"
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
+#include "migration/channel-savevm-async.h"
|
2018-02-22 14:34:57 +03:00
|
|
|
+#include "migration/migration.h"
|
2023-10-17 15:10:09 +03:00
|
|
|
+#include "migration/migration-stats.h"
|
|
|
|
+#include "migration/options.h"
|
2018-02-22 14:34:57 +03:00
|
|
|
+#include "migration/savevm.h"
|
|
|
|
+#include "migration/snapshot.h"
|
|
|
|
+#include "migration/global_state.h"
|
|
|
|
+#include "migration/ram.h"
|
|
|
|
+#include "migration/qemu-file.h"
|
2017-04-05 11:49:19 +03:00
|
|
|
+#include "sysemu/sysemu.h"
|
2020-03-10 17:12:50 +03:00
|
|
|
+#include "sysemu/runstate.h"
|
2017-04-05 11:49:19 +03:00
|
|
|
+#include "block/block.h"
|
|
|
|
+#include "sysemu/block-backend.h"
|
2018-08-30 16:00:07 +03:00
|
|
|
+#include "qapi/error.h"
|
|
|
|
+#include "qapi/qmp/qerror.h"
|
|
|
|
+#include "qapi/qmp/qdict.h"
|
|
|
|
+#include "qapi/qapi-commands-migration.h"
|
|
|
|
+#include "qapi/qapi-commands-misc.h"
|
2019-04-19 10:53:37 +03:00
|
|
|
+#include "qapi/qapi-commands-block.h"
|
2017-04-05 11:49:19 +03:00
|
|
|
+#include "qemu/cutils.h"
|
2021-02-11 19:11:11 +03:00
|
|
|
+#include "qemu/timer.h"
|
2020-03-10 17:12:50 +03:00
|
|
|
+#include "qemu/main-loop.h"
|
|
|
|
+#include "qemu/rcu.h"
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+#include "qemu/yank.h"
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+/* #define DEBUG_SAVEVM_STATE */
|
|
|
|
+
|
|
|
|
+#ifdef DEBUG_SAVEVM_STATE
|
|
|
|
+#define DPRINTF(fmt, ...) \
|
|
|
|
+ do { printf("savevm-async: " fmt, ## __VA_ARGS__); } while (0)
|
|
|
|
+#else
|
|
|
|
+#define DPRINTF(fmt, ...) \
|
|
|
|
+ do { } while (0)
|
|
|
|
+#endif
|
|
|
|
+
|
|
|
|
+enum {
|
|
|
|
+ SAVE_STATE_DONE,
|
|
|
|
+ SAVE_STATE_ERROR,
|
|
|
|
+ SAVE_STATE_ACTIVE,
|
|
|
|
+ SAVE_STATE_COMPLETED,
|
|
|
|
+ SAVE_STATE_CANCELLED
|
|
|
|
+};
|
|
|
|
+
|
|
|
|
+
|
|
|
|
+static struct SnapshotState {
|
2017-08-07 10:10:07 +03:00
|
|
|
+ BlockBackend *target;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ size_t bs_pos;
|
|
|
|
+ int state;
|
|
|
|
+ Error *error;
|
|
|
|
+ Error *blocker;
|
|
|
|
+ int saved_vm_running;
|
|
|
|
+ QEMUFile *file;
|
|
|
|
+ int64_t total_time;
|
2020-07-02 14:07:28 +03:00
|
|
|
+ QEMUBH *finalize_bh;
|
|
|
|
+ Coroutine *co;
|
2022-08-18 14:44:16 +03:00
|
|
|
+ QemuCoSleep target_close_wait;
|
2017-04-05 11:49:19 +03:00
|
|
|
+} snap_state;
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+static bool savevm_aborted(void)
|
|
|
|
+{
|
|
|
|
+ return snap_state.state == SAVE_STATE_CANCELLED ||
|
|
|
|
+ snap_state.state == SAVE_STATE_ERROR;
|
|
|
|
+}
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+SaveVMInfo *qmp_query_savevm(Error **errp)
|
|
|
|
+{
|
|
|
|
+ SaveVMInfo *info = g_malloc0(sizeof(*info));
|
|
|
|
+ struct SnapshotState *s = &snap_state;
|
|
|
|
+
|
|
|
|
+ if (s->state != SAVE_STATE_DONE) {
|
|
|
|
+ info->has_bytes = true;
|
|
|
|
+ info->bytes = s->bs_pos;
|
|
|
|
+ switch (s->state) {
|
|
|
|
+ case SAVE_STATE_ERROR:
|
|
|
|
+ info->status = g_strdup("failed");
|
|
|
|
+ info->has_total_time = true;
|
|
|
|
+ info->total_time = s->total_time;
|
|
|
|
+ if (s->error) {
|
|
|
|
+ info->error = g_strdup(error_get_pretty(s->error));
|
|
|
|
+ }
|
|
|
|
+ break;
|
|
|
|
+ case SAVE_STATE_ACTIVE:
|
|
|
|
+ info->status = g_strdup("active");
|
|
|
|
+ info->has_total_time = true;
|
|
|
|
+ info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
|
|
|
|
+ - s->total_time;
|
|
|
|
+ break;
|
|
|
|
+ case SAVE_STATE_COMPLETED:
|
|
|
|
+ info->status = g_strdup("completed");
|
|
|
|
+ info->has_total_time = true;
|
|
|
|
+ info->total_time = s->total_time;
|
|
|
|
+ break;
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ return info;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+static int save_snapshot_cleanup(void)
|
|
|
|
+{
|
|
|
|
+ int ret = 0;
|
|
|
|
+
|
|
|
|
+ DPRINTF("save_snapshot_cleanup\n");
|
|
|
|
+
|
|
|
|
+ snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
|
|
|
|
+ snap_state.total_time;
|
|
|
|
+
|
|
|
|
+ if (snap_state.file) {
|
|
|
|
+ ret = qemu_fclose(snap_state.file);
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
+ snap_state.file = NULL;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+
|
2017-08-07 10:10:07 +03:00
|
|
|
+ if (snap_state.target) {
|
2021-02-11 19:11:11 +03:00
|
|
|
+ if (!savevm_aborted()) {
|
|
|
|
+ /* try to truncate, but ignore errors (will fail on block devices).
|
|
|
|
+ * note1: bdrv_read() need whole blocks, so we need to round up
|
|
|
|
+ * note2: PVE requires 1024 (BDRV_SECTOR_SIZE*2) alignment
|
|
|
|
+ */
|
|
|
|
+ size_t size = QEMU_ALIGN_UP(snap_state.bs_pos, BDRV_SECTOR_SIZE*2);
|
|
|
|
+ blk_truncate(snap_state.target, size, false, PREALLOC_MODE_OFF, 0, NULL);
|
|
|
|
+ }
|
2017-08-07 10:10:07 +03:00
|
|
|
+ blk_op_unblock_all(snap_state.target, snap_state.blocker);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ error_free(snap_state.blocker);
|
|
|
|
+ snap_state.blocker = NULL;
|
2017-08-07 10:10:07 +03:00
|
|
|
+ blk_unref(snap_state.target);
|
|
|
|
+ snap_state.target = NULL;
|
2021-02-11 19:11:11 +03:00
|
|
|
+
|
2022-08-18 14:44:16 +03:00
|
|
|
+ qemu_co_sleep_wake(&snap_state.target_close_wait);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ return ret;
|
|
|
|
+}
|
|
|
|
+
|
2023-08-07 16:19:42 +03:00
|
|
|
+static void G_GNUC_PRINTF(1, 2) save_snapshot_error(const char *fmt, ...)
|
2017-04-05 11:49:19 +03:00
|
|
|
+{
|
|
|
|
+ va_list ap;
|
|
|
|
+ char *msg;
|
|
|
|
+
|
|
|
|
+ va_start(ap, fmt);
|
|
|
|
+ msg = g_strdup_vprintf(fmt, ap);
|
|
|
|
+ va_end(ap);
|
|
|
|
+
|
|
|
|
+ DPRINTF("save_snapshot_error: %s\n", msg);
|
|
|
|
+
|
|
|
|
+ if (!snap_state.error) {
|
|
|
|
+ error_set(&snap_state.error, ERROR_CLASS_GENERIC_ERROR, "%s", msg);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ g_free (msg);
|
|
|
|
+
|
|
|
|
+ snap_state.state = SAVE_STATE_ERROR;
|
|
|
|
+}
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+static void process_savevm_finalize(void *opaque)
|
2019-04-19 10:53:37 +03:00
|
|
|
+{
|
|
|
|
+ int ret;
|
2020-07-02 14:07:28 +03:00
|
|
|
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
|
|
|
|
+ MigrationState *ms = migrate_get_current();
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+ bool aborted = savevm_aborted();
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+#ifdef DEBUG_SAVEVM_STATE
|
|
|
|
+ int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
|
|
|
+#endif
|
|
|
|
+
|
|
|
|
+ qemu_bh_delete(snap_state.finalize_bh);
|
|
|
|
+ snap_state.finalize_bh = NULL;
|
|
|
|
+ snap_state.co = NULL;
|
|
|
|
+
|
|
|
|
+ /* We need to own the target bdrv's context for the following functions,
|
|
|
|
+ * so move it back. It can stay in the main context and live out its live
|
|
|
|
+ * there, since we're done with it after this method ends anyway.
|
|
|
|
+ */
|
|
|
|
+ aio_context_acquire(iohandler_ctx);
|
|
|
|
+ blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
|
|
|
|
+ aio_context_release(iohandler_ctx);
|
|
|
|
+
|
|
|
|
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
|
|
|
|
+ if (ret < 0) {
|
|
|
|
+ save_snapshot_error("vm_stop_force_state error %d", ret);
|
|
|
|
+ }
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+ if (!aborted) {
|
|
|
|
+ /* skip state saving if we aborted, snapshot will be invalid anyway */
|
|
|
|
+ (void)qemu_savevm_state_complete_precopy(snap_state.file, false, false);
|
|
|
|
+ ret = qemu_file_get_error(snap_state.file);
|
|
|
|
+ if (ret < 0) {
|
2023-01-23 14:43:23 +03:00
|
|
|
+ save_snapshot_error("qemu_savevm_state_complete_precopy error %d", ret);
|
2021-02-11 19:11:11 +03:00
|
|
|
+ }
|
2020-07-02 14:07:28 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ DPRINTF("state saving complete\n");
|
|
|
|
+ DPRINTF("timing: process_savevm_finalize (state saving) took %ld ms\n",
|
|
|
|
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
|
|
|
|
+
|
|
|
|
+ /* clear migration state */
|
|
|
|
+ migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP,
|
2021-02-11 19:11:11 +03:00
|
|
|
+ ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
|
2020-07-02 14:07:28 +03:00
|
|
|
+ ms->to_dst_file = NULL;
|
|
|
|
+
|
|
|
|
+ qemu_savevm_state_cleanup();
|
|
|
|
+
|
2019-04-19 10:53:37 +03:00
|
|
|
+ ret = save_snapshot_cleanup();
|
|
|
|
+ if (ret < 0) {
|
|
|
|
+ save_snapshot_error("save_snapshot_cleanup error %d", ret);
|
|
|
|
+ } else if (snap_state.state == SAVE_STATE_ACTIVE) {
|
|
|
|
+ snap_state.state = SAVE_STATE_COMPLETED;
|
2021-02-11 19:11:11 +03:00
|
|
|
+ } else if (aborted) {
|
2022-08-18 14:44:17 +03:00
|
|
|
+ /*
|
|
|
|
+ * If there was an error, there's no need to set a new one here.
|
|
|
|
+ * If the snapshot was canceled, leave setting the state to
|
|
|
|
+ * qmp_savevm_end(), which is waked by save_snapshot_cleanup().
|
|
|
|
+ */
|
2019-04-19 10:53:37 +03:00
|
|
|
+ } else {
|
|
|
|
+ save_snapshot_error("process_savevm_cleanup: invalid state: %d",
|
|
|
|
+ snap_state.state);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
2019-04-19 10:53:37 +03:00
|
|
|
+ if (snap_state.saved_vm_running) {
|
|
|
|
+ vm_start();
|
|
|
|
+ snap_state.saved_vm_running = false;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
2020-07-02 14:07:28 +03:00
|
|
|
+
|
|
|
|
+ DPRINTF("timing: process_savevm_finalize (full) took %ld ms\n",
|
|
|
|
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
|
2017-04-05 11:49:19 +03:00
|
|
|
+}
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+static void coroutine_fn process_savevm_co(void *opaque)
|
2017-04-05 11:49:19 +03:00
|
|
|
+{
|
|
|
|
+ int ret;
|
|
|
|
+ int64_t maxlen;
|
2020-07-02 14:07:28 +03:00
|
|
|
+ BdrvNextIterator it;
|
|
|
|
+ BlockDriverState *bs = NULL;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+#ifdef DEBUG_SAVEVM_STATE
|
|
|
|
+ int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
|
|
|
+#endif
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2018-02-22 14:34:57 +03:00
|
|
|
+ ret = qemu_file_get_error(snap_state.file);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ if (ret < 0) {
|
2018-02-22 14:34:57 +03:00
|
|
|
+ save_snapshot_error("qemu_savevm_state_setup failed");
|
2020-07-02 14:07:28 +03:00
|
|
|
+ return;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ while (snap_state.state == SAVE_STATE_ACTIVE) {
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ uint64_t pending_size, pend_precopy, pend_postcopy;
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ uint64_t threshold = 400 * 1000;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ /*
|
|
|
|
+ * pending_{estimate,exact} are expected to be called without iothread
|
|
|
|
+ * lock. Similar to what is done in migration.c, call the exact variant
|
|
|
|
+ * only once pend_precopy in the estimate is below the threshold.
|
|
|
|
+ */
|
2021-03-16 19:30:22 +03:00
|
|
|
+ qemu_mutex_unlock_iothread();
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ qemu_savevm_state_pending_estimate(&pend_precopy, &pend_postcopy);
|
|
|
|
+ if (pend_precopy <= threshold) {
|
|
|
|
+ qemu_savevm_state_pending_exact(&pend_precopy, &pend_postcopy);
|
|
|
|
+ }
|
2021-03-16 19:30:22 +03:00
|
|
|
+ qemu_mutex_lock_iothread();
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ pending_size = pend_precopy + pend_postcopy;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
savevm-async: keep more free space when entering final stage
In qemu-server, we already allocate 2 * $mem_size + 500 MiB for driver
state (which was 32 MiB long ago according to git history). It seems
likely that the 30 MiB cutoff in the savevm-async implementation was
chosen based on that.
In bug #4476 [0], another issue caused the iteration to not make any
progress and the state file filled up all the way to the 30 MiB +
pending_size cutoff. Since the guest is not stopped immediately after
the check, it can still dirty some RAM and the current cutoff is not
enough for a reproducer VM (was done while bug #4476 still was not
fixed), dirtying memory with
> stress-ng -B 2 --bigheap-growth 64.0M'
After entering the final stage, savevm actually filled up the state
file completely, leading to an I/O error. It's probably the same
scenario as reported in the bug report, the error message was fixed in
commit a020815 ("savevm-async: fix function name in error message")
after the bug report.
If not for the bug, the cutoff will only be reached by a VM that's
dirtying RAM faster than can be written to the storage, so increase
the cutoff to 100 MiB to have a bigger chance to finish successfully,
while still trying to not increase downtime too much for
non-hibernation snapshots.
[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=4476
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-26 16:46:14 +03:00
|
|
|
+ /*
|
|
|
|
+ * A guest reaching this cutoff is dirtying lots of RAM. It should be
|
|
|
|
+ * large enough so that the guest can't dirty this much between the
|
|
|
|
+ * check and the guest actually being stopped, but it should be small
|
|
|
|
+ * enough to avoid long downtimes for non-hibernation snapshots.
|
|
|
|
+ */
|
|
|
|
+ maxlen = blk_getlength(snap_state.target) - 100*1024*1024;
|
2019-04-19 10:53:37 +03:00
|
|
|
+
|
2023-01-26 16:46:13 +03:00
|
|
|
+ /* Note that there is no progress for pend_postcopy when iterating */
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ if (pend_precopy > threshold && snap_state.bs_pos + pending_size < maxlen) {
|
2019-04-19 10:53:37 +03:00
|
|
|
+ ret = qemu_savevm_state_iterate(snap_state.file, false);
|
|
|
|
+ if (ret < 0) {
|
|
|
|
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
|
|
|
|
+ break;
|
|
|
|
+ }
|
2020-07-02 14:07:28 +03:00
|
|
|
+ DPRINTF("savevm iterate pending size %lu ret %d\n", pending_size, ret);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ } else {
|
2019-06-06 13:58:15 +03:00
|
|
|
+ qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
|
2023-10-17 15:10:09 +03:00
|
|
|
+ global_state_store();
|
2020-07-02 14:07:28 +03:00
|
|
|
+
|
|
|
|
+ DPRINTF("savevm iterate complete\n");
|
2017-04-05 11:49:19 +03:00
|
|
|
+ break;
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+ DPRINTF("timing: process_savevm_co took %ld ms\n",
|
|
|
|
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
|
|
|
|
+
|
|
|
|
+#ifdef DEBUG_SAVEVM_STATE
|
|
|
|
+ int64_t start_time_flush = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
|
|
|
+#endif
|
|
|
|
+ /* If a drive runs in an IOThread we can flush it async, and only
|
|
|
|
+ * need to sync-flush whatever IO happens between now and
|
|
|
|
+ * vm_stop_force_state. bdrv_next can only be called from main AioContext,
|
|
|
|
+ * so move there now and after every flush.
|
|
|
|
+ */
|
|
|
|
+ aio_co_reschedule_self(qemu_get_aio_context());
|
|
|
|
+ for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
|
|
|
|
+ /* target has BDRV_O_NO_FLUSH, no sense calling bdrv_flush on it */
|
|
|
|
+ if (bs == blk_bs(snap_state.target)) {
|
|
|
|
+ continue;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
|
|
|
|
+ if (bs_ctx != qemu_get_aio_context()) {
|
|
|
|
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
|
|
|
|
+ aio_co_reschedule_self(bs_ctx);
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ bdrv_graph_co_rdlock();
|
2020-07-02 14:07:28 +03:00
|
|
|
+ bdrv_flush(bs);
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ bdrv_graph_co_rdunlock();
|
2020-07-02 14:07:28 +03:00
|
|
|
+ aio_co_reschedule_self(qemu_get_aio_context());
|
|
|
|
+ }
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ DPRINTF("timing: async flushing took %ld ms\n",
|
|
|
|
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time_flush);
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+ qemu_bh_schedule(snap_state.finalize_bh);
|
2017-04-05 11:49:19 +03:00
|
|
|
+}
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+void qmp_savevm_start(const char *statefile, Error **errp)
|
2017-04-05 11:49:19 +03:00
|
|
|
+{
|
|
|
|
+ Error *local_err = NULL;
|
2020-07-02 14:07:28 +03:00
|
|
|
+ MigrationState *ms = migrate_get_current();
|
|
|
|
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2017-08-07 10:10:07 +03:00
|
|
|
+ int bdrv_oflags = BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ if (snap_state.state != SAVE_STATE_DONE) {
|
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
|
|
|
|
+ "VM snapshot already started\n");
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+ if (migration_is_running(ms->state)) {
|
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, QERR_MIGRATION_ACTIVE);
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
2023-10-17 15:10:09 +03:00
|
|
|
+ if (migrate_block()) {
|
2020-07-02 14:07:28 +03:00
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
|
|
|
|
+ "Block migration and snapshots are incompatible");
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+ /* initialize snapshot info */
|
|
|
|
+ snap_state.saved_vm_running = runstate_is_running();
|
|
|
|
+ snap_state.bs_pos = 0;
|
|
|
|
+ snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
|
|
|
|
+ snap_state.blocker = NULL;
|
2022-10-14 15:07:15 +03:00
|
|
|
+ snap_state.target_close_wait = (QemuCoSleep){ .to_wake = NULL };
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ if (snap_state.error) {
|
|
|
|
+ error_free(snap_state.error);
|
|
|
|
+ snap_state.error = NULL;
|
|
|
|
+ }
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ if (!statefile) {
|
2017-04-05 11:49:19 +03:00
|
|
|
+ vm_stop(RUN_STATE_SAVE_VM);
|
|
|
|
+ snap_state.state = SAVE_STATE_COMPLETED;
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ if (qemu_savevm_state_blocked(errp)) {
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ /* Open the image */
|
|
|
|
+ QDict *options = NULL;
|
|
|
|
+ options = qdict_new();
|
2018-08-30 16:00:07 +03:00
|
|
|
+ qdict_put_str(options, "driver", "raw");
|
2017-08-07 10:10:07 +03:00
|
|
|
+ snap_state.target = blk_new_open(statefile, NULL, options, bdrv_oflags, &local_err);
|
|
|
|
+ if (!snap_state.target) {
|
2017-04-05 11:49:19 +03:00
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
|
|
|
|
+ goto restart;
|
|
|
|
+ }
|
|
|
|
+
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
+ QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
|
|
|
|
+ &snap_state.bs_pos));
|
|
|
|
+ snap_state.file = qemu_file_new_output(ioc);
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ if (!snap_state.file) {
|
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
|
|
|
|
+ goto restart;
|
|
|
|
+ }
|
|
|
|
+
|
2020-07-02 14:07:28 +03:00
|
|
|
+ /*
|
|
|
|
+ * qemu_savevm_* paths use migration code and expect a migration state.
|
|
|
|
+ * State is cleared in process_savevm_co, but has to be initialized
|
|
|
|
+ * here (blocking main thread, from QMP) to avoid race conditions.
|
|
|
|
+ */
|
|
|
|
+ migrate_init(ms);
|
2023-10-17 15:10:09 +03:00
|
|
|
+ memset(&mig_stats, 0, sizeof(mig_stats));
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+ memset(&compression_counters, 0, sizeof(compression_counters));
|
2020-07-02 14:07:28 +03:00
|
|
|
+ ms->to_dst_file = snap_state.file;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ error_setg(&snap_state.blocker, "block device is in use by savevm");
|
2017-08-07 10:10:07 +03:00
|
|
|
+ blk_op_block_all(snap_state.target, snap_state.blocker);
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2019-04-19 10:53:37 +03:00
|
|
|
+ snap_state.state = SAVE_STATE_ACTIVE;
|
2020-07-02 14:07:28 +03:00
|
|
|
+ snap_state.finalize_bh = qemu_bh_new(process_savevm_finalize, &snap_state);
|
|
|
|
+ snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
|
|
|
|
+ qemu_mutex_unlock_iothread();
|
|
|
|
+ qemu_savevm_state_header(snap_state.file);
|
|
|
|
+ qemu_savevm_state_setup(snap_state.file);
|
|
|
|
+ qemu_mutex_lock_iothread();
|
|
|
|
+
|
|
|
|
+ /* Async processing from here on out happens in iohandler context, so let
|
|
|
|
+ * the target bdrv have its home there.
|
|
|
|
+ */
|
|
|
|
+ blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
|
|
|
|
+
|
|
|
|
+ aio_co_schedule(iohandler_ctx, snap_state.co);
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
|
|
|
+ return;
|
|
|
|
+
|
|
|
|
+restart:
|
|
|
|
+
|
|
|
|
+ save_snapshot_error("setup failed");
|
|
|
|
+
|
|
|
|
+ if (snap_state.saved_vm_running) {
|
|
|
|
+ vm_start();
|
2021-02-11 19:11:11 +03:00
|
|
|
+ snap_state.saved_vm_running = false;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+}
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+void coroutine_fn qmp_savevm_end(Error **errp)
|
2017-04-05 11:49:19 +03:00
|
|
|
+{
|
2021-02-11 19:11:11 +03:00
|
|
|
+ int64_t timeout;
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+ if (snap_state.state == SAVE_STATE_DONE) {
|
|
|
|
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
|
|
|
|
+ "VM snapshot not started\n");
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ if (snap_state.state == SAVE_STATE_ACTIVE) {
|
|
|
|
+ snap_state.state = SAVE_STATE_CANCELLED;
|
2021-02-11 19:11:11 +03:00
|
|
|
+ goto wait_for_close;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ if (snap_state.saved_vm_running) {
|
|
|
|
+ vm_start();
|
2021-02-11 19:11:11 +03:00
|
|
|
+ snap_state.saved_vm_running = false;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ snap_state.state = SAVE_STATE_DONE;
|
2021-02-11 19:11:11 +03:00
|
|
|
+
|
|
|
|
+wait_for_close:
|
|
|
|
+ if (!snap_state.target) {
|
|
|
|
+ DPRINTF("savevm-end: no target file open\n");
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ /* wait until cleanup is done before returning, this ensures that after this
|
|
|
|
+ * call exits the statefile will be closed and can be removed immediately */
|
|
|
|
+ DPRINTF("savevm-end: waiting for cleanup\n");
|
|
|
|
+ timeout = 30L * 1000 * 1000 * 1000;
|
2022-08-18 14:44:16 +03:00
|
|
|
+ qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
|
2021-10-11 14:55:34 +03:00
|
|
|
+ QEMU_CLOCK_REALTIME, timeout);
|
2021-02-11 19:11:11 +03:00
|
|
|
+ if (snap_state.target) {
|
|
|
|
+ save_snapshot_error("timeout waiting for target file close in "
|
|
|
|
+ "qmp_savevm_end");
|
|
|
|
+ /* we cannot assume the snapshot finished in this case, so leave the
|
|
|
|
+ * state alone - caller has to figure something out */
|
|
|
|
+ return;
|
|
|
|
+ }
|
|
|
|
+
|
2022-08-18 14:44:17 +03:00
|
|
|
+ // File closed and no other error, so ensure next snapshot can be started.
|
|
|
|
+ if (snap_state.state != SAVE_STATE_ERROR) {
|
|
|
|
+ snap_state.state = SAVE_STATE_DONE;
|
|
|
|
+ }
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+ DPRINTF("savevm-end: cleanup done\n");
|
2017-04-05 11:49:19 +03:00
|
|
|
+}
|
|
|
|
+
|
2018-02-22 14:34:57 +03:00
|
|
|
+int load_snapshot_from_blockdev(const char *filename, Error **errp)
|
2017-04-05 11:49:19 +03:00
|
|
|
+{
|
2017-08-07 10:10:07 +03:00
|
|
|
+ BlockBackend *be;
|
2017-04-05 11:49:19 +03:00
|
|
|
+ Error *local_err = NULL;
|
|
|
|
+ Error *blocker = NULL;
|
|
|
|
+
|
|
|
|
+ QEMUFile *f;
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
+ size_t bs_pos = 0;
|
2017-08-07 10:10:07 +03:00
|
|
|
+ int ret = -EINVAL;
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2017-08-07 10:10:07 +03:00
|
|
|
+ be = blk_new_open(filename, NULL, NULL, 0, &local_err);
|
2017-04-05 11:49:19 +03:00
|
|
|
+
|
2017-08-07 10:10:07 +03:00
|
|
|
+ if (!be) {
|
2018-02-22 14:34:57 +03:00
|
|
|
+ error_setg(errp, "Could not open VM state file");
|
2017-04-05 11:49:19 +03:00
|
|
|
+ goto the_end;
|
|
|
|
+ }
|
|
|
|
+
|
2017-08-07 10:10:07 +03:00
|
|
|
+ error_setg(&blocker, "block device is in use by load state");
|
|
|
|
+ blk_op_block_all(be, blocker);
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+ /* restore the VM state */
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
+ f = qemu_file_new_input(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)));
|
2017-04-05 11:49:19 +03:00
|
|
|
+ if (!f) {
|
2018-02-22 14:34:57 +03:00
|
|
|
+ error_setg(errp, "Could not open VM state file");
|
2017-04-05 11:49:19 +03:00
|
|
|
+ goto the_end;
|
|
|
|
+ }
|
|
|
|
+
|
2018-02-22 14:34:57 +03:00
|
|
|
+ qemu_system_reset(SHUTDOWN_CAUSE_NONE);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ ret = qemu_loadvm_state(f);
|
|
|
|
+
|
2021-03-16 19:30:22 +03:00
|
|
|
+ /* dirty bitmap migration has a special case we need to trigger manually */
|
|
|
|
+ dirty_bitmap_mig_before_vm_start();
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+ qemu_fclose(f);
|
squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.
If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.
The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.
No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.
The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.
The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.
The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:56 +03:00
|
|
|
+
|
|
|
|
+ /* state_destroy assumes a real migration which would have added a yank */
|
|
|
|
+ yank_register_instance(MIGRATION_YANK_INSTANCE, &error_abort);
|
|
|
|
+
|
2017-04-05 11:49:19 +03:00
|
|
|
+ migration_incoming_state_destroy();
|
|
|
|
+ if (ret < 0) {
|
2018-02-22 14:34:57 +03:00
|
|
|
+ error_setg_errno(errp, -ret, "Error while loading VM state");
|
2017-04-05 11:49:19 +03:00
|
|
|
+ goto the_end;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ ret = 0;
|
|
|
|
+
|
|
|
|
+ the_end:
|
2017-08-07 10:10:07 +03:00
|
|
|
+ if (be) {
|
|
|
|
+ blk_op_unblock_all(be, blocker);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ error_free(blocker);
|
2017-08-07 10:10:07 +03:00
|
|
|
+ blk_unref(be);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ }
|
|
|
|
+ return ret;
|
|
|
|
+}
|
2021-02-11 19:11:11 +03:00
|
|
|
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
|
2023-05-24 16:56:53 +03:00
|
|
|
index 6c559b48c8..91be698308 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/monitor/hmp-cmds.c
|
|
|
|
+++ b/monitor/hmp-cmds.c
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
@@ -22,6 +22,7 @@
|
|
|
|
#include "monitor/monitor-internal.h"
|
|
|
|
#include "qapi/error.h"
|
|
|
|
#include "qapi/qapi-commands-control.h"
|
|
|
|
+#include "qapi/qapi-commands-migration.h"
|
|
|
|
#include "qapi/qapi-commands-misc.h"
|
|
|
|
#include "qapi/qmp/qdict.h"
|
|
|
|
#include "qapi/qmp/qerror.h"
|
2023-05-24 16:56:53 +03:00
|
|
|
@@ -443,3 +444,40 @@ void hmp_info_mtree(Monitor *mon, const QDict *qdict)
|
2021-02-11 19:11:11 +03:00
|
|
|
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
mtree_info(flatview, dispatch_tree, owner, disabled);
|
|
|
|
}
|
|
|
|
+
|
2021-02-11 19:11:11 +03:00
|
|
|
+void hmp_savevm_start(Monitor *mon, const QDict *qdict)
|
|
|
|
+{
|
|
|
|
+ Error *errp = NULL;
|
|
|
|
+ const char *statefile = qdict_get_try_str(qdict, "statefile");
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ qmp_savevm_start(statefile, &errp);
|
2021-02-11 19:11:11 +03:00
|
|
|
+ hmp_handle_error(mon, errp);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+void coroutine_fn hmp_savevm_end(Monitor *mon, const QDict *qdict)
|
|
|
|
+{
|
|
|
|
+ Error *errp = NULL;
|
|
|
|
+
|
|
|
|
+ qmp_savevm_end(&errp);
|
|
|
|
+ hmp_handle_error(mon, errp);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
+void hmp_info_savevm(Monitor *mon, const QDict *qdict)
|
|
|
|
+{
|
|
|
|
+ SaveVMInfo *info;
|
|
|
|
+ info = qmp_query_savevm(NULL);
|
|
|
|
+
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ if (info->status) {
|
2021-02-11 19:11:11 +03:00
|
|
|
+ monitor_printf(mon, "savevm status: %s\n", info->status);
|
|
|
|
+ monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
|
|
|
|
+ info->total_time);
|
|
|
|
+ } else {
|
|
|
|
+ monitor_printf(mon, "savevm status: not running\n");
|
|
|
|
+ }
|
|
|
|
+ if (info->has_bytes) {
|
|
|
|
+ monitor_printf(mon, "Bytes saved: %"PRIu64"\n", info->bytes);
|
|
|
|
+ }
|
update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
|
|
|
+ if (info->error) {
|
2021-02-11 19:11:11 +03:00
|
|
|
+ monitor_printf(mon, "Error: %s\n", info->error);
|
|
|
|
+ }
|
|
|
|
+}
|
|
|
|
diff --git a/qapi/migration.json b/qapi/migration.json
|
2023-10-17 15:10:09 +03:00
|
|
|
index 8843e74b59..aca0ca1ac1 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/qapi/migration.json
|
|
|
|
+++ b/qapi/migration.json
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -291,6 +291,40 @@
|
|
|
|
'*dirty-limit-throttle-time-per-round': 'uint64',
|
|
|
|
'*dirty-limit-ring-full-time': 'uint64'} }
|
2021-02-11 19:11:11 +03:00
|
|
|
|
|
|
|
+##
|
|
|
|
+# @SaveVMInfo:
|
|
|
|
+#
|
|
|
|
+# Information about current migration process.
|
|
|
|
+#
|
|
|
|
+# @status: string describing the current savevm status.
|
|
|
|
+# This can be 'active', 'completed', 'failed'.
|
|
|
|
+# If this field is not returned, no savevm process
|
|
|
|
+# has been initiated
|
|
|
|
+#
|
|
|
|
+# @error: string containing error message is status is failed.
|
|
|
|
+#
|
|
|
|
+# @total-time: total amount of milliseconds since savevm started.
|
|
|
|
+# If savevm has ended, it returns the total save time
|
|
|
|
+#
|
|
|
|
+# @bytes: total amount of data transfered
|
|
|
|
+#
|
|
|
|
+# Since: 1.3
|
|
|
|
+##
|
|
|
|
+{ 'struct': 'SaveVMInfo',
|
|
|
|
+ 'data': {'*status': 'str', '*error': 'str',
|
|
|
|
+ '*total-time': 'int', '*bytes': 'int'} }
|
|
|
|
+
|
|
|
|
+##
|
|
|
|
+# @query-savevm:
|
|
|
|
+#
|
|
|
|
+# Returns information about current savevm process.
|
|
|
|
+#
|
|
|
|
+# Returns: @SaveVMInfo
|
|
|
|
+#
|
|
|
|
+# Since: 1.3
|
|
|
|
+##
|
|
|
|
+{ 'command': 'query-savevm', 'returns': 'SaveVMInfo' }
|
|
|
|
+
|
|
|
|
##
|
|
|
|
# @query-migrate:
|
|
|
|
#
|
|
|
|
diff --git a/qapi/misc.json b/qapi/misc.json
|
2023-10-17 15:10:09 +03:00
|
|
|
index cda2effa81..94a58bb0bf 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/qapi/misc.json
|
|
|
|
+++ b/qapi/misc.json
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -456,6 +456,22 @@
|
2021-02-11 19:11:11 +03:00
|
|
|
##
|
|
|
|
{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
|
|
|
|
|
|
|
|
+##
|
|
|
|
+# @savevm-start:
|
|
|
|
+#
|
|
|
|
+# Prepare for snapshot and halt VM. Save VM state to statefile.
|
|
|
|
+#
|
|
|
|
+##
|
|
|
|
+{ 'command': 'savevm-start', 'data': { '*statefile': 'str' } }
|
|
|
|
+
|
|
|
|
+##
|
|
|
|
+# @savevm-end:
|
|
|
|
+#
|
|
|
|
+# Resume VM after a snapshot.
|
|
|
|
+#
|
|
|
|
+##
|
|
|
|
+{ 'command': 'savevm-end', 'coroutine': true }
|
|
|
|
+
|
|
|
|
##
|
|
|
|
# @CommandLineParameterType:
|
|
|
|
#
|
|
|
|
diff --git a/qemu-options.hx b/qemu-options.hx
|
2024-01-30 17:14:37 +03:00
|
|
|
index 8073f5edf5..dc1ececc9c 100644
|
2021-02-11 19:11:11 +03:00
|
|
|
--- a/qemu-options.hx
|
|
|
|
+++ b/qemu-options.hx
|
2024-01-30 17:14:37 +03:00
|
|
|
@@ -4483,6 +4483,18 @@ SRST
|
2021-02-11 19:11:11 +03:00
|
|
|
Start right away with a saved state (``loadvm`` in monitor)
|
|
|
|
ERST
|
|
|
|
|
|
|
|
+DEF("loadstate", HAS_ARG, QEMU_OPTION_loadstate, \
|
|
|
|
+ "-loadstate file\n" \
|
|
|
|
+ " start right away with a saved state\n",
|
|
|
|
+ QEMU_ARCH_ALL)
|
|
|
|
+SRST
|
|
|
|
+``-loadstate file``
|
|
|
|
+ Start right away with a saved state. This option does not rollback
|
|
|
|
+ disk state like @code{loadvm}, so user must make sure that disk
|
|
|
|
+ have correct state. @var{file} can be any valid device URL. See the section
|
|
|
|
+ for "Device URL Syntax" for more information.
|
|
|
|
+ERST
|
|
|
|
+
|
|
|
|
#ifndef _WIN32
|
|
|
|
DEF("daemonize", 0, QEMU_OPTION_daemonize, \
|
|
|
|
"-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL)
|
2020-04-07 17:53:19 +03:00
|
|
|
diff --git a/softmmu/vl.c b/softmmu/vl.c
|
2024-01-30 17:14:37 +03:00
|
|
|
index c9e9ede237..3f2681aded 100644
|
2020-04-07 17:53:19 +03:00
|
|
|
--- a/softmmu/vl.c
|
|
|
|
+++ b/softmmu/vl.c
|
2022-12-14 17:16:32 +03:00
|
|
|
@@ -164,6 +164,7 @@ static const char *accelerators;
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
static bool have_custom_ram_size;
|
|
|
|
static const char *ram_memdev_id;
|
2021-10-11 14:55:34 +03:00
|
|
|
static QDict *machine_opts_dict;
|
2021-05-27 13:43:32 +03:00
|
|
|
+static const char *loadstate;
|
|
|
|
static QTAILQ_HEAD(, ObjectOption) object_opts = QTAILQ_HEAD_INITIALIZER(object_opts);
|
2022-02-11 12:24:33 +03:00
|
|
|
static QTAILQ_HEAD(, DeviceOption) device_opts = QTAILQ_HEAD_INITIALIZER(device_opts);
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
static int display_remote;
|
2024-01-30 17:14:37 +03:00
|
|
|
@@ -2647,6 +2648,12 @@ void qmp_x_exit_preconfig(Error **errp)
|
2022-02-11 12:24:33 +03:00
|
|
|
|
|
|
|
if (loadvm) {
|
|
|
|
load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ } else if (loadstate) {
|
2018-02-22 14:34:57 +03:00
|
|
|
+ Error *local_err = NULL;
|
|
|
|
+ if (load_snapshot_from_blockdev(loadstate, &local_err) < 0) {
|
|
|
|
+ error_report_err(local_err);
|
2017-04-05 11:49:19 +03:00
|
|
|
+ autostart = 0;
|
|
|
|
+ }
|
|
|
|
}
|
2019-06-06 13:58:15 +03:00
|
|
|
if (replay_mode != REPLAY_MODE_NONE) {
|
|
|
|
replay_vmstate_init();
|
2024-01-30 17:14:37 +03:00
|
|
|
@@ -3194,6 +3201,9 @@ void qemu_init(int argc, char **argv)
|
2021-05-27 13:43:32 +03:00
|
|
|
case QEMU_OPTION_loadvm:
|
|
|
|
loadvm = optarg;
|
|
|
|
break;
|
|
|
|
+ case QEMU_OPTION_loadstate:
|
|
|
|
+ loadstate = optarg;
|
|
|
|
+ break;
|
|
|
|
case QEMU_OPTION_full_screen:
|
|
|
|
dpy.has_full_screen = true;
|
|
|
|
dpy.full_screen = true;
|