pve-qemu-qoup/debian/patches/pve/0018-PVE-add-optional-buffer-size-to-QEMUFile.patch

218 lines
7.6 KiB
Diff
Raw Permalink Normal View History

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 4 May 2020 11:05:08 +0200
Subject: [PATCH] PVE: add optional buffer size to QEMUFile
So we can use a 4M buffer for savevm-async which should
increase performance storing the state onto ceph.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[increase max IOV count in QEMUFile to actually write more data]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
[FE: adapt to removal of QEMUFileOps]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
migration/qemu-file.c | 50 +++++++++++++++++++++++++++-------------
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
migration/qemu-file.h | 2 ++
migration/savevm-async.c | 5 ++--
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
3 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index a10882d47f..19c1de0472 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -35,8 +35,8 @@
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
#include "rdma.h"
#include "io/channel-file.h"
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
+#define DEFAULT_IO_BUF_SIZE 32768
+#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 256)
struct QEMUFile {
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
QIOChannel *ioc;
@@ -44,7 +44,8 @@ struct QEMUFile {
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
int buf_index;
int buf_size; /* 0 when writing */
- uint8_t buf[IO_BUF_SIZE];
+ size_t buf_allocated_size;
+ uint8_t *buf;
DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
struct iovec iov[MAX_IOV_SIZE];
@@ -101,7 +102,9 @@ int qemu_file_shutdown(QEMUFile *f)
return 0;
}
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
-static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
+static QEMUFile *qemu_file_new_impl(QIOChannel *ioc,
+ bool is_writable,
+ size_t buffer_size)
{
QEMUFile *f;
@@ -110,6 +113,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
object_ref(ioc);
f->ioc = ioc;
f->is_writable = is_writable;
+ f->buf_allocated_size = buffer_size;
+ f->buf = malloc(buffer_size);
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
return f;
}
@@ -120,17 +125,27 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
*/
QEMUFile *qemu_file_get_return_path(QEMUFile *f)
{
- return qemu_file_new_impl(f->ioc, !f->is_writable);
+ return qemu_file_new_impl(f->ioc, !f->is_writable, DEFAULT_IO_BUF_SIZE);
}
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
QEMUFile *qemu_file_new_output(QIOChannel *ioc)
{
- return qemu_file_new_impl(ioc, true);
+ return qemu_file_new_impl(ioc, true, DEFAULT_IO_BUF_SIZE);
+}
+
+QEMUFile *qemu_file_new_output_sized(QIOChannel *ioc, size_t buffer_size)
+{
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ return qemu_file_new_impl(ioc, true, buffer_size);
}
QEMUFile *qemu_file_new_input(QIOChannel *ioc)
{
- return qemu_file_new_impl(ioc, false);
+ return qemu_file_new_impl(ioc, false, DEFAULT_IO_BUF_SIZE);
+}
+
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+QEMUFile *qemu_file_new_input_sized(QIOChannel *ioc, size_t buffer_size)
+{
+ return qemu_file_new_impl(ioc, false, buffer_size);
}
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
/*
@@ -328,7 +343,7 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
do {
len = qio_channel_read(f->ioc,
(char *)f->buf + pending,
- IO_BUF_SIZE - pending,
+ f->buf_allocated_size - pending,
&local_error);
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
@@ -368,6 +383,9 @@ int qemu_fclose(QEMUFile *f)
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
ret = ret2;
}
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
g_clear_pointer(&f->ioc, object_unref);
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+
+ free(f->buf);
+
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
error_free(f->last_error_obj);
g_free(f);
trace_qemu_file_fclose();
@@ -416,7 +434,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
{
if (!add_to_iovec(f, f->buf + f->buf_index, len, false)) {
f->buf_index += len;
- if (f->buf_index == IO_BUF_SIZE) {
+ if (f->buf_index == f->buf_allocated_size) {
qemu_fflush(f);
}
}
@@ -441,7 +459,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
}
while (size > 0) {
- l = IO_BUF_SIZE - f->buf_index;
+ l = f->buf_allocated_size - f->buf_index;
if (l > size) {
l = size;
}
@@ -587,8 +605,8 @@ size_t coroutine_mixed_fn qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t si
size_t index;
assert(!qemu_file_is_writable(f));
- assert(offset < IO_BUF_SIZE);
- assert(size <= IO_BUF_SIZE - offset);
+ assert(offset < f->buf_allocated_size);
+ assert(size <= f->buf_allocated_size - offset);
/* The 1st byte to read from */
index = f->buf_index + offset;
@@ -638,7 +656,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
size_t res;
uint8_t *src;
- res = qemu_peek_buffer(f, &src, MIN(pending, IO_BUF_SIZE), 0);
+ res = qemu_peek_buffer(f, &src, MIN(pending, f->buf_allocated_size), 0);
if (res == 0) {
return done;
}
@@ -672,7 +690,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
*/
size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
{
- if (size < IO_BUF_SIZE) {
+ if (size < f->buf_allocated_size) {
size_t res;
uint8_t *src = NULL;
@@ -697,7 +715,7 @@ int coroutine_mixed_fn qemu_peek_byte(QEMUFile *f, int offset)
int index = f->buf_index + offset;
assert(!qemu_file_is_writable(f));
- assert(offset < IO_BUF_SIZE);
+ assert(offset < f->buf_allocated_size);
if (index >= f->buf_size) {
qemu_fill_buffer(f);
@@ -811,7 +829,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
const uint8_t *p, size_t size)
{
- ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
+ ssize_t blen = f->buf_allocated_size - f->buf_index - sizeof(int32_t);
if (blen < compressBound(size)) {
return -1;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 32fd4a34fd..36a0cd8cc8 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
@@ -30,7 +30,9 @@
#include "io/channel.h"
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
QEMUFile *qemu_file_new_input(QIOChannel *ioc);
+QEMUFile *qemu_file_new_input_sized(QIOChannel *ioc, size_t buffer_size);
QEMUFile *qemu_file_new_output(QIOChannel *ioc);
+QEMUFile *qemu_file_new_output_sized(QIOChannel *ioc, size_t buffer_size);
int qemu_fclose(QEMUFile *f);
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
/*
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
async snapshot: fix crash with VirtIO block with iothread when not saving VM state As reported in the community forum [0], doing a snapshot without saving the VM state for a VM with a VirtIO block device with iothread would lead to an assertion failure [1] and thus crash. The issue is that vm_start() is called from the coroutine qmp_savevm_end() which violates assumptions about graph locking down the line. Factor out the part of qmp_savevm_end() that actually needs to be a coroutine into a separate helper and turn qmp_savevm_end() into a non-coroutine, so that it can call vm_start() safely. The issue is likely not new, but was exposed by the recent graph locking rework introducing stricter checks. The issue does not occur when saving the VM state, because then the non-coroutine process_savevm_finalize() will already call vm_start() before qmp_savevm_end(). [0]: https://forum.proxmox.com/threads/149883/ [1]: > #0 0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c) > #1 0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2) > #2 0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472) > #3 0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395) > #4 0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2) > #5 0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3) > #6 0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455) > #7 0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb) > #8 0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122) > #9 0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90) > #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927) > #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4) > #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517) > #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b) > #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd) > #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749) > #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8) > #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98) > #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e) > #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b) > #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0) Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-28 11:46:56 +03:00
index 72cf6588c2..fb4e8ea689 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -379,7 +379,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
&snap_state.bs_pos));
- snap_state.file = qemu_file_new_output(ioc);
+ snap_state.file = qemu_file_new_output_sized(ioc, 4 * 1024 * 1024);
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
async snapshot: fix crash with VirtIO block with iothread when not saving VM state As reported in the community forum [0], doing a snapshot without saving the VM state for a VM with a VirtIO block device with iothread would lead to an assertion failure [1] and thus crash. The issue is that vm_start() is called from the coroutine qmp_savevm_end() which violates assumptions about graph locking down the line. Factor out the part of qmp_savevm_end() that actually needs to be a coroutine into a separate helper and turn qmp_savevm_end() into a non-coroutine, so that it can call vm_start() safely. The issue is likely not new, but was exposed by the recent graph locking rework introducing stricter checks. The issue does not occur when saving the VM state, because then the non-coroutine process_savevm_finalize() will already call vm_start() before qmp_savevm_end(). [0]: https://forum.proxmox.com/threads/149883/ [1]: > #0 0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c) > #1 0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2) > #2 0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472) > #3 0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395) > #4 0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2) > #5 0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3) > #6 0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455) > #7 0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb) > #8 0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122) > #9 0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90) > #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927) > #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4) > #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517) > #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b) > #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd) > #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749) > #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8) > #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98) > #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e) > #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b) > #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0) Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-28 11:46:56 +03:00
@@ -503,7 +503,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
/* restore the VM state */
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
- f = qemu_file_new_input(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)));
+ f = qemu_file_new_input_sized(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)),
+ 4 * 1024 * 1024);
if (!f) {
error_setg(errp, "Could not open VM state file");
goto the_end;