pve-qemu-qoup/debian/patches/pve/0016-PVE-add-IOChannel-implementation-for-savevm-async.patch

285 lines
8.2 KiB
Diff
Raw Normal View History

update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 13 Oct 2022 11:33:50 +0200
Subject: [PATCH] PVE: add IOChannel implementation for savevm-async
based on migration/channel-block.c and the implementation that was
present in migration/savevm-async.c before QEMU 7.1.
Passes along read/write requests to the given BlockBackend, while
ensuring that a read request going beyond the end results in a
graceful short read.
Additionally, allows tracking the current position from the outside
(intended to be used for progress tracking).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
---
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
migration/channel-savevm-async.c | 184 +++++++++++++++++++++++++++++++
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
migration/channel-savevm-async.h | 51 +++++++++
migration/meson.build | 1 +
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
3 files changed, 236 insertions(+)
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
create mode 100644 migration/channel-savevm-async.c
create mode 100644 migration/channel-savevm-async.h
diff --git a/migration/channel-savevm-async.c b/migration/channel-savevm-async.c
new file mode 100644
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
index 0000000000..081a192f49
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
--- /dev/null
+++ b/migration/channel-savevm-async.c
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
@@ -0,0 +1,184 @@
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+/*
+ * QIO Channel implementation to be used by savevm-async QMP calls
+ */
+#include "qemu/osdep.h"
+#include "migration/channel-savevm-async.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "trace.h"
+
+QIOChannelSavevmAsync *
+qio_channel_savevm_async_new(BlockBackend *be, size_t *bs_pos)
+{
+ QIOChannelSavevmAsync *ioc;
+
+ ioc = QIO_CHANNEL_SAVEVM_ASYNC(object_new(TYPE_QIO_CHANNEL_SAVEVM_ASYNC));
+
+ bdrv_ref(blk_bs(be));
+ ioc->be = be;
+ ioc->bs_pos = bs_pos;
+
+ return ioc;
+}
+
+
+static void
+qio_channel_savevm_async_finalize(Object *obj)
+{
+ QIOChannelSavevmAsync *ioc = QIO_CHANNEL_SAVEVM_ASYNC(obj);
+
+ if (ioc->be) {
+ bdrv_unref(blk_bs(ioc->be));
+ ioc->be = NULL;
+ }
+ ioc->bs_pos = NULL;
+}
+
+
+static ssize_t
+qio_channel_savevm_async_readv(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov,
+ int **fds,
+ size_t *nfds,
update submodule and patches to QEMU 8.0.0 Many changes were necessary this time around: * QAPI was changed to avoid redundant has_* variables, see commit 44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C") for details. This affected many QMP commands added by Proxmox too. * Pending querying for migration got split into two functions, one to estimate, one for exact value, see commit c8df4a7aef ("migration: Split save_live_pending() into state_pending_*") for details. Relevant for savevm-async and PBS dirty bitmap. * Some block (driver) functions got converted to coroutines, so the Proxmox block drivers needed to be adapted. * Alloc track auto-detaching during PBS live restore got broken by AioContext-related changes resulting in a deadlock. The current, hacky method was replaced by a simpler one. Stefan apparently ran into a problem with that when he wrote the driver, but there were improvements in the stream job code since then and I didn't manage to reproduce the issue. It's a separate patch "alloc-track: fix deadlock during drop" for now, you can find the details there. * Async snapshot-related changes: - The pending querying got adapted to the above-mentioned split and a patch is added to optimize it/make it more similar to what upstream code does. - Added initialization of the compression counters (for future-proofing). - It's necessary the hold the BQL (big QEMU lock = iothread mutex) during the setup phase, because block layer functions are used there and not doing so leads to racy, hard-to-debug crashes or hangs. It's necessary to change some upstream code too for this, a version of the patch "migration: for snapshots, hold the BQL during setup callbacks" is intended to be upstreamed. - Need to take the bdrv graph read lock before flushing. * hmp_info_balloon was moved to a different file. * Needed to include a new headers from time to time to still get the correct functions. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-15 16:39:53 +03:00
+ int flags,
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ BlockBackend *be = saioc->be;
+ int64_t maxlen = blk_getlength(be);
+ QEMUIOVector qiov;
+ size_t size;
+ int ret;
+
+ qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
+
+ if (*saioc->bs_pos >= maxlen) {
+ error_setg(errp, "cannot read beyond maxlen");
+ return -1;
+ }
+
+ if (maxlen - *saioc->bs_pos < qiov.size) {
+ size = maxlen - *saioc->bs_pos;
+ } else {
+ size = qiov.size;
+ }
+
+ // returns 0 on success
+ ret = blk_preadv(be, *saioc->bs_pos, size, &qiov, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "blk_preadv failed");
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ return -1;
+ }
+
+ *saioc->bs_pos += size;
+ return size;
+}
+
+
+static ssize_t
+qio_channel_savevm_async_writev(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov,
+ int *fds,
+ size_t nfds,
+ int flags,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ BlockBackend *be = saioc->be;
+ QEMUIOVector qiov;
+ int ret;
+
+ qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
+
+ if (qemu_in_coroutine()) {
+ ret = blk_co_pwritev(be, *saioc->bs_pos, qiov.size, &qiov, 0);
+ aio_wait_kick();
+ } else {
+ ret = blk_pwritev(be, *saioc->bs_pos, qiov.size, &qiov, 0);
+ }
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "blk(_co)_pwritev failed");
+ return -1;
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ }
+
+ *saioc->bs_pos += qiov.size;
+ return qiov.size;
+}
+
+
+static int
+qio_channel_savevm_async_set_blocking(QIOChannel *ioc,
+ bool enabled,
+ Error **errp)
+{
+ if (!enabled) {
+ error_setg(errp, "Non-blocking mode not supported for savevm-async");
+ return -1;
+ }
+ return 0;
+}
+
+
+static int
+qio_channel_savevm_async_close(QIOChannel *ioc,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ int rv = bdrv_flush(blk_bs(saioc->be));
+
+ if (rv < 0) {
+ error_setg_errno(errp, -rv, "Unable to flush VMState");
+ return -1;
+ }
+
+ bdrv_unref(blk_bs(saioc->be));
+ saioc->be = NULL;
+ saioc->bs_pos = NULL;
+
+ return 0;
+}
+
+
+static void
+qio_channel_savevm_async_set_aio_fd_handler(QIOChannel *ioc,
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ AioContext *read_ctx,
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ IOHandler *io_read,
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
+ AioContext *write_ctx,
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
+ IOHandler *io_write,
+ void *opaque)
+{
+ // if channel-block starts doing something, check if this needs adaptation
+}
+
+
+static void
+qio_channel_savevm_async_class_init(ObjectClass *klass,
+ void *class_data G_GNUC_UNUSED)
+{
+ QIOChannelClass *ioc_klass = QIO_CHANNEL_CLASS(klass);
+
+ ioc_klass->io_writev = qio_channel_savevm_async_writev;
+ ioc_klass->io_readv = qio_channel_savevm_async_readv;
+ ioc_klass->io_set_blocking = qio_channel_savevm_async_set_blocking;
+ ioc_klass->io_close = qio_channel_savevm_async_close;
+ ioc_klass->io_set_aio_fd_handler = qio_channel_savevm_async_set_aio_fd_handler;
+}
+
+static const TypeInfo qio_channel_savevm_async_info = {
+ .parent = TYPE_QIO_CHANNEL,
+ .name = TYPE_QIO_CHANNEL_SAVEVM_ASYNC,
+ .instance_size = sizeof(QIOChannelSavevmAsync),
+ .instance_finalize = qio_channel_savevm_async_finalize,
+ .class_init = qio_channel_savevm_async_class_init,
+};
+
+static void
+qio_channel_savevm_async_register_types(void)
+{
+ type_register_static(&qio_channel_savevm_async_info);
+}
+
+type_init(qio_channel_savevm_async_register_types);
diff --git a/migration/channel-savevm-async.h b/migration/channel-savevm-async.h
new file mode 100644
index 0000000000..17ae2cb261
--- /dev/null
+++ b/migration/channel-savevm-async.h
@@ -0,0 +1,51 @@
+/*
+ * QEMU I/O channels driver for savevm-async.c
+ *
+ * Copyright (c) 2022 Proxmox Server Solutions
+ *
+ * Authors:
+ * Fiona Ebner (f.ebner@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QIO_CHANNEL_SAVEVM_ASYNC_H
+#define QIO_CHANNEL_SAVEVM_ASYNC_H
+
+#include "io/channel.h"
+#include "qom/object.h"
+
+#define TYPE_QIO_CHANNEL_SAVEVM_ASYNC "qio-channel-savevm-async"
+OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelSavevmAsync, QIO_CHANNEL_SAVEVM_ASYNC)
+
+
+/**
+ * QIOChannelSavevmAsync:
+ *
+ * The QIOChannelBlock object provides a channel implementation that is able to
+ * perform I/O on any BlockBackend whose BlockDriverState directly contains a
+ * VMState (as opposed to indirectly, like qcow2). It allows tracking the
+ * current position from the outside.
+ */
+struct QIOChannelSavevmAsync {
+ QIOChannel parent;
+ BlockBackend *be;
+ size_t *bs_pos;
+};
+
+
+/**
+ * qio_channel_savevm_async_new:
+ * @be: the block backend
+ * @bs_pos: used to keep track of the IOChannels current position
+ *
+ * Create a new IO channel object that can perform I/O on a BlockBackend object
+ * whose BlockDriverState directly contains a VMState.
+ *
+ * Returns: the new channel object
+ */
+QIOChannelSavevmAsync *
+qio_channel_savevm_async_new(BlockBackend *be, size_t *bs_pos);
+
+#endif /* QIO_CHANNEL_SAVEVM_ASYNC_H */
diff --git a/migration/meson.build b/migration/meson.build
update submodule and patches to QEMU 8.2.2 This version includes both the AioContext lock and the block graph lock, so there might be some deadlocks lurking. It's not possible to disable the block graph lock like was done in QEMU 8.1, because there are no changes like the function bdrv_schedule_unref() that require it. QEMU 9.0 will finally get rid of the AioContext locking. During live-restore with a VirtIO SCSI drive with iothread there is a known racy deadlock related to the AioContext lock. Not new [1], but not sure if more likely now. Should be fixed in QEMU 9.0. The block graph lock comes with annotations that can be checked by clang's TSA. This required changes to the block drivers, i.e. alloc-track, pbs, zeroinit as well as taking the appropriate locks in pve-backup, savevm-async, vma-reader. Local variable shadowing is prohibited via a compiler flag now, required slight adaptation in vma.c. Major changes only affect alloc-track: * It is not possible to call a generated co-wrapper like bdrv_get_info() while holding the block graph lock exclusively [0], which does happen during initialization of alloc-track when the backing hd is set and the refresh_limits driver callback is invoked. The bdrv_get_info() call to get the cluster size is moved to directly after opening the file child in track_open(). The important thing is that at least the request alignment for the write target is used, because then the RMW cycle in bdrv_pwritev will gather enough data from the backing file. Partial cluster allocations in the target are not a fundamental issue, because the driver returns its allocation status based on the bitmap, so any other data that maps to the same cluster will still be copied later by a stream job (or during writes to that cluster). * Replacing the node cannot be done in the track_co_change_backing_file() callback, because it is a coroutine and cannot hold the block graph lock exclusively. So it is moved to the stream job itself with the auto-remove option not having an effect anymore (qemu-server would always set it anyways). In the future, there could either be a special option for the stream job, or maybe the upcoming blockdev-replace QMP command can be used. Replacing the backing child is actually already done in the stream job, so no need to do it in the track_co_change_backing_file() callback. It also cannot be called from a coroutine. Looking at the implementation in the qcow2 driver, it doesn't seem to be intended to change the backing child itself, just update driver-internal state. Other changes: * alloc-track: Error out early when used without auto-remove. Since replacing the node now happens in the stream job, where the option cannot be read from (it's internal to the driver), it will always be treated as 'on'. Makes sure to have users beside qemu-server notice the change (should they even exist). The option can be fully dropped in the future while adding a version guard in qemu-server. * alloc-track: Avoid seemingly superfluous child permission update. Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix deadlock during drop" where the dropping is not rescheduled and delayed anymore or some upstream change). Replacing the block node will already update the permissions of the new node (which was the file child before). Should there really be some issue, instead of having a drop state, this could also be just based off the fact whether there is still a backing child. Dumping the cumulative (shared) permissions for the BDS with a debug print yields the same values after this patch and with QEMU 8.1, namely 3 and 5. * PBS block driver: compile unconditionally. Proxmox VE always needs it and something in the build process changed to make it not enabled by default. Probably would need to move the build option to meson otherwise. * backup: job unreferencing during cleanup needs to happen outside of coroutine, so it was moved to before invoking the clean * mirror: Cherry-pick stable fix to avoid potential deadlock. * savevm-async: migrate_init now can fail, so propagate potential error. * savevm-async: compression counters are not accessible outside migration/ram-compress now, so drop code that prophylactically set it to zero. [0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/ [1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/ Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
index 92b1cc4297..0e689eac09 100644
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,6 +13,7 @@ system_ss.add(files(
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
+ 'channel-savevm-async.c',
'dirtyrate.c',
update submodule and patches to 7.1.0 Notable changes: * The only big change is the switch to using a custom QIOChannel for savevm-async, because the previously used QEMUFileOps was dropped. Changes to the current implementation: * Switch to vector based methods as required for an IO channel. For short reads the passed-in IO vector is stuffed with zeroes at the end, just to be sure. * For reading: The documentation in include/io/channel.h states that at least one byte should be read, so also error out when whe are at the very end instead of returning 0. * For reading: Fix off-by-one error when request goes beyond end. The wrong code piece was: if ((pos + size) > maxlen) { size = maxlen - pos - 1; } Previously, the last byte would not be read. It's actually possible to get a snapshot .raw file that has content all the way up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any trailing zero bytes (I wrote a script to do it). Luckily, it didn't cause a real issue, because qemu_loadvm_state() is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION) section. The buffer for reading it is simply freed up afterwards and the function will assume that it read the whole section, even if that's not the case. * For writing: Make use of the generated blk_pwritev() wrapper instead of manually wrapping the coroutine to simplify and save a few lines. * Adapt to changed interfaces for blk_{pread,pwrite}: * a9262f551e ("block: Change blk_{pread,pwrite}() param order") * 3b35d4542c ("block: Add a 'flags' param to blk_pread()") * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success") Those changes especially affected the qemu-img dd patches, because the context also changed, but also some of our block drivers used the functions. * Drop qemu-common.h include: it got renamed after essentially everything was moved to other headers. The only remaining user I could find for things dropped from the header between 7.0 and 7.1 was qemu_get_vm_name() in the iscsi-initiatorname patch, but it already includes the header to which the function was moved. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
'exec.c',
'fd.c',