pve-qemu-qoup/debian/patches/pve/0058-PVE-Backup-avoid-segfault-issues-upon-backup-cancel.patch

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:39 +0200
Subject: [PATCH] PVE-Backup: avoid segfault issues upon backup-cancel

When canceling a backup in PVE via a signal it's easy to run into a
situation where the job is already failing when the backup_cancel QMP
command comes in. With a bit of unlucky timing on top, it can happen
that job_exit() runs between schedulung of job_cancel_bh() and
execution of job_cancel_bh(). But job_cancel_sync() does not expect
that the job is already finalized (in fact, the job might've been
freed already, but even if it isn't, job_cancel_sync() would try to
deref job->txn which would be NULL at that point).

It is not possible to simply use the job_cancel() (which is advertised
as being async but isn't in all cases) in qmp_backup_cancel() for the
same reason job_cancel_sync() cannot be used. Namely, because it can
invoke job_finish_sync() (which uses AIO_WAIT_WHILE and thus hangs if
called from a coroutine). This happens when there's multiple jobs in
the transaction and job->deferred_to_main_loop is true (is set before
scheduling job_exit()) or if the job was not started yet.

Fix the issue by selecting the job to cancel in job_cancel_bh() itself
using the first job that's not completed yet. This is not necessarily
the first job in the list, because pvebackup_co_complete_stream()
might not yet have removed a completed job when job_cancel_bh() runs.

An alternative would be to continue using only the first job and
checking against JOB_STATUS_CONCLUDED or JOB_STATUS_NULL to decide if
it's still necessary and possible to cancel, but the approach with
using the first non-completed job seemed more robust.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt for new job lock mechanism replacing AioContext locks]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 pve-backup.c | 57 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/pve-backup.c b/pve-backup.c
index 67e2b99d74..7a8240363d 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -356,12 +356,41 @@ static void pvebackup_complete_cb(void *opaque, int ret)
 
 /*
  * job_cancel(_sync) does not like to be called from coroutines, so defer to
- * main loop processing via a bottom half.
+ * main loop processing via a bottom half. Assumes that caller holds
+ * backup_mutex.
  */
 static void job_cancel_bh(void *opaque) {
     CoCtxData *data = (CoCtxData*)opaque;
-    Job *job = (Job*)data->data;
-    job_cancel_sync(job, true);
+
+    /*
+     * Be careful to pick a valid job to cancel:
+     * 1. job_cancel_sync() does not expect the job to be finalized already.
+     * 2. job_exit() might run between scheduling and running job_cancel_bh()
+     *    and pvebackup_co_complete_stream() might not have removed the job from
+     *    the list yet (in fact, cannot, because it waits for the backup_mutex).
+     * Requiring !job_is_completed() ensures that no finalized job is picked.
+     */
+    GList *bdi = g_list_first(backup_state.di_list);
+    while (bdi) {
+        if (bdi->data) {
+            BlockJob *bj = ((PVEBackupDevInfo *)bdi->data)->job;
+            if (bj) {
+                Job *job = &bj->job;
+                WITH_JOB_LOCK_GUARD() {
+                    if (!job_is_completed_locked(job)) {
+                        job_cancel_sync_locked(job, true);
+                        /*
+                         * It's enough to cancel one job in the transaction, the
+                         * rest will follow automatically.
+                         */
+                        break;
+                    }
+                }
+            }
+        }
+        bdi = g_list_next(bdi);
+    }
+
     aio_co_enter(data->ctx, data->co);
 }
 
@@ -382,22 +411,12 @@ void coroutine_fn qmp_backup_cancel(Error **errp)
         proxmox_backup_abort(backup_state.pbs, "backup canceled");
     }
 
-    /* it's enough to cancel one job in the transaction, the rest will follow
-     * automatically */
-    GList *bdi = g_list_first(backup_state.di_list);
-    BlockJob *cancel_job = bdi && bdi->data ?
-        ((PVEBackupDevInfo *)bdi->data)->job :
-        NULL;
-
-    if (cancel_job) {
-        CoCtxData data = {
-            .ctx = qemu_get_current_aio_context(),
-            .co = qemu_coroutine_self(),
-            .data = &cancel_job->job,
-        };
-        aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
-        qemu_coroutine_yield();
-    }
+    CoCtxData data = {
+        .ctx = qemu_get_current_aio_context(),
+        .co = qemu_coroutine_self(),
+    };
+    aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+    qemu_coroutine_yield();
 
     qemu_co_mutex_unlock(&backup_state.backup_mutex);
 }
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001`
			`From: Fabian Ebner <f.ebner@proxmox.com>`
			`Date: Wed, 25 May 2022 13:59:39 +0200`
			`Subject: [PATCH] PVE-Backup: avoid segfault issues upon backup-cancel`

			`When canceling a backup in PVE via a signal it's easy to run into a`
			`situation where the job is already failing when the backup_cancel QMP`
			`command comes in. With a bit of unlucky timing on top, it can happen`
			`that job_exit() runs between schedulung of job_cancel_bh() and`
			`execution of job_cancel_bh(). But job_cancel_sync() does not expect`
			`that the job is already finalized (in fact, the job might've been`
			`freed already, but even if it isn't, job_cancel_sync() would try to`
			`deref job->txn which would be NULL at that point).`

			`It is not possible to simply use the job_cancel() (which is advertised`
			`as being async but isn't in all cases) in qmp_backup_cancel() for the`
			`same reason job_cancel_sync() cannot be used. Namely, because it can`
			`invoke job_finish_sync() (which uses AIO_WAIT_WHILE and thus hangs if`
			`called from a coroutine). This happens when there's multiple jobs in`
			`the transaction and job->deferred_to_main_loop is true (is set before`
			`scheduling job_exit()) or if the job was not started yet.`

			`Fix the issue by selecting the job to cancel in job_cancel_bh() itself`
			`using the first job that's not completed yet. This is not necessarily`
			`the first job in the list, because pvebackup_co_complete_stream()`
			`might not yet have removed a completed job when job_cancel_bh() runs.`

			`An alternative would be to continue using only the first job and`
			`checking against JOB_STATUS_CONCLUDED or JOB_STATUS_NULL to decide if`
			`it's still necessary and possible to cancel, but the approach with`
			`using the first non-completed job seemed more robust.`

update submodule and patches to 7.2.0 User-facing breaking change: The slirp submodule for user networking got removed. It would be necessary to add the --enable-slirp option to the build and/or install the appropriate library to continue building it. Since PVE is not explicitly supporting it, it would require additionally installing the libslirp0 package on all installations and there is very little mention on the community forum when searching for "slirp" or "netdev user", the plan is to only enable it again if there is some real demand for it. Notable changes: * The big change for this release is the rework of job locking, using a job mutex and introducing _locked() variants of job API functions moving away from call-side AioContext locking. See (in the qemu submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and remove Aiocontext locks") and previous commits for context. Changes required for the backup patches: * Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job API functions where appropriate (many are only availalbe as a _locked() variant). * Remove acquiring/releasing AioContext around functions taking the job mutex lock internally. The patch introducing sequential transaction support for jobs needs to temporarily unlock the job mutex to call job_start() when starting the next job in the transaction. * The zeroinit block driver now marks its child as primary. The documentation in include/block/block-common.h states: > Filter node has exactly one FILTERED\|PRIMARY child, and may have > other children which must not have these bits Without this, an assert will trigger when copying to a zeroinit target with qemu-img convert, because bdrv_child_cb_attach() expects any non-PRIMARY child to be not FILTERED: > qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw > qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion > `!(child->role & BDRV_CHILD_FILTERED)' failed. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2022-12-14 17:16:32 +03:00			`Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>`
update submodule and patches to 7.2.0 User-facing breaking change: The slirp submodule for user networking got removed. It would be necessary to add the --enable-slirp option to the build and/or install the appropriate library to continue building it. Since PVE is not explicitly supporting it, it would require additionally installing the libslirp0 package on all installations and there is very little mention on the community forum when searching for "slirp" or "netdev user", the plan is to only enable it again if there is some real demand for it. Notable changes: * The big change for this release is the rework of job locking, using a job mutex and introducing _locked() variants of job API functions moving away from call-side AioContext locking. See (in the qemu submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and remove Aiocontext locks") and previous commits for context. Changes required for the backup patches: * Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job API functions where appropriate (many are only availalbe as a _locked() variant). * Remove acquiring/releasing AioContext around functions taking the job mutex lock internally. The patch introducing sequential transaction support for jobs needs to temporarily unlock the job mutex to call job_start() when starting the next job in the transaction. * The zeroinit block driver now marks its child as primary. The documentation in include/block/block-common.h states: > Filter node has exactly one FILTERED\|PRIMARY child, and may have > other children which must not have these bits Without this, an assert will trigger when copying to a zeroinit target with qemu-img convert, because bdrv_child_cb_attach() expects any non-PRIMARY child to be not FILTERED: > qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw > qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion > `!(child->role & BDRV_CHILD_FILTERED)' failed. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2022-12-14 17:16:32 +03:00			`[FE: adapt for new job lock mechanism replacing AioContext locks]`
			`Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`---`
update submodule and patches to 7.2.0 User-facing breaking change: The slirp submodule for user networking got removed. It would be necessary to add the --enable-slirp option to the build and/or install the appropriate library to continue building it. Since PVE is not explicitly supporting it, it would require additionally installing the libslirp0 package on all installations and there is very little mention on the community forum when searching for "slirp" or "netdev user", the plan is to only enable it again if there is some real demand for it. Notable changes: * The big change for this release is the rework of job locking, using a job mutex and introducing _locked() variants of job API functions moving away from call-side AioContext locking. See (in the qemu submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and remove Aiocontext locks") and previous commits for context. Changes required for the backup patches: * Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job API functions where appropriate (many are only availalbe as a _locked() variant). * Remove acquiring/releasing AioContext around functions taking the job mutex lock internally. The patch introducing sequential transaction support for jobs needs to temporarily unlock the job mutex to call job_start() when starting the next job in the transaction. * The zeroinit block driver now marks its child as primary. The documentation in include/block/block-common.h states: > Filter node has exactly one FILTERED\|PRIMARY child, and may have > other children which must not have these bits Without this, an assert will trigger when copying to a zeroinit target with qemu-img convert, because bdrv_child_cb_attach() expects any non-PRIMARY child to be not FILTERED: > qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw > qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion > `!(child->role & BDRV_CHILD_FILTERED)' failed. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2022-12-14 17:16:32 +03:00			`pve-backup.c \| 57 ++++++++++++++++++++++++++++++++++------------------`
			`1 file changed, 38 insertions(+), 19 deletions(-)`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00
			`diff --git a/pve-backup.c b/pve-backup.c`
PVE backup: don't call no_co_wrapper function from coroutine Namely, pvebackup_co_prepare() needs to call bdrv_co_open() rather than bdrv_open(), because it is a coroutine itself. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2023-05-15 16:39:55 +03:00			`index 67e2b99d74..7a8240363d 100644`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`--- a/pve-backup.c`
			`+++ b/pve-backup.c`
update submodule and patches to QEMU 8.0.0 Many changes were necessary this time around: * QAPI was changed to avoid redundant has_* variables, see commit 44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C") for details. This affected many QMP commands added by Proxmox too. * Pending querying for migration got split into two functions, one to estimate, one for exact value, see commit c8df4a7aef ("migration: Split save_live_pending() into state_pending_") for details. Relevant for savevm-async and PBS dirty bitmap. Some block (driver) functions got converted to coroutines, so the Proxmox block drivers needed to be adapted. * Alloc track auto-detaching during PBS live restore got broken by AioContext-related changes resulting in a deadlock. The current, hacky method was replaced by a simpler one. Stefan apparently ran into a problem with that when he wrote the driver, but there were improvements in the stream job code since then and I didn't manage to reproduce the issue. It's a separate patch "alloc-track: fix deadlock during drop" for now, you can find the details there. * Async snapshot-related changes: - The pending querying got adapted to the above-mentioned split and a patch is added to optimize it/make it more similar to what upstream code does. - Added initialization of the compression counters (for future-proofing). - It's necessary the hold the BQL (big QEMU lock = iothread mutex) during the setup phase, because block layer functions are used there and not doing so leads to racy, hard-to-debug crashes or hangs. It's necessary to change some upstream code too for this, a version of the patch "migration: for snapshots, hold the BQL during setup callbacks" is intended to be upstreamed. - Need to take the bdrv graph read lock before flushing. * hmp_info_balloon was moved to a different file. * Needed to include a new headers from time to time to still get the correct functions. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2023-05-15 16:39:53 +03:00			`@@ -356,12 +356,41 @@ static void pvebackup_complete_cb(void *opaque, int ret)`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00
			`/*`
			`* job_cancel(_sync) does not like to be called from coroutines, so defer to`
			`- * main loop processing via a bottom half.`
			`+ * main loop processing via a bottom half. Assumes that caller holds`
			`+ * backup_mutex.`
			`*/`
			`static void job_cancel_bh(void *opaque) {`
			`CoCtxData data = (CoCtxData)opaque;`
			`- Job job = (Job)data->data;`
			`- job_cancel_sync(job, true);`
			`+`
			`+ /*`
			`+ * Be careful to pick a valid job to cancel:`
			`+ * 1. job_cancel_sync() does not expect the job to be finalized already.`
			`+ * 2. job_exit() might run between scheduling and running job_cancel_bh()`
			`+ * and pvebackup_co_complete_stream() might not have removed the job from`
			`+ * the list yet (in fact, cannot, because it waits for the backup_mutex).`
			`+ * Requiring !job_is_completed() ensures that no finalized job is picked.`
			`+ */`
			`+ GList *bdi = g_list_first(backup_state.di_list);`
			`+ while (bdi) {`
			`+ if (bdi->data) {`
			`+ BlockJob bj = ((PVEBackupDevInfo )bdi->data)->job;`
			`+ if (bj) {`
			`+ Job *job = &bj->job;`
update submodule and patches to 7.2.0 User-facing breaking change: The slirp submodule for user networking got removed. It would be necessary to add the --enable-slirp option to the build and/or install the appropriate library to continue building it. Since PVE is not explicitly supporting it, it would require additionally installing the libslirp0 package on all installations and there is very little mention on the community forum when searching for "slirp" or "netdev user", the plan is to only enable it again if there is some real demand for it. Notable changes: * The big change for this release is the rework of job locking, using a job mutex and introducing _locked() variants of job API functions moving away from call-side AioContext locking. See (in the qemu submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and remove Aiocontext locks") and previous commits for context. Changes required for the backup patches: * Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job API functions where appropriate (many are only availalbe as a _locked() variant). * Remove acquiring/releasing AioContext around functions taking the job mutex lock internally. The patch introducing sequential transaction support for jobs needs to temporarily unlock the job mutex to call job_start() when starting the next job in the transaction. * The zeroinit block driver now marks its child as primary. The documentation in include/block/block-common.h states: > Filter node has exactly one FILTERED\|PRIMARY child, and may have > other children which must not have these bits Without this, an assert will trigger when copying to a zeroinit target with qemu-img convert, because bdrv_child_cb_attach() expects any non-PRIMARY child to be not FILTERED: > qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw > qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion > `!(child->role & BDRV_CHILD_FILTERED)' failed. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2022-12-14 17:16:32 +03:00			`+ WITH_JOB_LOCK_GUARD() {`
			`+ if (!job_is_completed_locked(job)) {`
			`+ job_cancel_sync_locked(job, true);`
			`+ /*`
			`+ * It's enough to cancel one job in the transaction, the`
			`+ * rest will follow automatically.`
			`+ */`
			`+ break;`
			`+ }`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`+ }`
			`+ }`
			`+ }`
			`+ bdi = g_list_next(bdi);`
			`+ }`
			`+`
			`aio_co_enter(data->ctx, data->co);`
			`}`

update submodule and patches to QEMU 8.0.0 Many changes were necessary this time around: * QAPI was changed to avoid redundant has_* variables, see commit 44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C") for details. This affected many QMP commands added by Proxmox too. * Pending querying for migration got split into two functions, one to estimate, one for exact value, see commit c8df4a7aef ("migration: Split save_live_pending() into state_pending_") for details. Relevant for savevm-async and PBS dirty bitmap. Some block (driver) functions got converted to coroutines, so the Proxmox block drivers needed to be adapted. * Alloc track auto-detaching during PBS live restore got broken by AioContext-related changes resulting in a deadlock. The current, hacky method was replaced by a simpler one. Stefan apparently ran into a problem with that when he wrote the driver, but there were improvements in the stream job code since then and I didn't manage to reproduce the issue. It's a separate patch "alloc-track: fix deadlock during drop" for now, you can find the details there. * Async snapshot-related changes: - The pending querying got adapted to the above-mentioned split and a patch is added to optimize it/make it more similar to what upstream code does. - Added initialization of the compression counters (for future-proofing). - It's necessary the hold the BQL (big QEMU lock = iothread mutex) during the setup phase, because block layer functions are used there and not doing so leads to racy, hard-to-debug crashes or hangs. It's necessary to change some upstream code too for this, a version of the patch "migration: for snapshots, hold the BQL during setup callbacks" is intended to be upstreamed. - Need to take the bdrv graph read lock before flushing. * hmp_info_balloon was moved to a different file. * Needed to include a new headers from time to time to still get the correct functions. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> 2023-05-15 16:39:53 +03:00			`@@ -382,22 +411,12 @@ void coroutine_fn qmp_backup_cancel(Error **errp)`
pbs cleanup fixes Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> 2022-06-08 14:10:51 +03:00			`proxmox_backup_abort(backup_state.pbs, "backup canceled");`
			`}`

			`- /* it's enough to cancel one job in the transaction, the rest will follow`
			`- * automatically */`
			`- GList *bdi = g_list_first(backup_state.di_list);`
			`- BlockJob *cancel_job = bdi && bdi->data ?`
			`- ((PVEBackupDevInfo *)bdi->data)->job :`
			`- NULL;`
			`-`
			`- if (cancel_job) {`
			`- CoCtxData data = {`
			`- .ctx = qemu_get_current_aio_context(),`
			`- .co = qemu_coroutine_self(),`
			`- .data = &cancel_job->job,`
			`- };`
			`- aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);`
			`- qemu_coroutine_yield();`
			`- }`
			`+ CoCtxData data = {`
			`+ .ctx = qemu_get_current_aio_context(),`
			`+ .co = qemu_coroutine_self(),`
			`+ };`
			`+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);`
			`+ qemu_coroutine_yield();`

			`qemu_co_mutex_unlock(&backup_state.backup_mutex);`
			`}`