pve-qemu-qoup/debian/patches/pve/0030-PVE-Backup-Proxmox-backup-patches-for-QEMU.patch
Fiona Ebner f1eed34ac7 update submodule and patches to QEMU 8.2.2
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.

During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.

The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.

Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.

Major changes only affect alloc-track:

* It is not possible to call a generated co-wrapper like
  bdrv_get_info() while holding the block graph lock exclusively [0],
  which does happen during initialization of alloc-track when the
  backing hd is set and the refresh_limits driver callback is invoked.

  The bdrv_get_info() call to get the cluster size is moved to
  directly after opening the file child in track_open().

  The important thing is that at least the request alignment for the
  write target is used, because then the RMW cycle in bdrv_pwritev
  will gather enough data from the backing file. Partial cluster
  allocations in the target are not a fundamental issue, because the
  driver returns its allocation status based on the bitmap, so any
  other data that maps to the same cluster will still be copied later
  by a stream job (or during writes to that cluster).

* Replacing the node cannot be done in the
  track_co_change_backing_file() callback, because it is a coroutine
  and cannot hold the block graph lock exclusively. So it is moved to
  the stream job itself with the auto-remove option not having an
  effect anymore (qemu-server would always set it anyways).

  In the future, there could either be a special option for the stream
  job, or maybe the upcoming blockdev-replace QMP command can be used.

  Replacing the backing child is actually already done in the stream
  job, so no need to do it in the track_co_change_backing_file()
  callback. It also cannot be called from a coroutine. Looking at the
  implementation in the qcow2 driver, it doesn't seem to be intended
  to change the backing child itself, just update driver-internal
  state.

Other changes:

* alloc-track: Error out early when used without auto-remove. Since
  replacing the node now happens in the stream job, where the option
  cannot be read from (it's internal to the driver), it will always be
  treated as 'on'. Makes sure to have users beside qemu-server notice
  the change (should they even exist). The option can be fully dropped
  in the future while adding a version guard in qemu-server.

* alloc-track: Avoid seemingly superfluous child permission update.
  Doesn't seem necessary nowadays (maybe after commit "alloc-track:
  fix deadlock during drop" where the dropping is not rescheduled and
  delayed anymore or some upstream change). Replacing the block node
  will already update the permissions of the new node (which was the
  file child before). Should there really be some issue, instead of
  having a drop state, this could also be just based off the fact
  whether there is still a backing child.

  Dumping the cumulative (shared) permissions for the BDS with a debug
  print yields the same values after this patch and with QEMU 8.1,
  namely 3 and 5.

* PBS block driver: compile unconditionally. Proxmox VE always needs
  it and something in the build process changed to make it not enabled
  by default. Probably would need to move the build option to meson
  otherwise.

* backup: job unreferencing during cleanup needs to happen outside of
  coroutine, so it was moved to before invoking the clean

* mirror: Cherry-pick stable fix to avoid potential deadlock.

* savevm-async: migrate_init now can fail, so propagate potential
  error.

* savevm-async: compression counters are not accessible outside
  migration/ram-compress now, so drop code that prophylactically set
  it to zero.

[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:14:06 +02:00

1992 lines
62 KiB
Diff

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:59 +0200
Subject: [PATCH] PVE-Backup: Proxmox backup patches for QEMU
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
For PBS, using dirty bitmaps is supported via QEMU's
MIRROR_SYNC_MODE_BITMAP. When the feature is used, the data-write
callback is only executed for any changed chunks, the PBS rust code
will reuse chunks from the previous index for everything it doesn't
receive if reuse_index is true. On error or cancellation, all dirty
bitmaps are removed to ensure consistency.
By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup
fails, since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that
for us. A sequential transaction is used, so drives will be backed up
one after the other.
The backup and backup-cancel QMP calls are coroutines. This has the
benefit that calls are asynchronous to the main loop, i.e. long
running operations like connecting to a PBS server will no longer hang
the VM.
backup_job_create() and job_cancel_sync() cannot be run from a
coroutine and requires an acuqired AioContext, so the job creation and
canceling are extracted as bottom halves and called from the
respective QMP coroutines.
To communicate the finishing state, a dedicated property is used for
query-backup: 'finishing'. A dedicated state is explicitly not used,
since that would break compatibility with older qemu-server versions.
The first call to job_cancel_sync() will cancel and free all jobs in
the transaction, but it is necessary to pick a job that is:
1. still referenced. For this, there is a job_ref directly after job
creation paired with a job_unref in cleanup paths.
2. not yet finalized. In job_cancel_bh(), the first job that's not
completed yet is used. This is not necessarily the first job in the
list, because pvebackup_co_complete_stream() might not yet have
removed a completed job when job_cancel_bh() runs. Why even bother
with the bottom half at all and not use job_cancel() in
qmp_backup_cancel() directly? The reason is that qmp_backup_cancel()
is a coroutine, so it will hang when reaching AIO_WAIT_WHILE() and
job_cancel() might end up calling that.
Regarding BackupPerf performance settings. For now, only the
max-workers setting is exposed, because:
1. use-copy-range would need to be implemented in backup-dump and the
feature was actually turned off by default in QEMU itself, because it
didn't provide the expected benefit, see commit 6a30f663d4 ("qapi:
backup: disable copy_range by default").
2. max-chunk: enforced to be at least the backup cluster size (4 MiB
for PBS) and otherwise maximum of source and target cluster size.
And block-copy has a maximum buffer size of 1 MiB, so setting a larger
max-chunk doesn't even have an effect. To make the setting sensibly
usable the check would need to be removed and optionally the
block-copy max buffer size would need to be bumped. I tried doing just
that, and tested different source/target combinations with different
max-chunk settings, but there were no noticable improvements over the
default "unlimited" (resulting in 1 MiB for block-copy).
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
[SR: Add dirty-bitmap tracking for incremental backups
Add query_proxmox_support and query-pbs-bitmap-info QMP calls
Use a transaction to synchronize job states
Co-routine and async-related improvements
Improve finishing backups/cleanups
Various other improvements]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FG: add master key support]
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[WB: add PBS namespace support]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: add new force parameter to job_cancel_sync calls
adapt for new job lock mechanism replacing AioContext locks
adapt to QAPI changes
improve canceling
allow passing max-workers setting
use malloc_trim after backup
create jobs in a drained section]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 5 +
block/monitor/block-hmp-cmds.c | 39 ++
blockdev.c | 1 +
hmp-commands-info.hx | 14 +
hmp-commands.hx | 29 +
include/monitor/hmp.h | 3 +
meson.build | 1 +
monitor/hmp-cmds.c | 72 +++
proxmox-backup-client.c | 146 +++++
proxmox-backup-client.h | 60 ++
pve-backup.c | 1103 ++++++++++++++++++++++++++++++++
qapi/block-core.json | 229 +++++++
qapi/common.json | 14 +
qapi/machine.json | 16 +-
14 files changed, 1718 insertions(+), 14 deletions(-)
create mode 100644 proxmox-backup-client.c
create mode 100644 proxmox-backup-client.h
create mode 100644 pve-backup.c
diff --git a/block/meson.build b/block/meson.build
index f7d1b7ac42..9df99aceb5 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -44,6 +44,11 @@ block_ss.add(files(
), zstd, zlib, gnutls)
block_ss.add(files('../vma-writer.c'), libuuid)
+block_ss.add(files(
+ '../proxmox-backup-client.c',
+ '../pve-backup.c',
+), libproxmox_backup_qemu)
+
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..1656859e03 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1037,3 +1037,42 @@ void hmp_change_medium(Monitor *mon, const char *device, const char *target,
qmp_blockdev_change_medium(device, NULL, target, arg, true, force,
!!read_only, read_only_mode, errp);
}
+
+void coroutine_fn hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+{
+ Error *error = NULL;
+
+ qmp_backup_cancel(&error);
+
+ hmp_handle_error(mon, error);
+}
+
+void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
+{
+ Error *error = NULL;
+
+ const char *backup_file = qdict_get_str(qdict, "backupfile");
+ const char *devlist = qdict_get_try_str(qdict, "devlist");
+ int64_t speed = qdict_get_try_int(qdict, "speed", 0);
+
+ qmp_backup(
+ backup_file,
+ NULL, // PBS password
+ NULL, // PBS keyfile
+ NULL, // PBS key_password
+ NULL, // PBS master_keyfile
+ NULL, // PBS fingerprint
+ NULL, // PBS backup-ns
+ NULL, // PBS backup-id
+ false, 0, // PBS backup-time
+ false, false, // PBS use-dirty-bitmap
+ false, false, // PBS compress
+ false, false, // PBS encrypt
+ true, BACKUP_FORMAT_VMA,
+ NULL, NULL,
+ devlist, qdict_haskey(qdict, "speed"), speed,
+ false, 0, // BackupPerf max-workers
+ &error);
+
+ hmp_handle_error(mon, error);
+}
diff --git a/blockdev.c b/blockdev.c
index 38a40e3e32..3049811be8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -37,6 +37,7 @@
#include "block/blockjob.h"
#include "block/dirty-bitmap.h"
#include "block/qdict.h"
+#include "block/blockjob_int.h"
#include "block/throttle-groups.h"
#include "monitor/monitor.h"
#include "qemu/error-report.h"
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 10fdd822e0..15937793c1 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -471,6 +471,20 @@ SRST
Show the current VM UUID.
ERST
+
+ {
+ .name = "backup",
+ .args_type = "",
+ .params = "",
+ .help = "show backup status",
+ .cmd = hmp_info_backup,
+ },
+
+SRST
+ ``info backup``
+ Show backup status.
+ERST
+
#if defined(CONFIG_SLIRP)
{
.name = "usernet",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 893c3bd240..5c1ffbc602 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -101,6 +101,35 @@ ERST
SRST
``block_stream``
Copy data from a backing file into a block device.
+ERST
+
+ {
+ .name = "backup",
+ .args_type = "backupfile:s,speed:o?,devlist:s?",
+ .params = "backupfile [speed [devlist]]",
+ .help = "create a VM backup (VMA format).",
+ .cmd = hmp_backup,
+ .coroutine = true,
+ },
+
+SRST
+``backup``
+ Create a VM backup.
+ERST
+
+ {
+ .name = "backup_cancel",
+ .args_type = "",
+ .params = "",
+ .help = "cancel the current VM backup",
+ .cmd = hmp_backup_cancel,
+ .coroutine = true,
+ },
+
+SRST
+``backup_cancel``
+ Cancel the current VM backup.
+
ERST
{
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 7a7def7530..cba7afe70c 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -32,6 +32,7 @@ void hmp_info_savevm(Monitor *mon, const QDict *qdict);
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
+void hmp_info_backup(Monitor *mon, const QDict *qdict);
void hmp_info_cpus(Monitor *mon, const QDict *qdict);
void hmp_info_vnc(Monitor *mon, const QDict *qdict);
void hmp_info_spice(Monitor *mon, const QDict *qdict);
@@ -84,6 +85,8 @@ void hmp_change_vnc(Monitor *mon, const char *device, const char *target,
void hmp_change_medium(Monitor *mon, const char *device, const char *target,
const char *arg, const char *read_only, bool force,
Error **errp);
+void hmp_backup(Monitor *mon, const QDict *qdict);
+void hmp_backup_cancel(Monitor *mon, const QDict *qdict);
void hmp_migrate(Monitor *mon, const QDict *qdict);
void hmp_device_add(Monitor *mon, const QDict *qdict);
void hmp_device_del(Monitor *mon, const QDict *qdict);
diff --git a/meson.build b/meson.build
index 8cb1ccd5e1..955f579308 100644
--- a/meson.build
+++ b/meson.build
@@ -1803,6 +1803,7 @@ endif
has_gettid = cc.has_function('gettid')
libuuid = cc.find_library('uuid', required: true)
+libproxmox_backup_qemu = cc.find_library('proxmox_backup_qemu', required: true)
# libselinux
selinux = dependency('libselinux',
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ef4634e5c1..6e25279f42 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -21,6 +21,7 @@
#include "qemu/help_option.h"
#include "monitor/monitor-internal.h"
#include "qapi/error.h"
+#include "qapi/qapi-commands-block-core.h"
#include "qapi/qapi-commands-control.h"
#include "qapi/qapi-commands-migration.h"
#include "qapi/qapi-commands-misc.h"
@@ -144,6 +145,77 @@ void hmp_sync_profile(Monitor *mon, const QDict *qdict)
}
}
+void hmp_info_backup(Monitor *mon, const QDict *qdict)
+{
+ BackupStatus *info;
+ PBSBitmapInfoList *bitmap_info;
+
+ info = qmp_query_backup(NULL);
+
+ if (!info) {
+ monitor_printf(mon, "Backup status: not initialized\n");
+ return;
+ }
+
+ if (info->status) {
+ if (info->errmsg) {
+ monitor_printf(mon, "Backup status: %s - %s\n",
+ info->status, info->errmsg);
+ } else {
+ monitor_printf(mon, "Backup status: %s\n", info->status);
+ }
+ }
+
+ if (info->backup_file) {
+ monitor_printf(mon, "Start time: %s", ctime(&info->start_time));
+ if (info->end_time) {
+ monitor_printf(mon, "End time: %s", ctime(&info->end_time));
+ }
+
+ monitor_printf(mon, "Backup file: %s\n", info->backup_file);
+ monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
+
+ if (!(info->has_total && info->total)) {
+ // this should not happen normally
+ monitor_printf(mon, "Total size: %d\n", 0);
+ } else {
+ size_t total_or_dirty = info->total;
+ bitmap_info = qmp_query_pbs_bitmap_info(NULL);
+
+ while (bitmap_info) {
+ monitor_printf(mon, "Drive %s:\n",
+ bitmap_info->value->drive);
+ monitor_printf(mon, " bitmap action: %s\n",
+ PBSBitmapAction_str(bitmap_info->value->action));
+ monitor_printf(mon, " size: %zd\n",
+ bitmap_info->value->size);
+ monitor_printf(mon, " dirty: %zd\n",
+ bitmap_info->value->dirty);
+ bitmap_info = bitmap_info->next;
+ }
+
+ qapi_free_PBSBitmapInfoList(bitmap_info);
+
+ int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
+ (info->zero_bytes * 100)/info->total : 0;
+ monitor_printf(mon, "Total size: %zd\n", info->total);
+ int trans_per = (info->transferred * 100)/total_or_dirty;
+ monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+ info->transferred, trans_per);
+ monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+ info->zero_bytes, zero_per);
+
+ if (info->has_reused) {
+ int reused_per = (info->reused * 100)/total_or_dirty;
+ monitor_printf(mon, "Reused bytes: %zd (%d%%)\n",
+ info->reused, reused_per);
+ }
+ }
+ }
+
+ qapi_free_BackupStatus(info);
+}
+
void hmp_exit_preconfig(Monitor *mon, const QDict *qdict)
{
Error *err = NULL;
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
new file mode 100644
index 0000000000..0923037dec
--- /dev/null
+++ b/proxmox-backup-client.c
@@ -0,0 +1,146 @@
+#include "proxmox-backup-client.h"
+#include "qemu/main-loop.h"
+#include "block/aio-wait.h"
+#include "qapi/error.h"
+
+/* Proxmox Backup Server client bindings using coroutines */
+
+// This is called from another thread, so we use aio_co_schedule()
+static void proxmox_backup_schedule_wake(void *data) {
+ CoCtxData *waker = (CoCtxData *)data;
+ aio_co_schedule(waker->ctx, waker->co);
+}
+
+int coroutine_fn
+proxmox_backup_co_connect(ProxmoxBackupHandle *pbs, Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_connect_async(pbs, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup connect failed: %s", pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
+
+int coroutine_fn
+proxmox_backup_co_add_config(
+ ProxmoxBackupHandle *pbs,
+ const char *name,
+ const uint8_t *data,
+ uint64_t size,
+ Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_add_config_async(
+ pbs, name, data, size ,proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup add_config %s failed: %s", name, pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
+
+int coroutine_fn
+proxmox_backup_co_register_image(
+ ProxmoxBackupHandle *pbs,
+ const char *device_name,
+ uint64_t size,
+ bool incremental,
+ Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_register_image_async(
+ pbs, device_name, size, incremental, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup register image failed: %s", pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
+
+int coroutine_fn
+proxmox_backup_co_finish(
+ ProxmoxBackupHandle *pbs,
+ Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_finish_async(
+ pbs, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup finish failed: %s", pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
+
+int coroutine_fn
+proxmox_backup_co_close_image(
+ ProxmoxBackupHandle *pbs,
+ uint8_t dev_id,
+ Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_close_image_async(
+ pbs, dev_id, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup close image failed: %s", pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
+
+int coroutine_fn
+proxmox_backup_co_write_data(
+ ProxmoxBackupHandle *pbs,
+ uint8_t dev_id,
+ const uint8_t *data,
+ uint64_t offset,
+ uint64_t size,
+ Error **errp)
+{
+ Coroutine *co = qemu_coroutine_self();
+ AioContext *ctx = qemu_get_current_aio_context();
+ CoCtxData waker = { .co = co, .ctx = ctx };
+ char *pbs_err = NULL;
+ int pbs_res = -1;
+
+ proxmox_backup_write_data_async(
+ pbs, dev_id, data, offset, size, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ qemu_coroutine_yield();
+ if (pbs_res < 0) {
+ if (errp) error_setg(errp, "backup write data failed: %s", pbs_err ? pbs_err : "unknown error");
+ if (pbs_err) proxmox_backup_free_error(pbs_err);
+ }
+ return pbs_res;
+}
diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
new file mode 100644
index 0000000000..8cbf645b2c
--- /dev/null
+++ b/proxmox-backup-client.h
@@ -0,0 +1,60 @@
+#ifndef PROXMOX_BACKUP_CLIENT_H
+#define PROXMOX_BACKUP_CLIENT_H
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine.h"
+#include "proxmox-backup-qemu.h"
+
+typedef struct CoCtxData {
+ Coroutine *co;
+ AioContext *ctx;
+ void *data;
+} CoCtxData;
+
+// FIXME: Remove once coroutines are supported for QMP
+void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg);
+
+int coroutine_fn
+proxmox_backup_co_connect(
+ ProxmoxBackupHandle *pbs,
+ Error **errp);
+
+int coroutine_fn
+proxmox_backup_co_add_config(
+ ProxmoxBackupHandle *pbs,
+ const char *name,
+ const uint8_t *data,
+ uint64_t size,
+ Error **errp);
+
+int coroutine_fn
+proxmox_backup_co_register_image(
+ ProxmoxBackupHandle *pbs,
+ const char *device_name,
+ uint64_t size,
+ bool incremental,
+ Error **errp);
+
+
+int coroutine_fn
+proxmox_backup_co_finish(
+ ProxmoxBackupHandle *pbs,
+ Error **errp);
+
+int coroutine_fn
+proxmox_backup_co_close_image(
+ ProxmoxBackupHandle *pbs,
+ uint8_t dev_id,
+ Error **errp);
+
+int coroutine_fn
+proxmox_backup_co_write_data(
+ ProxmoxBackupHandle *pbs,
+ uint8_t dev_id,
+ const uint8_t *data,
+ uint64_t offset,
+ uint64_t size,
+ Error **errp);
+
+
+#endif /* PROXMOX_BACKUP_CLIENT_H */
diff --git a/pve-backup.c b/pve-backup.c
new file mode 100644
index 0000000000..903afcd7e9
--- /dev/null
+++ b/pve-backup.c
@@ -0,0 +1,1103 @@
+#include "proxmox-backup-client.h"
+#include "vma.h"
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/blockdev.h"
+#include "block/block_int-global-state.h"
+#include "block/blockjob.h"
+#include "block/dirty-bitmap.h"
+#include "block/graph-lock.h"
+#include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/cutils.h"
+
+#if defined(CONFIG_MALLOC_TRIM)
+#include <malloc.h>
+#endif
+
+#include <proxmox-backup-qemu.h>
+
+/* PVE backup state and related function */
+
+/*
+ * Note: A resume from a qemu_coroutine_yield can happen in a different thread,
+ * so you may not use normal mutexes within coroutines:
+ *
+ * ---bad-example---
+ * qemu_rec_mutex_lock(lock)
+ * ...
+ * qemu_coroutine_yield() // wait for something
+ * // we are now inside a different thread
+ * qemu_rec_mutex_unlock(lock) // Crash - wrong thread!!
+ * ---end-bad-example--
+ *
+ * ==> Always use CoMutext inside coroutines.
+ * ==> Never acquire/release AioContext withing coroutines (because that use QemuRecMutex)
+ *
+ */
+
+const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+
+static struct PVEBackupState {
+ struct {
+ // Everything accessed from qmp_backup_query command is protected using
+ // this lock. Do NOT hold this lock for long times, as it is sometimes
+ // acquired from coroutines, and thus any wait time may block the guest.
+ QemuMutex lock;
+ Error *error;
+ time_t start_time;
+ time_t end_time;
+ char *backup_file;
+ uuid_t uuid;
+ char uuid_str[37];
+ size_t total;
+ size_t dirty;
+ size_t transferred;
+ size_t reused;
+ size_t zero_bytes;
+ GList *bitmap_list;
+ bool finishing;
+ bool starting;
+ } stat;
+ int64_t speed;
+ BackupPerf perf;
+ VmaWriter *vmaw;
+ ProxmoxBackupHandle *pbs;
+ GList *di_list;
+ JobTxn *txn;
+ CoMutex backup_mutex;
+ CoMutex dump_callback_mutex;
+} backup_state;
+
+static void pvebackup_init(void)
+{
+ qemu_mutex_init(&backup_state.stat.lock);
+ qemu_co_mutex_init(&backup_state.backup_mutex);
+ qemu_co_mutex_init(&backup_state.dump_callback_mutex);
+}
+
+// initialize PVEBackupState at startup
+opts_init(pvebackup_init);
+
+typedef struct PVEBackupDevInfo {
+ BlockDriverState *bs;
+ size_t size;
+ uint64_t block_size;
+ uint8_t dev_id;
+ int completed_ret; // INT_MAX if not completed
+ char targetfile[PATH_MAX];
+ BdrvDirtyBitmap *bitmap;
+ BlockDriverState *target;
+ BlockJob *job;
+} PVEBackupDevInfo;
+
+static void pvebackup_propagate_error(Error *err)
+{
+ qemu_mutex_lock(&backup_state.stat.lock);
+ error_propagate(&backup_state.stat.error, err);
+ qemu_mutex_unlock(&backup_state.stat.lock);
+}
+
+static bool pvebackup_error_or_canceled(void)
+{
+ qemu_mutex_lock(&backup_state.stat.lock);
+ bool error_or_canceled = !!backup_state.stat.error;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return error_or_canceled;
+}
+
+static void pvebackup_add_transferred_bytes(size_t transferred, size_t zero_bytes, size_t reused)
+{
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.zero_bytes += zero_bytes;
+ backup_state.stat.transferred += transferred;
+ backup_state.stat.reused += reused;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+}
+
+// This may get called from multiple coroutines in multiple io-threads
+// Note1: this may get called after job_cancel()
+static int coroutine_fn
+pvebackup_co_dump_pbs_cb(
+ void *opaque,
+ uint64_t start,
+ uint64_t bytes,
+ const void *pbuf)
+{
+ assert(qemu_in_coroutine());
+
+ const uint64_t size = bytes;
+ const unsigned char *buf = pbuf;
+ PVEBackupDevInfo *di = opaque;
+
+ assert(backup_state.pbs);
+ assert(buf);
+
+ Error *local_err = NULL;
+ int pbs_res = -1;
+
+ bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
+
+ qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
+
+ // avoid deadlock if job is cancelled
+ if (pvebackup_error_or_canceled()) {
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return -1;
+ }
+
+ uint64_t transferred = 0;
+ uint64_t reused = 0;
+ while (transferred < size) {
+ uint64_t left = size - transferred;
+ uint64_t to_transfer = left < di->block_size ? left : di->block_size;
+
+ pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
+ is_zero_block ? NULL : buf + transferred, start + transferred,
+ to_transfer, &local_err);
+ transferred += to_transfer;
+
+ if (pbs_res < 0) {
+ pvebackup_propagate_error(local_err);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return pbs_res;
+ }
+
+ reused += pbs_res == 0 ? to_transfer : 0;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ pvebackup_add_transferred_bytes(size, is_zero_block ? size : 0, reused);
+
+ return size;
+}
+
+// This may get called from multiple coroutines in multiple io-threads
+static int coroutine_fn
+pvebackup_co_dump_vma_cb(
+ void *opaque,
+ uint64_t start,
+ uint64_t bytes,
+ const void *pbuf)
+{
+ assert(qemu_in_coroutine());
+
+ const uint64_t size = bytes;
+ const unsigned char *buf = pbuf;
+ PVEBackupDevInfo *di = opaque;
+
+ int ret = -1;
+
+ assert(backup_state.vmaw);
+ assert(buf);
+
+ uint64_t remaining = size;
+
+ uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
+ if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
+ Error *local_err = NULL;
+ error_setg(&local_err,
+ "got unaligned write inside backup dump "
+ "callback (sector %ld)", start);
+ pvebackup_propagate_error(local_err);
+ return -1; // not aligned to cluster size
+ }
+
+ while (remaining > 0) {
+ qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
+ // avoid deadlock if job is cancelled
+ if (pvebackup_error_or_canceled()) {
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return -1;
+ }
+
+ size_t zero_bytes = 0;
+ ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num, buf, &zero_bytes);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+
+ ++cluster_num;
+ buf += VMA_CLUSTER_SIZE;
+ if (ret < 0) {
+ Error *local_err = NULL;
+ vma_writer_error_propagate(backup_state.vmaw, &local_err);
+ pvebackup_propagate_error(local_err);
+ return ret;
+ } else {
+ if (remaining >= VMA_CLUSTER_SIZE) {
+ assert(ret == VMA_CLUSTER_SIZE);
+ pvebackup_add_transferred_bytes(VMA_CLUSTER_SIZE, zero_bytes, 0);
+ remaining -= VMA_CLUSTER_SIZE;
+ } else {
+ assert(ret == remaining);
+ pvebackup_add_transferred_bytes(remaining, zero_bytes, 0);
+ remaining = 0;
+ }
+ }
+ }
+
+ return size;
+}
+
+// assumes the caller holds backup_mutex
+static void coroutine_fn pvebackup_co_cleanup(void)
+{
+ assert(qemu_in_coroutine());
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.finishing = true;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ if (backup_state.vmaw) {
+ Error *local_err = NULL;
+ vma_writer_close(backup_state.vmaw, &local_err);
+
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+
+ backup_state.vmaw = NULL;
+ }
+
+ if (backup_state.pbs) {
+ if (!pvebackup_error_or_canceled()) {
+ Error *local_err = NULL;
+ proxmox_backup_co_finish(backup_state.pbs, &local_err);
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+ }
+
+ proxmox_backup_disconnect(backup_state.pbs);
+ backup_state.pbs = NULL;
+ }
+
+ g_list_free(backup_state.di_list);
+ backup_state.di_list = NULL;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+#if defined(CONFIG_MALLOC_TRIM)
+ /*
+ * Try to reclaim memory for buffers (and, in case of PBS, Rust futures), etc.
+ * Won't happen by default if there is fragmentation.
+ */
+ malloc_trim(4 * 1024 * 1024);
+#endif
+}
+
+static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
+{
+ PVEBackupDevInfo *di = opaque;
+ int ret = di->completed_ret;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ bool starting = backup_state.stat.starting;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ if (starting) {
+ /* in 'starting' state, no tasks have been run yet, meaning we can (and
+ * must) skip all cleanup, as we don't know what has and hasn't been
+ * initialized yet. */
+ return;
+ }
+
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+ /*
+ * All jobs in the transaction will be canceled when one receives an error.
+ * The first error wins, so only set it for ECANCELED if it was the last
+ * job. This allows more interesting errors from other jobs to win.
+ */
+ if (ret < 0 && (ret != -ECANCELED || !g_list_nth(backup_state.di_list, 1))) {
+ Error *local_err = NULL;
+ error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
+ pvebackup_propagate_error(local_err);
+ }
+
+ di->bs = NULL;
+
+ assert(di->target == NULL);
+
+ bool error_or_canceled = pvebackup_error_or_canceled();
+
+ if (backup_state.vmaw) {
+ vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ }
+
+ if (backup_state.pbs && !error_or_canceled) {
+ Error *local_err = NULL;
+ proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+ }
+
+ // remove self from job list
+ backup_state.di_list = g_list_remove(backup_state.di_list, di);
+
+ g_free(di);
+
+ /* call cleanup if we're the last job */
+ if (!g_list_first(backup_state.di_list)) {
+ pvebackup_co_cleanup();
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+}
+
+static void pvebackup_complete_cb(void *opaque, int ret)
+{
+ PVEBackupDevInfo *di = opaque;
+ di->completed_ret = ret;
+
+ /*
+ * Needs to happen outside of coroutine, because it takes the graph write lock.
+ */
+ if (di->job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_unref_locked(&di->job->job);
+ di->job = NULL;
+ }
+ }
+
+ /*
+ * Schedule stream cleanup in async coroutine. close_image and finish might
+ * take a while, so we can't block on them here. This way it also doesn't
+ * matter if we're already running in a coroutine or not.
+ * Note: di is a pointer to an entry in the global backup_state struct, so
+ * it stays valid.
+ */
+ Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
+ aio_co_enter(qemu_get_aio_context(), co);
+}
+
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half. Assumes that caller holds
+ * backup_mutex.
+ */
+static void job_cancel_bh(void *opaque) {
+ CoCtxData *data = (CoCtxData*)opaque;
+
+ /*
+ * Be careful to pick a valid job to cancel:
+ * 1. job_cancel_sync() does not expect the job to be finalized already.
+ * 2. job_exit() might run between scheduling and running job_cancel_bh()
+ * and pvebackup_co_complete_stream() might not have removed the job from
+ * the list yet (in fact, cannot, because it waits for the backup_mutex).
+ * Requiring !job_is_completed() ensures that no finalized job is picked.
+ */
+ GList *bdi = g_list_first(backup_state.di_list);
+ while (bdi) {
+ if (bdi->data) {
+ BlockJob *bj = ((PVEBackupDevInfo *)bdi->data)->job;
+ if (bj) {
+ Job *job = &bj->job;
+ WITH_JOB_LOCK_GUARD() {
+ if (!job_is_completed_locked(job)) {
+ job_cancel_sync_locked(job, true);
+ /*
+ * It's enough to cancel one job in the transaction, the
+ * rest will follow automatically.
+ */
+ break;
+ }
+ }
+ }
+ }
+ bdi = g_list_next(bdi);
+ }
+
+ aio_co_enter(data->ctx, data->co);
+}
+
+void coroutine_fn qmp_backup_cancel(Error **errp)
+{
+ Error *cancel_err = NULL;
+ error_setg(&cancel_err, "backup canceled");
+ pvebackup_propagate_error(cancel_err);
+
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+ if (backup_state.vmaw) {
+ /* make sure vma writer does not block anymore */
+ vma_writer_set_error(backup_state.vmaw, "backup canceled");
+ }
+
+ if (backup_state.pbs) {
+ proxmox_backup_abort(backup_state.pbs, "backup canceled");
+ }
+
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+}
+
+// assumes the caller holds backup_mutex
+static int coroutine_fn pvebackup_co_add_config(
+ const char *file,
+ const char *name,
+ BackupFormat format,
+ VmaWriter *vmaw,
+ ProxmoxBackupHandle *pbs,
+ Error **errp)
+{
+ int res = 0;
+
+ char *cdata = NULL;
+ gsize clen = 0;
+ GError *err = NULL;
+ if (!g_file_get_contents(file, &cdata, &clen, &err)) {
+ error_setg(errp, "unable to read file '%s'", file);
+ return 1;
+ }
+
+ char *basename = g_path_get_basename(file);
+ if (name == NULL) name = basename;
+
+ if (format == BACKUP_FORMAT_VMA) {
+ if (vma_writer_add_config(vmaw, name, cdata, clen) != 0) {
+ error_setg(errp, "unable to add %s config data to vma archive", file);
+ goto err;
+ }
+ } else if (format == BACKUP_FORMAT_PBS) {
+ if (proxmox_backup_co_add_config(pbs, name, (unsigned char *)cdata, clen, errp) < 0)
+ goto err;
+ }
+
+ out:
+ g_free(basename);
+ g_free(cdata);
+ return res;
+
+ err:
+ res = -1;
+ goto out;
+}
+
+/*
+ * backup_job_create can *not* be run from a coroutine (and requires an
+ * acquired AioContext), so this can't either.
+ * The caller is responsible that backup_mutex is held nonetheless.
+ */
+static void create_backup_jobs_bh(void *opaque) {
+
+ assert(!qemu_in_coroutine());
+
+ CoCtxData *data = (CoCtxData*)opaque;
+ Error **errp = (Error**)data->data;
+
+ Error *local_err = NULL;
+
+ /* create job transaction to synchronize bitmap commit and cancel all
+ * jobs in case one errors */
+ if (backup_state.txn) {
+ job_txn_unref(backup_state.txn);
+ }
+ backup_state.txn = job_txn_new_seq();
+
+ /* create and start all jobs (paused state) */
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ assert(di->target != NULL);
+
+ MirrorSyncMode sync_mode = MIRROR_SYNC_MODE_FULL;
+ BitmapSyncMode bitmap_mode = BITMAP_SYNC_MODE_NEVER;
+ if (di->bitmap) {
+ sync_mode = MIRROR_SYNC_MODE_BITMAP;
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
+ AioContext *aio_context = bdrv_get_aio_context(di->bs);
+ aio_context_acquire(aio_context);
+
+ bdrv_drained_begin(di->bs);
+
+ BlockJob *job = backup_job_create(
+ NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
+ BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
+ &local_err);
+
+ bdrv_drained_end(di->bs);
+
+ aio_context_release(aio_context);
+
+ di->job = job;
+ if (job) {
+ WITH_JOB_LOCK_GUARD() {
+ job_ref_locked(&job->job);
+ }
+ }
+
+ if (!job || local_err) {
+ error_setg(errp, "backup_job_create failed: %s",
+ local_err ? error_get_pretty(local_err) : "null");
+ break;
+ }
+
+ bdrv_unref(di->target);
+ di->target = NULL;
+ }
+
+ if (*errp) {
+ /*
+ * It's enough to cancel one job in the transaction, the rest will
+ * follow automatically.
+ */
+ bool canceled = false;
+ l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->target) {
+ bdrv_unref(di->target);
+ di->target = NULL;
+ }
+
+ if (di->job) {
+ WITH_JOB_LOCK_GUARD() {
+ if (!canceled) {
+ job_cancel_sync_locked(&di->job->job, true);
+ canceled = true;
+ }
+ job_unref_locked(&di->job->job);
+ di->job = NULL;
+ }
+ }
+ }
+ }
+
+ /* return */
+ aio_co_enter(data->ctx, data->co);
+}
+
+/*
+ * Returns a list of device infos, which needs to be freed by the caller. In
+ * case of an error, errp will be set, but the returned value might still be a
+ * list.
+ */
+static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
+ const char *devlist,
+ Error **errp)
+{
+ gchar **devs = NULL;
+ GList *di_list = NULL;
+
+ if (devlist) {
+ devs = g_strsplit_set(devlist, ",;:", -1);
+
+ gchar **d = devs;
+ while (d && *d) {
+ BlockBackend *blk = blk_by_name(*d);
+ if (!blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", *d);
+ goto err;
+ }
+ BlockDriverState *bs = blk_bs(blk);
+ if (!bdrv_co_is_inserted(bs)) {
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+ goto err;
+ }
+ PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
+ di->bs = bs;
+ di_list = g_list_append(di_list, di);
+ d++;
+ }
+ } else {
+ BdrvNextIterator it;
+
+ for (BlockDriverState *bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+ if (!bdrv_co_is_inserted(bs) || bdrv_is_read_only(bs)) {
+ continue;
+ }
+
+ PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
+ di->bs = bs;
+ di_list = g_list_append(di_list, di);
+ }
+ }
+
+ if (!di_list) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+ goto err;
+ }
+
+err:
+ if (devs) {
+ g_strfreev(devs);
+ }
+
+ return di_list;
+}
+
+UuidInfo coroutine_fn *qmp_backup(
+ const char *backup_file,
+ const char *password,
+ const char *keyfile,
+ const char *key_password,
+ const char *master_keyfile,
+ const char *fingerprint,
+ const char *backup_ns,
+ const char *backup_id,
+ bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
+ bool has_format, BackupFormat format,
+ const char *config_file,
+ const char *firewall_file,
+ const char *devlist,
+ bool has_speed, int64_t speed,
+ bool has_max_workers, int64_t max_workers,
+ Error **errp)
+{
+ assert(qemu_in_coroutine());
+
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
+ Error *local_err = NULL;
+ uuid_t uuid;
+ VmaWriter *vmaw = NULL;
+ ProxmoxBackupHandle *pbs = NULL;
+ GList *di_list = NULL;
+ GList *l;
+ UuidInfo *uuid_info;
+
+ const char *config_name = "qemu-server.conf";
+ const char *firewall_name = "qemu-server.fw";
+
+ if (backup_state.di_list) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "previous backup not finished");
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return NULL;
+ }
+
+ /* Todo: try to auto-detect format based on file name */
+ format = has_format ? format : BACKUP_FORMAT_VMA;
+
+ bdrv_graph_co_rdlock();
+ di_list = get_device_info(devlist, &local_err);
+ bdrv_graph_co_rdunlock();
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto err;
+ }
+ assert(di_list);
+
+ size_t total = 0;
+
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ bdrv_graph_co_rdlock();
+ bool blocked = bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp);
+ bdrv_graph_co_rdunlock();
+ if (blocked) {
+ goto err;
+ }
+
+ ssize_t size = bdrv_getlength(di->bs);
+ if (size < 0) {
+ error_setg_errno(errp, -size, "bdrv_getlength failed");
+ goto err;
+ }
+ di->size = size;
+ total += size;
+
+ di->completed_ret = INT_MAX;
+ }
+
+ uuid_generate(uuid);
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.reused = 0;
+
+ /* clear previous backup's bitmap_list */
+ if (backup_state.stat.bitmap_list) {
+ GList *bl = backup_state.stat.bitmap_list;
+ while (bl) {
+ g_free(((PBSBitmapInfo *)bl->data)->drive);
+ g_free(bl->data);
+ bl = g_list_next(bl);
+ }
+ g_list_free(backup_state.stat.bitmap_list);
+ backup_state.stat.bitmap_list = NULL;
+ }
+
+ if (format == BACKUP_FORMAT_PBS) {
+ if (!password) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
+ goto err_mutex;
+ }
+ if (!backup_id) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
+ goto err_mutex;
+ }
+ if (!has_backup_time) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
+ goto err_mutex;
+ }
+
+ int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
+ firewall_name = "fw.conf";
+
+ char *pbs_err = NULL;
+ pbs = proxmox_backup_new_ns(
+ backup_file,
+ backup_ns,
+ backup_id,
+ backup_time,
+ dump_cb_block_size,
+ password,
+ keyfile,
+ key_password,
+ master_keyfile,
+ has_compress ? compress : true,
+ has_encrypt ? encrypt : !!keyfile,
+ fingerprint,
+ &pbs_err);
+
+ if (!pbs) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "proxmox_backup_new failed: %s", pbs_err);
+ proxmox_backup_free_error(pbs_err);
+ goto err_mutex;
+ }
+
+ int connect_result = proxmox_backup_co_connect(pbs, errp);
+ if (connect_result < 0)
+ goto err_mutex;
+
+ /* register all devices */
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ di->block_size = dump_cb_block_size;
+
+ bdrv_graph_co_rdlock();
+ const char *devname = bdrv_get_device_name(di->bs);
+ bdrv_graph_co_rdunlock();
+ PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
+ size_t dirty = di->size;
+
+ BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
+ bool expect_only_dirty = false;
+
+ if (has_use_dirty_bitmap && use_dirty_bitmap) {
+ if (bitmap == NULL) {
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, errp);
+ if (!bitmap) {
+ goto err_mutex;
+ }
+ action = PBS_BITMAP_ACTION_NEW;
+ } else {
+ expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
+ }
+
+ if (expect_only_dirty) {
+ /* track clean chunks as reused */
+ dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
+ backup_state.stat.reused += di->size - dirty;
+ action = PBS_BITMAP_ACTION_USED;
+ } else {
+ /* mark entire bitmap as dirty to make full backup */
+ bdrv_set_dirty_bitmap(bitmap, 0, di->size);
+ if (action != PBS_BITMAP_ACTION_NEW) {
+ action = PBS_BITMAP_ACTION_INVALID;
+ }
+ }
+ di->bitmap = bitmap;
+ } else {
+ /* after a full backup the old dirty bitmap is invalid anyway */
+ if (bitmap != NULL) {
+ bdrv_release_dirty_bitmap(bitmap);
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
+ }
+ }
+
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp);
+ if (dev_id < 0) {
+ goto err_mutex;
+ }
+
+ if (!(di->target = bdrv_co_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, errp))) {
+ goto err_mutex;
+ }
+
+ di->dev_id = dev_id;
+
+ PBSBitmapInfo *info = g_malloc(sizeof(*info));
+ info->drive = g_strdup(devname);
+ info->action = action;
+ info->size = di->size;
+ info->dirty = dirty;
+ backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
+ }
+ } else if (format == BACKUP_FORMAT_VMA) {
+ vmaw = vma_writer_create(backup_file, uuid, &local_err);
+ if (!vmaw) {
+ if (local_err) {
+ error_propagate(errp, local_err);
+ }
+ goto err_mutex;
+ }
+
+ /* register all devices for vma writer */
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (!(di->target = bdrv_co_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, errp))) {
+ goto err_mutex;
+ }
+
+ bdrv_graph_co_rdlock();
+ const char *devname = bdrv_get_device_name(di->bs);
+ bdrv_graph_co_rdunlock();
+ di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
+ if (di->dev_id <= 0) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "register_stream failed");
+ goto err_mutex;
+ }
+ }
+ } else {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
+ goto err_mutex;
+ }
+
+ /* add configuration file to archive */
+ if (config_file) {
+ if (pvebackup_co_add_config(config_file, config_name, format, vmaw, pbs, errp) != 0) {
+ goto err_mutex;
+ }
+ }
+
+ /* add firewall file to archive */
+ if (firewall_file) {
+ if (pvebackup_co_add_config(firewall_file, firewall_name, format, vmaw, pbs, errp) != 0) {
+ goto err_mutex;
+ }
+ }
+ /* initialize global backup_state now */
+ /* note: 'reused' and 'bitmap_list' are initialized earlier */
+
+ if (backup_state.stat.error) {
+ error_free(backup_state.stat.error);
+ backup_state.stat.error = NULL;
+ }
+
+ backup_state.stat.start_time = time(NULL);
+ backup_state.stat.end_time = 0;
+
+ if (backup_state.stat.backup_file) {
+ g_free(backup_state.stat.backup_file);
+ }
+ backup_state.stat.backup_file = g_strdup(backup_file);
+
+ uuid_copy(backup_state.stat.uuid, uuid);
+ uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
+ char *uuid_str = g_strdup(backup_state.stat.uuid_str);
+
+ backup_state.stat.total = total;
+ backup_state.stat.dirty = total - backup_state.stat.reused;
+ backup_state.stat.transferred = 0;
+ backup_state.stat.zero_bytes = 0;
+ backup_state.stat.finishing = false;
+ backup_state.stat.starting = true;
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ backup_state.speed = (has_speed && speed > 0) ? speed : 0;
+
+ backup_state.perf = (BackupPerf){ .max_workers = 16 };
+ if (has_max_workers) {
+ backup_state.perf.max_workers = max_workers;
+ }
+
+ backup_state.vmaw = vmaw;
+ backup_state.pbs = pbs;
+
+ backup_state.di_list = di_list;
+
+ uuid_info = g_malloc0(sizeof(*uuid_info));
+ uuid_info->UUID = uuid_str;
+
+ /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
+ * backup_mutex locked. This is fine, a CoMutex can be held across yield
+ * points, and we'll release it as soon as the BH reschedules us.
+ */
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ .data = &local_err,
+ };
+ aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
+ qemu_coroutine_yield();
+
+ if (local_err) {
+ error_propagate(errp, local_err);
+ goto err;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.starting = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ /* start the first job in the transaction */
+ job_txn_start_seq(backup_state.txn);
+
+ return uuid_info;
+
+err_mutex:
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+err:
+
+ l = di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->target) {
+ bdrv_co_unref(di->target);
+ }
+
+ if (di->targetfile[0]) {
+ unlink(di->targetfile);
+ }
+ g_free(di);
+ }
+ g_list_free(di_list);
+ backup_state.di_list = NULL;
+
+ if (vmaw) {
+ Error *err = NULL;
+ vma_writer_close(vmaw, &err);
+ unlink(backup_file);
+ }
+
+ if (pbs) {
+ proxmox_backup_disconnect(pbs);
+ backup_state.pbs = NULL;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ return NULL;
+}
+
+BackupStatus *qmp_query_backup(Error **errp)
+{
+ BackupStatus *info = g_malloc0(sizeof(*info));
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+
+ if (!backup_state.stat.start_time) {
+ /* not started, return {} */
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ return info;
+ }
+
+ info->has_start_time = true;
+ info->start_time = backup_state.stat.start_time;
+
+ if (backup_state.stat.backup_file) {
+ info->backup_file = g_strdup(backup_state.stat.backup_file);
+ }
+
+ info->uuid = g_strdup(backup_state.stat.uuid_str);
+
+ if (backup_state.stat.end_time) {
+ if (backup_state.stat.error) {
+ info->status = g_strdup("error");
+ info->errmsg = g_strdup(error_get_pretty(backup_state.stat.error));
+ } else {
+ info->status = g_strdup("done");
+ }
+ info->has_end_time = true;
+ info->end_time = backup_state.stat.end_time;
+ } else {
+ info->status = g_strdup("active");
+ }
+
+ info->has_total = true;
+ info->total = backup_state.stat.total;
+ info->has_dirty = true;
+ info->dirty = backup_state.stat.dirty;
+ info->has_zero_bytes = true;
+ info->zero_bytes = backup_state.stat.zero_bytes;
+ info->has_transferred = true;
+ info->transferred = backup_state.stat.transferred;
+ info->has_reused = true;
+ info->reused = backup_state.stat.reused;
+ info->finishing = backup_state.stat.finishing;
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return info;
+}
+
+PBSBitmapInfoList *qmp_query_pbs_bitmap_info(Error **errp)
+{
+ PBSBitmapInfoList *head = NULL, **p_next = &head;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+
+ GList *l = backup_state.stat.bitmap_list;
+ while (l) {
+ PBSBitmapInfo *info = (PBSBitmapInfo *)l->data;
+ l = g_list_next(l);
+
+ /* clone bitmap info to avoid auto free after QMP marshalling */
+ PBSBitmapInfo *info_ret = g_malloc0(sizeof(*info_ret));
+ info_ret->drive = g_strdup(info->drive);
+ info_ret->action = info->action;
+ info_ret->size = info->size;
+ info_ret->dirty = info->dirty;
+
+ PBSBitmapInfoList *info_list = g_malloc0(sizeof(*info_list));
+ info_list->value = info_ret;
+
+ *p_next = info_list;
+ p_next = &info_list->next;
+ }
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return head;
+}
+
+ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
+{
+ ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
+ ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
+ ret->pbs_dirty_bitmap = true;
+ ret->pbs_dirty_bitmap_savevm = true;
+ ret->query_bitmap_info = true;
+ ret->pbs_masterkey = true;
+ ret->backup_max_workers = true;
+ return ret;
+}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 299e3fc350..c155d74230 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -841,6 +841,235 @@
{ 'command': 'query-block', 'returns': ['BlockInfo'],
'allow-preconfig': true }
+##
+# @BackupStatus:
+#
+# Detailed backup status.
+#
+# @status: string describing the current backup status.
+# This can be 'active', 'done', 'error'. If this field is not
+# returned, no backup process has been initiated
+#
+# @errmsg: error message (only returned if status is 'error')
+#
+# @total: total amount of bytes involved in the backup process
+#
+# @dirty: with incremental mode (PBS) this is the amount of bytes involved
+# in the backup process which are marked dirty.
+#
+# @transferred: amount of bytes already backed up.
+#
+# @reused: amount of bytes reused due to deduplication.
+#
+# @zero-bytes: amount of 'zero' bytes detected.
+#
+# @start-time: time (epoch) when backup job started.
+#
+# @end-time: time (epoch) when backup job finished.
+#
+# @backup-file: backup file name
+#
+# @uuid: uuid for this backup job
+#
+# @finishing: if status='active' and finishing=true, then the backup process is
+# waiting for the target to finish.
+#
+##
+{ 'struct': 'BackupStatus',
+ 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
+ '*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
+ '*start-time': 'int', '*end-time': 'int',
+ '*backup-file': 'str', '*uuid': 'str', 'finishing': 'bool' } }
+
+##
+# @BackupFormat:
+#
+# An enumeration of supported backup formats.
+#
+# @vma: Proxmox vma backup format
+#
+# @pbs: Proxmox backup server format
+#
+##
+{ 'enum': 'BackupFormat',
+ 'data': [ 'vma', 'pbs' ] }
+
+##
+# @backup:
+#
+# Starts a VM backup.
+#
+# @backup-file: the backup file name
+#
+# @format: format of the backup file
+#
+# @config-file: a configuration file to include into
+# the backup archive.
+#
+# @speed: the maximum speed, in bytes per second
+#
+# @devlist: list of block device names (separated by ',', ';'
+# or ':'). By default the backup includes all writable block devices.
+#
+# @password: backup server passsword (required for format 'pbs')
+#
+# @keyfile: keyfile used for encryption (optional for format 'pbs')
+#
+# @key-password: password for keyfile (optional for format 'pbs')
+#
+# @master-keyfile: PEM-formatted master public keyfile (optional for format 'pbs')
+#
+# @fingerprint: server cert fingerprint (optional for format 'pbs')
+#
+# @backup-ns: backup namespace (required for format 'pbs')
+#
+# @backup-id: backup ID (required for format 'pbs')
+#
+# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
+#
+# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
+#
+# @compress: use compression (optional for format 'pbs', defaults to true)
+#
+# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
+#
+# @max-workers: see @BackupPerf for details. Default 16.
+#
+# Returns: the uuid of the backup job
+#
+##
+{ 'command': 'backup', 'data': { 'backup-file': 'str',
+ '*password': 'str',
+ '*keyfile': 'str',
+ '*key-password': 'str',
+ '*master-keyfile': 'str',
+ '*fingerprint': 'str',
+ '*backup-ns': 'str',
+ '*backup-id': 'str',
+ '*backup-time': 'int',
+ '*use-dirty-bitmap': 'bool',
+ '*compress': 'bool',
+ '*encrypt': 'bool',
+ '*format': 'BackupFormat',
+ '*config-file': 'str',
+ '*firewall-file': 'str',
+ '*devlist': 'str',
+ '*speed': 'int',
+ '*max-workers': 'int' },
+ 'returns': 'UuidInfo', 'coroutine': true }
+
+##
+# @query-backup:
+#
+# Returns information about current/last backup task.
+#
+# Returns: @BackupStatus
+#
+##
+{ 'command': 'query-backup', 'returns': 'BackupStatus' }
+
+##
+# @backup-cancel:
+#
+# Cancel the current executing backup process.
+#
+# Returns: nothing on success
+#
+# Notes: This command succeeds even if there is no backup process running.
+#
+##
+{ 'command': 'backup-cancel', 'coroutine': true }
+
+##
+# @ProxmoxSupportStatus:
+#
+# Contains info about supported features added by Proxmox.
+#
+# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
+# supported.
+#
+# @query-bitmap-info: True if the 'query-pbs-bitmap-info' QMP call is supported.
+#
+# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
+# safely be set for savevm-async.
+#
+# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
+# parameter.
+#
+# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
+#
+##
+{ 'struct': 'ProxmoxSupportStatus',
+ 'data': { 'pbs-dirty-bitmap': 'bool',
+ 'query-bitmap-info': 'bool',
+ 'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-masterkey': 'bool',
+ 'pbs-library-version': 'str',
+ 'backup-max-workers': 'bool' } }
+
+##
+# @query-proxmox-support:
+#
+# Returns information about supported features added by Proxmox.
+#
+# Returns: @ProxmoxSupportStatus
+#
+##
+{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+
+##
+# @PBSBitmapAction:
+#
+# An action taken on a dirty-bitmap when a backup job was started.
+#
+# @not-used: Bitmap mode was not enabled.
+#
+# @not-used-removed: Bitmap mode was not enabled, but a bitmap from a
+# previous backup still existed and was removed.
+#
+# @new: A new bitmap was attached to the drive for this backup.
+#
+# @used: An existing bitmap will be used to only backup changed data.
+#
+# @invalid: A bitmap existed, but had to be cleared since it's associated
+# base snapshot did not match the base given for the current job or
+# the crypt mode has changed.
+#
+##
+{ 'enum': 'PBSBitmapAction',
+ 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid'] }
+
+##
+# @PBSBitmapInfo:
+#
+# Contains information about dirty bitmaps used for each drive in a PBS backup.
+#
+# @drive: The underlying drive.
+#
+# @action: The action that was taken when the backup started.
+#
+# @size: The total size of the drive.
+#
+# @dirty: How much of the drive is considered dirty and will be backed up,
+# or 'size' if everything will be.
+#
+##
+{ 'struct': 'PBSBitmapInfo',
+ 'data': { 'drive': 'str', 'action': 'PBSBitmapAction', 'size': 'int',
+ 'dirty': 'int' } }
+
+##
+# @query-pbs-bitmap-info:
+#
+# Returns information about dirty bitmaps used on the most recently started
+# backup. Returns nothing when the last backup was not using PBS or if no
+# backup occured in this session.
+#
+# Returns: @PBSBitmapInfo
+#
+##
+{ 'command': 'query-pbs-bitmap-info', 'returns': ['PBSBitmapInfo'] }
+
##
# @BlockDeviceTimedStats:
#
diff --git a/qapi/common.json b/qapi/common.json
index 6fed9cde1a..630a2a8f9a 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -207,3 +207,17 @@
##
{ 'struct': 'HumanReadableText',
'data': { 'human-readable-text': 'str' } }
+
+##
+# @UuidInfo:
+#
+# Guest UUID information (Universally Unique Identifier).
+#
+# @UUID: the UUID of the guest
+#
+# Since: 0.14.0
+#
+# Notes: If no UUID was specified for the guest, a null UUID is
+# returned.
+##
+{ 'struct': 'UuidInfo', 'data': {'UUID': 'str'} }
diff --git a/qapi/machine.json b/qapi/machine.json
index a9fd40d844..d97f024173 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -4,6 +4,8 @@
# This work is licensed under the terms of the GNU GPL, version 2 or later.
# See the COPYING file in the top-level directory.
+{ 'include': 'common.json' }
+
##
# = Machines
##
@@ -237,20 +239,6 @@
##
{ 'command': 'query-target', 'returns': 'TargetInfo' }
-##
-# @UuidInfo:
-#
-# Guest UUID information (Universally Unique Identifier).
-#
-# @UUID: the UUID of the guest
-#
-# Since: 0.14
-#
-# Notes: If no UUID was specified for the guest, a null UUID is
-# returned.
-##
-{ 'struct': 'UuidInfo', 'data': {'UUID': 'str'} }
-
##
# @query-uuid:
#