2021-09-01 19:10:12 +03:00
|
|
|
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
|
|
|
|
From: Stefan Reiter <s.reiter@proxmox.com>
|
|
|
|
Date: Mon, 23 Aug 2021 11:28:32 +0200
|
|
|
|
Subject: [PATCH] monitor/qmp: fix race with clients disconnecting early
|
|
|
|
|
|
|
|
The following sequence can produce a race condition that results in
|
|
|
|
responses meant for different clients being sent to the wrong one:
|
|
|
|
|
|
|
|
(QMP, no OOB)
|
|
|
|
1) client A connects
|
|
|
|
2) client A sends 'qmp_capabilities'
|
|
|
|
3) 'qmp_dispatch' runs in coroutine, schedules out to
|
|
|
|
'do_qmp_dispatch_bh' and yields
|
|
|
|
4) client A disconnects (i.e. aborts, crashes, etc...)
|
|
|
|
5) client B connects
|
|
|
|
6) 'do_qmp_dispatch_bh' runs 'qmp_capabilities' and wakes calling coroutine
|
|
|
|
7) capabilities are now set and 'mon->commands' is set to '&qmp_commands'
|
|
|
|
8) 'qmp_dispatch' returns to 'monitor_qmp_dispatch'
|
|
|
|
9) success message is sent to client B *without it ever having sent
|
|
|
|
'qmp_capabilities' itself*
|
|
|
|
9a) even if client B ignores it, it will now presumably send it's own
|
|
|
|
greeting, which will error because caps are already set
|
|
|
|
|
|
|
|
The fix proposed here uses an atomic, sequential connection number
|
|
|
|
stored in the MonitorQMP struct, which is incremented everytime a new
|
|
|
|
client connects. Since it is not changed on CHR_EVENT_CLOSED, the
|
|
|
|
behaviour of allowing a client to disconnect only one side of the
|
|
|
|
connection is retained.
|
|
|
|
|
|
|
|
The connection_nr needs to be exposed outside of the monitor subsystem,
|
|
|
|
since qmp_dispatch lives in qapi code. It needs to be checked twice,
|
|
|
|
once for actually running the command in the BH (fixes 7), and once for
|
|
|
|
sending back a response (fixes 9).
|
|
|
|
|
|
|
|
This satisfies my local reproducer - using multiple clients constantly
|
|
|
|
looping to open a connection, send the greeting, then exiting no longer
|
|
|
|
crashes other, normally behaving clients with unrelated responses.
|
|
|
|
|
|
|
|
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
|
2022-01-13 12:34:33 +03:00
|
|
|
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
|
2021-09-01 19:10:12 +03:00
|
|
|
---
|
|
|
|
include/monitor/monitor.h | 1 +
|
|
|
|
monitor/monitor-internal.h | 7 +++++++
|
|
|
|
monitor/monitor.c | 15 +++++++++++++++
|
|
|
|
monitor/qmp.c | 15 ++++++++++++++-
|
|
|
|
qapi/qmp-dispatch.c | 21 +++++++++++++++++----
|
|
|
|
stubs/monitor-core.c | 5 +++++
|
|
|
|
6 files changed, 59 insertions(+), 5 deletions(-)
|
|
|
|
|
|
|
|
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
|
2023-10-17 15:10:09 +03:00
|
|
|
index 965f5d5450..e04bd059b6 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/include/monitor/monitor.h
|
|
|
|
+++ b/include/monitor/monitor.h
|
|
|
|
@@ -16,6 +16,7 @@ extern QemuOptsList qemu_mon_opts;
|
|
|
|
Monitor *monitor_cur(void);
|
|
|
|
Monitor *monitor_set_cur(Coroutine *co, Monitor *mon);
|
|
|
|
bool monitor_cur_is_qmp(void);
|
|
|
|
+int monitor_get_connection_nr(const Monitor *mon);
|
|
|
|
|
|
|
|
void monitor_init_globals(void);
|
|
|
|
void monitor_init_globals_core(void);
|
|
|
|
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
|
2023-10-17 15:10:09 +03:00
|
|
|
index 252de85681..8db28f9272 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/monitor/monitor-internal.h
|
|
|
|
+++ b/monitor/monitor-internal.h
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -151,6 +151,13 @@ typedef struct {
|
2021-09-01 19:10:12 +03:00
|
|
|
QemuMutex qmp_queue_lock;
|
|
|
|
/* Input queue that holds all the parsed QMP requests */
|
|
|
|
GQueue *qmp_requests;
|
|
|
|
+
|
|
|
|
+ /*
|
|
|
|
+ * A sequential number that gets incremented on every new CHR_EVENT_OPENED.
|
|
|
|
+ * Used to avoid leftover responses in BHs from being sent to the wrong
|
|
|
|
+ * client. Access with atomics.
|
|
|
|
+ */
|
|
|
|
+ int connection_nr;
|
|
|
|
} MonitorQMP;
|
|
|
|
|
|
|
|
/**
|
|
|
|
diff --git a/monitor/monitor.c b/monitor/monitor.c
|
update submodule and patches to QEMU 8.2.2
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.
During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.
The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.
Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.
Major changes only affect alloc-track:
* It is not possible to call a generated co-wrapper like
bdrv_get_info() while holding the block graph lock exclusively [0],
which does happen during initialization of alloc-track when the
backing hd is set and the refresh_limits driver callback is invoked.
The bdrv_get_info() call to get the cluster size is moved to
directly after opening the file child in track_open().
The important thing is that at least the request alignment for the
write target is used, because then the RMW cycle in bdrv_pwritev
will gather enough data from the backing file. Partial cluster
allocations in the target are not a fundamental issue, because the
driver returns its allocation status based on the bitmap, so any
other data that maps to the same cluster will still be copied later
by a stream job (or during writes to that cluster).
* Replacing the node cannot be done in the
track_co_change_backing_file() callback, because it is a coroutine
and cannot hold the block graph lock exclusively. So it is moved to
the stream job itself with the auto-remove option not having an
effect anymore (qemu-server would always set it anyways).
In the future, there could either be a special option for the stream
job, or maybe the upcoming blockdev-replace QMP command can be used.
Replacing the backing child is actually already done in the stream
job, so no need to do it in the track_co_change_backing_file()
callback. It also cannot be called from a coroutine. Looking at the
implementation in the qcow2 driver, it doesn't seem to be intended
to change the backing child itself, just update driver-internal
state.
Other changes:
* alloc-track: Error out early when used without auto-remove. Since
replacing the node now happens in the stream job, where the option
cannot be read from (it's internal to the driver), it will always be
treated as 'on'. Makes sure to have users beside qemu-server notice
the change (should they even exist). The option can be fully dropped
in the future while adding a version guard in qemu-server.
* alloc-track: Avoid seemingly superfluous child permission update.
Doesn't seem necessary nowadays (maybe after commit "alloc-track:
fix deadlock during drop" where the dropping is not rescheduled and
delayed anymore or some upstream change). Replacing the block node
will already update the permissions of the new node (which was the
file child before). Should there really be some issue, instead of
having a drop state, this could also be just based off the fact
whether there is still a backing child.
Dumping the cumulative (shared) permissions for the BDS with a debug
print yields the same values after this patch and with QEMU 8.1,
namely 3 and 5.
* PBS block driver: compile unconditionally. Proxmox VE always needs
it and something in the build process changed to make it not enabled
by default. Probably would need to move the build option to meson
otherwise.
* backup: job unreferencing during cleanup needs to happen outside of
coroutine, so it was moved to before invoking the clean
* mirror: Cherry-pick stable fix to avoid potential deadlock.
* savevm-async: migrate_init now can fail, so propagate potential
error.
* savevm-async: compression counters are not accessible outside
migration/ram-compress now, so drop code that prophylactically set
it to zero.
[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-25 18:21:28 +03:00
|
|
|
index 01ede1babd..5681bca346 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/monitor/monitor.c
|
|
|
|
+++ b/monitor/monitor.c
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -117,6 +117,21 @@ bool monitor_cur_is_qmp(void)
|
2021-09-01 19:10:12 +03:00
|
|
|
return cur_mon && monitor_is_qmp(cur_mon);
|
|
|
|
}
|
|
|
|
|
|
|
|
+/**
|
|
|
|
+ * If @mon is a QMP monitor, return the connection_nr, otherwise -1.
|
|
|
|
+ */
|
|
|
|
+int monitor_get_connection_nr(const Monitor *mon)
|
|
|
|
+{
|
|
|
|
+ MonitorQMP *qmp_mon;
|
|
|
|
+
|
|
|
|
+ if (!monitor_is_qmp(mon)) {
|
|
|
|
+ return -1;
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
+ qmp_mon = container_of(mon, MonitorQMP, common);
|
|
|
|
+ return qatomic_read(&qmp_mon->connection_nr);
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
/**
|
|
|
|
* Is @mon is using readline?
|
|
|
|
* Note: not all HMP monitors use readline, e.g., gdbserver has a
|
|
|
|
diff --git a/monitor/qmp.c b/monitor/qmp.c
|
2024-01-30 17:14:37 +03:00
|
|
|
index a239945e8d..589c9524f8 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/monitor/qmp.c
|
|
|
|
+++ b/monitor/qmp.c
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -165,6 +165,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
|
2021-09-01 19:10:12 +03:00
|
|
|
QDict *rsp;
|
|
|
|
QDict *error;
|
|
|
|
|
|
|
|
+ int conn_nr_before = qatomic_read(&mon->connection_nr);
|
|
|
|
+
|
|
|
|
rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
|
|
|
|
&mon->common);
|
|
|
|
|
2023-10-17 15:10:09 +03:00
|
|
|
@@ -180,7 +182,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
|
2021-09-01 19:10:12 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
- monitor_qmp_respond(mon, rsp);
|
|
|
|
+ /*
|
|
|
|
+ * qmp_dispatch might have yielded and waited for a BH, in which case there
|
|
|
|
+ * is a chance a new client connected in the meantime - if this happened,
|
|
|
|
+ * the command will not have been executed, but we also need to ensure that
|
|
|
|
+ * we don't send back a corresponding response on a line that no longer
|
|
|
|
+ * belongs to this request.
|
|
|
|
+ */
|
|
|
|
+ if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
|
|
|
|
+ monitor_qmp_respond(mon, rsp);
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
qobject_unref(rsp);
|
|
|
|
}
|
|
|
|
|
2024-01-30 17:14:37 +03:00
|
|
|
@@ -461,6 +473,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
|
2021-09-01 19:10:12 +03:00
|
|
|
|
|
|
|
switch (event) {
|
|
|
|
case CHR_EVENT_OPENED:
|
|
|
|
+ qatomic_inc_fetch(&mon->connection_nr);
|
|
|
|
mon->commands = &qmp_cap_negotiation_commands;
|
|
|
|
monitor_qmp_caps_reset(mon);
|
|
|
|
data = qmp_greeting(mon);
|
|
|
|
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
|
2024-01-30 17:14:37 +03:00
|
|
|
index 176b549473..790bb7d1da 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/qapi/qmp-dispatch.c
|
|
|
|
+++ b/qapi/qmp-dispatch.c
|
2022-06-27 14:05:40 +03:00
|
|
|
@@ -117,16 +117,28 @@ typedef struct QmpDispatchBH {
|
2021-09-01 19:10:12 +03:00
|
|
|
QObject **ret;
|
|
|
|
Error **errp;
|
|
|
|
Coroutine *co;
|
|
|
|
+ int conn_nr;
|
|
|
|
} QmpDispatchBH;
|
|
|
|
|
|
|
|
static void do_qmp_dispatch_bh(void *opaque)
|
|
|
|
{
|
|
|
|
QmpDispatchBH *data = opaque;
|
|
|
|
|
|
|
|
- assert(monitor_cur() == NULL);
|
|
|
|
- monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
|
|
|
|
- data->cmd->fn(data->args, data->ret, data->errp);
|
|
|
|
- monitor_set_cur(qemu_coroutine_self(), NULL);
|
|
|
|
+ /*
|
|
|
|
+ * A QMP monitor tracks it's client with a connection number, if this
|
|
|
|
+ * changes during the scheduling delay of this BH, we must not execute the
|
|
|
|
+ * command. Otherwise a badly placed 'qmp_capabilities' might affect the
|
|
|
|
+ * connection state of a client it was never meant for.
|
|
|
|
+ */
|
|
|
|
+ if (data->conn_nr == monitor_get_connection_nr(data->cur_mon)) {
|
|
|
|
+ assert(monitor_cur() == NULL);
|
|
|
|
+ monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
|
|
|
|
+ data->cmd->fn(data->args, data->ret, data->errp);
|
|
|
|
+ monitor_set_cur(qemu_coroutine_self(), NULL);
|
|
|
|
+ } else {
|
|
|
|
+ error_setg(data->errp, "active monitor connection changed");
|
|
|
|
+ }
|
|
|
|
+
|
|
|
|
aio_co_wake(data->co);
|
|
|
|
}
|
|
|
|
|
2024-01-30 17:14:37 +03:00
|
|
|
@@ -253,6 +265,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
|
2021-09-01 19:10:12 +03:00
|
|
|
.ret = &ret,
|
|
|
|
.errp = &err,
|
|
|
|
.co = qemu_coroutine_self(),
|
|
|
|
+ .conn_nr = monitor_get_connection_nr(cur_mon),
|
|
|
|
};
|
2024-01-30 17:14:37 +03:00
|
|
|
aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
|
2021-09-01 19:10:12 +03:00
|
|
|
&data);
|
|
|
|
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
index afa477aae6..d3ff124bf3 100644
|
2021-09-01 19:10:12 +03:00
|
|
|
--- a/stubs/monitor-core.c
|
|
|
|
+++ b/stubs/monitor-core.c
|
update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
savevm-async, because the previously used QEMUFileOps was dropped.
Changes to the current implementation:
* Switch to vector based methods as required for an IO channel. For
short reads the passed-in IO vector is stuffed with zeroes at the
end, just to be sure.
* For reading: The documentation in include/io/channel.h states that
at least one byte should be read, so also error out when whe are
at the very end instead of returning 0.
* For reading: Fix off-by-one error when request goes beyond end.
The wrong code piece was:
if ((pos + size) > maxlen) {
size = maxlen - pos - 1;
}
Previously, the last byte would not be read. It's actually
possible to get a snapshot .raw file that has content all the way
up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
trailing zero bytes (I wrote a script to do it).
Luckily, it didn't cause a real issue, because qemu_loadvm_state()
is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
section. The buffer for reading it is simply freed up afterwards
and the function will assume that it read the whole section, even
if that's not the case.
* For writing: Make use of the generated blk_pwritev() wrapper
instead of manually wrapping the coroutine to simplify and save a
few lines.
* Adapt to changed interfaces for blk_{pread,pwrite}:
* a9262f551e ("block: Change blk_{pread,pwrite}() param order")
* 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
* bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
Those changes especially affected the qemu-img dd patches, because
the context also changed, but also some of our block drivers used
the functions.
* Drop qemu-common.h include: it got renamed after essentially
everything was moved to other headers. The only remaining user I
could find for things dropped from the header between 7.0 and 7.1
was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
already includes the header to which the function was moved.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 15:07:13 +03:00
|
|
|
@@ -12,6 +12,11 @@ Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
|
2021-09-01 19:10:12 +03:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
+int monitor_get_connection_nr(const Monitor *mon)
|
|
|
|
+{
|
|
|
|
+ return -1;
|
|
|
|
+}
|
|
|
|
+
|
|
|
|
void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
|
|
|
|
{
|
|
|
|
}
|