Biggest change is that AioContext locking got removed, but no changes
required other than dropping the calls to acquire and release it. As a
consequence, the single parameter for the bdrv_graph_wrlock() call got
removed which also required adaptation.
QAPI docs became stricter requiring to document all members.
Other minor changes:
- Single parameter from migration_is_running() was dropped.
- qemu_mutex_(un)lock_iothread() got renamed to bql_(un)lock().
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.
During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.
The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.
Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.
Major changes only affect alloc-track:
* It is not possible to call a generated co-wrapper like
bdrv_get_info() while holding the block graph lock exclusively [0],
which does happen during initialization of alloc-track when the
backing hd is set and the refresh_limits driver callback is invoked.
The bdrv_get_info() call to get the cluster size is moved to
directly after opening the file child in track_open().
The important thing is that at least the request alignment for the
write target is used, because then the RMW cycle in bdrv_pwritev
will gather enough data from the backing file. Partial cluster
allocations in the target are not a fundamental issue, because the
driver returns its allocation status based on the bitmap, so any
other data that maps to the same cluster will still be copied later
by a stream job (or during writes to that cluster).
* Replacing the node cannot be done in the
track_co_change_backing_file() callback, because it is a coroutine
and cannot hold the block graph lock exclusively. So it is moved to
the stream job itself with the auto-remove option not having an
effect anymore (qemu-server would always set it anyways).
In the future, there could either be a special option for the stream
job, or maybe the upcoming blockdev-replace QMP command can be used.
Replacing the backing child is actually already done in the stream
job, so no need to do it in the track_co_change_backing_file()
callback. It also cannot be called from a coroutine. Looking at the
implementation in the qcow2 driver, it doesn't seem to be intended
to change the backing child itself, just update driver-internal
state.
Other changes:
* alloc-track: Error out early when used without auto-remove. Since
replacing the node now happens in the stream job, where the option
cannot be read from (it's internal to the driver), it will always be
treated as 'on'. Makes sure to have users beside qemu-server notice
the change (should they even exist). The option can be fully dropped
in the future while adding a version guard in qemu-server.
* alloc-track: Avoid seemingly superfluous child permission update.
Doesn't seem necessary nowadays (maybe after commit "alloc-track:
fix deadlock during drop" where the dropping is not rescheduled and
delayed anymore or some upstream change). Replacing the block node
will already update the permissions of the new node (which was the
file child before). Should there really be some issue, instead of
having a drop state, this could also be just based off the fact
whether there is still a backing child.
Dumping the cumulative (shared) permissions for the BDS with a debug
print yields the same values after this patch and with QEMU 8.1,
namely 3 and 5.
* PBS block driver: compile unconditionally. Proxmox VE always needs
it and something in the build process changed to make it not enabled
by default. Probably would need to move the build option to meson
otherwise.
* backup: job unreferencing during cleanup needs to happen outside of
coroutine, so it was moved to before invoking the clean
* mirror: Cherry-pick stable fix to avoid potential deadlock.
* savevm-async: migrate_init now can fail, so propagate potential
error.
* savevm-async: compression counters are not accessible outside
migration/ram-compress now, so drop code that prophylactically set
it to zero.
[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Most notable fixes from a Proxmox VE perspective are:
* "virtio-net: correctly copy vnet header when flushing TX"
To prevent a stack overflow that could lead to leaking parts of the
QEMU process's memory.
* "hw/pflash: implement update buffer for block writes"
To prevent an edge case for half-completed writes. This potentially
affected EFI disks.
* Fixes to i386 emulation and ARM emulation.
No changes for patches were necessary (all are just automatic context
changes).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Many changes were necessary this time around:
* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.
* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.
* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.
* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.
* Async snapshot-related changes:
- The pending querying got adapted to the above-mentioned split and
a patch is added to optimize it/make it more similar to what
upstream code does.
- Added initialization of the compression counters (for
future-proofing).
- It's necessary the hold the BQL (big QEMU lock = iothread mutex)
during the setup phase, because block layer functions are used there
and not doing so leads to racy, hard-to-debug crashes or hangs. It's
necessary to change some upstream code too for this, a version of
the patch "migration: for snapshots, hold the BQL during setup
callbacks" is intended to be upstreamed.
- Need to take the bdrv graph read lock before flushing.
* hmp_info_balloon was moved to a different file.
* Needed to include a new headers from time to time to still get the
correct functions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
User-facing breaking change:
The slirp submodule for user networking got removed. It would be
necessary to add the --enable-slirp option to the build and/or install
the appropriate library to continue building it. Since PVE is not
explicitly supporting it, it would require additionally installing the
libslirp0 package on all installations and there is *very* little
mention on the community forum when searching for "slirp" or
"netdev user", the plan is to only enable it again if there is some
real demand for it.
Notable changes:
* The big change for this release is the rework of job locking, using
a job mutex and introducing _locked() variants of job API functions
moving away from call-side AioContext locking. See (in the qemu
submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and
remove Aiocontext locks") and previous commits for context.
Changes required for the backup patches:
* Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job
API functions where appropriate (many are only availalbe as
a _locked() variant).
* Remove acquiring/releasing AioContext around functions taking the
job mutex lock internally.
The patch introducing sequential transaction support for jobs needs
to temporarily unlock the job mutex to call job_start() when
starting the next job in the transaction.
* The zeroinit block driver now marks its child as primary.
The documentation in include/block/block-common.h states:
> Filter node has exactly one FILTERED|PRIMARY child, and may have
> other children which must not have these bits
Without this, an assert will trigger when copying to a zeroinit target
with qemu-img convert, because bdrv_child_cb_attach() expects any
non-PRIMARY child to be not FILTERED:
> qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw
> qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion
> `!(child->role & BDRV_CHILD_FILTERED)' failed.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Notable changes:
* bdrv_co_p{discard,readv,writev,write_zeroes} function signatures
changed, to using int64_t for offsets/bytes and some still had int
rather than BrdvRequestFlags for the flags.
* job_cancel_sync now has a force parameter. Commit messages in
73895f3838cd7fdaf185cf1dbc47be58844a966f
4cfb3f05627ad82af473e7f7ae113c3884cd04e3
sound like using force=true makes more sense.
* Added 3 patches coming in via qemu-stable tag, most important one is
to work around a librbd issue.
* Added another 3 patches from qemu-devel to fix issue leading to
crash when live migrating with iothread.
* cluster_size calculation helper changed (see patch pve/0026).
* QAPI's if conditionals now use 'CONFIG_FOO' rather than
'defined(CONFIG_FOO)'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Very clean rebase, only the +pve version handling needed manual fixing.
Drops two applied patches from extra/ and adds one new from upstream
(extra/0001*, fixes VNC over unix sockets) as well as 3 of my own for
allowing password changes on custom VNC displays again (as seen and
reviewed upstream, but not yet applied).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Mostly minor changes, bigger ones summarized:
* QEMU's internal backup code now uses a new async system, which allows
parallel requests - the default max_workers settings is 64, I chose
less, since 64 put enough stress on QEMU that the guest became
practically unusable during the backup, and 16 still shows quite a
nice measureable performance improvement. Little code changes for us
though.
* 'malformed' QAPI parameters/functions are now a build error (i.e.
using '_' vs '-'), I chose to just whitelist our calls in the name of
backwards compatibility.
* monitor OOB race fix now uses the upstream variant, cherry-picked from
origin/master since it's not in 6.0 by default
* last patch fixes a bug with snapshot rollback related to the new yank
system
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Lots of patches touched and some slight changes to the build process
since QEMU switched to meson as their build system. Functionality-wise
very little rebasing required.
New patches introduced:
* pve/0058: to fix VMA backups and clean up some code in general with
new 5.2 features now available to us (namely coroutine-enabled QMP).
* extra/0002: don't build man pages for guest agent when disabled
* extra/0003: fix live-migration with hugepages
* 0017 and 0018 are adjusted to fix snapshot abort and improve
snap performance a bit
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
No major semantic changes, mostly just deprecations and changed function
signatures. Drop the extra/ patches, as they have been applied upstream.
The added extra/ patch was accepted upstream[0] but has not been picked
up for 5.1. It is required for non-4M aligned backups to work with PBS.
[0] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg01671.html
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The long overdue nice rebase+cleanup was done by Dietmar
Originally-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>