This reverts commit 29cb6fcbb7, user
feedback was showing any positive impact of this patch, and upstream
still hasn't a fix for older stable releases (but for 6.8), so for now
rather revert this and wait for either a better (well, actual) fix or
updating to 6.8 or newer.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Users have been reporting [1] that VMs occasionally become
unresponsive with high CPU usage for some time (varying between ~1 and
more than 60 seconds). After that time, the guests come back and
continue running. Windows VMs seem most affected (not responding to
pings during the hang, RDP sessions time out), but we also got reports
about Linux VMs (reporting soft lockups). The issue was not present on
host kernel 5.15 and was first reported with kernel 6.2. Users
reported that the issue becomes easier to trigger the more memory is
assigned to the guests. Setting mitigations=off was reported to
alleviate (but not eliminate) the issue. For most users the issue
seems to disappear after (also) disabling KSM [2], but some users
reported freezes even with KSM disabled [3].
It turned out the reports concerned NUMA hosts only, and that the
freezes correlated with runs of the NUMA balancer [4]. Users reported
that disabling the NUMA balancer resolves the issue (even with KSM
enabled).
We put together a Linux VM reproducer, ran a git-bisect on the kernel
to find the commit introducing the issue and asked upstream for help
[5]. As it turned out, an upstream bugreport was recently opened [6]
and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
patch [7] on top of kernel 6.7, the reproducer does not trigger
freezes anymore. As of now, the patch (or its v2 [8]) is not yet
merged in the mainline kernel, and backporting it may be difficult due
to dependencies on other KVM changes [9].
However, the bugreport [6] also prompted an upstream developer to
propose a patch to the kernel scheduler logic that decides whether a
contended spinlock/rwlock should be dropped [10]. Without the patch,
PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
locks. With the patch, the kernel only drops contended locks if the
kernel is currently set to preempt=full. As noted in the commit
message [10], this can (counter-intuitively) improve KVM performance.
Our kernel defaults to preempt=voluntary (according to
/sys/kernel/debug/sched/preempt), so with the patch it does not drop
contended locks anymore, and the reproducer does not trigger freezes
anymore. Hence, backport [10] to our kernel.
[1] https://forum.proxmox.com/threads/130727/
[2] https://forum.proxmox.com/threads/130727/page-4#post-575886
[3] https://forum.proxmox.com/threads/130727/page-8#post-617587
[4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
[5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
[6] https://bugzilla.kernel.org/show_bug.cgi?id=218259
[7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
[8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
[9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/
[10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
There shouldn't be any changes for us w.r.t. data integrity and the
recent uncovered dnode dirtiness, as we backported those patches
already.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
since e2d55709398e ("vfio: Fold vfio_virqfd.ko into vfio.ko") this
config isn't a tristate anymore but a bool, so adapt to that.
Luckily the kconfig script did the right thing and set (or at least
kept) this to yes anyway
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the signed template together with the binary package(s) containing the unsigned
files form the input to our secure boot signing service.
the signed template consists of
- files.json (specifying which files are signed how and by which key)
- packaging template used to build the signed package(s)
the signing service
- extracts and checks the signed-template binary package
- extracts the unsigned package(s)
- signs the needed files
- packs up the signatures + the template contained in the signed-template
package into the signed source package
the signed source package can then be built in the regular fashion (in case of
the kernel packages, it will copy the kernel image, modules and some helper
files from the unsigned package, attach the signature created by the signing
service, and re-pack the result as signed-kernel package).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
it's really not just ZFS and AMDGPU modules, but way more and
generating scary looking messages for these "issues" is just noise
that drown real issues. Disable this for now, maybe in another few
years.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
to silence array-index-out-of-bounds warnings for dynamically-sized
arrays. All commits applied cleanly and just replace array[1] with
array[].
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
makes it easier to cherry-pick newer stable release tags, that
sometimes contain new config values one must pick from.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
it's mostly noise for users, and quiet some interpret this as real
problem and report it to us.
Ideally we'd either educate them, or take time ourself, to report this
upstream and see if the situation can be improved overall, but
currently that's not feasible. We should check this out a few releases
down, if the lower hanging fruits got fixed and noise got lower we
could enable it again to catch the more rare cases.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We have a slightly better fix where only a few targeted ZFS module
parts are added to the UBSAN ignore-list, so the rest of the kernel
still gets exposure.
Link: https://github.com/openzfs/zfs/pull/15510
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This is generating far too much noise in the logs, so keep it at once
per boot until we (and other user space tools) adapted to the kernel
wanting user space to chose memfd execution behavior very explicitly.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>