Compare commits

...

54 Commits

Author SHA1 Message Date
Thomas Lamprecht
b6eb6d7a4d update ABI file for 6.2.16-20-bpo11-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 16:05:15 +01:00
Thomas Lamprecht
0817fc60f6 bump version to 6.2.16-20~bpo11+1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:42:47 +01:00
Thomas Lamprecht
d35c4a21ef update ZFS to 2.1.14
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:52 +01:00
Thomas Lamprecht
93ecd382ac rebase patches on top of Ubuntu-6.2.0-39.40
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit ddd91a3b05)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
be7f6da7d4 update sources to Ubuntu-6.2.0-39.40
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit b5335f0007)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
da42753efe backport constraining guest-supported xfeatures only at KVM_GET_XSAVE{2}
This improves compatibility for guests w.r.t. live-migration, or live
snapshot rollback, to hosts with less (FPU) xfeatures supported, as
long as the set of features that was actually exposed to the guest is
still supported.

This improves on the ad856280ddea ("x86/kvm/fpu: Limit guest
user_xfeatures to supported bits of XCR0") bug fix.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 6d825fcff3)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
ec989f0029 normalize patches
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 9a2449d7c2)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Stefan Sterz
7a0603cc5d backport exposing FLUSHBYASID when running nested VMs on AMD CPUs
this exposes the FLUSHBYASID CPU flag to nested VMs when running on an
AMD CPU. also reverts a made up check that would advertise
FLUSHBYASID as not supported. this enable certain modern hypervisors
such as VMWare ESXi 7 and Workstation 17 to run nested VMs properly
again.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
(cherry picked from commit 3202de9857)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
3cef827603 backport fix for AMD erratum #1485 on Zen4-based CPUs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 04f267a5c7)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
74a10f6133 update fwlist
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 67d3491e09)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:24:01 +01:00
Thomas Lamprecht
f9257a2fcd rebase patches on top of Ubuntu-6.2.0-36.36
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 2db681b5f1)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:21:56 +01:00
Thomas Lamprecht
9487cb3ce2 update sources to Ubuntu-6.2.0-36.36
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit f048d6bc26)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:21:44 +01:00
Stoiko Ivanov
8699bd7f04 cherry-pick fix for new amd64 ucode
The latest amd64-microcode package in sid [0] (which probably will
eventually make it to bookworm-security) has a change that requires
the added patch to work properly.

The changelog-entry refers to stable k.o branches only - but a quick
look through the linux-firmware.git log identifies:
`f2eb058afc57348cde66852272d6bf11da1eef8f` as relevant commit, which
refers (as NOTE in the patch) to:
a32b0f0db3f3 ("x86/microcode/AMD: Load late on both threads too")
which applies cleanly (although I cherry-picked the patch from the
6.1.y stable branch to have the original commit in the commit
message).

quickly tested compiling and booting the result in a VM (however w/o
a fitting CPU (Epyc Genoa or Bergamo) it should cause a change)

reported in our Enterprise Support as potential culprit for one
thread from 128 being reported as offline in `lscpu`

[0] https://metadata.ftp-master.debian.org/changelogs//non-free-firmware/a/amd64-microcode/amd64-microcode_3.20230808.1.1_changelog

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
(cherry picked from commit 4696b978f7)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:20:23 +01:00
Thomas Lamprecht
a16ba5f76a fix thunderbolt ring-interrupt not being masked on suspend
Originally for v6.4-rc7 and now it also got already into some stable
trees, but not yet into a (released) ubuntu tag – so backport it
already.

Link: https://forum.proxmox.com/threads/133104/post-590457
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit d772676031)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:20:13 +01:00
Thomas Lamprecht
9e5b784c3c cherry-pick fix for setting X86_FEATURE_OSXSAVE feature
Avoids regressions where some code falsely think they cannot use some
CPU features like AVX1, e.g., ZFS.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 9ba0dde971)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:20:08 +01:00
Thomas Lamprecht
43eef4a616 rebase patches on top of Ubuntu-6.2.0-34.34
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit 8ff596f2d3)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:20:01 +01:00
Thomas Lamprecht
479f8ddf68 update sources to Ubuntu-6.2.0-34.34
(generated with debian/scripts/import-upstream-tag)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit b3aeb8dba9)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:20:01 +01:00
Thomas Lamprecht
0d7e7de56f backport thunderbolt-net fixes
A user of ours reported an issue with p2p thunderbolt-net w.r.t. IPv6
and failure to reestablish the connection after a reboot of a peer
node, in the forum [0] and the relayed it upstream, so lets
cherry-pick those two patches to our 6.2. Especially the IPv6 one
seems straight forward, and the other one makes it actually spec
conform and should only improve things.

[0]: https://forum.proxmox.com/threads/133104/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
(cherry picked from commit ddba52024f)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:19:50 +01:00
Fabian Grünbichler
31a20e1eb0 fix #4707: add override parameter for RMRR relaxation
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
(cherry picked from commit 1acfcad2f3)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-01 15:19:46 +01:00
Thomas Lamprecht
db0095a49a buildsys: sync package version to d/changelog source-of-truth one
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-09-04 17:37:41 +02:00
Thomas Lamprecht
48aa0042ee bump version to 6.2.16-11~bpo11+2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-09-04 16:49:20 +02:00
Fiona Ebner
44f3d669a5 cherry-pick fix for KVM vCPU page fault loop
The mailing list thread [0] (found by Friedrich, many thanks!) leading
up to this patch sounds very familiar to issues users reported in the
community forum [1] and enterprise support channel, where a VM would
be stuck for no discernable reason with all vCPU threads spinning.

[0]: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com/T/#u
[1]: https://forum.proxmox.com/threads/127459/

Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
(cherry picked from commit 6810c247a1)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-09-04 16:46:34 +02:00
Thomas Lamprecht
e55e2c5f6b d/changelog: fixup dist
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-08-31 15:32:08 +02:00
Thomas Lamprecht
ec000fbeee update ABI file for 6.2.16-11-bpo11-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-08-31 15:15:56 +02:00
Thomas Lamprecht
c856766990 bump version to 6.2.16-11~bpo11+1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-08-31 14:44:09 +02:00
Thomas Lamprecht
eb24434957 update fwlist for 6.2.16-11
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-08-31 14:44:09 +02:00
Thomas Lamprecht
95a30681a7 sync patches and submodule state from master
from commit 243a198 ("bump version to 6.2.16-11")

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-08-31 12:14:44 +02:00
Thomas Lamprecht
4110bd2d03 update ABI file for 6.2.16-4-bpo11-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:55:10 +02:00
Thomas Lamprecht
880f7c870d bump version to 6.2.16-4~bpo11+1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:34:54 +02:00
Thomas Lamprecht
b355e714f7 Revert "update ZFS to 2.1.12"
This reverts commit 4c09c4f700 as we
avoid updating ZFS on bullseye for now.
2023-07-07 17:08:14 +02:00
Thomas Lamprecht
72cc608d55 d/control: set compat back to 12 for oldstable
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:08:14 +02:00
Thomas Lamprecht
37e73fdea5 update ABI file for 6.2.16-4-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
b75cb66e19 bump version to 6.2.16-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
b93bf93302 update submodule to Proxmox-6.2.16-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
2ebe71c2f4 update ABI file for 6.2.16-3-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
20e538aded bump version to 6.2.16-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
9d49533827 update to Proxmox-6.2.16-2 based on Ubuntu-6.2.0-25.25
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
60e902c637 buildsys: improve DSC target
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
e662807c34 bump version to 6.2.16-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
4c09c4f700 update ZFS to 2.1.12
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
9dc16d8b59 backport "net/sched: flower: fix possible OOB write in fl_set_geneve_opt()"
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
8d9ca43bc9 backport re-adding mdev_set_iommu_device() kABI
Should fix compat with SRIOV based Nvidia vGPU until they switch over
to using the vfio-pci-core framework instead of MDEV.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
280a4afd00 scripts: modernize abi-generate & find-firmware
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
d198e88b50 scripts: modernize abi-check a bit
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
14c9961f1d debian: update postinst, postrm and prerm script style
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
9edf7f3338 update ABI file for 6.2.16-1-pve
(generated with debian/scripts/abi-generate)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
6b2a7ba687 bump version to 6.2.16-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
205a1d1cfc update ZFS submodule to latest git
no actual source code changes, just packaging stuff for bookworm

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:27 +02:00
Thomas Lamprecht
74dd3bf8d0 update submodule to Proxmox-6.2.16-1 and refresh patches
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:04 +02:00
Thomas Lamprecht
243323e92d update patches for Ubuntu-6.2.0-23.23
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:04 +02:00
Thomas Lamprecht
fce511c937 d/control: define compat level via build-depends and raise to 13
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-07-07 17:00:04 +02:00
Thomas Lamprecht
04b9751b0f buildsys: derive upload dist automatically
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-12 15:25:34 +02:00
Thomas Lamprecht
13b09d1825 d/control: drop useless dependency on already essential coreutils
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-12 15:25:34 +02:00
Thomas Lamprecht
72d9da59dc buildsys: add sbuild convenience target
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-12 15:25:34 +02:00
29 changed files with 29025 additions and 28179 deletions

View File

@ -1,12 +1,14 @@
include /usr/share/dpkg/pkg-info.mk
# also bump pve-kernel-meta if either of MAJ.MIN, PATCHLEVEL or KREL change
KERNEL_MAJ=6
KERNEL_MIN=2
KERNEL_PATCHLEVEL=11
KERNEL_PATCHLEVEL=16
# increment KREL if the ABI changes (abicheck target in debian/rules)
# rebuild packages with new KREL and run 'make abiupdate'
KREL=2
KREL=20-bpo11
PKGREL=2
PKGREL=20~bpo11+1
KERNEL_MAJMIN=$(KERNEL_MAJ).$(KERNEL_MIN)
KERNEL_VER=$(KERNEL_MAJMIN).$(KERNEL_PATCHLEVEL)
@ -63,10 +65,15 @@ $(DST_DEB): $(BUILD_DIR).prepared
#lintian $(HDR_DEB)
lintian $(LINUX_TOOLS_DEB)
dsc: $(DSC)
dsc:
$(MAKE) $(DSC)
lintian $(DSC)
$(DSC): $(BUILD_DIR).prepared
cd $(BUILD_DIR); dpkg-buildpackage -S -uc -us -d
lintian $(DSC)
sbuild: $(DSC)
sbuild $(DSC)
$(BUILD_DIR).prepared: $(addsuffix .prepared,$(KERNEL_SRC) $(MODULES) debian)
cp -a fwlist-previous $(BUILD_DIR)/
@ -113,8 +120,9 @@ $(ZFSDIR).prepared: $(ZFSONLINUX_SUBMODULE)
touch $(ZFSDIR).prepared
.PHONY: upload
upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
upload: $(DEBS)
tar cf - $(DEBS)|ssh -X repoman@repo.proxmox.com -- upload --product pve,pmg,pbs --dist bullseye --arch $(ARCH)
tar cf - $(DEBS)|ssh -X repoman@repo.proxmox.com -- upload --product pve,pmg,pbs --dist $(UPLOAD_DIST) --arch $(ARCH)
.PHONY: distclean
distclean: clean

File diff suppressed because it is too large Load Diff

27621
abi-prev-6.2.16-20-bpo11-pve Normal file

File diff suppressed because it is too large Load Diff

144
debian/changelog vendored
View File

@ -1,3 +1,147 @@
pve-kernel (6.2.16-20~bpo11+1) bullseye-backports; urgency=medium
* backport to Debian Bullseye based Proxmox projects.
-- Proxmox Support Team <support@proxmox.com> Fri, 01 Dec 2023 15:42:23 +0100
proxmox-kernel-6.2 (6.2.16-20) bookworm; urgency=medium
* update sources to Ubuntu-6.2.0-39.40
* update ZFS to 2.1.14
-- Proxmox Support Team <support@proxmox.com> Fri, 01 Dec 2023 14:17:27 +0100
proxmox-kernel-6.2 (6.2.16-19) bookworm; urgency=medium
* backport exposing FLUSHBYASID when running nested VMs on AMD CPUs to fix
nesting of some hyper-visors like VMware Workstation.
* backport constraining guest-supported xfeatures only at KVM_GET_XSAVE{2}
to further improve compatibility for guests w.r.t. live-migration, or live
snapshot rollback, to hosts with less (FPU) xfeatures supported.
-- Proxmox Support Team <support@proxmox.com> Tue, 24 Oct 2023 14:07:51 +0200
proxmox-kernel-6.2 (6.2.16-18) bookworm; urgency=medium
* backport fix for AMD erratum #1485 on Zen4-based CPUs to avoid triggering
undefined instruction exceptions when disabling all, or certain security
mitigations, like using the "mitigations=off" kernel command line
parameter
* backport ZFS fix to avoid crashes and hangs if used on modern Intel HW
like the Xeon Scalable 4th Gen "Sapphire Rapids" CPUs due to a HW bug as
per Intel SPR erratum SPR4
-- Proxmox Support Team <support@proxmox.com> Wed, 11 Oct 2023 17:05:18 +0200
proxmox-kernel-6.2 (6.2.16-16) bookworm; urgency=medium
* update sources to Ubuntu-6.2.0-36.36
-- Proxmox Support Team <support@proxmox.com> Tue, 03 Oct 2023 07:42:21 +0200
proxmox-kernel-6.2 (6.2.16-15) bookworm; urgency=medium
* fix thunderbolt ring-interrupt not being masked on suspend
* cherry-pick fix to avoid potentially offlining one CPU thread on some EPYC
CPUs with a new amd64-microcode package (still in unstable).
* update ZFS to 2.1.13
-- Proxmox Support Team <support@proxmox.com> Thu, 28 Sep 2023 15:53:58 +0200
proxmox-kernel-6.2 (6.2.16-14) bookworm; urgency=medium
* cherry-pick fix for setting X86_FEATURE_OSXSAVE feature improving
performance of some code that tries to live-detect available CPU features,
like, e.g., ZFS.
-- Proxmox Support Team <support@proxmox.com> Tue, 19 Sep 2023 10:17:16 +0200
proxmox-kernel-6.2 (6.2.16-13) bookworm; urgency=medium
* fix #4707: add override parameter for RMRR relaxation
* backport thunderbolt-net fixes for IPv6 and connection re-establishment
after a node got rebooted
* update sources to Ubuntu-6.2.0-34.34
-- Proxmox Support Team <support@proxmox.com> Mon, 18 Sep 2023 15:31:57 +0200
pve-kernel (6.2.16-11~bpo11+2) bullseye; urgency=medium
* cherry-pick fix for KVM vCPU page-fault loop.
Due to to small and signed type used for an memory related sequence
counter there was a chance that for long-lived VMs KVM would effectively
hang vCPUs due to always thinking page faults are stale, which results in
KVM refusing to "fix" faults.
-- Proxmox Support Team <support@proxmox.com> Mon, 04 Sep 2023 16:49:15 +0200
proxmox-kernel-6.2 (6.2.16-12) bookworm; urgency=medium
* cherry-pick fix for KVM vCPU page-fault loop.
Due to too small and signed type used for an memory related sequence
counter there was a chance that for long-lived VMs KVM would effectively
hang vCPUs due to always thinking page faults are stale, which results in
KVM refusing to "fix" faults.
-- Proxmox Support Team <support@proxmox.com> Mon, 04 Sep 2023 15:21:22 +0200
pve-kernel (6.2.16-11~bpo11+1) bullseye; urgency=medium
* Rebuild 6.2.6-11 for Bullseye
* bump ABI to 6.2.16-11~bpo11
-- Proxmox Support Team <support@proxmox.com> Thu, 31 Aug 2023 12:18:44 +0200
pve-kernel (6.2.16-4~bpo11+1) bullseye; urgency=medium
* Rebuild for bullseye but keep ZFS at 2.1.11 for now.
* bump ABI to 6.2.16-4~bpo11
-- Proxmox Support Team <support@proxmox.com> Fri, 07 Jul 2023 17:05:59 +0200
pve-kernel (6.2.16-4) bookworm; urgency=medium
* backport fixes for StackRot (CVE-2023-3269)
-- Proxmox Support Team <support@proxmox.com> Fri, 07 Jul 2023 06:22:28 +0200
pve-kernel (6.2.16-3) bookworm; urgency=medium
* update to Ubuntu-6.2.0-25.25
-- Proxmox Support Team <support@proxmox.com> Sat, 17 Jun 2023 07:58:57 +0200
pve-kernel (6.2.16-2) bookworm; urgency=medium
* update ZFS to 2.1.12
* bump ABI to 6.2.16-2
* backport "net/sched: flower: fix possible OOB write in fl_set_geneve_opt()"
* backport re-adding mdev_set_iommu_device() kABI for support of SRIOV based
Nvidia vGPU
-- Proxmox Support Team <support@proxmox.com> Tue, 13 Jun 2023 15:30:53 +0200
pve-kernel (6.2.16-1) bookworm; urgency=medium
* update to Ubuntu-6.2.0-23.23 and pull in stable fixes up to v6.2.16
* build for Debian 12 Bookworm based releases
-- Proxmox Support Team <support@proxmox.com> Sat, 20 May 2023 19:23:34 +0200
pve-kernel (6.2.11-2) bullseye; urgency=medium
* backport "netfilter: nf_tables: deactivate anonymous set from preparation

1
debian/compat vendored
View File

@ -1 +0,0 @@
10

4
debian/control.in vendored
View File

@ -7,7 +7,7 @@ Build-Depends: asciidoc-base,
bc,
bison,
cpio,
debhelper (>= 10~),
debhelper-compat (= 12),
dh-python,
dwarves,
file,
@ -50,7 +50,7 @@ Section: devel
Priority: optional
Architecture: any
Provides: linux-headers-@KVNAME@-amd64,
Depends: coreutils | fileutils (>= 4.0), ${misc:Depends},
Depends: ${misc:Depends},
Description: Proxmox Kernel Headers
This package contains the linux kernel headers

View File

@ -1,6 +1,7 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use strict;
use warnings;
# Ignore all invocations except when called on to configure.
exit 0 unless $ARGV[0] =~ /configure/;
@ -16,10 +17,9 @@ system("depmod $version");
if (-d "/etc/kernel/postinst.d") {
print STDERR "Examining /etc/kernel/postinst.d.\n";
system ("run-parts --verbose --exit-on-error --arg=$version " .
"--arg=$imagedir/vmlinuz-$version " .
"/etc/kernel/postinst.d") &&
die "Failed to process /etc/kernel/postinst.d";
system(
"run-parts --verbose --exit-on-error --arg=$version --arg=$imagedir/vmlinuz-$version /etc/kernel/postinst.d"
) && die "Failed to process /etc/kernel/postinst.d";
}
exit 0

View File

@ -1,6 +1,7 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use strict;
use warnings;
# Ignore all 'upgrade' invocations .
exit 0 if $ARGV[0] =~ /upgrade/;
@ -11,10 +12,9 @@ my $version = "@@KVNAME@@";
if (-d "/etc/kernel/postrm.d") {
print STDERR "Examining /etc/kernel/postrm.d.\n";
system ("run-parts --verbose --exit-on-error --arg=$version " .
"--arg=$imagedir/vmlinuz-$version " .
"/etc/kernel/postrm.d") &&
die "Failed to process /etc/kernel/postrm.d";
system (
"run-parts --verbose --exit-on-error --arg=$version --arg=$imagedir/vmlinuz-$version /etc/kernel/postrm.d"
) && die "Failed to process /etc/kernel/postrm.d";
}
unlink "$imagedir/initrd.img-$version";
@ -25,15 +25,15 @@ unlink "/var/lib/initramfs-tools/$version";
exit 0 unless $ARGV[0] =~ /purge/;
my @files_to_remove = qw{
modules.dep modules.isapnpmap modules.pcimap
modules.usbmap modules.parportmap
modules.generic_string modules.ieee1394map
modules.ieee1394map modules.pnpbiosmap
modules.alias modules.ccwmap modules.inputmap
modules.symbols modules.ofmap
modules.seriomap modules.*.bin
modules.softdep modules.devname
};
modules.dep modules.isapnpmap modules.pcimap
modules.usbmap modules.parportmap
modules.generic_string modules.ieee1394map
modules.ieee1394map modules.pnpbiosmap
modules.alias modules.ccwmap modules.inputmap
modules.symbols modules.ofmap
modules.seriomap modules.*.bin
modules.softdep modules.devname
};
foreach my $extra_file (@files_to_remove) {
for (glob("/lib/modules/$version/$extra_file")) {

View File

@ -1,6 +1,7 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use strict;
use warnings;
# Ignore all invocations uxcept when called on to remove
exit 0 unless ($ARGV[0] && $ARGV[0] =~ /remove/) ;
@ -14,10 +15,9 @@ my $version = "@@KVNAME@@";
if (-d "/etc/kernel/prerm.d") {
print STDERR "Examining /etc/kernel/prerm.d.\n";
system ("run-parts --verbose --exit-on-error --arg=$version " .
"--arg=$imagedir/vmlinuz-$version " .
"/etc/kernel/prerm.d") &&
die "Failed to process /etc/kernel/prerm.d";
system(
"run-parts --verbose --exit-on-error --arg=$version --arg=$imagedir/vmlinuz-$version /etc/kernel/prerm.d"
) && die "Failed to process /etc/kernel/prerm.d";
}
exit 0

View File

@ -1,4 +1,7 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use strict;
use warnings;
my $abinew = shift;
my $abiold = shift;
@ -22,30 +25,30 @@ my $count;
print "II: Checking ABI...\n";
if ($skipabi) {
print "WW: Explicitly asked to ignore ABI, running in no-fail mode\n";
$fail_exit = 0;
$abiskip = 1;
$EE = "WW:";
print "WW: Explicitly asked to ignore ABI, running in no-fail mode\n";
$fail_exit = 0;
$abiskip = 1;
$EE = "WW:";
}
if ($prev_abistr ne $abistr) {
print "II: Different ABI's, running in no-fail mode\n";
$fail_exit = 0;
$EE = "WW:";
print "II: Different ABI's, running in no-fail mode\n";
$fail_exit = 0;
$EE = "WW:";
}
if (not -f "$abinew" or not -f "$abiold") {
print "EE: Previous or current ABI file missing!\n";
print " $abinew\n" if not -f "$abinew";
print " $abiold\n" if not -f "$abiold";
print "EE: Previous or current ABI file missing!\n";
print " $abinew\n" if not -f "$abinew";
print " $abiold\n" if not -f "$abiold";
# Exit if the ABI files are missing, but return status based on whether
# skip ABI was indicated.
if ("$abiskip" eq "1") {
exit(0);
} else {
exit(1);
}
# Exit if the ABI files are missing, but return status based on whether
# skip ABI was indicated.
if ("$abiskip" eq "1") {
exit(0);
} else {
exit(1);
}
}
my %symbols;
@ -57,101 +60,97 @@ my %module_syms;
my $ignore = 0;
print " Reading symbols/modules to ignore...";
for $file ("abi-blacklist") {
if (-f $file) {
open(IGNORE, "< $file") or
die "Could not open $file";
while (<IGNORE>) {
chomp;
if ($_ =~ m/M: (.*)/) {
$modules_ignore{$1} = 1;
} else {
$symbols_ignore{$_} = 1;
}
$ignore++;
}
close(IGNORE);
for my $file ("abi-blacklist") {
next if !-f $file;
open(my $IGNORE_FH, '<', $file) or die "Could not open $file - $!";
while (<$IGNORE_FH>) {
chomp;
if ($_ =~ m/M: (.*)/) {
$modules_ignore{$1} = 1;
} else {
$symbols_ignore{$_} = 1;
}
$ignore++;
}
close($IGNORE_FH);
}
print "read $ignore symbols/modules.\n";
sub is_ignored($$) {
my ($mod, $sym) = @_;
my ($mod, $sym) = @_;
die "Missing module name in is_ignored()" if not defined($mod);
die "Missing symbol name in is_ignored()" if not defined($sym);
die "Missing module name in is_ignored()" if not defined($mod);
die "Missing symbol name in is_ignored()" if not defined($sym);
if (defined($symbols_ignore{$sym}) or defined($modules_ignore{$mod})) {
return 1;
}
return 0;
if (defined($symbols_ignore{$sym}) or defined($modules_ignore{$mod})) {
return 1;
}
return 0;
}
# Read new syms first
print " Reading new symbols ($abistr)...";
$count = 0;
open(NEW, "< $abinew") or
die "Could not open $abinew";
while (<NEW>) {
chomp;
m/^(\S+)\s(.+)\s(0x[0-9a-f]+)\s(.+)$/;
$symbols{$4}{'type'} = $1;
$symbols{$4}{'loc'} = $2;
$symbols{$4}{'hash'} = $3;
$module_syms{$2} = 0;
$count++;
open(my $NEW_FH, '<', $abinew) or die "Could not open $abinew - $!";
while (<$NEW_FH>) {
chomp;
m/^(\S+)\s(.+)\s(0x[0-9a-f]+)\s(.+)$/;
$symbols{$4}{'type'} = $1;
$symbols{$4}{'loc'} = $2;
$symbols{$4}{'hash'} = $3;
$module_syms{$2} = 0;
$count++;
}
close(NEW);
close($NEW_FH);
print "read $count symbols.\n";
# Now the old symbols, checking for missing ones
print " Reading old symbols...";
$count = 0;
open(OLD, "< $abiold") or
die "Could not open $abiold";
while (<OLD>) {
chomp;
m/^(\S+)\s(.+)\s(0x[0-9a-f]+)\s(.+)$/;
$symbols{$4}{'old_type'} = $1;
$symbols{$4}{'old_loc'} = $2;
$symbols{$4}{'old_hash'} = $3;
$count++;
open(my $OLD_FH, '<', $abiold) or die "Could not open $abiold - $!";
while (<$OLD_FH>) {
chomp;
m/^(\S+)\s(.+)\s(0x[0-9a-f]+)\s(.+)$/;
$symbols{$4}{'old_type'} = $1;
$symbols{$4}{'old_loc'} = $2;
$symbols{$4}{'old_hash'} = $3;
$count++;
}
close(OLD);
close($OLD_FH);
print "read $count symbols.\n";
print "II: Checking for missing symbols in new ABI...";
$count = 0;
foreach $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'type'})) {
print "\n" if not $count;
printf(" MISS : %s%s\n", $sym,
is_ignored($symbols{$sym}{'old_loc'}, $sym) ? " (ignored)" : "");
$count++ if !is_ignored($symbols{$sym}{'old_loc'}, $sym);
}
for my $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'type'})) {
print "\n" if not $count;
printf(" MISS : %s%s\n", $sym, is_ignored($symbols{$sym}{'old_loc'}, $sym) ? " (ignored)" : "");
$count++ if !is_ignored($symbols{$sym}{'old_loc'}, $sym);
}
}
print " " if $count;
print "found $count missing symbols\n";
if ($count) {
print "$EE Symbols gone missing (what did you do!?!)\n";
$errors++;
print "$EE Symbols gone missing (what did you do!?!)\n";
$errors++;
}
print "II: Checking for new symbols in new ABI...";
$count = 0;
foreach $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'old_type'})) {
print "\n" if not $count;
print " NEW : $sym\n";
$count++;
}
for my $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'old_type'})) {
print "\n" if not $count;
print " NEW : $sym\n";
$count++;
}
}
print " " if $count;
print "found $count new symbols\n";
if ($count) {
print "WW: Found new symbols. Not recommended unless ABI was bumped\n";
print "WW: Found new symbols. Not recommended unless ABI was bumped\n";
}
print "II: Checking for changes to ABI...\n";
@ -159,37 +158,34 @@ $count = 0;
my $moved = 0;
my $changed_type = 0;
my $changed_hash = 0;
foreach $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'old_type'}) or
!defined($symbols{$sym}{'type'})) {
next;
}
for my $sym (keys(%symbols)) {
if (!defined($symbols{$sym}{'old_type'}) or !defined($symbols{$sym}{'type'})) {
next;
}
# Changes in location don't hurt us, but log it anyway
if ($symbols{$sym}{'loc'} ne $symbols{$sym}{'old_loc'}) {
printf(" MOVE : %-40s : %s => %s\n", $sym, $symbols{$sym}{'old_loc'},
$symbols{$sym}{'loc'});
$moved++;
}
# Changes in location don't hurt us, but log it anyway
if ($symbols{$sym}{'loc'} ne $symbols{$sym}{'old_loc'}) {
printf(" MOVE : %-40s : %s => %s\n", $sym, $symbols{$sym}{'old_loc'}, $symbols{$sym}{'loc'});
$moved++;
}
# Changes to export type are only bad if new type isn't
# EXPORT_SYMBOL. Changing things to GPL are bad.
if ($symbols{$sym}{'type'} ne $symbols{$sym}{'old_type'}) {
printf(" TYPE : %-40s : %s => %s%s\n", $sym, $symbols{$sym}{'old_type'}.
$symbols{$sym}{'type'}, is_ignored($symbols{$sym}{'loc'}, $sym)
? " (ignored)" : "");
$changed_type++ if $symbols{$sym}{'type'} ne "EXPORT_SYMBOL"
and !is_ignored($symbols{$sym}{'loc'}, $sym);
}
# Changes to export type are only bad if new type isn't
# EXPORT_SYMBOL. Changing things to GPL are bad.
if ($symbols{$sym}{'type'} ne $symbols{$sym}{'old_type'}) {
printf(" TYPE : %-40s : %s => %s%s\n", $sym, $symbols{$sym}{'old_type'}.
$symbols{$sym}{'type'}, is_ignored($symbols{$sym}{'loc'}, $sym)
? " (ignored)" : "");
$changed_type++ if $symbols{$sym}{'type'} ne "EXPORT_SYMBOL" and !is_ignored($symbols{$sym}{'loc'}, $sym);
}
# Changes to the hash are always bad
if ($symbols{$sym}{'hash'} ne $symbols{$sym}{'old_hash'}) {
printf(" HASH : %-40s : %s => %s%s\n", $sym, $symbols{$sym}{'old_hash'},
$symbols{$sym}{'hash'}, is_ignored($symbols{$sym}{'loc'}, $sym)
? " (ignored)" : "");
$changed_hash++ if !is_ignored($symbols{$sym}{'loc'}, $sym);
$module_syms{$symbols{$sym}{'loc'}}++;
}
# Changes to the hash are always bad
if ($symbols{$sym}{'hash'} ne $symbols{$sym}{'old_hash'}) {
printf(" HASH : %-40s : %s => %s%s\n", $sym, $symbols{$sym}{'old_hash'},
$symbols{$sym}{'hash'}, is_ignored($symbols{$sym}{'loc'}, $sym)
? " (ignored)" : "");
$changed_hash++ if !is_ignored($symbols{$sym}{'loc'}, $sym);
$module_syms{$symbols{$sym}{'loc'}}++;
}
}
print "WW: $moved symbols changed location\n" if $moved;
@ -198,17 +194,17 @@ print "$EE $changed_hash symbols changed hash and weren't ignored\n" if $changed
$errors++ if $changed_hash or $changed_type;
if ($changed_hash) {
print "II: Module hash change summary...\n";
foreach $mod (sort { $module_syms{$b} <=> $module_syms{$a} } keys %module_syms) {
next if ! $module_syms{$mod};
printf(" %-40s: %d\n", $mod, $module_syms{$mod});
}
print "II: Module hash change summary...\n";
for my $mod (sort { $module_syms{$b} <=> $module_syms{$a} } keys %module_syms) {
next if ! $module_syms{$mod};
printf(" %-40s: %d\n", $mod, $module_syms{$mod});
}
}
print "II: Done\n";
if ($errors) {
exit($fail_exit);
exit($fail_exit);
} else {
exit(0);
exit(0);
}

View File

@ -1,8 +1,11 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use PVE::Tools;
use strict;
use warnings;
use IO::File;
use PVE::Tools ();
use IO::File ();
sub usage {
die "USAGE: $0 INFILE OUTFILE [ABI INFILE-IS-DEB]\n";

View File

@ -1,6 +1,7 @@
#!/usr/bin/perl -w
#!/usr/bin/perl
use strict;
use warnings;
my $dir = shift;
@ -12,21 +13,21 @@ warn "\n\nNOTE: strange directory name: $dir\n\n" if $dir !~ m|^(.*/)?(\d+.\d+.\
my $apiver = $2;
open(TMP, "find '$dir' -name '*.ko'|");
while (defined(my $fn = <TMP>)) {
open(my $FIND_KO_FH, "find '$dir' -name '*.ko'|");
while (defined(my $fn = <$FIND_KO_FH>)) {
chomp $fn;
my $relfn = $fn;
$relfn =~ s|^$dir/*||;
my $cmd = "/sbin/modinfo -F firmware '$fn'";
open(MOD, "$cmd|");
while (defined(my $fw = <MOD>)) {
open(my $MOD_FH, "$cmd|");
while (defined(my $fw = <$MOD_FH>)) {
chomp $fw;
print "$fw $relfn\n";
}
close(MOD);
close($MOD_FH);
}
close TMP;
close($FIND_KO_FH);
exit 0;

File diff suppressed because it is too large Load Diff

View File

@ -21,7 +21,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/init/Makefile b/init/Makefile
index 26de459006c4..3157d9c79901 100644
index ec557ada3c12..72095034f338 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -29,7 +29,7 @@ preempt-flag-$(CONFIG_PREEMPT_DYNAMIC) := PREEMPT_DYNAMIC

View File

@ -55,10 +55,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2 files changed, 111 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2e77ecc12692..eae6fdc4c683 100644
index 5d47f23514d0..f06df077504b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4188,6 +4188,15 @@
@@ -4210,6 +4210,15 @@
Also, it enforces the PCI Local Bus spec
rule that those bits should be 0 in system reset
events (useful for kexec/kdump cases).
@ -75,7 +75,7 @@ index 2e77ecc12692..eae6fdc4c683 100644
Safety option to keep boot IRQs enabled. This
should never be necessary.
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 267e6002e29f..fac76ca1d16a 100644
index 592e1c4ae697..aebf6f412203 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -194,6 +194,106 @@ static int __init pci_apply_final_quirks(void)
@ -185,7 +185,7 @@ index 267e6002e29f..fac76ca1d16a 100644
/*
* Decoding should be disabled for a PCI device during BAR sizing to avoid
* conflict. But doing so may cause problems on host bridge and perhaps other
@@ -4959,6 +5059,8 @@ static const struct pci_dev_acs_enabled {
@@ -4974,6 +5074,8 @@ static const struct pci_dev_acs_enabled {
{ PCI_VENDOR_ID_CAVIUM, 0xA060, pci_quirk_mf_endpoint_acs },
/* APM X-Gene */
{ PCI_VENDOR_ID_AMCC, 0xE004, pci_quirk_xgene_acs },

View File

@ -13,7 +13,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 07aae60288f9..949b7204cf52 100644
index 73fad57408f7..99ae3e468ce6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -79,7 +79,7 @@ module_param(halt_poll_ns, uint, 0644);

View File

@ -14,10 +14,10 @@ Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index fce980d531bd..5079a3851798 100644
index 555bbe774734..de2e0d0185fc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10257,7 +10257,7 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
@@ -10262,7 +10262,7 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
if (time_after(jiffies, warning_time +
READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
list_for_each_entry(dev, list, todo_list) {

View File

@ -0,0 +1,133 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Date: Fri, 14 Jul 2023 18:10:32 +0200
Subject: [PATCH] kvm: xsave set: mask-out PKRU bit in xfeatures if vCPU has no
support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Fixes live-migrations & snapshot-rollback of VMs with a restricted
CPU type (e.g., qemu64) from our 5.15 based kernel (default Proxmox
VE 7.4) to the 6.2 (and future newer) of Proxmox VE 8.0.
Previous to ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to
supported bits of XCR0") the PKRU bit of the host could leak into the
state from the guest, which caused trouble when migrating between
hosts with different CPUs, i.e., where the source supported it but
the target did not, causing a general protection fault when the guest
tried to use a pkru related instruction after the migration.
But the fix, while welcome, caused a temporary out-of-sync state when
migrating such a VM from a kernel without the fix to a kernel with
the fix, as it threw of KVM when the CPUID of the guest and most of
the state doesn't report XSAVE and thus any xfeatures, but PKRU and
the related state is set as enabled, causing the vCPU to spin at 100%
without any progress forever.
The fix could be at two sites, either in QEMU or in the kernel, I
choose the kernel as we have all the info there for a targeted
heuristic so that we don't have to adapt QEMU and qemu-server, the
latter even on both sides.
Still, a short summary of the possible fixes and short drawbacks:
* on QEMU-side either
- clear the PKRU state in the migration saved state would be rather
complicated to implement as the vCPU is initialised way before we
have the saved xfeature state available to check what we'd need
to do, plus the user-space only gets a memory blob from ioctl
KVM_GET_XSAVE2 that it passes to KVM_SET_XSAVE ioctl, there are
no ABI guarantees, and while the struct seem stable for 5.15 to
6.5-rc1, that doesn't has to be for future kernels, so off the
table.
- enforce that the CPUID reports PKU support even if it normally
wouldn't. While this works (tested by hard-coding it as POC) it
is a) not really nice and b) needs some interaction from
qemu-server to enable this flag as otherwise we have no good info
to decide when it's OK to do this, which means we need to adapt
both PVE 7 and 8's qemu-server and also pve-qemu, workable but
not optimal
* on Kernel/KVM-side we can hook into the set XSAVE ioctl specific to
the KVM subsystem, which already reduces chance of regression for
all other places. There we have access to the union/struct
definitions of the saved state and thus can savely cast to that.
We also got access to the vCPU's CPUID capabilities, meaning we can
check if the XCR0 (first XSAVE Control Register) reports
that it support the PKRU feature, and if it does *NOT* but the
saved xfeatures register from XSAVE *DOES* report it, we can safely
assume that this combination is due to an migration from an older,
leaky kernel and clear the bit in the xfeature register before
restoring it to the guest vCPU KVM state, avoiding the confusing
situation that made the vCPU spin at 100%.
This should be safe to do, as the guest vCPU CPUID never reported
support for the PKRU feature, and it's also a relatively niche and
newish feature.
If it gains us something we can drop this patch a bit in the future
Proxmox VE 9 major release, but we should ensure that VMs that where
started before PVE 8 cannot be directly live-migrated to the release
that includes that change; so we should rather only drop it if the
maintenance burden is high.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
arch/x86/kvm/cpuid.c | 6 ++++++
arch/x86/kvm/cpuid.h | 2 ++
arch/x86/kvm/x86.c | 13 +++++++++++++
3 files changed, 21 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7ccdf991d18e..61aefeb3fdbc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -251,6 +251,12 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
}
+bool vcpu_supports_xsave_pkru(struct kvm_vcpu *vcpu) {
+ u64 guest_supported_xcr0 = cpuid_get_supported_xcr0(
+ vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+ return (guest_supported_xcr0 & XFEATURE_MASK_PKRU) != 0;
+}
+
static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
int nent)
{
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..12a02851ff57 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -32,6 +32,8 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
u32 *ecx, u32 *edx, bool exact_only);
+bool vcpu_supports_xsave_pkru(struct kvm_vcpu *vcpu);
+
u32 xstate_required_size(u64 xstate_bv, bool compacted);
int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee603f4edce1..ff92ff41d5ce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5342,6 +5342,19 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return 0;
+ if (!vcpu_supports_xsave_pkru(vcpu)) {
+ void *buf = guest_xsave->region;
+ union fpregs_state *ustate = buf;
+ if (ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
+ printk(
+ KERN_NOTICE "clearing PKRU xfeature bit as vCPU from PID %d"
+ " reports no PKRU support - migration from fpu-leaky kernel?",
+ current->pid
+ );
+ ustate->xsave.header.xfeatures &= ~XFEATURE_MASK_PKRU;
+ }
+ }
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,

View File

@ -1,120 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 2 May 2023 10:25:24 +0200
Subject: [PATCH] netfilter: nf_tables: deactivate anonymous set from
preparation phase
Toggle deleted anonymous sets as inactive in the next generation, so
users cannot perform any update on it. Clear the generation bitmask
in case the transaction is aborted.
The following KASAN splat shows a set element deletion for a bound
anonymous set that has been already removed in the same transaction.
[ 64.921510] ==================================================================
[ 64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.924745] Write of size 8 at addr dead000000000122 by task test/890
[ 64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253
[ 64.931120] Call Trace:
[ 64.932699] <TASK>
[ 64.934292] dump_stack_lvl+0x33/0x50
[ 64.935908] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.937551] kasan_report+0xda/0x120
[ 64.939186] ? nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.940814] nf_tables_commit+0xa24/0x1490 [nf_tables]
[ 64.942452] ? __kasan_slab_alloc+0x2d/0x60
[ 64.944070] ? nf_tables_setelem_notify+0x190/0x190 [nf_tables]
[ 64.945710] ? kasan_set_track+0x21/0x30
[ 64.947323] nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink]
[ 64.948898] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/net/netfilter/nf_tables.h | 1 +
net/netfilter/nf_tables_api.c | 12 ++++++++++++
net/netfilter/nft_dynset.c | 2 +-
net/netfilter/nft_lookup.c | 2 +-
net/netfilter/nft_objref.c | 2 +-
5 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 9430128aae99..06815130e861 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -619,6 +619,7 @@ struct nft_set_binding {
};
enum nft_trans_phase;
+void nf_tables_activate_set(const struct nft_ctx *ctx, struct nft_set *set);
void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
struct nft_set_binding *binding,
enum nft_trans_phase phase);
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6023c9f72cdc..26255c2a6692 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4932,12 +4932,24 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
}
}
+void nf_tables_activate_set(const struct nft_ctx *ctx, struct nft_set *set)
+{
+ if (nft_set_is_anonymous(set))
+ nft_clear(ctx->net, set);
+
+ set->use++;
+}
+EXPORT_SYMBOL_GPL(nf_tables_activate_set);
+
void nf_tables_deactivate_set(const struct nft_ctx *ctx, struct nft_set *set,
struct nft_set_binding *binding,
enum nft_trans_phase phase)
{
switch (phase) {
case NFT_TRANS_PREPARE:
+ if (nft_set_is_anonymous(set))
+ nft_deactivate_next(ctx->net, set);
+
set->use--;
return;
case NFT_TRANS_ABORT:
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 274579b1696e..bd19c7aec92e 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -342,7 +342,7 @@ static void nft_dynset_activate(const struct nft_ctx *ctx,
{
struct nft_dynset *priv = nft_expr_priv(expr);
- priv->set->use++;
+ nf_tables_activate_set(ctx, priv->set);
}
static void nft_dynset_destroy(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index cae5a6724163..925392bab58a 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -167,7 +167,7 @@ static void nft_lookup_activate(const struct nft_ctx *ctx,
{
struct nft_lookup *priv = nft_expr_priv(expr);
- priv->set->use++;
+ nf_tables_activate_set(ctx, priv->set);
}
static void nft_lookup_destroy(const struct nft_ctx *ctx,
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index 7b01aa2ef653..d985d361ed8a 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -185,7 +185,7 @@ static void nft_objref_map_activate(const struct nft_ctx *ctx,
{
struct nft_objref_map *priv = nft_expr_priv(expr);
- priv->set->use++;
+ nf_tables_activate_set(ctx, priv->set);
}
static void nft_objref_map_destroy(const struct nft_ctx *ctx,

View File

@ -0,0 +1,41 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: kiler129 <grzegorz@noflash.pl>
Date: Mon, 18 Sep 2023 15:19:26 +0200
Subject: [PATCH] allow opt-in to allow pass-through on broken hardware..
adapted from https://github.com/kiler129/relax-intel-rmrr , licensed under MIT or GPL 2.0+
---
drivers/iommu/intel/iommu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1c5ba4dbfe78..887667218e3b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -297,6 +297,7 @@ static int dmar_map_gfx = 1;
static int dmar_map_ipu = 1;
static int intel_iommu_superpage = 1;
static int iommu_identity_mapping;
+static int intel_relaxable_rmrr = 0;
static int iommu_skip_te_disable;
#define IDENTMAP_GFX 2
@@ -358,6 +359,9 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "tboot_noforce", 13)) {
pr_info("Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n");
intel_iommu_tboot_noforce = 1;
+ } else if (!strncmp(str, "relax_rmrr", 10)) {
+ pr_info("Intel-IOMMU: assuming all RMRRs are relaxable. This can lead to instability or data loss\n");
+ intel_relaxable_rmrr = 1;
} else {
pr_notice("Unknown option - '%s'\n", str);
}
@@ -2538,7 +2542,7 @@ static bool device_rmrr_is_relaxable(struct device *dev)
return false;
pdev = to_pci_dev(dev);
- if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
+ if (intel_relaxable_rmrr || IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
return true;
else
return false;

View File

@ -0,0 +1,42 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Wed, 13 Sep 2023 08:26:47 +0300
Subject: [PATCH] net: thunderbolt: Fix TCPv6 GSO checksum calculation
Alex reported that running ssh over IPv6 does not work with
Thunderbolt/USB4 networking driver. The reason for that is that driver
should call skb_is_gso() before calling skb_is_gso_v6(), and it should
not return false after calculates the checksum successfully. This probably
was a copy paste error from the original driver where it was done properly.
Reported-by: Alex Balcanquall <alex@alexbal.com>
Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
drivers/net/thunderbolt.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/thunderbolt.c b/drivers/net/thunderbolt.c
index 990484776f2d..0c554a7a5ce4 100644
--- a/drivers/net/thunderbolt.c
+++ b/drivers/net/thunderbolt.c
@@ -1005,12 +1005,11 @@ static bool tbnet_xmit_csum_and_map(struct tbnet *net, struct sk_buff *skb,
*tucso = ~csum_tcpudp_magic(ip_hdr(skb)->saddr,
ip_hdr(skb)->daddr, 0,
ip_hdr(skb)->protocol, 0);
- } else if (skb_is_gso_v6(skb)) {
+ } else if (skb_is_gso(skb) && skb_is_gso_v6(skb)) {
tucso = dest + ((void *)&(tcp_hdr(skb)->check) - data);
*tucso = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
&ipv6_hdr(skb)->daddr, 0,
IPPROTO_TCP, 0);
- return false;
} else if (protocol == htons(ETH_P_IPV6)) {
tucso = dest + skb_checksum_start_offset(skb) + skb->csum_offset;
*tucso = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,

View File

@ -0,0 +1,134 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Thu, 7 Sep 2023 16:02:30 +0300
Subject: [PATCH] thunderbolt: Restart XDomain discovery handshake after
failure
Alex reported that after rebooting the other host the peer-to-peer link
does not come up anymore. The reason for this is that the host that was
not rebooted tries to send the UUID request only 10 times according to
the USB4 Inter-Domain spec and gives up if it does not get reply. Then
when the other side is actually ready it cannot get the link established
anymore. The USB4 Inter-Domain spec requires that the discovery protocol
is restarted in that case so implement this now.
Reported-by: Alex Balcanquall <alex@alexbal.com>
Fixes: 8e1de7042596 ("thunderbolt: Add support for XDomain lane bonding")
Cc: stable@vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
drivers/thunderbolt/xdomain.c | 58 +++++++++++++++++++++++++----------
1 file changed, 41 insertions(+), 17 deletions(-)
diff --git a/drivers/thunderbolt/xdomain.c b/drivers/thunderbolt/xdomain.c
index 3c51e47dd86b..0b17a4d4e9b9 100644
--- a/drivers/thunderbolt/xdomain.c
+++ b/drivers/thunderbolt/xdomain.c
@@ -704,6 +704,27 @@ static void update_property_block(struct tb_xdomain *xd)
mutex_unlock(&xdomain_lock);
}
+static void start_handshake(struct tb_xdomain *xd)
+{
+ xd->state = XDOMAIN_STATE_INIT;
+ queue_delayed_work(xd->tb->wq, &xd->state_work,
+ msecs_to_jiffies(XDOMAIN_SHORT_TIMEOUT));
+}
+
+/* Can be called from state_work */
+static void __stop_handshake(struct tb_xdomain *xd)
+{
+ cancel_delayed_work_sync(&xd->properties_changed_work);
+ xd->properties_changed_retries = 0;
+ xd->state_retries = 0;
+}
+
+static void stop_handshake(struct tb_xdomain *xd)
+{
+ cancel_delayed_work_sync(&xd->state_work);
+ __stop_handshake(xd);
+}
+
static void tb_xdp_handle_request(struct work_struct *work)
{
struct xdomain_request_work *xw = container_of(work, typeof(*xw), work);
@@ -766,6 +787,15 @@ static void tb_xdp_handle_request(struct work_struct *work)
case UUID_REQUEST:
tb_dbg(tb, "%llx: received XDomain UUID request\n", route);
ret = tb_xdp_uuid_response(ctl, route, sequence, uuid);
+ /*
+ * If we've stopped the discovery with an error such as
+ * timing out, we will restart the handshake now that we
+ * received UUID request from the remote host.
+ */
+ if (!ret && xd && xd->state == XDOMAIN_STATE_ERROR) {
+ dev_dbg(&xd->dev, "restarting handshake\n");
+ start_handshake(xd);
+ }
break;
case LINK_STATE_STATUS_REQUEST:
@@ -1522,6 +1552,13 @@ static void tb_xdomain_queue_properties_changed(struct tb_xdomain *xd)
msecs_to_jiffies(XDOMAIN_SHORT_TIMEOUT));
}
+static void tb_xdomain_failed(struct tb_xdomain *xd)
+{
+ xd->state = XDOMAIN_STATE_ERROR;
+ queue_delayed_work(xd->tb->wq, &xd->state_work,
+ msecs_to_jiffies(XDOMAIN_DEFAULT_TIMEOUT));
+}
+
static void tb_xdomain_state_work(struct work_struct *work)
{
struct tb_xdomain *xd = container_of(work, typeof(*xd), state_work.work);
@@ -1548,7 +1585,7 @@ static void tb_xdomain_state_work(struct work_struct *work)
if (ret) {
if (ret == -EAGAIN)
goto retry_state;
- xd->state = XDOMAIN_STATE_ERROR;
+ tb_xdomain_failed(xd);
} else {
tb_xdomain_queue_properties_changed(xd);
if (xd->bonding_possible)
@@ -1613,7 +1650,7 @@ static void tb_xdomain_state_work(struct work_struct *work)
if (ret) {
if (ret == -EAGAIN)
goto retry_state;
- xd->state = XDOMAIN_STATE_ERROR;
+ tb_xdomain_failed(xd);
} else {
xd->state = XDOMAIN_STATE_ENUMERATED;
}
@@ -1624,6 +1661,8 @@ static void tb_xdomain_state_work(struct work_struct *work)
break;
case XDOMAIN_STATE_ERROR:
+ dev_dbg(&xd->dev, "discovery failed, stopping handshake\n");
+ __stop_handshake(xd);
break;
default:
@@ -1793,21 +1832,6 @@ static void tb_xdomain_release(struct device *dev)
kfree(xd);
}
-static void start_handshake(struct tb_xdomain *xd)
-{
- xd->state = XDOMAIN_STATE_INIT;
- queue_delayed_work(xd->tb->wq, &xd->state_work,
- msecs_to_jiffies(XDOMAIN_SHORT_TIMEOUT));
-}
-
-static void stop_handshake(struct tb_xdomain *xd)
-{
- cancel_delayed_work_sync(&xd->properties_changed_work);
- cancel_delayed_work_sync(&xd->state_work);
- xd->properties_changed_retries = 0;
- xd->state_retries = 0;
-}
-
static int __maybe_unused tb_xdomain_suspend(struct device *dev)
{
stop_handshake(tb_to_xdomain(dev));

View File

@ -0,0 +1,72 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Sat, 7 Oct 2023 12:57:02 +0200
Subject: [PATCH] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
cause an #UD exception. The performance impact of the fix is negligible.
Reported-by: René Rebe <rene@exactcode.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: René Rebe <rene@exactcode.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
arch/x86/include/asm/msr-index.h | 9 +++++++--
arch/x86/kernel/cpu/amd.c | 8 ++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebbf80d8b8bd..a79b10e57757 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -630,12 +630,17 @@
/* AMD Last Branch Record MSRs */
#define MSR_AMD64_LBR_SELECT 0xc000010e
-/* Fam 17h MSRs */
-#define MSR_F17H_IRPERF 0xc00000e9
+/* Zen4 */
+#define MSR_ZEN4_BP_CFG 0xc001102e
+#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
+/* Zen 2 */
#define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3
#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1)
+/* Fam 17h MSRs */
+#define MSR_F17H_IRPERF 0xc00000e9
+
/* Fam 16h MSRs */
#define MSR_F16H_L2I_PERF_CTL 0xc0010230
#define MSR_F16H_L2I_PERF_CTR 0xc0010231
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index a608a2b78073..154e9c0c16bd 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -80,6 +80,10 @@ static const int amd_div0[] =
AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
+static const int amd_erratum_1485[] =
+ AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
+ AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
+
static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
{
int osvw_id = *erratum++;
@@ -1125,6 +1129,10 @@ static void init_amd(struct cpuinfo_x86 *c)
pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
setup_force_cpu_bug(X86_BUG_DIV0);
}
+
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
+ cpu_has_amd_erratum(c, amd_erratum_1485))
+ msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
}
#ifdef CONFIG_X86_32

View File

@ -0,0 +1,46 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Sterz <s.sterz@proxmox.com>
Date: Wed, 18 Oct 2023 10:45:45 +0200
Subject: [PATCH] Revert "nSVM: Check for reserved encodings of TLB_CONTROL in
nested VMCB"
This reverts commit 174a921b6975ef959dd82ee9e8844067a62e3ec1.
Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
arch/x86/kvm/svm/nested.c | 15 ---------------
1 file changed, 15 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index add65dd59756..61a6c0235519 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -242,18 +242,6 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1);
}
-static bool nested_svm_check_tlb_ctl(struct kvm_vcpu *vcpu, u8 tlb_ctl)
-{
- /* Nested FLUSHBYASID is not supported yet. */
- switch(tlb_ctl) {
- case TLB_CONTROL_DO_NOTHING:
- case TLB_CONTROL_FLUSH_ALL_ASID:
- return true;
- default:
- return false;
- }
-}
-
static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
struct vmcb_ctrl_area_cached *control)
{
@@ -273,9 +261,6 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
IOPM_SIZE)))
return false;
- if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl)))
- return false;
-
return true;
}

View File

@ -0,0 +1,36 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 18 Oct 2023 12:41:04 -0700
Subject: [PATCH] KVM: nSVM: Advertise support for flush-by-ASID
Advertise support for FLUSHBYASID when nested SVM is enabled, as KVM can
always emulate flushing TLB entries for a vmcb12 ASID, e.g. by running L2
with a new, fresh ASID in vmcb02. Some modern hypervisors, e.g. VMWare
Workstation 17, require FLUSHBYASID support and will refuse to run if it's
not present.
Punt on proper support, as "Honor L1's request to flush an ASID on nested
VMRUN" is one of the TODO items in the (incomplete) list of issues that
need to be addressed in order for KVM to NOT do a full TLB flush on every
nested SVM transition (see nested_svm_transition_tlb_flush()).
Reported-by: Stefan Sterz <s.sterz@proxmox.com>
Closes: https://lkml.kernel.org/r/b9915c9c-4cf6-051a-2d91-44cc6380f455%40proxmox.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
arch/x86/kvm/svm/svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cf31babfbbb9..99a7e93b2edf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4920,6 +4920,7 @@ static __init void svm_set_cpu_caps(void)
if (nested) {
kvm_cpu_cap_set(X86_FEATURE_SVM);
kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
+ kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID);
if (nrips)
kvm_cpu_cap_set(X86_FEATURE_NRIPS);

View File

@ -0,0 +1,164 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Sep 2023 17:19:52 -0700
Subject: [PATCH] x86/fpu: Allow caller to constrain xfeatures when copying to
uabi buffer
Plumb an xfeatures mask into __copy_xstate_to_uabi_buf() so that KVM can
constrain which xfeatures are saved into the userspace buffer without
having to modify the user_xfeatures field in KVM's guest_fpu state.
KVM's ABI for KVM_GET_XSAVE{2} is that features that are not exposed to
guest must not show up in the effective xstate_bv field of the buffer.
Saving only the guest-supported xfeatures allows userspace to load the
saved state on a different host with a fewer xfeatures, so long as the
target host supports the xfeatures that are exposed to the guest.
KVM currently sets user_xfeatures directly to restrict KVM_GET_XSAVE{2} to
the set of guest-supported xfeatures, but doing so broke KVM's historical
ABI for KVM_SET_XSAVE, which allows userspace to load any xfeatures that
are supported by the *host*.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 18164f66e6c59fda15c198b371fa008431efdb22)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
arch/x86/include/asm/fpu/api.h | 3 ++-
arch/x86/kernel/fpu/core.c | 5 +++--
arch/x86/kernel/fpu/xstate.c | 7 +++++--
arch/x86/kernel/fpu/xstate.h | 3 ++-
arch/x86/kvm/x86.c | 21 +++++++++------------
5 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b475d9a582b8..e829fa4c6788 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -148,7 +148,8 @@ static inline void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) {
static inline void fpu_sync_guest_vmexit_xfd_state(void) { }
#endif
-extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf, unsigned int size, u32 pkru);
+extern void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf,
+ unsigned int size, u64 xfeatures, u32 pkru);
extern int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, u64 xcr0, u32 *vpkru);
static inline void fpstate_set_confidential(struct fpu_guest *gfpu)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index a083f9ac9e4f..1d190761d00f 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -369,14 +369,15 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
EXPORT_SYMBOL_GPL(fpu_swap_kvm_fpstate);
void fpu_copy_guest_fpstate_to_uabi(struct fpu_guest *gfpu, void *buf,
- unsigned int size, u32 pkru)
+ unsigned int size, u64 xfeatures, u32 pkru)
{
struct fpstate *kstate = gfpu->fpstate;
union fpregs_state *ustate = buf;
struct membuf mb = { .p = buf, .left = size };
if (cpu_feature_enabled(X86_FEATURE_XSAVE)) {
- __copy_xstate_to_uabi_buf(mb, kstate, pkru, XSTATE_COPY_XSAVE);
+ __copy_xstate_to_uabi_buf(mb, kstate, xfeatures, pkru,
+ XSTATE_COPY_XSAVE);
} else {
memcpy(&ustate->fxsave, &kstate->regs.fxsave,
sizeof(ustate->fxsave));
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 1afbc4866b10..463ec0cd0dab 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1053,6 +1053,7 @@ static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
* __copy_xstate_to_uabi_buf - Copy kernel saved xstate to a UABI buffer
* @to: membuf descriptor
* @fpstate: The fpstate buffer from which to copy
+ * @xfeatures: The mask of xfeatures to save (XSAVE mode only)
* @pkru_val: The PKRU value to store in the PKRU component
* @copy_mode: The requested copy mode
*
@@ -1063,7 +1064,8 @@ static void copy_feature(bool from_xstate, struct membuf *to, void *xstate,
* It supports partial copy but @to.pos always starts from zero.
*/
void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
- u32 pkru_val, enum xstate_copy_mode copy_mode)
+ u64 xfeatures, u32 pkru_val,
+ enum xstate_copy_mode copy_mode)
{
const unsigned int off_mxcsr = offsetof(struct fxregs_state, mxcsr);
struct xregs_state *xinit = &init_fpstate.regs.xsave;
@@ -1087,7 +1089,7 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
break;
case XSTATE_COPY_XSAVE:
- header.xfeatures &= fpstate->user_xfeatures;
+ header.xfeatures &= fpstate->user_xfeatures & xfeatures;
break;
}
@@ -1189,6 +1191,7 @@ void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode copy_mode)
{
__copy_xstate_to_uabi_buf(to, tsk->thread.fpu.fpstate,
+ tsk->thread.fpu.fpstate->user_xfeatures,
tsk->thread.pkru, copy_mode);
}
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index a4ecb04d8d64..3518fb26d06b 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -43,7 +43,8 @@ enum xstate_copy_mode {
struct membuf;
extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
- u32 pkru_val, enum xstate_copy_mode copy_mode);
+ u64 xfeatures, u32 pkru_val,
+ enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff92ff41d5ce..a43a950d04cb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5314,26 +5314,23 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
return 0;
}
-static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
- struct kvm_xsave *guest_xsave)
+
+static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
+ u8 *state, unsigned int size)
{
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return;
- fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu,
- guest_xsave->region,
- sizeof(guest_xsave->region),
+ fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
+ vcpu->arch.guest_fpu.fpstate->user_xfeatures,
vcpu->arch.pkru);
}
-static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
- u8 *state, unsigned int size)
+static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
+ struct kvm_xsave *guest_xsave)
{
- if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
- return;
-
- fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu,
- state, size, vcpu->arch.pkru);
+ return kvm_vcpu_ioctl_x86_get_xsave2(vcpu, (void *)guest_xsave->region,
+ sizeof(guest_xsave->region));
}
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,

View File

@ -0,0 +1,119 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 27 Sep 2023 17:19:53 -0700
Subject: [PATCH] KVM: x86: Constrain guest-supported xfeatures only at
KVM_GET_XSAVE{2}
Mask off xfeatures that aren't exposed to the guest only when saving guest
state via KVM_GET_XSAVE{2} instead of modifying user_xfeatures directly.
Preserving the maximal set of xfeatures in user_xfeatures restores KVM's
ABI for KVM_SET_XSAVE, which prior to commit ad856280ddea ("x86/kvm/fpu:
Limit guest user_xfeatures to supported bits of XCR0") allowed userspace
to load xfeatures that are supported by the host, irrespective of what
xfeatures are exposed to the guest.
There is no known use case where userspace *intentionally* loads xfeatures
that aren't exposed to the guest, but the bug fixed by commit ad856280ddea
was specifically that KVM_GET_SAVE{2} would save xfeatures that weren't
exposed to the guest, e.g. would lead to userspace unintentionally loading
guest-unsupported xfeatures when live migrating a VM.
Restricting KVM_SET_XSAVE to guest-supported xfeatures is especially
problematic for QEMU-based setups, as QEMU has a bug where instead of
terminating the VM if KVM_SET_XSAVE fails, QEMU instead simply stops
loading guest state, i.e. resumes the guest after live migration with
incomplete guest state, and ultimately results in guest data corruption.
Note, letting userspace restore all host-supported xfeatures does not fix
setups where a VM is migrated from a host *without* commit ad856280ddea,
to a target with a subset of host-supported xfeatures. However there is
no way to safely address that scenario, e.g. KVM could silently drop the
unsupported features, but that would be a clear violation of KVM's ABI and
so would require userspace to opt-in, at which point userspace could
simply be updated to sanitize the to-be-loaded XSAVE state.
Reported-by: Tyler Stachecki <stachecki.tyler@gmail.com>
Closes: https://lore.kernel.org/all/20230914010003.358162-1-tstachecki@bloomberg.net
Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
Cc: stable@vger.kernel.org
Cc: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Message-Id: <20230928001956.924301-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8647c52e9504c99752a39f1d44f6268f82c40a5c)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
arch/x86/kernel/fpu/xstate.c | 5 +----
arch/x86/kvm/cpuid.c | 8 --------
arch/x86/kvm/x86.c | 18 ++++++++++++++++--
3 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 463ec0cd0dab..ebe698f8af73 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1543,10 +1543,7 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize,
fpregs_restore_userregs();
newfps->xfeatures = curfps->xfeatures | xfeatures;
-
- if (!guest_fpu)
- newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
-
+ newfps->user_xfeatures = curfps->user_xfeatures | xfeatures;
newfps->xfd = curfps->xfd & ~xfeatures;
/* Do the final updates within the locked region */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 61aefeb3fdbc..e5393ee652ba 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -350,14 +350,6 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
vcpu->arch.guest_supported_xcr0 =
cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
- /*
- * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
- * XSAVE/XCRO are not exposed to the guest, and even if XSAVE isn't
- * supported by the host.
- */
- vcpu->arch.guest_fpu.fpstate->user_xfeatures = vcpu->arch.guest_supported_xcr0 |
- XFEATURE_MASK_FPSSE;
-
kvm_update_pv_runtime(vcpu);
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a43a950d04cb..a4a44adf7c72 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5318,12 +5318,26 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
static void kvm_vcpu_ioctl_x86_get_xsave2(struct kvm_vcpu *vcpu,
u8 *state, unsigned int size)
{
+ /*
+ * Only copy state for features that are enabled for the guest. The
+ * state itself isn't problematic, but setting bits in the header for
+ * features that are supported in *this* host but not exposed to the
+ * guest can result in KVM_SET_XSAVE failing when live migrating to a
+ * compatible host without the features that are NOT exposed to the
+ * guest.
+ *
+ * FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
+ * XSAVE/XCRO are not exposed to the guest, and even if XSAVE isn't
+ * supported by the host.
+ */
+ u64 supported_xcr0 = vcpu->arch.guest_supported_xcr0 |
+ XFEATURE_MASK_FPSSE;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return;
fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, state, size,
- vcpu->arch.guest_fpu.fpstate->user_xfeatures,
- vcpu->arch.pkru);
+ supported_xcr0, vcpu->arch.pkru);
}
static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,

@ -1 +1 @@
Subproject commit d3bf8f1da09634a8b4e661d022702ead6c0dadfb
Subproject commit 7afee6a065ac401acfe0624f7e60e9fcd08f903c

@ -1 +1 @@
Subproject commit b0c4d8ac0b3c0d7de51130b2fd1f3a63efda832c
Subproject commit d5320c35ef466d28f470fd92fb8c4ce169a3defe