cherry-pick upstream fixes for 7.0.0
coming in via qemu-stable (except for the vdmk fix, which was tagged for-7.0 on the qemu-devel list, but didn't make it into the release). Also took the chance to switch the gluster fix to the version that made it into upstream. Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This commit is contained in:
		
							parent
							
								
									eba403aafc
								
							
						
					
					
						commit
						14ed554660
					
				| @ -1,38 +0,0 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Fabian Ebner <f.ebner@proxmox.com> | ||||
| Date: Fri, 6 May 2022 14:38:35 +0200 | ||||
| Subject: [PATCH] block/gluster: correctly set max_pdiscard which is int64_t | ||||
| 
 | ||||
| Previously, max_pdiscard would be zero in the following assertion: | ||||
| qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion | ||||
| `max_pdiscard >= bs->bl.request_alignment' failed. | ||||
| 
 | ||||
| Fixes: 0c8022876f ("block: use int64_t instead of int in driver discard handlers") | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com> | ||||
| ---
 | ||||
|  block/gluster.c | 4 ++-- | ||||
|  1 file changed, 2 insertions(+), 2 deletions(-) | ||||
| 
 | ||||
| diff --git a/block/gluster.c b/block/gluster.c
 | ||||
| index 398976bc66..592e71b22a 100644
 | ||||
| --- a/block/gluster.c
 | ||||
| +++ b/block/gluster.c
 | ||||
| @@ -891,7 +891,7 @@ out:
 | ||||
|  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) | ||||
|  { | ||||
|      bs->bl.max_transfer = GLUSTER_MAX_TRANSFER; | ||||
| -    bs->bl.max_pdiscard = SIZE_MAX;
 | ||||
| +    bs->bl.max_pdiscard = INT64_MAX;
 | ||||
|  } | ||||
|   | ||||
|  static int qemu_gluster_reopen_prepare(BDRVReopenState *state, | ||||
| @@ -1304,7 +1304,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
 | ||||
|      GlusterAIOCB acb; | ||||
|      BDRVGlusterState *s = bs->opaque; | ||||
|   | ||||
| -    assert(bytes <= SIZE_MAX); /* rely on max_pdiscard */
 | ||||
| +    assert(bytes <= INT64_MAX); /* rely on max_pdiscard */
 | ||||
|   | ||||
|      acb.size = 0; | ||||
|      acb.ret = 0; | ||||
							
								
								
									
										47
									
								
								debian/patches/extra/0002-block-gluster-correctly-set-max_pdiscard.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										47
									
								
								debian/patches/extra/0002-block-gluster-correctly-set-max_pdiscard.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,47 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Fabian Ebner <f.ebner@proxmox.com> | ||||
| Date: Fri, 20 May 2022 09:59:22 +0200 | ||||
| Subject: [PATCH] block/gluster: correctly set max_pdiscard | ||||
| 
 | ||||
| On 64-bit platforms, assigning SIZE_MAX to the int64_t max_pdiscard | ||||
| results in a negative value, and the following assertion would trigger | ||||
| down the line (it's not the same max_pdiscard, but computed from the | ||||
| other one): | ||||
| qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion | ||||
| `max_pdiscard >= bs->bl.request_alignment' failed. | ||||
| 
 | ||||
| On 32-bit platforms, it's fine to keep using SIZE_MAX. | ||||
| 
 | ||||
| The assertion in qemu_gluster_co_pdiscard() is checking that the value | ||||
| of 'bytes' can safely be passed to glfs_discard_async(), which takes a | ||||
| size_t for the argument in question, so it is kept as is. And since | ||||
| max_pdiscard is still <= SIZE_MAX, relying on max_pdiscard is still | ||||
| fine. | ||||
| 
 | ||||
| Fixes: 0c8022876f ("block: use int64_t instead of int in driver discard handlers") | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| Message-Id: <20220520075922.43972-1-f.ebner@proxmox.com> | ||||
| Reviewed-by: Eric Blake <eblake@redhat.com> | ||||
| Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> | ||||
| Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit 9b38fc56c054c7de65fa3bf7cdd82b32654f6b7d) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  block/gluster.c | 2 +- | ||||
|  1 file changed, 1 insertion(+), 1 deletion(-) | ||||
| 
 | ||||
| diff --git a/block/gluster.c b/block/gluster.c
 | ||||
| index 80b75cb96c..1079b6186b 100644
 | ||||
| --- a/block/gluster.c
 | ||||
| +++ b/block/gluster.c
 | ||||
| @@ -901,7 +901,7 @@ out:
 | ||||
|  static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp) | ||||
|  { | ||||
|      bs->bl.max_transfer = GLUSTER_MAX_TRANSFER; | ||||
| -    bs->bl.max_pdiscard = SIZE_MAX;
 | ||||
| +    bs->bl.max_pdiscard = MIN(SIZE_MAX, INT64_MAX);
 | ||||
|  } | ||||
|   | ||||
|  static int qemu_gluster_reopen_prepare(BDRVReopenState *state, | ||||
							
								
								
									
										129
									
								
								debian/patches/extra/0003-block-vmdk-Fix-reopening-bs-file.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										129
									
								
								debian/patches/extra/0003-block-vmdk-Fix-reopening-bs-file.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,129 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Hanna Reitz <hreitz@redhat.com> | ||||
| Date: Mon, 14 Mar 2022 17:27:18 +0100 | ||||
| Subject: [PATCH] block/vmdk: Fix reopening bs->file | ||||
| 
 | ||||
| VMDK disk data is stored in extents, which may or may not be separate | ||||
| from bs->file.  VmdkExtent.file points to where they are stored.  Each | ||||
| that is stored in bs->file will simply reuse the exact pointer value of | ||||
| bs->file. | ||||
| 
 | ||||
| (That is why vmdk_free_extents() will unref VmdkExtent.file (e->file) | ||||
| only if e->file != bs->file.) | ||||
| 
 | ||||
| Reopen operations can change bs->file (they will replace the whole | ||||
| BdrvChild object, not just the BDS stored in that BdrvChild), and then | ||||
| we will need to change all .file pointers of all such VmdkExtents to | ||||
| point to the new BdrvChild. | ||||
| 
 | ||||
| In vmdk_reopen_prepare(), we have to check which VmdkExtents are | ||||
| affected, and in vmdk_reopen_commit(), we can modify them.  We have to | ||||
| split this because: | ||||
| - The new BdrvChild is created only after prepare, so we can change
 | ||||
|   VmdkExtent.file only in commit | ||||
| - In commit, there no longer is any (valid) reference to the old
 | ||||
|   BdrvChild object, so there would be nothing to compare VmdkExtent.file | ||||
|   against to see whether it was equal to bs->file before reopening | ||||
|   (There is BDRVReopenState.old_file_bs, but the old bs->file | ||||
|   BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is | ||||
|   created, and so we cannot compare VmdkExtent.file->bs against | ||||
|   BDRVReopenState.old_file_bs) | ||||
| 
 | ||||
| Signed-off-by: Hanna Reitz <hreitz@redhat.com> | ||||
| Message-Id: <20220314162719.65384-2-hreitz@redhat.com> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit 6d17e2879854d7d0e623c06a9286085e97bf3545) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  block/vmdk.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++- | ||||
|  1 file changed, 55 insertions(+), 1 deletion(-) | ||||
| 
 | ||||
| diff --git a/block/vmdk.c b/block/vmdk.c
 | ||||
| index 37c0946066..38e5ab3806 100644
 | ||||
| --- a/block/vmdk.c
 | ||||
| +++ b/block/vmdk.c
 | ||||
| @@ -178,6 +178,10 @@ typedef struct BDRVVmdkState {
 | ||||
|      char *create_type; | ||||
|  } BDRVVmdkState; | ||||
|   | ||||
| +typedef struct BDRVVmdkReopenState {
 | ||||
| +    bool *extents_using_bs_file;
 | ||||
| +} BDRVVmdkReopenState;
 | ||||
| +
 | ||||
|  typedef struct VmdkMetaData { | ||||
|      unsigned int l1_index; | ||||
|      unsigned int l2_index; | ||||
| @@ -400,15 +404,63 @@ static int vmdk_is_cid_valid(BlockDriverState *bs)
 | ||||
|      return 1; | ||||
|  } | ||||
|   | ||||
| -/* We have nothing to do for VMDK reopen, stubs just return success */
 | ||||
|  static int vmdk_reopen_prepare(BDRVReopenState *state, | ||||
|                                 BlockReopenQueue *queue, Error **errp) | ||||
|  { | ||||
| +    BDRVVmdkState *s;
 | ||||
| +    BDRVVmdkReopenState *rs;
 | ||||
| +    int i;
 | ||||
| +
 | ||||
|      assert(state != NULL); | ||||
|      assert(state->bs != NULL); | ||||
| +    assert(state->opaque == NULL);
 | ||||
| +
 | ||||
| +    s = state->bs->opaque;
 | ||||
| +
 | ||||
| +    rs = g_new0(BDRVVmdkReopenState, 1);
 | ||||
| +    state->opaque = rs;
 | ||||
| +
 | ||||
| +    /*
 | ||||
| +     * Check whether there are any extents stored in bs->file; if bs->file
 | ||||
| +     * changes, we will need to update their .file pointers to follow suit
 | ||||
| +     */
 | ||||
| +    rs->extents_using_bs_file = g_new(bool, s->num_extents);
 | ||||
| +    for (i = 0; i < s->num_extents; i++) {
 | ||||
| +        rs->extents_using_bs_file[i] = s->extents[i].file == state->bs->file;
 | ||||
| +    }
 | ||||
| +
 | ||||
|      return 0; | ||||
|  } | ||||
|   | ||||
| +static void vmdk_reopen_clean(BDRVReopenState *state)
 | ||||
| +{
 | ||||
| +    BDRVVmdkReopenState *rs = state->opaque;
 | ||||
| +
 | ||||
| +    g_free(rs->extents_using_bs_file);
 | ||||
| +    g_free(rs);
 | ||||
| +    state->opaque = NULL;
 | ||||
| +}
 | ||||
| +
 | ||||
| +static void vmdk_reopen_commit(BDRVReopenState *state)
 | ||||
| +{
 | ||||
| +    BDRVVmdkState *s = state->bs->opaque;
 | ||||
| +    BDRVVmdkReopenState *rs = state->opaque;
 | ||||
| +    int i;
 | ||||
| +
 | ||||
| +    for (i = 0; i < s->num_extents; i++) {
 | ||||
| +        if (rs->extents_using_bs_file[i]) {
 | ||||
| +            s->extents[i].file = state->bs->file;
 | ||||
| +        }
 | ||||
| +    }
 | ||||
| +
 | ||||
| +    vmdk_reopen_clean(state);
 | ||||
| +}
 | ||||
| +
 | ||||
| +static void vmdk_reopen_abort(BDRVReopenState *state)
 | ||||
| +{
 | ||||
| +    vmdk_reopen_clean(state);
 | ||||
| +}
 | ||||
| +
 | ||||
|  static int vmdk_parent_open(BlockDriverState *bs) | ||||
|  { | ||||
|      char *p_name; | ||||
| @@ -3072,6 +3124,8 @@ static BlockDriver bdrv_vmdk = {
 | ||||
|      .bdrv_open                    = vmdk_open, | ||||
|      .bdrv_co_check                = vmdk_co_check, | ||||
|      .bdrv_reopen_prepare          = vmdk_reopen_prepare, | ||||
| +    .bdrv_reopen_commit           = vmdk_reopen_commit,
 | ||||
| +    .bdrv_reopen_abort            = vmdk_reopen_abort,
 | ||||
|      .bdrv_child_perm              = bdrv_default_perms, | ||||
|      .bdrv_co_preadv               = vmdk_co_preadv, | ||||
|      .bdrv_co_pwritev              = vmdk_co_pwritev, | ||||
							
								
								
									
										44
									
								
								debian/patches/extra/0004-linux-aio-fix-unbalanced-plugged-counter-in-laio_io_.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										44
									
								
								debian/patches/extra/0004-linux-aio-fix-unbalanced-plugged-counter-in-laio_io_.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,44 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Date: Thu, 9 Jun 2022 17:47:11 +0100 | ||||
| Subject: [PATCH] linux-aio: fix unbalanced plugged counter in laio_io_unplug() | ||||
| 
 | ||||
| Every laio_io_plug() call has a matching laio_io_unplug() call. There is | ||||
| a plugged counter that tracks the number of levels of plugging and | ||||
| allows for nesting. | ||||
| 
 | ||||
| The plugged counter must reflect the balance between laio_io_plug() and | ||||
| laio_io_unplug() calls accurately. Otherwise I/O stalls occur since | ||||
| io_submit(2) calls are skipped while plugged. | ||||
| 
 | ||||
| Reported-by: Nikolay Tenev <nt@storpool.com> | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> | ||||
| Message-id: 20220609164712.1539045-2-stefanha@redhat.com | ||||
| Cc: Stefano Garzarella <sgarzare@redhat.com> | ||||
| Fixes: 68d7946648 ("linux-aio: add `dev_max_batch` parameter to laio_io_unplug()") | ||||
| [Stefano Garzarella suggested adding a Fixes tag. | ||||
| --Stefan]
 | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| (cherry-picked from commit f387cac5af030a58ac5a0dacf64cab5e5a4fe5c7) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  block/linux-aio.c | 4 +++- | ||||
|  1 file changed, 3 insertions(+), 1 deletion(-) | ||||
| 
 | ||||
| diff --git a/block/linux-aio.c b/block/linux-aio.c
 | ||||
| index 4c423fcccf..6078da7e42 100644
 | ||||
| --- a/block/linux-aio.c
 | ||||
| +++ b/block/linux-aio.c
 | ||||
| @@ -363,8 +363,10 @@ void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s,
 | ||||
|                      uint64_t dev_max_batch) | ||||
|  { | ||||
|      assert(s->io_q.plugged); | ||||
| +    s->io_q.plugged--;
 | ||||
| +
 | ||||
|      if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) || | ||||
| -        (--s->io_q.plugged == 0 &&
 | ||||
| +        (!s->io_q.plugged &&
 | ||||
|           !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) { | ||||
|          ioq_submit(s); | ||||
|      } | ||||
							
								
								
									
										100
									
								
								debian/patches/extra/0005-pci-fix-overflow-in-snprintf-string-formatting.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										100
									
								
								debian/patches/extra/0005-pci-fix-overflow-in-snprintf-string-formatting.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,100 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Claudio Fontana <cfontana@suse.de> | ||||
| Date: Tue, 31 May 2022 13:47:07 +0200 | ||||
| Subject: [PATCH] pci: fix overflow in snprintf string formatting | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| the code in pcibus_get_fw_dev_path contained the potential for a | ||||
| stack buffer overflow of 1 byte, potentially writing to the stack an | ||||
| extra NUL byte. | ||||
| 
 | ||||
| This overflow could happen if the PCI slot is >= 0x10000000, | ||||
| and the PCI function is >= 0x10000000, due to the size parameter | ||||
| of snprintf being incorrectly calculated in the call: | ||||
| 
 | ||||
|     if (PCI_FUNC(d->devfn)) | ||||
|         snprintf(path + off, sizeof(path) + off, ",%x", PCI_FUNC(d->devfn)); | ||||
| 
 | ||||
| since the off obtained from a previous call to snprintf is added | ||||
| instead of subtracted from the total available size of the buffer. | ||||
| 
 | ||||
| Without the accurate size guard from snprintf, we end up writing in the | ||||
| worst case: | ||||
| 
 | ||||
| name (32) + "@" (1) + SLOT (8) + "," (1) + FUNC (8) + term NUL (1) = 51 bytes | ||||
| 
 | ||||
| In order to provide something more robust, replace all of the code in | ||||
| pcibus_get_fw_dev_path with a single call to g_strdup_printf, | ||||
| so there is no need to rely on manual calculations. | ||||
| 
 | ||||
| Found by compiling QEMU with FORTIFY_SOURCE=3 as the error: | ||||
| 
 | ||||
| *** buffer overflow detected ***: terminated | ||||
| 
 | ||||
| Thread 1 "qemu-system-x86" received signal SIGABRT, Aborted. | ||||
| [Switching to Thread 0x7ffff642c380 (LWP 121307)] | ||||
| 0x00007ffff71ff55c in __pthread_kill_implementation () from /lib64/libc.so.6 | ||||
| (gdb) bt | ||||
|  #0  0x00007ffff71ff55c in __pthread_kill_implementation () at /lib64/libc.so.6 | ||||
|  #1  0x00007ffff71ac6f6 in raise () at /lib64/libc.so.6 | ||||
|  #2  0x00007ffff7195814 in abort () at /lib64/libc.so.6 | ||||
|  #3  0x00007ffff71f279e in __libc_message () at /lib64/libc.so.6 | ||||
|  #4  0x00007ffff729767a in __fortify_fail () at /lib64/libc.so.6 | ||||
|  #5  0x00007ffff7295c36 in  () at /lib64/libc.so.6 | ||||
|  #6  0x00007ffff72957f5 in __snprintf_chk () at /lib64/libc.so.6 | ||||
|  #7  0x0000555555b1c1fd in pcibus_get_fw_dev_path () | ||||
|  #8  0x0000555555f2bde4 in qdev_get_fw_dev_path_helper.constprop () | ||||
|  #9  0x0000555555f2bd86 in qdev_get_fw_dev_path_helper.constprop () | ||||
|  #10 0x00005555559a6e5d in get_boot_device_path () | ||||
|  #11 0x00005555559a712c in get_boot_devices_list () | ||||
|  #12 0x0000555555b1a3d0 in fw_cfg_machine_reset () | ||||
|  #13 0x0000555555bf4c2d in pc_machine_reset () | ||||
|  #14 0x0000555555c66988 in qemu_system_reset () | ||||
|  #15 0x0000555555a6dff6 in qdev_machine_creation_done () | ||||
|  #16 0x0000555555c79186 in qmp_x_exit_preconfig.part () | ||||
|  #17 0x0000555555c7b459 in qemu_init () | ||||
|  #18 0x0000555555960a29 in main () | ||||
| 
 | ||||
| Found-by: Dario Faggioli <Dario Faggioli <dfaggioli@suse.com> | ||||
| Found-by: Martin Liška <martin.liska@suse.com> | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Signed-off-by: Claudio Fontana <cfontana@suse.de> | ||||
| Message-Id: <20220531114707.18830-1-cfontana@suse.de> | ||||
| Reviewed-by: Ani Sinha <ani@anisinha.ca> | ||||
| (cherry-picked from commit 36f18c6989a3d1ff1d7a0e50b0868ef3958299b4) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/pci/pci.c | 18 +++++++++--------- | ||||
|  1 file changed, 9 insertions(+), 9 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 | ||||
| index dae9119bfe..c69b412434 100644
 | ||||
| --- a/hw/pci/pci.c
 | ||||
| +++ b/hw/pci/pci.c
 | ||||
| @@ -2625,15 +2625,15 @@ static char *pci_dev_fw_name(DeviceState *dev, char *buf, int len)
 | ||||
|  static char *pcibus_get_fw_dev_path(DeviceState *dev) | ||||
|  { | ||||
|      PCIDevice *d = (PCIDevice *)dev; | ||||
| -    char path[50], name[33];
 | ||||
| -    int off;
 | ||||
| -
 | ||||
| -    off = snprintf(path, sizeof(path), "%s@%x",
 | ||||
| -                   pci_dev_fw_name(dev, name, sizeof name),
 | ||||
| -                   PCI_SLOT(d->devfn));
 | ||||
| -    if (PCI_FUNC(d->devfn))
 | ||||
| -        snprintf(path + off, sizeof(path) + off, ",%x", PCI_FUNC(d->devfn));
 | ||||
| -    return g_strdup(path);
 | ||||
| +    char name[33];
 | ||||
| +    int has_func = !!PCI_FUNC(d->devfn);
 | ||||
| +
 | ||||
| +    return g_strdup_printf("%s@%x%s%.*x",
 | ||||
| +                           pci_dev_fw_name(dev, name, sizeof(name)),
 | ||||
| +                           PCI_SLOT(d->devfn),
 | ||||
| +                           has_func ? "," : "",
 | ||||
| +                           has_func,
 | ||||
| +                           PCI_FUNC(d->devfn));
 | ||||
|  } | ||||
|   | ||||
|  static char *pcibus_get_dev_path(DeviceState *dev) | ||||
							
								
								
									
										48
									
								
								debian/patches/extra/0006-target-i386-kvm-Fix-disabling-MPX-on-cpu-host-with-M.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										48
									
								
								debian/patches/extra/0006-target-i386-kvm-Fix-disabling-MPX-on-cpu-host-with-M.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,48 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> | ||||
| Date: Mon, 23 May 2022 18:26:58 +0200 | ||||
| Subject: [PATCH] target/i386/kvm: Fix disabling MPX on "-cpu host" with | ||||
|  MPX-capable host | ||||
| 
 | ||||
| Since KVM commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled") | ||||
| it is not possible to disable MPX on a "-cpu host" just by adding "-mpx" | ||||
| there if the host CPU does indeed support MPX. | ||||
| QEMU will fail to set MSR_IA32_VMX_TRUE_{EXIT,ENTRY}_CTLS MSRs in this case | ||||
| and so trigger an assertion failure. | ||||
| 
 | ||||
| Instead, besides "-mpx" one has to explicitly add also | ||||
| "-vmx-exit-clear-bndcfgs" and "-vmx-entry-load-bndcfgs" to QEMU command | ||||
| line to make it work, which is a bit convoluted. | ||||
| 
 | ||||
| Make the MPX-related bits in FEAT_VMX_{EXIT,ENTRY}_CTLS dependent on MPX | ||||
| being actually enabled so such workarounds are no longer necessary. | ||||
| 
 | ||||
| Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> | ||||
| Message-Id: <51aa2125c76363204cc23c27165e778097c33f0b.1653323077.git.maciej.szmigiero@oracle.com> | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> | ||||
| (cherry-picked from commit 267b5e7e378afd260004cb37a66a6fcd641e3b53) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  target/i386/cpu.c | 8 ++++++++ | ||||
|  1 file changed, 8 insertions(+) | ||||
| 
 | ||||
| diff --git a/target/i386/cpu.c b/target/i386/cpu.c
 | ||||
| index cb6b5467d0..6e6945139b 100644
 | ||||
| --- a/target/i386/cpu.c
 | ||||
| +++ b/target/i386/cpu.c
 | ||||
| @@ -1327,6 +1327,14 @@ static FeatureDep feature_dependencies[] = {
 | ||||
|          .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_INVPCID }, | ||||
|          .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_ENABLE_INVPCID }, | ||||
|      }, | ||||
| +    {
 | ||||
| +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_MPX },
 | ||||
| +        .to = { FEAT_VMX_EXIT_CTLS,         VMX_VM_EXIT_CLEAR_BNDCFGS },
 | ||||
| +    },
 | ||||
| +    {
 | ||||
| +        .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_MPX },
 | ||||
| +        .to = { FEAT_VMX_ENTRY_CTLS,        VMX_VM_ENTRY_LOAD_BNDCFGS },
 | ||||
| +    },
 | ||||
|      { | ||||
|          .from = { FEAT_7_0_EBX,             CPUID_7_0_EBX_RDSEED }, | ||||
|          .to = { FEAT_VMX_SECONDARY_CTLS,    VMX_SECONDARY_EXEC_RDSEED_EXITING }, | ||||
							
								
								
									
										121
									
								
								debian/patches/extra/0007-coroutine-ucontext-use-QEMU_DEFINE_STATIC_CO_TLS.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										121
									
								
								debian/patches/extra/0007-coroutine-ucontext-use-QEMU_DEFINE_STATIC_CO_TLS.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,121 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Date: Mon, 7 Mar 2022 15:38:51 +0000 | ||||
| Subject: [PATCH] coroutine-ucontext: use QEMU_DEFINE_STATIC_CO_TLS() | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Thread-Local Storage variables cannot be used directly from coroutine | ||||
| code because the compiler may optimize TLS variable accesses across | ||||
| qemu_coroutine_yield() calls. When the coroutine is re-entered from | ||||
| another thread the TLS variables from the old thread must no longer be | ||||
| used. | ||||
| 
 | ||||
| Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. | ||||
| 
 | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Message-Id: <20220307153853.602859-2-stefanha@redhat.com> | ||||
| Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit 34145a307d849d0b6734d0222a7aa0bb9eef7407) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  util/coroutine-ucontext.c | 38 ++++++++++++++++++++++++-------------- | ||||
|  1 file changed, 24 insertions(+), 14 deletions(-) | ||||
| 
 | ||||
| diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
 | ||||
| index 904b375192..127d5a13c8 100644
 | ||||
| --- a/util/coroutine-ucontext.c
 | ||||
| +++ b/util/coroutine-ucontext.c
 | ||||
| @@ -25,6 +25,7 @@
 | ||||
|  #include "qemu/osdep.h" | ||||
|  #include <ucontext.h> | ||||
|  #include "qemu/coroutine_int.h" | ||||
| +#include "qemu/coroutine-tls.h"
 | ||||
|   | ||||
|  #ifdef CONFIG_VALGRIND_H | ||||
|  #include <valgrind/valgrind.h> | ||||
| @@ -66,8 +67,8 @@ typedef struct {
 | ||||
|  /** | ||||
|   * Per-thread coroutine bookkeeping | ||||
|   */ | ||||
| -static __thread CoroutineUContext leader;
 | ||||
| -static __thread Coroutine *current;
 | ||||
| +QEMU_DEFINE_STATIC_CO_TLS(Coroutine *, current);
 | ||||
| +QEMU_DEFINE_STATIC_CO_TLS(CoroutineUContext, leader);
 | ||||
|   | ||||
|  /* | ||||
|   * va_args to makecontext() must be type 'int', so passing | ||||
| @@ -97,14 +98,15 @@ static inline __attribute__((always_inline))
 | ||||
|  void finish_switch_fiber(void *fake_stack_save) | ||||
|  { | ||||
|  #ifdef CONFIG_ASAN | ||||
| +    CoroutineUContext *leaderp = get_ptr_leader();
 | ||||
|      const void *bottom_old; | ||||
|      size_t size_old; | ||||
|   | ||||
|      __sanitizer_finish_switch_fiber(fake_stack_save, &bottom_old, &size_old); | ||||
|   | ||||
| -    if (!leader.stack) {
 | ||||
| -        leader.stack = (void *)bottom_old;
 | ||||
| -        leader.stack_size = size_old;
 | ||||
| +    if (!leaderp->stack) {
 | ||||
| +        leaderp->stack = (void *)bottom_old;
 | ||||
| +        leaderp->stack_size = size_old;
 | ||||
|      } | ||||
|  #endif | ||||
|  #ifdef CONFIG_TSAN | ||||
| @@ -161,8 +163,10 @@ static void coroutine_trampoline(int i0, int i1)
 | ||||
|   | ||||
|      /* Initialize longjmp environment and switch back the caller */ | ||||
|      if (!sigsetjmp(self->env, 0)) { | ||||
| -        start_switch_fiber_asan(COROUTINE_YIELD, &fake_stack_save, leader.stack,
 | ||||
| -                                leader.stack_size);
 | ||||
| +        CoroutineUContext *leaderp = get_ptr_leader();
 | ||||
| +
 | ||||
| +        start_switch_fiber_asan(COROUTINE_YIELD, &fake_stack_save,
 | ||||
| +                                leaderp->stack, leaderp->stack_size);
 | ||||
|          start_switch_fiber_tsan(&fake_stack_save, self, true); /* true=caller */ | ||||
|          siglongjmp(*(sigjmp_buf *)co->entry_arg, 1); | ||||
|      } | ||||
| @@ -297,7 +301,7 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
 | ||||
|      int ret; | ||||
|      void *fake_stack_save = NULL; | ||||
|   | ||||
| -    current = to_;
 | ||||
| +    set_current(to_);
 | ||||
|   | ||||
|      ret = sigsetjmp(from->env, 0); | ||||
|      if (ret == 0) { | ||||
| @@ -315,18 +319,24 @@ qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
 | ||||
|   | ||||
|  Coroutine *qemu_coroutine_self(void) | ||||
|  { | ||||
| -    if (!current) {
 | ||||
| -        current = &leader.base;
 | ||||
| +    Coroutine *self = get_current();
 | ||||
| +    CoroutineUContext *leaderp = get_ptr_leader();
 | ||||
| +
 | ||||
| +    if (!self) {
 | ||||
| +        self = &leaderp->base;
 | ||||
| +        set_current(self);
 | ||||
|      } | ||||
|  #ifdef CONFIG_TSAN | ||||
| -    if (!leader.tsan_co_fiber) {
 | ||||
| -        leader.tsan_co_fiber = __tsan_get_current_fiber();
 | ||||
| +    if (!leaderp->tsan_co_fiber) {
 | ||||
| +        leaderp->tsan_co_fiber = __tsan_get_current_fiber();
 | ||||
|      } | ||||
|  #endif | ||||
| -    return current;
 | ||||
| +    return self;
 | ||||
|  } | ||||
|   | ||||
|  bool qemu_in_coroutine(void) | ||||
|  { | ||||
| -    return current && current->caller;
 | ||||
| +    Coroutine *self = get_current();
 | ||||
| +
 | ||||
| +    return self && self->caller;
 | ||||
|  } | ||||
							
								
								
									
										123
									
								
								debian/patches/extra/0008-coroutine-use-QEMU_DEFINE_STATIC_CO_TLS.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										123
									
								
								debian/patches/extra/0008-coroutine-use-QEMU_DEFINE_STATIC_CO_TLS.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,123 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Date: Mon, 7 Mar 2022 15:38:52 +0000 | ||||
| Subject: [PATCH] coroutine: use QEMU_DEFINE_STATIC_CO_TLS() | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Thread-Local Storage variables cannot be used directly from coroutine | ||||
| code because the compiler may optimize TLS variable accesses across | ||||
| qemu_coroutine_yield() calls. When the coroutine is re-entered from | ||||
| another thread the TLS variables from the old thread must no longer be | ||||
| used. | ||||
| 
 | ||||
| Use QEMU_DEFINE_STATIC_CO_TLS() for the current and leader variables. | ||||
| The alloc_pool QSLIST needs a typedef so the return value of | ||||
| get_ptr_alloc_pool() can be stored in a local variable. | ||||
| 
 | ||||
| One example of why this code is necessary: a coroutine that yields | ||||
| before calling qemu_coroutine_create() to create another coroutine is | ||||
| affected by the TLS issue. | ||||
| 
 | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Message-Id: <20220307153853.602859-3-stefanha@redhat.com> | ||||
| Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit ac387a08a9c9f6b36757da912f0339c25f421f90) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  util/qemu-coroutine.c | 41 ++++++++++++++++++++++++----------------- | ||||
|  1 file changed, 24 insertions(+), 17 deletions(-) | ||||
| 
 | ||||
| diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
 | ||||
| index c03b2422ff..f3e8300c8d 100644
 | ||||
| --- a/util/qemu-coroutine.c
 | ||||
| +++ b/util/qemu-coroutine.c
 | ||||
| @@ -18,6 +18,7 @@
 | ||||
|  #include "qemu/atomic.h" | ||||
|  #include "qemu/coroutine.h" | ||||
|  #include "qemu/coroutine_int.h" | ||||
| +#include "qemu/coroutine-tls.h"
 | ||||
|  #include "block/aio.h" | ||||
|   | ||||
|  /** Initial batch size is 64, and is increased on demand */ | ||||
| @@ -29,17 +30,20 @@ enum {
 | ||||
|  static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool); | ||||
|  static unsigned int pool_batch_size = POOL_INITIAL_BATCH_SIZE; | ||||
|  static unsigned int release_pool_size; | ||||
| -static __thread QSLIST_HEAD(, Coroutine) alloc_pool = QSLIST_HEAD_INITIALIZER(pool);
 | ||||
| -static __thread unsigned int alloc_pool_size;
 | ||||
| -static __thread Notifier coroutine_pool_cleanup_notifier;
 | ||||
| +
 | ||||
| +typedef QSLIST_HEAD(, Coroutine) CoroutineQSList;
 | ||||
| +QEMU_DEFINE_STATIC_CO_TLS(CoroutineQSList, alloc_pool);
 | ||||
| +QEMU_DEFINE_STATIC_CO_TLS(unsigned int, alloc_pool_size);
 | ||||
| +QEMU_DEFINE_STATIC_CO_TLS(Notifier, coroutine_pool_cleanup_notifier);
 | ||||
|   | ||||
|  static void coroutine_pool_cleanup(Notifier *n, void *value) | ||||
|  { | ||||
|      Coroutine *co; | ||||
|      Coroutine *tmp; | ||||
| +    CoroutineQSList *alloc_pool = get_ptr_alloc_pool();
 | ||||
|   | ||||
| -    QSLIST_FOREACH_SAFE(co, &alloc_pool, pool_next, tmp) {
 | ||||
| -        QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
 | ||||
| +    QSLIST_FOREACH_SAFE(co, alloc_pool, pool_next, tmp) {
 | ||||
| +        QSLIST_REMOVE_HEAD(alloc_pool, pool_next);
 | ||||
|          qemu_coroutine_delete(co); | ||||
|      } | ||||
|  } | ||||
| @@ -49,27 +53,30 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
 | ||||
|      Coroutine *co = NULL; | ||||
|   | ||||
|      if (CONFIG_COROUTINE_POOL) { | ||||
| -        co = QSLIST_FIRST(&alloc_pool);
 | ||||
| +        CoroutineQSList *alloc_pool = get_ptr_alloc_pool();
 | ||||
| +
 | ||||
| +        co = QSLIST_FIRST(alloc_pool);
 | ||||
|          if (!co) { | ||||
|              if (release_pool_size > qatomic_read(&pool_batch_size)) { | ||||
|                  /* Slow path; a good place to register the destructor, too.  */ | ||||
| -                if (!coroutine_pool_cleanup_notifier.notify) {
 | ||||
| -                    coroutine_pool_cleanup_notifier.notify = coroutine_pool_cleanup;
 | ||||
| -                    qemu_thread_atexit_add(&coroutine_pool_cleanup_notifier);
 | ||||
| +                Notifier *notifier = get_ptr_coroutine_pool_cleanup_notifier();
 | ||||
| +                if (!notifier->notify) {
 | ||||
| +                    notifier->notify = coroutine_pool_cleanup;
 | ||||
| +                    qemu_thread_atexit_add(notifier);
 | ||||
|                  } | ||||
|   | ||||
|                  /* This is not exact; there could be a little skew between | ||||
|                   * release_pool_size and the actual size of release_pool.  But | ||||
|                   * it is just a heuristic, it does not need to be perfect. | ||||
|                   */ | ||||
| -                alloc_pool_size = qatomic_xchg(&release_pool_size, 0);
 | ||||
| -                QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool);
 | ||||
| -                co = QSLIST_FIRST(&alloc_pool);
 | ||||
| +                set_alloc_pool_size(qatomic_xchg(&release_pool_size, 0));
 | ||||
| +                QSLIST_MOVE_ATOMIC(alloc_pool, &release_pool);
 | ||||
| +                co = QSLIST_FIRST(alloc_pool);
 | ||||
|              } | ||||
|          } | ||||
|          if (co) { | ||||
| -            QSLIST_REMOVE_HEAD(&alloc_pool, pool_next);
 | ||||
| -            alloc_pool_size--;
 | ||||
| +            QSLIST_REMOVE_HEAD(alloc_pool, pool_next);
 | ||||
| +            set_alloc_pool_size(get_alloc_pool_size() - 1);
 | ||||
|          } | ||||
|      } | ||||
|   | ||||
| @@ -93,9 +100,9 @@ static void coroutine_delete(Coroutine *co)
 | ||||
|              qatomic_inc(&release_pool_size); | ||||
|              return; | ||||
|          } | ||||
| -        if (alloc_pool_size < qatomic_read(&pool_batch_size)) {
 | ||||
| -            QSLIST_INSERT_HEAD(&alloc_pool, co, pool_next);
 | ||||
| -            alloc_pool_size++;
 | ||||
| +        if (get_alloc_pool_size() < qatomic_read(&pool_batch_size)) {
 | ||||
| +            QSLIST_INSERT_HEAD(get_ptr_alloc_pool(), co, pool_next);
 | ||||
| +            set_alloc_pool_size(get_alloc_pool_size() + 1);
 | ||||
|              return; | ||||
|          } | ||||
|      } | ||||
							
								
								
									
										90
									
								
								debian/patches/extra/0009-coroutine-Rename-qemu_coroutine_inc-dec_pool_size.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										90
									
								
								debian/patches/extra/0009-coroutine-Rename-qemu_coroutine_inc-dec_pool_size.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,90 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Kevin Wolf <kwolf@redhat.com> | ||||
| Date: Tue, 10 May 2022 17:10:19 +0200 | ||||
| Subject: [PATCH] coroutine: Rename qemu_coroutine_inc/dec_pool_size() | ||||
| 
 | ||||
| It's true that these functions currently affect the batch size in which | ||||
| coroutines are reused (i.e. moved from the global release pool to the | ||||
| allocation pool of a specific thread), but this is a bug and will be | ||||
| fixed in a separate patch. | ||||
| 
 | ||||
| In fact, the comment in the header file already just promises that it | ||||
| influences the pool size, so reflect this in the name of the functions. | ||||
| As a nice side effect, the shorter function name makes some line | ||||
| wrapping unnecessary. | ||||
| 
 | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| Message-Id: <20220510151020.105528-2-kwolf@redhat.com> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit 98e3ab35054b946f7c2aba5408822532b0920b53) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/block/virtio-blk.c    | 6 ++---- | ||||
|  include/qemu/coroutine.h | 6 +++--- | ||||
|  util/qemu-coroutine.c    | 4 ++-- | ||||
|  3 files changed, 7 insertions(+), 9 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 | ||||
| index 540c38f829..6a1cc41877 100644
 | ||||
| --- a/hw/block/virtio-blk.c
 | ||||
| +++ b/hw/block/virtio-blk.c
 | ||||
| @@ -1215,8 +1215,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
 | ||||
|      for (i = 0; i < conf->num_queues; i++) { | ||||
|          virtio_add_queue(vdev, conf->queue_size, virtio_blk_handle_output); | ||||
|      } | ||||
| -    qemu_coroutine_increase_pool_batch_size(conf->num_queues * conf->queue_size
 | ||||
| -                                            / 2);
 | ||||
| +    qemu_coroutine_inc_pool_size(conf->num_queues * conf->queue_size / 2);
 | ||||
|      virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err); | ||||
|      if (err != NULL) { | ||||
|          error_propagate(errp, err); | ||||
| @@ -1253,8 +1252,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev)
 | ||||
|      for (i = 0; i < conf->num_queues; i++) { | ||||
|          virtio_del_queue(vdev, i); | ||||
|      } | ||||
| -    qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size
 | ||||
| -                                            / 2);
 | ||||
| +    qemu_coroutine_dec_pool_size(conf->num_queues * conf->queue_size / 2);
 | ||||
|      qemu_del_vm_change_state_handler(s->change); | ||||
|      blockdev_mark_auto_del(s->blk); | ||||
|      virtio_cleanup(vdev); | ||||
| diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
 | ||||
| index c828a95ee0..5b621d1295 100644
 | ||||
| --- a/include/qemu/coroutine.h
 | ||||
| +++ b/include/qemu/coroutine.h
 | ||||
| @@ -334,12 +334,12 @@ void coroutine_fn yield_until_fd_readable(int fd);
 | ||||
|  /** | ||||
|   * Increase coroutine pool size | ||||
|   */ | ||||
| -void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size);
 | ||||
| +void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size);
 | ||||
|   | ||||
|  /** | ||||
| - * Devcrease coroutine pool size
 | ||||
| + * Decrease coroutine pool size
 | ||||
|   */ | ||||
| -void qemu_coroutine_decrease_pool_batch_size(unsigned int additional_pool_size);
 | ||||
| +void qemu_coroutine_dec_pool_size(unsigned int additional_pool_size);
 | ||||
|   | ||||
|  #include "qemu/lockable.h" | ||||
|   | ||||
| diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
 | ||||
| index f3e8300c8d..ea23929a74 100644
 | ||||
| --- a/util/qemu-coroutine.c
 | ||||
| +++ b/util/qemu-coroutine.c
 | ||||
| @@ -212,12 +212,12 @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
 | ||||
|      return co->ctx; | ||||
|  } | ||||
|   | ||||
| -void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size)
 | ||||
| +void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size)
 | ||||
|  { | ||||
|      qatomic_add(&pool_batch_size, additional_pool_size); | ||||
|  } | ||||
|   | ||||
| -void qemu_coroutine_decrease_pool_batch_size(unsigned int removing_pool_size)
 | ||||
| +void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size)
 | ||||
|  { | ||||
|      qatomic_sub(&pool_batch_size, removing_pool_size); | ||||
|  } | ||||
							
								
								
									
										121
									
								
								debian/patches/extra/0010-coroutine-Revert-to-constant-batch-size.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										121
									
								
								debian/patches/extra/0010-coroutine-Revert-to-constant-batch-size.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,121 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Kevin Wolf <kwolf@redhat.com> | ||||
| Date: Tue, 10 May 2022 17:10:20 +0200 | ||||
| Subject: [PATCH] coroutine: Revert to constant batch size | ||||
| 
 | ||||
| Commit 4c41c69e changed the way the coroutine pool is sized because for | ||||
| virtio-blk devices with a large queue size and heavy I/O, it was just | ||||
| too small and caused coroutines to be deleted and reallocated soon | ||||
| afterwards. The change made the size dynamic based on the number of | ||||
| queues and the queue size of virtio-blk devices. | ||||
| 
 | ||||
| There are two important numbers here: Slightly simplified, when a | ||||
| coroutine terminates, it is generally stored in the global release pool | ||||
| up to a certain pool size, and if the pool is full, it is freed. | ||||
| Conversely, when allocating a new coroutine, the coroutines in the | ||||
| release pool are reused if the pool already has reached a certain | ||||
| minimum size (the batch size), otherwise we allocate new coroutines. | ||||
| 
 | ||||
| The problem after commit 4c41c69e is that it not only increases the | ||||
| maximum pool size (which is the intended effect), but also the batch | ||||
| size for reusing coroutines (which is a bug). It means that in cases | ||||
| with many devices and/or a large queue size (which defaults to the | ||||
| number of vcpus for virtio-blk-pci), many thousand coroutines could be | ||||
| sitting in the release pool without being reused. | ||||
| 
 | ||||
| This is not only a waste of memory and allocations, but it actually | ||||
| makes the QEMU process likely to hit the vm.max_map_count limit on Linux | ||||
| because each coroutine requires two mappings (its stack and the guard | ||||
| page for the stack), causing it to abort() in qemu_alloc_stack() because | ||||
| when the limit is hit, mprotect() starts to fail with ENOMEM. | ||||
| 
 | ||||
| In order to fix the problem, change the batch size back to 64 to avoid | ||||
| uselessly accumulating coroutines in the release pool, but keep the | ||||
| dynamic maximum pool size so that coroutines aren't freed too early | ||||
| in heavy I/O scenarios. | ||||
| 
 | ||||
| Note that this fix doesn't strictly make it impossible to hit the limit, | ||||
| but this would only happen if most of the coroutines are actually in use | ||||
| at the same time, not just sitting in a pool. This is the same behaviour | ||||
| as we already had before commit 4c41c69e. Fully preventing this would | ||||
| require allowing qemu_coroutine_create() to return an error, but it | ||||
| doesn't seem to be a scenario that people hit in practice. | ||||
| 
 | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938 | ||||
| Fixes: 4c41c69e05fe28c0f95f8abd2ebf407e95a4f04b | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| Message-Id: <20220510151020.105528-3-kwolf@redhat.com> | ||||
| Tested-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp> | ||||
| Signed-off-by: Kevin Wolf <kwolf@redhat.com> | ||||
| (cherry-picked from commit 9ec7a59b5aad4b736871c378d30f5ef5ec51cb52) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  util/qemu-coroutine.c | 22 ++++++++++++++-------- | ||||
|  1 file changed, 14 insertions(+), 8 deletions(-) | ||||
| 
 | ||||
| diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
 | ||||
| index ea23929a74..4a8bd63ef0 100644
 | ||||
| --- a/util/qemu-coroutine.c
 | ||||
| +++ b/util/qemu-coroutine.c
 | ||||
| @@ -21,14 +21,20 @@
 | ||||
|  #include "qemu/coroutine-tls.h" | ||||
|  #include "block/aio.h" | ||||
|   | ||||
| -/** Initial batch size is 64, and is increased on demand */
 | ||||
| +/**
 | ||||
| + * The minimal batch size is always 64, coroutines from the release_pool are
 | ||||
| + * reused as soon as there are 64 coroutines in it. The maximum pool size starts
 | ||||
| + * with 64 and is increased on demand so that coroutines are not deleted even if
 | ||||
| + * they are not immediately reused.
 | ||||
| + */
 | ||||
|  enum { | ||||
| -    POOL_INITIAL_BATCH_SIZE = 64,
 | ||||
| +    POOL_MIN_BATCH_SIZE = 64,
 | ||||
| +    POOL_INITIAL_MAX_SIZE = 64,
 | ||||
|  }; | ||||
|   | ||||
|  /** Free list to speed up creation */ | ||||
|  static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool); | ||||
| -static unsigned int pool_batch_size = POOL_INITIAL_BATCH_SIZE;
 | ||||
| +static unsigned int pool_max_size = POOL_INITIAL_MAX_SIZE;
 | ||||
|  static unsigned int release_pool_size; | ||||
|   | ||||
|  typedef QSLIST_HEAD(, Coroutine) CoroutineQSList; | ||||
| @@ -57,7 +63,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
 | ||||
|   | ||||
|          co = QSLIST_FIRST(alloc_pool); | ||||
|          if (!co) { | ||||
| -            if (release_pool_size > qatomic_read(&pool_batch_size)) {
 | ||||
| +            if (release_pool_size > POOL_MIN_BATCH_SIZE) {
 | ||||
|                  /* Slow path; a good place to register the destructor, too.  */ | ||||
|                  Notifier *notifier = get_ptr_coroutine_pool_cleanup_notifier(); | ||||
|                  if (!notifier->notify) { | ||||
| @@ -95,12 +101,12 @@ static void coroutine_delete(Coroutine *co)
 | ||||
|      co->caller = NULL; | ||||
|   | ||||
|      if (CONFIG_COROUTINE_POOL) { | ||||
| -        if (release_pool_size < qatomic_read(&pool_batch_size) * 2) {
 | ||||
| +        if (release_pool_size < qatomic_read(&pool_max_size) * 2) {
 | ||||
|              QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next); | ||||
|              qatomic_inc(&release_pool_size); | ||||
|              return; | ||||
|          } | ||||
| -        if (get_alloc_pool_size() < qatomic_read(&pool_batch_size)) {
 | ||||
| +        if (get_alloc_pool_size() < qatomic_read(&pool_max_size)) {
 | ||||
|              QSLIST_INSERT_HEAD(get_ptr_alloc_pool(), co, pool_next); | ||||
|              set_alloc_pool_size(get_alloc_pool_size() + 1); | ||||
|              return; | ||||
| @@ -214,10 +220,10 @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
 | ||||
|   | ||||
|  void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size) | ||||
|  { | ||||
| -    qatomic_add(&pool_batch_size, additional_pool_size);
 | ||||
| +    qatomic_add(&pool_max_size, additional_pool_size);
 | ||||
|  } | ||||
|   | ||||
|  void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size) | ||||
|  { | ||||
| -    qatomic_sub(&pool_batch_size, removing_pool_size);
 | ||||
| +    qatomic_sub(&pool_max_size, removing_pool_size);
 | ||||
|  } | ||||
							
								
								
									
										117
									
								
								debian/patches/extra/0011-target-i386-do-not-consult-nonexistent-host-leaves.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										117
									
								
								debian/patches/extra/0011-target-i386-do-not-consult-nonexistent-host-leaves.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,117 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Paolo Bonzini <pbonzini@redhat.com> | ||||
| Date: Fri, 29 Apr 2022 21:16:28 +0200 | ||||
| Subject: [PATCH] target/i386: do not consult nonexistent host leaves | ||||
| 
 | ||||
| When cache_info_passthrough is requested, QEMU passes the host values | ||||
| of the cache information CPUID leaves down to the guest.  However, | ||||
| it blindly assumes that the CPUID leaf exists on the host, and this | ||||
| cannot be guaranteed: for example, KVM has recently started to | ||||
| synthesize AMD leaves up to 0x80000021 in order to provide accurate | ||||
| CPU bug information to guests. | ||||
| 
 | ||||
| Querying a nonexistent host leaf fills the output arguments of | ||||
| host_cpuid with data that (albeit deterministic) is nonsensical | ||||
| as cache information, namely the data in the highest Intel CPUID | ||||
| leaf.  If said highest leaf is not ECX-dependent, this can even | ||||
| cause an infinite loop when kvm_arch_init_vcpu prepares the input | ||||
| to KVM_SET_CPUID2.  The infinite loop is only terminated by an | ||||
| abort() when the array gets full. | ||||
| 
 | ||||
| Reported-by: Maxim Levitsky <mlevitsk@redhat.com> | ||||
| Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> | ||||
| Cc: qemu-stable@nongnu.org | ||||
| Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> | ||||
| (cherry-picked from commit 798d8ec0dacd4cc0034298d94f430c14f23e2919) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  target/i386/cpu.c | 41 ++++++++++++++++++++++++++++++++++++----- | ||||
|  1 file changed, 36 insertions(+), 5 deletions(-) | ||||
| 
 | ||||
| diff --git a/target/i386/cpu.c b/target/i386/cpu.c
 | ||||
| index 6e6945139b..c79e151887 100644
 | ||||
| --- a/target/i386/cpu.c
 | ||||
| +++ b/target/i386/cpu.c
 | ||||
| @@ -5030,6 +5030,37 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 | ||||
|      return r; | ||||
|  } | ||||
|   | ||||
| +static void x86_cpu_get_cache_cpuid(uint32_t func, uint32_t index,
 | ||||
| +                                    uint32_t *eax, uint32_t *ebx,
 | ||||
| +                                    uint32_t *ecx, uint32_t *edx)
 | ||||
| +{
 | ||||
| +    uint32_t level, unused;
 | ||||
| +
 | ||||
| +    /* Only return valid host leaves.  */
 | ||||
| +    switch (func) {
 | ||||
| +    case 2:
 | ||||
| +    case 4:
 | ||||
| +        host_cpuid(0, 0, &level, &unused, &unused, &unused);
 | ||||
| +        break;
 | ||||
| +    case 0x80000005:
 | ||||
| +    case 0x80000006:
 | ||||
| +    case 0x8000001d:
 | ||||
| +        host_cpuid(0x80000000, 0, &level, &unused, &unused, &unused);
 | ||||
| +        break;
 | ||||
| +    default:
 | ||||
| +        return;
 | ||||
| +    }
 | ||||
| +
 | ||||
| +    if (func > level) {
 | ||||
| +        *eax = 0;
 | ||||
| +        *ebx = 0;
 | ||||
| +        *ecx = 0;
 | ||||
| +        *edx = 0;
 | ||||
| +    } else {
 | ||||
| +        host_cpuid(func, index, eax, ebx, ecx, edx);
 | ||||
| +    }
 | ||||
| +}
 | ||||
| +
 | ||||
|  /* | ||||
|   * Only for builtin_x86_defs models initialized with x86_register_cpudef_types. | ||||
|   */ | ||||
| @@ -5288,7 +5319,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 | ||||
|      case 2: | ||||
|          /* cache info: needed for Pentium Pro compatibility */ | ||||
|          if (cpu->cache_info_passthrough) { | ||||
| -            host_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
| +            x86_cpu_get_cache_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
|              break; | ||||
|          } else if (cpu->vendor_cpuid_only && IS_AMD_CPU(env)) { | ||||
|              *eax = *ebx = *ecx = *edx = 0; | ||||
| @@ -5308,7 +5339,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 | ||||
|      case 4: | ||||
|          /* cache info: needed for Core compatibility */ | ||||
|          if (cpu->cache_info_passthrough) { | ||||
| -            host_cpuid(index, count, eax, ebx, ecx, edx);
 | ||||
| +            x86_cpu_get_cache_cpuid(index, count, eax, ebx, ecx, edx);
 | ||||
|              /* QEMU gives out its own APIC IDs, never pass down bits 31..26.  */ | ||||
|              *eax &= ~0xFC000000; | ||||
|              if ((*eax & 31) && cs->nr_cores > 1) { | ||||
| @@ -5710,7 +5741,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 | ||||
|      case 0x80000005: | ||||
|          /* cache info (L1 cache) */ | ||||
|          if (cpu->cache_info_passthrough) { | ||||
| -            host_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
| +            x86_cpu_get_cache_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
|              break; | ||||
|          } | ||||
|          *eax = (L1_DTLB_2M_ASSOC << 24) | (L1_DTLB_2M_ENTRIES << 16) | | ||||
| @@ -5723,7 +5754,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 | ||||
|      case 0x80000006: | ||||
|          /* cache info (L2 cache) */ | ||||
|          if (cpu->cache_info_passthrough) { | ||||
| -            host_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
| +            x86_cpu_get_cache_cpuid(index, 0, eax, ebx, ecx, edx);
 | ||||
|              break; | ||||
|          } | ||||
|          *eax = (AMD_ENC_ASSOC(L2_DTLB_2M_ASSOC) << 28) | | ||||
| @@ -5783,7 +5814,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
 | ||||
|      case 0x8000001D: | ||||
|          *eax = 0; | ||||
|          if (cpu->cache_info_passthrough) { | ||||
| -            host_cpuid(index, count, eax, ebx, ecx, edx);
 | ||||
| +            x86_cpu_get_cache_cpuid(index, count, eax, ebx, ecx, edx);
 | ||||
|              break; | ||||
|          } | ||||
|          switch (count) { | ||||
							
								
								
									
										108
									
								
								debian/patches/extra/0012-virtio-scsi-fix-ctrl-and-event-handler-functions-in-.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										108
									
								
								debian/patches/extra/0012-virtio-scsi-fix-ctrl-and-event-handler-functions-in-.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,108 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Date: Wed, 27 Apr 2022 15:35:36 +0100 | ||||
| Subject: [PATCH] virtio-scsi: fix ctrl and event handler functions in | ||||
|  dataplane mode | ||||
| 
 | ||||
| Commit f34e8d8b8d48d73f36a67b6d5e492ef9784b5012 ("virtio-scsi: prepare | ||||
| virtio_scsi_handle_cmd for dataplane") prepared the virtio-scsi cmd | ||||
| virtqueue handler function to be used in both the dataplane and | ||||
| non-datpalane code paths. | ||||
| 
 | ||||
| It failed to convert the ctrl and event virtqueue handler functions, | ||||
| which are not designed to be called from the dataplane code path but | ||||
| will be since the ioeventfd is set up for those virtqueues when | ||||
| dataplane starts. | ||||
| 
 | ||||
| Convert the ctrl and event virtqueue handler functions now so they | ||||
| operate correctly when called from the dataplane code path. Avoid code | ||||
| duplication by extracting this code into a helper function. | ||||
| 
 | ||||
| Fixes: f34e8d8b8d48d73f36a67b6d5e492ef9784b5012 ("virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane") | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> | ||||
| Message-id: 20220427143541.119567-2-stefanha@redhat.com | ||||
| [Fixed s/by used/be used/ typo pointed out by Michael Tokarev | ||||
| <mjt@tls.msk.ru>. | ||||
| --Stefan]
 | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| (cherry-picked from commit 2f743ef6366c2df4ef51ef3ae318138cdc0125ab) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/scsi/virtio-scsi.c | 42 +++++++++++++++++++++++++++--------------- | ||||
|  1 file changed, 27 insertions(+), 15 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
 | ||||
| index 34a968ecfb..417fbc71d6 100644
 | ||||
| --- a/hw/scsi/virtio-scsi.c
 | ||||
| +++ b/hw/scsi/virtio-scsi.c
 | ||||
| @@ -472,16 +472,32 @@ bool virtio_scsi_handle_ctrl_vq(VirtIOSCSI *s, VirtQueue *vq)
 | ||||
|      return progress; | ||||
|  } | ||||
|   | ||||
| +/*
 | ||||
| + * If dataplane is configured but not yet started, do so now and return true on
 | ||||
| + * success.
 | ||||
| + *
 | ||||
| + * Dataplane is started by the core virtio code but virtqueue handler functions
 | ||||
| + * can also be invoked when a guest kicks before DRIVER_OK, so this helper
 | ||||
| + * function helps us deal with manually starting ioeventfd in that case.
 | ||||
| + */
 | ||||
| +static bool virtio_scsi_defer_to_dataplane(VirtIOSCSI *s)
 | ||||
| +{
 | ||||
| +    if (!s->ctx || s->dataplane_started) {
 | ||||
| +        return false;
 | ||||
| +    }
 | ||||
| +
 | ||||
| +    virtio_device_start_ioeventfd(&s->parent_obj.parent_obj);
 | ||||
| +    return !s->dataplane_fenced;
 | ||||
| +}
 | ||||
| +
 | ||||
|  static void virtio_scsi_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) | ||||
|  { | ||||
|      VirtIOSCSI *s = (VirtIOSCSI *)vdev; | ||||
|   | ||||
| -    if (s->ctx) {
 | ||||
| -        virtio_device_start_ioeventfd(vdev);
 | ||||
| -        if (!s->dataplane_fenced) {
 | ||||
| -            return;
 | ||||
| -        }
 | ||||
| +    if (virtio_scsi_defer_to_dataplane(s)) {
 | ||||
| +        return;
 | ||||
|      } | ||||
| +
 | ||||
|      virtio_scsi_acquire(s); | ||||
|      virtio_scsi_handle_ctrl_vq(s, vq); | ||||
|      virtio_scsi_release(s); | ||||
| @@ -720,12 +736,10 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, VirtQueue *vq)
 | ||||
|      /* use non-QOM casts in the data path */ | ||||
|      VirtIOSCSI *s = (VirtIOSCSI *)vdev; | ||||
|   | ||||
| -    if (s->ctx && !s->dataplane_started) {
 | ||||
| -        virtio_device_start_ioeventfd(vdev);
 | ||||
| -        if (!s->dataplane_fenced) {
 | ||||
| -            return;
 | ||||
| -        }
 | ||||
| +    if (virtio_scsi_defer_to_dataplane(s)) {
 | ||||
| +        return;
 | ||||
|      } | ||||
| +
 | ||||
|      virtio_scsi_acquire(s); | ||||
|      virtio_scsi_handle_cmd_vq(s, vq); | ||||
|      virtio_scsi_release(s); | ||||
| @@ -855,12 +869,10 @@ static void virtio_scsi_handle_event(VirtIODevice *vdev, VirtQueue *vq)
 | ||||
|  { | ||||
|      VirtIOSCSI *s = VIRTIO_SCSI(vdev); | ||||
|   | ||||
| -    if (s->ctx) {
 | ||||
| -        virtio_device_start_ioeventfd(vdev);
 | ||||
| -        if (!s->dataplane_fenced) {
 | ||||
| -            return;
 | ||||
| -        }
 | ||||
| +    if (virtio_scsi_defer_to_dataplane(s)) {
 | ||||
| +        return;
 | ||||
|      } | ||||
| +
 | ||||
|      virtio_scsi_acquire(s); | ||||
|      virtio_scsi_handle_event_vq(s, vq); | ||||
|      virtio_scsi_release(s); | ||||
							
								
								
									
										91
									
								
								debian/patches/extra/0013-virtio-scsi-don-t-waste-CPU-polling-the-event-virtqu.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										91
									
								
								debian/patches/extra/0013-virtio-scsi-don-t-waste-CPU-polling-the-event-virtqu.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,91 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Date: Wed, 27 Apr 2022 15:35:37 +0100 | ||||
| Subject: [PATCH] virtio-scsi: don't waste CPU polling the event virtqueue | ||||
| 
 | ||||
| The virtio-scsi event virtqueue is not emptied by its handler function. | ||||
| This is typical for rx virtqueues where the device uses buffers when | ||||
| some event occurs (e.g. a packet is received, an error condition | ||||
| happens, etc). | ||||
| 
 | ||||
| Polling non-empty virtqueues wastes CPU cycles. We are not waiting for | ||||
| new buffers to become available, we are waiting for an event to occur, | ||||
| so it's a misuse of CPU resources to poll for buffers. | ||||
| 
 | ||||
| Introduce the new virtio_queue_aio_attach_host_notifier_no_poll() API, | ||||
| which is identical to virtio_queue_aio_attach_host_notifier() except | ||||
| that it does not poll the virtqueue. | ||||
| 
 | ||||
| Before this patch the following command-line consumed 100% CPU in the | ||||
| IOThread polling and calling virtio_scsi_handle_event(): | ||||
| 
 | ||||
|   $ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \ | ||||
|       --object iothread,id=iothread0 \ | ||||
|       --device virtio-scsi-pci,iothread=iothread0 \ | ||||
|       --blockdev file,filename=test.img,aio=native,cache.direct=on,node-name=drive0 \ | ||||
|       --device scsi-hd,drive=drive0 | ||||
| 
 | ||||
| After this patch CPU is no longer wasted. | ||||
| 
 | ||||
| Reported-by: Nir Soffer <nsoffer@redhat.com> | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| Tested-by: Nir Soffer <nsoffer@redhat.com> | ||||
| Message-id: 20220427143541.119567-3-stefanha@redhat.com | ||||
| Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||||
| (cherry-picked from commit 38738f7dbbda90fbc161757b7f4be35b52205552) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/scsi/virtio-scsi-dataplane.c |  2 +- | ||||
|  hw/virtio/virtio.c              | 13 +++++++++++++ | ||||
|  include/hw/virtio/virtio.h      |  1 + | ||||
|  3 files changed, 15 insertions(+), 1 deletion(-) | ||||
| 
 | ||||
| diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
 | ||||
| index 29575cbaf6..8bb6e6acfc 100644
 | ||||
| --- a/hw/scsi/virtio-scsi-dataplane.c
 | ||||
| +++ b/hw/scsi/virtio-scsi-dataplane.c
 | ||||
| @@ -138,7 +138,7 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 | ||||
|   | ||||
|      aio_context_acquire(s->ctx); | ||||
|      virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx); | ||||
| -    virtio_queue_aio_attach_host_notifier(vs->event_vq, s->ctx);
 | ||||
| +    virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
 | ||||
|   | ||||
|      for (i = 0; i < vs->conf.num_queues; i++) { | ||||
|          virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx); | ||||
| diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
 | ||||
| index 9d637e043e..67a873f54a 100644
 | ||||
| --- a/hw/virtio/virtio.c
 | ||||
| +++ b/hw/virtio/virtio.c
 | ||||
| @@ -3534,6 +3534,19 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 | ||||
|                                  virtio_queue_host_notifier_aio_poll_end); | ||||
|  } | ||||
|   | ||||
| +/*
 | ||||
| + * Same as virtio_queue_aio_attach_host_notifier() but without polling. Use
 | ||||
| + * this for rx virtqueues and similar cases where the virtqueue handler
 | ||||
| + * function does not pop all elements. When the virtqueue is left non-empty
 | ||||
| + * polling consumes CPU cycles and should not be used.
 | ||||
| + */
 | ||||
| +void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx)
 | ||||
| +{
 | ||||
| +    aio_set_event_notifier(ctx, &vq->host_notifier, true,
 | ||||
| +                           virtio_queue_host_notifier_read,
 | ||||
| +                           NULL, NULL);
 | ||||
| +}
 | ||||
| +
 | ||||
|  void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx) | ||||
|  { | ||||
|      aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL); | ||||
| diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
 | ||||
| index b31c4507f5..b62a35fdca 100644
 | ||||
| --- a/include/hw/virtio/virtio.h
 | ||||
| +++ b/include/hw/virtio/virtio.h
 | ||||
| @@ -317,6 +317,7 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 | ||||
|  void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled); | ||||
|  void virtio_queue_host_notifier_read(EventNotifier *n); | ||||
|  void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx); | ||||
| +void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx);
 | ||||
|  void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx); | ||||
|  VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector); | ||||
|  VirtQueue *virtio_vector_next_queue(VirtQueue *vq); | ||||
							
								
								
									
										102
									
								
								debian/patches/extra/0014-vhost-Track-descriptor-chain-in-private-at-SVQ.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										102
									
								
								debian/patches/extra/0014-vhost-Track-descriptor-chain-in-private-at-SVQ.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,102 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <eperezma@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:42 +0200 | ||||
| Subject: [PATCH] vhost: Track descriptor chain in private at SVQ | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| The device could have access to modify them, and it definitely have | ||||
| access when we implement packed vq. Harden SVQ maintaining a private | ||||
| copy of the descriptor chain. Other fields like buffer addresses are | ||||
| already maintained sepparatedly. | ||||
| 
 | ||||
| Signed-off-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Message-Id: <20220512175747.142058-2-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit 495fe3a78749c39c0e772c4e1a55d6cb8a7e5292) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/virtio/vhost-shadow-virtqueue.c | 12 +++++++----- | ||||
|  hw/virtio/vhost-shadow-virtqueue.h |  6 ++++++ | ||||
|  2 files changed, 13 insertions(+), 5 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| index b232803d1b..3155801f50 100644
 | ||||
| --- a/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| +++ b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| @@ -138,6 +138,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 | ||||
|      for (n = 0; n < num; n++) { | ||||
|          if (more_descs || (n + 1 < num)) { | ||||
|              descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT); | ||||
| +            descs[i].next = cpu_to_le16(svq->desc_next[i]);
 | ||||
|          } else { | ||||
|              descs[i].flags = flags; | ||||
|          } | ||||
| @@ -145,10 +146,10 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 | ||||
|          descs[i].len = cpu_to_le32(iovec[n].iov_len); | ||||
|   | ||||
|          last = i; | ||||
| -        i = cpu_to_le16(descs[i].next);
 | ||||
| +        i = cpu_to_le16(svq->desc_next[i]);
 | ||||
|      } | ||||
|   | ||||
| -    svq->free_head = le16_to_cpu(descs[last].next);
 | ||||
| +    svq->free_head = le16_to_cpu(svq->desc_next[last]);
 | ||||
|  } | ||||
|   | ||||
|  static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, | ||||
| @@ -336,7 +337,6 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 | ||||
|  static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, | ||||
|                                             uint32_t *len) | ||||
|  { | ||||
| -    vring_desc_t *descs = svq->vring.desc;
 | ||||
|      const vring_used_t *used = svq->vring.used; | ||||
|      vring_used_elem_t used_elem; | ||||
|      uint16_t last_used; | ||||
| @@ -365,7 +365,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 | ||||
|          return NULL; | ||||
|      } | ||||
|   | ||||
| -    descs[used_elem.id].next = svq->free_head;
 | ||||
| +    svq->desc_next[used_elem.id] = svq->free_head;
 | ||||
|      svq->free_head = used_elem.id; | ||||
|   | ||||
|      *len = used_elem.len; | ||||
| @@ -540,8 +540,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
 | ||||
|      svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size); | ||||
|      memset(svq->vring.used, 0, device_size); | ||||
|      svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num); | ||||
| +    svq->desc_next = g_new0(uint16_t, svq->vring.num);
 | ||||
|      for (unsigned i = 0; i < svq->vring.num - 1; i++) { | ||||
| -        svq->vring.desc[i].next = cpu_to_le16(i + 1);
 | ||||
| +        svq->desc_next[i] = cpu_to_le16(i + 1);
 | ||||
|      } | ||||
|  } | ||||
|   | ||||
| @@ -574,6 +575,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 | ||||
|          virtqueue_detach_element(svq->vq, next_avail_elem, 0); | ||||
|      } | ||||
|      svq->vq = NULL; | ||||
| +    g_free(svq->desc_next);
 | ||||
|      g_free(svq->ring_id_maps); | ||||
|      qemu_vfree(svq->vring.desc); | ||||
|      qemu_vfree(svq->vring.used); | ||||
| diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
 | ||||
| index e5e24c536d..c132c994e9 100644
 | ||||
| --- a/hw/virtio/vhost-shadow-virtqueue.h
 | ||||
| +++ b/hw/virtio/vhost-shadow-virtqueue.h
 | ||||
| @@ -53,6 +53,12 @@ typedef struct VhostShadowVirtqueue {
 | ||||
|      /* Next VirtQueue element that guest made available */ | ||||
|      VirtQueueElement *next_guest_avail_elem; | ||||
|   | ||||
| +    /*
 | ||||
| +     * Backup next field for each descriptor so we can recover securely, not
 | ||||
| +     * needing to trust the device access.
 | ||||
| +     */
 | ||||
| +    uint16_t *desc_next;
 | ||||
| +
 | ||||
|      /* Next head to expose to the device */ | ||||
|      uint16_t shadow_avail_idx; | ||||
|   | ||||
							
								
								
									
										62
									
								
								debian/patches/extra/0015-vhost-Fix-device-s-used-descriptor-dequeue.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										62
									
								
								debian/patches/extra/0015-vhost-Fix-device-s-used-descriptor-dequeue.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,62 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <eperezma@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:43 +0200 | ||||
| Subject: [PATCH] vhost: Fix device's used descriptor dequeue | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Only the first one of them were properly enqueued back. | ||||
| 
 | ||||
| Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") | ||||
| 
 | ||||
| Signed-off-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Message-Id: <20220512175747.142058-3-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit 81abfa5724c9a6502d7a1d3a67c55f2a303a1170) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/virtio/vhost-shadow-virtqueue.c | 17 +++++++++++++++-- | ||||
|  1 file changed, 15 insertions(+), 2 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| index 3155801f50..31fc50907d 100644
 | ||||
| --- a/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| +++ b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| @@ -334,12 +334,22 @@ static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 | ||||
|      svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT); | ||||
|  } | ||||
|   | ||||
| +static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
 | ||||
| +                                             uint16_t num, uint16_t i)
 | ||||
| +{
 | ||||
| +    for (uint16_t j = 0; j < (num - 1); ++j) {
 | ||||
| +        i = le16_to_cpu(svq->desc_next[i]);
 | ||||
| +    }
 | ||||
| +
 | ||||
| +    return i;
 | ||||
| +}
 | ||||
| +
 | ||||
|  static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, | ||||
|                                             uint32_t *len) | ||||
|  { | ||||
|      const vring_used_t *used = svq->vring.used; | ||||
|      vring_used_elem_t used_elem; | ||||
| -    uint16_t last_used;
 | ||||
| +    uint16_t last_used, last_used_chain, num;
 | ||||
|   | ||||
|      if (!vhost_svq_more_used(svq)) { | ||||
|          return NULL; | ||||
| @@ -365,7 +375,10 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 | ||||
|          return NULL; | ||||
|      } | ||||
|   | ||||
| -    svq->desc_next[used_elem.id] = svq->free_head;
 | ||||
| +    num = svq->ring_id_maps[used_elem.id]->in_num +
 | ||||
| +          svq->ring_id_maps[used_elem.id]->out_num;
 | ||||
| +    last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 | ||||
| +    svq->desc_next[last_used_chain] = svq->free_head;
 | ||||
|      svq->free_head = used_elem.id; | ||||
|   | ||||
|      *len = used_elem.len; | ||||
							
								
								
									
										39
									
								
								debian/patches/extra/0016-vdpa-Fix-bad-index-calculus-at-vhost_vdpa_get_vring_.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										39
									
								
								debian/patches/extra/0016-vdpa-Fix-bad-index-calculus-at-vhost_vdpa_get_vring_.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,39 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <eperezma@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:44 +0200 | ||||
| Subject: [PATCH] vdpa: Fix bad index calculus at vhost_vdpa_get_vring_base | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Fixes: 6d0b222666 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ") | ||||
| 
 | ||||
| Acked-by: Jason Wang <jasowang@redhat.com> | ||||
| Signed-off-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Message-Id: <20220512175747.142058-4-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit 639036477ef890958415967e753ca2cbb348c16c) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/virtio/vhost-vdpa.c | 4 ++-- | ||||
|  1 file changed, 2 insertions(+), 2 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
 | ||||
| index 8adf7c0b92..8555a84f87 100644
 | ||||
| --- a/hw/virtio/vhost-vdpa.c
 | ||||
| +++ b/hw/virtio/vhost-vdpa.c
 | ||||
| @@ -1170,11 +1170,11 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
 | ||||
|                                         struct vhost_vring_state *ring) | ||||
|  { | ||||
|      struct vhost_vdpa *v = dev->opaque; | ||||
| +    int vdpa_idx = ring->index - dev->vq_index;
 | ||||
|      int ret; | ||||
|   | ||||
|      if (v->shadow_vqs_enabled) { | ||||
| -        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
 | ||||
| -                                                      ring->index);
 | ||||
| +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
 | ||||
|   | ||||
|          /* | ||||
|           * Setting base as last used idx, so destination will see as available | ||||
							
								
								
									
										35
									
								
								debian/patches/extra/0017-vdpa-Fix-index-calculus-at-vhost_vdpa_svqs_start.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										35
									
								
								debian/patches/extra/0017-vdpa-Fix-index-calculus-at-vhost_vdpa_svqs_start.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,35 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <eperezma@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:45 +0200 | ||||
| Subject: [PATCH] vdpa: Fix index calculus at vhost_vdpa_svqs_start | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| With the introduction of MQ the index of the vq needs to be calculated | ||||
| with the device model vq_index. | ||||
| 
 | ||||
| Signed-off-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Acked-by: Jason Wang <jasowang@redhat.com> | ||||
| Message-Id: <20220512175747.142058-5-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit 1c82fdfef8a227518ffecae9d419bcada995c202) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/virtio/vhost-vdpa.c | 2 +- | ||||
|  1 file changed, 1 insertion(+), 1 deletion(-) | ||||
| 
 | ||||
| diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
 | ||||
| index 8555a84f87..aa43cfa7d9 100644
 | ||||
| --- a/hw/virtio/vhost-vdpa.c
 | ||||
| +++ b/hw/virtio/vhost-vdpa.c
 | ||||
| @@ -1016,7 +1016,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
 | ||||
|          VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i); | ||||
|          VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i); | ||||
|          struct vhost_vring_addr addr = { | ||||
| -            .index = i,
 | ||||
| +            .index = dev->vq_index + i,
 | ||||
|          }; | ||||
|          int r; | ||||
|          bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err); | ||||
							
								
								
									
										74
									
								
								debian/patches/extra/0018-hw-virtio-Replace-g_memdup-by-g_memdup2.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										74
									
								
								debian/patches/extra/0018-hw-virtio-Replace-g_memdup-by-g_memdup2.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,74 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:46 +0200 | ||||
| Subject: [PATCH] hw/virtio: Replace g_memdup() by g_memdup2() | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Per https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538 | ||||
| 
 | ||||
|   The old API took the size of the memory to duplicate as a guint, | ||||
|   whereas most memory functions take memory sizes as a gsize. This | ||||
|   made it easy to accidentally pass a gsize to g_memdup(). For large | ||||
|   values, that would lead to a silent truncation of the size from 64 | ||||
|   to 32 bits, and result in a heap area being returned which is | ||||
|   significantly smaller than what the caller expects. This can likely | ||||
|   be exploited in various modules to cause a heap buffer overflow. | ||||
| 
 | ||||
| Replace g_memdup() by the safer g_memdup2() wrapper. | ||||
| 
 | ||||
| Acked-by: Jason Wang <jasowang@redhat.com> | ||||
| Acked-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> | ||||
| Message-Id: <20220512175747.142058-6-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit d792199de55ca5cb5334016884039c740290b5c7) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/net/virtio-net.c       | 3 ++- | ||||
|  hw/virtio/virtio-crypto.c | 6 +++--- | ||||
|  2 files changed, 5 insertions(+), 4 deletions(-) | ||||
| 
 | ||||
| diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 | ||||
| index 1067e72b39..e4748a7e6c 100644
 | ||||
| --- a/hw/net/virtio-net.c
 | ||||
| +++ b/hw/net/virtio-net.c
 | ||||
| @@ -1443,7 +1443,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 | ||||
|          } | ||||
|   | ||||
|          iov_cnt = elem->out_num; | ||||
| -        iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * elem->out_num);
 | ||||
| +        iov2 = iov = g_memdup2(elem->out_sg,
 | ||||
| +                               sizeof(struct iovec) * elem->out_num);
 | ||||
|          s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl)); | ||||
|          iov_discard_front(&iov, &iov_cnt, sizeof(ctrl)); | ||||
|          if (s != sizeof(ctrl)) { | ||||
| diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
 | ||||
| index dcd80b904d..0e31e3cc04 100644
 | ||||
| --- a/hw/virtio/virtio-crypto.c
 | ||||
| +++ b/hw/virtio/virtio-crypto.c
 | ||||
| @@ -242,7 +242,7 @@ static void virtio_crypto_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 | ||||
|          } | ||||
|   | ||||
|          out_num = elem->out_num; | ||||
| -        out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
 | ||||
| +        out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
 | ||||
|          out_iov = out_iov_copy; | ||||
|   | ||||
|          in_num = elem->in_num; | ||||
| @@ -605,11 +605,11 @@ virtio_crypto_handle_request(VirtIOCryptoReq *request)
 | ||||
|      } | ||||
|   | ||||
|      out_num = elem->out_num; | ||||
| -    out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
 | ||||
| +    out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
 | ||||
|      out_iov = out_iov_copy; | ||||
|   | ||||
|      in_num = elem->in_num; | ||||
| -    in_iov_copy = g_memdup(elem->in_sg, sizeof(in_iov[0]) * in_num);
 | ||||
| +    in_iov_copy = g_memdup2(elem->in_sg, sizeof(in_iov[0]) * in_num);
 | ||||
|      in_iov = in_iov_copy; | ||||
|   | ||||
|      if (unlikely(iov_to_buf(out_iov, out_num, 0, &req, sizeof(req)) | ||||
							
								
								
									
										47
									
								
								debian/patches/extra/0019-vhost-Fix-element-in-vhost_svq_add-failure.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							
							
						
						
									
										47
									
								
								debian/patches/extra/0019-vhost-Fix-element-in-vhost_svq_add-failure.patch
									
									
									
									
										vendored
									
									
										Normal file
									
								
							| @ -0,0 +1,47 @@ | ||||
| From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 | ||||
| From: =?UTF-8?q?Eugenio=20P=C3=A9rez?= <eperezma@redhat.com> | ||||
| Date: Thu, 12 May 2022 19:57:47 +0200 | ||||
| Subject: [PATCH] vhost: Fix element in vhost_svq_add failure | ||||
| MIME-Version: 1.0 | ||||
| Content-Type: text/plain; charset=UTF-8 | ||||
| Content-Transfer-Encoding: 8bit | ||||
| 
 | ||||
| Coverity rightly reports that is not free in that case. | ||||
| 
 | ||||
| Fixes: Coverity CID 1487559 | ||||
| Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding") | ||||
| 
 | ||||
| Signed-off-by: Eugenio Pérez <eperezma@redhat.com> | ||||
| Message-Id: <20220512175747.142058-7-eperezma@redhat.com> | ||||
| Reviewed-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| Signed-off-by: Michael S. Tsirkin <mst@redhat.com> | ||||
| (cherry-picked from commit 5181db132b587754dda3a520eec923b87a65bbb7) | ||||
| Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||||
| ---
 | ||||
|  hw/virtio/vhost-shadow-virtqueue.c | 8 ++++++++ | ||||
|  1 file changed, 8 insertions(+) | ||||
| 
 | ||||
| diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| index 31fc50907d..06d0bb39d9 100644
 | ||||
| --- a/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| +++ b/hw/virtio/vhost-shadow-virtqueue.c
 | ||||
| @@ -199,11 +199,19 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 | ||||
|      return true; | ||||
|  } | ||||
|   | ||||
| +/**
 | ||||
| + * Add an element to a SVQ.
 | ||||
| + *
 | ||||
| + * The caller must check that there is enough slots for the new element. It
 | ||||
| + * takes ownership of the element: In case of failure, it is free and the SVQ
 | ||||
| + * is considered broken.
 | ||||
| + */
 | ||||
|  static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) | ||||
|  { | ||||
|      unsigned qemu_head; | ||||
|      bool ok = vhost_svq_add_split(svq, elem, &qemu_head); | ||||
|      if (unlikely(!ok)) { | ||||
| +        g_free(elem);
 | ||||
|          return false; | ||||
|      } | ||||
|   | ||||
							
								
								
									
										19
									
								
								debian/patches/series
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										19
									
								
								debian/patches/series
									
									
									
									
										vendored
									
									
								
							| @ -1,5 +1,22 @@ | ||||
| extra/0001-monitor-qmp-fix-race-with-clients-disconnecting-earl.patch | ||||
| extra/0002-block-gluster-correctly-set-max_pdiscard-which-is-in.patch | ||||
| extra/0002-block-gluster-correctly-set-max_pdiscard.patch | ||||
| extra/0003-block-vmdk-Fix-reopening-bs-file.patch | ||||
| extra/0004-linux-aio-fix-unbalanced-plugged-counter-in-laio_io_.patch | ||||
| extra/0005-pci-fix-overflow-in-snprintf-string-formatting.patch | ||||
| extra/0006-target-i386-kvm-Fix-disabling-MPX-on-cpu-host-with-M.patch | ||||
| extra/0007-coroutine-ucontext-use-QEMU_DEFINE_STATIC_CO_TLS.patch | ||||
| extra/0008-coroutine-use-QEMU_DEFINE_STATIC_CO_TLS.patch | ||||
| extra/0009-coroutine-Rename-qemu_coroutine_inc-dec_pool_size.patch | ||||
| extra/0010-coroutine-Revert-to-constant-batch-size.patch | ||||
| extra/0011-target-i386-do-not-consult-nonexistent-host-leaves.patch | ||||
| extra/0012-virtio-scsi-fix-ctrl-and-event-handler-functions-in-.patch | ||||
| extra/0013-virtio-scsi-don-t-waste-CPU-polling-the-event-virtqu.patch | ||||
| extra/0014-vhost-Track-descriptor-chain-in-private-at-SVQ.patch | ||||
| extra/0015-vhost-Fix-device-s-used-descriptor-dequeue.patch | ||||
| extra/0016-vdpa-Fix-bad-index-calculus-at-vhost_vdpa_get_vring_.patch | ||||
| extra/0017-vdpa-Fix-index-calculus-at-vhost_vdpa_svqs_start.patch | ||||
| extra/0018-hw-virtio-Replace-g_memdup-by-g_memdup2.patch | ||||
| extra/0019-vhost-Fix-element-in-vhost_svq_add-failure.patch | ||||
| bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch | ||||
| bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch | ||||
| bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 Fabian Ebner
						Fabian Ebner