[Spice-devel] "qxl: add optinal 64bit vram bar" patch significantly slows down Spice

Submitted by Hans de Goede on April 16, 2012, 12:02 p.m.

Details

Message ID 4F8C0A46.3090303@redhat.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Hans de Goede April 16, 2012, 12:02 p.m.
Hi,

On 04/16/2012 11:59 AM, Avi Kivity wrote:
> On 04/16/2012 12:39 PM, Gerd Hoffmann wrote:
>> On 04/06/12 15:54, Hans de Goede wrote:
>>> Hi Gerd,
>>>
>>> While cherry picking some spice patches into my own qemu-kvm-1.0 branch,
>>> so that we can add them to Fedora-17 I noticed a significant slowdown
>>> after I was done cherry picking. Investigation has shown your
>>> "qxl: add optinal 64bit vram bar" patch to be the culprit.
>>>
>>> I noticed this using an old Fedora-14 32 bits vm with an xorg-x11-drv-qxl
>>> patched to use the async methods.
>>>
>>> If I scroll through the gnome applications menu with the mouse with
>>> plain qemu-kvm-1.0 all is fine, but once I add your:
>>> "qxl: add optinal 64bit vram bar" the updating of the display
>>> becomes noticably slower, the blue bar highlighting the selected
>>> menu entry becomes lagged compared to the mouse cursor.
>>
>> Avi?  This looks like a memory aliasing issue.
>>
>> Old state:
>>    vram bar, backed by memory
>>
>> New state:
>>    vram bar, 64bit, backed by memory, might not be mapped.
>>    vram bar, 32bit, alias for the first part of the 64bit bar.
>>
>> Full patch:
>>
>> http://git.qemu.org/?p=qemu.git;a=commitdiff;h=6f2b175a090f367c3aab2226c4741b439671307a
>>
>> Any hints?
>
> The patch looks correct.  Hans, please post your git tree and a command
> line.

I've seen this with this git tree + branch:
http://cgit.freedesktop.org/~jwrdegoede/qemu/log/?h=qemu-kvm-1.0-usbredir

With the patch in question back-ported to it (attached), and this cmdline:
exec sudo gdb -ex "handle SIGUSR2 nostop noprint" -ex run --args \
  /usr/bin/qemu-system-x86_64 -enable-kvm \
  -m 1024 -name F14-x86_64 -boot menu=on \
  -drive file=/home/virt-images/f14-i386.qcow2.img,media=disk,index=0 \
  -net nic,macaddr=52:54:00:7a:b4:7e,vlan=0,model=virtio,name=virtio.0 -net tap,
  -vga qxl -spice port=5932,disable-ticketing,streaming-video=off,image-compress
  -device virtio-serial -chardev spicevmc,id=vdagent,debug=1,name=vdagent -devic
  -readconfig /etc/qemu/ich9-ehci-uhci.cfg \
  -chardev spicevmc,name=usbredir,id=usbredirchardev1,debug=0 \
  -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1,debug=3 \
  -chardev spicevmc,name=usbredir,id=usbredirchardev2,debug=0 \
  -device usb-redir,chardev=usbredirchardev2,id=usbredirdev2,debug=3 \
  -chardev spicevmc,name=usbredir,id=usbredirchardev3,debug=0 \
  -device usb-redir,chardev=usbredirchardev3,id=usbredirdev3,debug=3 \
  -monitor stdio -device intel-hda -device hda-duplex

Note I've not actually run any benchmarks, running the vm with versus
without the patch just feels (and looks, in the form of slower updating
display) significantly slower.

Could it be that the patch somehow causes seabios and/or Xorg to
see the region as different wrt being cachable / uncachable or
some such?

Regards,

Hans

Patch hide | download patch | download mbox

From 4f9b6f4d19ca87aacb1a7a1c0a0e8a36304c5e60 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri, 14 Oct 2011 18:05:48 +0200
Subject: [PATCH 19/31] qxl: add optinal 64bit vram bar

This patch adds an 64bit pci bar for vram.  It is turned off by default.
It can be enabled by setting the size of the 64bit bar to be larger than
the 32bit bar.  Both 32bit and 64bit bar refer to the same memory.  Only
the first part of the memory is available via 32bit bar.

The intention is to allow large vram sizes for 64bit guests, by allowing
the vram bar being mapped above 4G, so we don't have to squeeze it into
the pci I/O window below 4G.

With vram_size_mb=16 and vram64_size_mb=256 it looks like this:

00:02.0 VGA compatible controller: Red Hat, Inc. Device 0100 (rev 02) (prog-if 00 [VGA controller])
        Subsystem: Red Hat, Inc Device 1100
        Physical Slot: 2
        Flags: fast devsel, IRQ 10
        Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
        Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
        Memory at fd020000 (32-bit, non-prefetchable) [size=8K]
        I/O ports at c5a0 [size=32]
        Memory at ffe0000000 (64-bit, prefetchable) [size=256M]
        Expansion ROM at fd000000 [disabled] [size=64K]

[ mapping above 4G needs patched seabios:
  http://www.kraxel.org/cgit/seabios/commit/?h=pci64 ]

Conflicts:

	hw/qxl.c
---
 hw/qxl.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 hw/qxl.h |    7 +++++++
 2 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 73be115..1151147 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -979,6 +979,7 @@  static void qxl_add_memslot(PCIQXLDevice *d, uint32_t slot_id, uint64_t delta,
     static const int regions[] = {
         QXL_RAM_RANGE_INDEX,
         QXL_VRAM_RANGE_INDEX,
+        QXL_VRAM64_RANGE_INDEX,
     };
     uint64_t guest_start;
     uint64_t guest_end;
@@ -1025,6 +1026,7 @@  static void qxl_add_memslot(PCIQXLDevice *d, uint32_t slot_id, uint64_t delta,
         virt_start = (intptr_t)memory_region_get_ram_ptr(&d->vga.vram);
         break;
     case QXL_VRAM_RANGE_INDEX:
+    case 4 /* vram 64bit */:
         virt_start = (intptr_t)memory_region_get_ram_ptr(&d->vram_bar);
         break;
     default:
@@ -1625,18 +1627,28 @@  static void qxl_init_ramsize(PCIQXLDevice *qxl, uint32_t ram_min_mb)
         qxl->vga.vram_size = ram_min_mb * 1024 * 1024;
     }
 
-    /* vram (surfaces, bar 1) */
+    /* vram32 (surfaces, 32bit, bar 1) */
+    if (qxl->vram32_size_mb != -1) {
+        qxl->vram32_size = qxl->vram32_size_mb * 1024 * 1024;
+    }
+    if (qxl->vram32_size < 4096) {
+        qxl->vram32_size = 4096;
+    }
+
+    /* vram (surfaces, 64bit, bar 4+5) */
     if (qxl->vram_size_mb != -1) {
         qxl->vram_size = qxl->vram_size_mb * 1024 * 1024;
     }
-    if (qxl->vram_size < 4096) {
-        qxl->vram_size = 4096;
+    if (qxl->vram_size < qxl->vram32_size) {
+        qxl->vram_size = qxl->vram32_size;
     }
+
     if (qxl->revision == 1) {
+        qxl->vram32_size = 4096;
         qxl->vram_size = 4096;
     }
-
     qxl->vga.vram_size = msb_mask(qxl->vga.vram_size * 2 - 1);
+    qxl->vram32_size = msb_mask(qxl->vram32_size * 2 - 1);
     qxl->vram_size = msb_mask(qxl->vram_size * 2 - 1);
 }
 
@@ -1678,6 +1690,8 @@  static int qxl_init_common(PCIQXLDevice *qxl)
 
     memory_region_init_ram(&qxl->vram_bar, &qxl->pci.qdev, "qxl.vram",
                            qxl->vram_size);
+    memory_region_init_alias(&qxl->vram32_bar, "qxl.vram32", &qxl->vram_bar,
+                             0, qxl->vram32_size);
 
     io_size = msb_mask(QXL_IO_RANGE_SIZE * 2 - 1);
     if (qxl->revision == 1) {
@@ -1701,7 +1715,29 @@  static int qxl_init_common(PCIQXLDevice *qxl)
                      PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->vga.vram);
 
     pci_register_bar(&qxl->pci, QXL_VRAM_RANGE_INDEX,
-                     PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->vram_bar);
+                     PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->vram32_bar);
+
+    if (qxl->vram32_size < qxl->vram_size) {
+        /*
+         * Make the 64bit vram bar show up only in case it is
+         * configured to be larger than the 32bit vram bar.
+         */
+        pci_register_bar(&qxl->pci, QXL_VRAM64_RANGE_INDEX,
+                         PCI_BASE_ADDRESS_SPACE_MEMORY |
+                         PCI_BASE_ADDRESS_MEM_TYPE_64 |
+                         PCI_BASE_ADDRESS_MEM_PREFETCH,
+                         &qxl->vram_bar);
+    }
+
+    /* print pci bar details */
+    dprint(qxl, 1, "ram/%s: %d MB [region 0]\n",
+           qxl->id == 0 ? "pri" : "sec",
+           qxl->vga.vram_size / (1024*1024));
+    dprint(qxl, 1, "vram/32: %d MB [region 1]\n",
+           qxl->vram32_size / (1024*1024));
+    dprint(qxl, 1, "vram/64: %d MB %s\n",
+           qxl->vram_size / (1024*1024),
+           qxl->vram32_size < qxl->vram_size ? "[region 4]" : "[unmapped]");
 
     qxl->ssd.qxl.base.sif = &qxl_interface.base;
     qxl->ssd.qxl.id = qxl->id;
@@ -1920,7 +1956,7 @@  static VMStateDescription qxl_vmstate = {
 static Property qxl_properties[] = {
         DEFINE_PROP_UINT32("ram_size", PCIQXLDevice, vga.vram_size,
                            64 * 1024 * 1024),
-        DEFINE_PROP_UINT32("vram_size", PCIQXLDevice, vram_size,
+        DEFINE_PROP_UINT32("vram_size", PCIQXLDevice, vram32_size,
                            64 * 1024 * 1024),
         DEFINE_PROP_UINT32("revision", PCIQXLDevice, revision,
                            QXL_DEFAULT_REVISION),
@@ -1928,7 +1964,8 @@  static Property qxl_properties[] = {
         DEFINE_PROP_UINT32("guestdebug", PCIQXLDevice, guestdebug, 0),
         DEFINE_PROP_UINT32("cmdlog", PCIQXLDevice, cmdlog, 0),
         DEFINE_PROP_UINT32("ram_size_mb",  PCIQXLDevice, ram_size_mb, -1),
-        DEFINE_PROP_UINT32("vram_size_mb", PCIQXLDevice, vram_size_mb, -1),
+        DEFINE_PROP_UINT32("vram_size_mb", PCIQXLDevice, vram32_size_mb, 0),
+        DEFINE_PROP_UINT32("vram64_size_mb", PCIQXLDevice, vram_size_mb, 0),
         DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/qxl.h b/hw/qxl.h
index 86e415b..11a0db3 100644
--- a/hw/qxl.h
+++ b/hw/qxl.h
@@ -16,6 +16,10 @@  enum qxl_mode {
     QXL_MODE_NATIVE,
 };
 
+#ifndef QXL_VRAM64_RANGE_INDEX
+#define QXL_VRAM64_RANGE_INDEX 4
+#endif
+
 #define QXL_UNDEFINED_IO UINT32_MAX
 
 #define QXL_NUM_DIRTY_RECTS 64
@@ -88,6 +92,8 @@  typedef struct PCIQXLDevice {
     /* vram pci bar */
     uint32_t           vram_size;
     MemoryRegion       vram_bar;
+    uint32_t           vram32_size;
+    MemoryRegion       vram32_bar;
 
     /* io bar */
     MemoryRegion       io_bar;
@@ -95,6 +101,7 @@  typedef struct PCIQXLDevice {
     /* user-friendly properties (in megabytes) */
     uint32_t          ram_size_mb;
     uint32_t          vram_size_mb;
+    uint32_t          vram32_size_mb;
 
     /* qxl_render_update state */
     int                render_update_cookie_num;
-- 
1.7.9.3


Comments

On 04/16/2012 03:02 PM, Hans de Goede wrote:
>
> Could it be that the patch somehow causes seabios and/or Xorg to
> see the region as different wrt being cachable / uncachable or
> some such?

In general we ignore cacheability in kvm.  Most likely a but in the
memory core caused the region to be interpreted as mmio instead of RAM,
so every access costs a few microseconds to trap and emulate.