[Spice-devel,10/12] qxl-wddm-dod: Optimize allocation of memory chunks

Submitted by Yuri Benditovich on March 12, 2017, 8:45 a.m.

Details

Message ID 1489308309-9728-11-git-send-email-yuri.benditovich@daynix.com
State New
Headers show
Series "Set of patches for further support of VSync" ( rev: 1 ) in Spice

Not browsing as part of any series.

Commit Message

Yuri Benditovich March 12, 2017, 8:45 a.m.
Increased size of allocation to reduce number of allocation per
bitmap. Before change the procedure ignored 'alloc_size' parameter
and always allocated memory chunk according to 'size' parameter.
As a result, first chunk could be up to 64K and all following are
limited by line size. For example, for bitmap 1280x1024 there was
more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
We change the procedure to use chunk size up to 64K (similar to first
chunk). This reduces in described examples number of allocation from
1008 to 64 and 897 to 8 respectively.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
---
 qxldod/QxlDod.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
index 56a4cf2..94426dd 100755
--- a/qxldod/QxlDod.cpp
+++ b/qxldod/QxlDod.cpp
@@ -4586,7 +4586,7 @@  BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk **chunk_ptr, UINT8 **now_ptr,
     UINT8 *now = *now_ptr;
     UINT8 *end = *end_ptr;
     size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
-    alloc_size = MIN(size, maxAllocSize);
+    alloc_size = MIN(alloc_size, maxAllocSize);
     DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
 
     while (size) {

Comments

> 
> Increased size of allocation to reduce number of allocation per
> bitmap. Before change the procedure ignored 'alloc_size' parameter
> and always allocated memory chunk according to 'size' parameter.
> As a result, first chunk could be up to 64K and all following are
> limited by line size. For example, for bitmap 1280x1024 there was
> more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
> We change the procedure to use chunk size up to 64K (similar to first
> chunk). This reduces in described examples number of allocation from
> 1008 to 64 and 897 to 8 respectively.
> 
> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> ---
>  qxldod/QxlDod.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
> index 56a4cf2..94426dd 100755
> --- a/qxldod/QxlDod.cpp
> +++ b/qxldod/QxlDod.cpp
> @@ -4586,7 +4586,7 @@ BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk
> **chunk_ptr, UINT8 **now_ptr,
>      UINT8 *now = *now_ptr;
>      UINT8 *end = *end_ptr;
>      size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
> -    alloc_size = MIN(size, maxAllocSize);
> +    alloc_size = MIN(alloc_size, maxAllocSize);
>      DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
>  
>      while (size) {

As far as I can see this fix a regression introduced in 9/12 so
should be squashed together with 9/12.

On the same subject. Why we need to split the allocations?
Can't we try to allocate the whole image in one chunk and
only if fails fall back to multiple chunks ?

Frediano
On Tue, Mar 28, 2017 at 4:23 PM, Frediano Ziglio <fziglio@redhat.com> wrote:

> >
> > Increased size of allocation to reduce number of allocation per
> > bitmap. Before change the procedure ignored 'alloc_size' parameter
> > and always allocated memory chunk according to 'size' parameter.
> > As a result, first chunk could be up to 64K and all following are
> > limited by line size. For example, for bitmap 1280x1024 there was
> > more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
> > We change the procedure to use chunk size up to 64K (similar to first
> > chunk). This reduces in described examples number of allocation from
> > 1008 to 64 and 897 to 8 respectively.
> >
> > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
> > ---
> >  qxldod/QxlDod.cpp | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
> > index 56a4cf2..94426dd 100755
> > --- a/qxldod/QxlDod.cpp
> > +++ b/qxldod/QxlDod.cpp
> > @@ -4586,7 +4586,7 @@ BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk
> > **chunk_ptr, UINT8 **now_ptr,
> >      UINT8 *now = *now_ptr;
> >      UINT8 *end = *end_ptr;
> >      size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
> > -    alloc_size = MIN(size, maxAllocSize);
> > +    alloc_size = MIN(alloc_size, maxAllocSize);
> >      DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
> >
> >      while (size) {
>
> As far as I can see this fix a regression introduced in 9/12 so
> should be squashed together with 9/12.
>

There was no regression in 9/12, see commit description
This patch forces the driver to make bigger allocations and in result we
will have smaller amount of allocations per operation.



>
> On the same subject. Why we need to split the allocations?
> Can't we try to allocate the whole image in one chunk and
> only if fails fall back to multiple chunks ?
>

No, we can't do that, as maximal size of allocation from device memory in
driver's code was always limited by ~64K.
So allocation were always splitted, see numbers in commit description.
I suppose there is some reason for that.


> Frediano
>
> On Tue, Mar 28, 2017 at 4:23 PM, Frediano Ziglio < fziglio@redhat.com >
> wrote:

> > >
> 
> > > Increased size of allocation to reduce number of allocation per
> 
> > > bitmap. Before change the procedure ignored 'alloc_size' parameter
> 
> > > and always allocated memory chunk according to 'size' parameter.
> 
> > > As a result, first chunk could be up to 64K and all following are
> 
> > > limited by line size. For example, for bitmap 1280x1024 there was
> 
> > > more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
> 
> > > We change the procedure to use chunk size up to 64K (similar to first
> 
> > > chunk). This reduces in described examples number of allocation from
> 
> > > 1008 to 64 and 897 to 8 respectively.
> 
> > >
> 
> > > Signed-off-by: Yuri Benditovich < yuri.benditovich@daynix.com >
> 
> > > ---
> 
> > > qxldod/QxlDod.cpp | 2 +-
> 
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> > >
> 
> > > diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
> 
> > > index 56a4cf2..94426dd 100755
> 
> > > --- a/qxldod/QxlDod.cpp
> 
> > > +++ b/qxldod/QxlDod.cpp
> 
> > > @@ -4586,7 +4586,7 @@ BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk
> 
> > > **chunk_ptr, UINT8 **now_ptr,
> 
> > > UINT8 *now = *now_ptr;
> 
> > > UINT8 *end = *end_ptr;
> 
> > > size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
> 
> > > - alloc_size = MIN(size, maxAllocSize);
> 
> > > + alloc_size = MIN(alloc_size, maxAllocSize);
> 
> > > DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
> 
> > >
> 
> > > while (size) {
> 

> > As far as I can see this fix a regression introduced in 9/12 so
> 
> > should be squashed together with 9/12.
> 

> There was no regression in 9/12, see commit description
> This patch forces the driver to make bigger allocations and in result we will
> have smaller amount of allocations per operation.

Yes, you are right, there is this computation 

aligned_size = (int)MIN(alloc_size + alignment - 1, BITS_BUF_MAX); 
aligned_size -= aligned_size % alignment; 

but then the value is discarded and size is used instead. 

> > On the same subject. Why we need to split the allocations?
> 
> > Can't we try to allocate the whole image in one chunk and
> 
> > only if fails fall back to multiple chunks ?
> 

> No, we can't do that, as maximal size of allocation from device memory in
> driver's code was always limited by ~64K.
> So allocation were always splitted, see numbers in commit description.
> I suppose there is some reason for that.

Honestly knowing the server part I don't see any reasons. Maybe an history one... 

Frediano
On Tue, Mar 28, 2017 at 7:19 PM, Frediano Ziglio <fziglio@redhat.com> wrote:

>
> On Tue, Mar 28, 2017 at 4:23 PM, Frediano Ziglio <fziglio@redhat.com>
> wrote:
>
>> >
>> > Increased size of allocation to reduce number of allocation per
>> > bitmap. Before change the procedure ignored 'alloc_size' parameter
>> > and always allocated memory chunk according to 'size' parameter.
>> > As a result, first chunk could be up to 64K and all following are
>> > limited by line size. For example, for bitmap 1280x1024 there was
>> > more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
>> > We change the procedure to use chunk size up to 64K (similar to first
>> > chunk). This reduces in described examples number of allocation from
>> > 1008 to 64 and 897 to 8 respectively.
>> >
>> > Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
>> > ---
>> >  qxldod/QxlDod.cpp | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
>> > index 56a4cf2..94426dd 100755
>> > --- a/qxldod/QxlDod.cpp
>> > +++ b/qxldod/QxlDod.cpp
>> > @@ -4586,7 +4586,7 @@ BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk
>> > **chunk_ptr, UINT8 **now_ptr,
>> >      UINT8 *now = *now_ptr;
>> >      UINT8 *end = *end_ptr;
>> >      size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
>> > -    alloc_size = MIN(size, maxAllocSize);
>> > +    alloc_size = MIN(alloc_size, maxAllocSize);
>> >      DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
>> >
>> >      while (size) {
>>
>> As far as I can see this fix a regression introduced in 9/12 so
>> should be squashed together with 9/12.
>>
>
> There was no regression in 9/12, see commit description
> This patch forces the driver to make bigger allocations and in result we
> will have smaller amount of allocations per operation.
>
> Yes, you are right, there is this computation
>
>   aligned_size = (int)MIN(alloc_size + alignment - 1, BITS_BUF_MAX);
>   aligned_size -=  aligned_size % alignment;
>
> but then the value is discarded and size is used instead.
>
>
>> On the same subject. Why we need to split the allocations?
>> Can't we try to allocate the whole image in one chunk and
>> only if fails fall back to multiple chunks ?
>>
>
> No, we can't do that, as maximal size of allocation from device memory in
> driver's code was always limited by ~64K.
> So allocation were always splitted, see numbers in commit description.
> I suppose there is some reason for that.
>
> Honestly knowing the server part I don't see any reasons. Maybe an history
> one...
>
> If any version of Spice server is able to work with bigger chunks, this
optimization can be evaluated later.
In any case it requires non-forced allocations to be fully implemented and
used ( as we can 'try' to allocate
full size chunk only if we do not block during such allocation).



> Frediano
>
>
> On Tue, Mar 28, 2017 at 7:19 PM, Frediano Ziglio < fziglio@redhat.com >
> wrote:

> > > On Tue, Mar 28, 2017 at 4:23 PM, Frediano Ziglio < fziglio@redhat.com >
> > > wrote:
> > 
> 

> > > > >
> > > 
> > 
> 
> > > > > Increased size of allocation to reduce number of allocation per
> > > 
> > 
> 
> > > > > bitmap. Before change the procedure ignored 'alloc_size' parameter
> > > 
> > 
> 
> > > > > and always allocated memory chunk according to 'size' parameter.
> > > 
> > 
> 
> > > > > As a result, first chunk could be up to 64K and all following are
> > > 
> > 
> 
> > > > > limited by line size. For example, for bitmap 1280x1024 there was
> > > 
> > 
> 
> > > > > more than 1008 chunks allocated, for bitmap 128x1024 - 897 chunks.
> > > 
> > 
> 
> > > > > We change the procedure to use chunk size up to 64K (similar to first
> > > 
> > 
> 
> > > > > chunk). This reduces in described examples number of allocation from
> > > 
> > 
> 
> > > > > 1008 to 64 and 897 to 8 respectively.
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Signed-off-by: Yuri Benditovich < yuri.benditovich@daynix.com >
> > > 
> > 
> 
> > > > > ---
> > > 
> > 
> 
> > > > > qxldod/QxlDod.cpp | 2 +-
> > > 
> > 
> 
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > diff --git a/qxldod/QxlDod.cpp b/qxldod/QxlDod.cpp
> > > 
> > 
> 
> > > > > index 56a4cf2..94426dd 100755
> > > 
> > 
> 
> > > > > --- a/qxldod/QxlDod.cpp
> > > 
> > 
> 
> > > > > +++ b/qxldod/QxlDod.cpp
> > > 
> > 
> 
> > > > > @@ -4586,7 +4586,7 @@ BOOLEAN QxlDevice::PutBytesAlign(QXLDataChunk
> > > 
> > 
> 
> > > > > **chunk_ptr, UINT8 **now_ptr,
> > > 
> > 
> 
> > > > > UINT8 *now = *now_ptr;
> > > 
> > 
> 
> > > > > UINT8 *end = *end_ptr;
> > > 
> > 
> 
> > > > > size_t maxAllocSize = BITS_BUF_MAX - BITS_BUF_MAX % size;
> > > 
> > 
> 
> > > > > - alloc_size = MIN(size, maxAllocSize);
> > > 
> > 
> 
> > > > > + alloc_size = MIN(alloc_size, maxAllocSize);
> > > 
> > 
> 
> > > > > DbgPrint(TRACE_LEVEL_VERBOSE, ("---> %s\n", __FUNCTION__));
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > while (size) {
> > > 
> > 
> 

> > > > As far as I can see this fix a regression introduced in 9/12 so
> > > 
> > 
> 
> > > > should be squashed together with 9/12.
> > > 
> > 
> 

> > > There was no regression in 9/12, see commit description
> > 
> 
> > > This patch forces the driver to make bigger allocations and in result we
> > > will
> > > have smaller amount of allocations per operation.
> > 
> 

> > Yes, you are right, there is this computation
> 

> > aligned_size = (int)MIN(alloc_size + alignment - 1, BITS_BUF_MAX);
> 
> > aligned_size -= aligned_size % alignment;
> 

> > but then the value is discarded and size is used instead.
> 

> > > > On the same subject. Why we need to split the allocations?
> > > 
> > 
> 
> > > > Can't we try to allocate the whole image in one chunk and
> > > 
> > 
> 
> > > > only if fails fall back to multiple chunks ?
> > > 
> > 
> 

> > > No, we can't do that, as maximal size of allocation from device memory in
> > > driver's code was always limited by ~64K.
> > 
> 
> > > So allocation were always splitted, see numbers in commit description.
> > 
> 
> > > I suppose there is some reason for that.
> > 
> 

> > Honestly knowing the server part I don't see any reasons. Maybe an history
> > one...
> 

> If any version of Spice server is able to work with bigger chunks, this
> optimization can be evaluated later.
> In any case it requires non-forced allocations to be fully implemented and
> used ( as we can 'try' to allocate
> full size chunk only if we do not block during such allocation).

Was not a request to change anything but I noted that different split of the image are common, 
from Linux driver to other layers (spice or not). 
About spice I won't be surprised to learn that all this was a old compatibility with some 
16 bit computation. Just to make sure that there are no reason beside previous code 
for this WDDM DOD driver. 

Frediano