glx/dri: Take an extra reference on our own GLX drawables

Submitted by Adam Jackson on May 8, 2018, 7:47 p.m.

Details

Message ID 20180508194707.22492-1-ajax@redhat.com
State New
Headers show
Series "glx/dri: Take an extra reference on our own GLX drawables" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Adam Jackson May 8, 2018, 7:47 p.m.
dri*_bind_context, when switching current drawables, will drop the
reference on the old one; since that refcount has probably now gone to
zero that means we lose all the state we applied to that drawable
before, like when swaps are expected to complete.

Dropping this reference might make some sense for drawables that aren't
_ours_, since we don't get events for destroyed resources and need to
rely on the server throwing errors when we name a no-longer-valid
drawable. But if the resource is one that this client created, we can be
reasonably sure that it will be explicitly destroyed by the same client
- and if not, the client is likely to exit anyway, so the memory leak
doesn't matter.

So, bump the refcnt if the XID of the drawable indicates that it's one
of ours. This is, admittedly, a hack. The proper solution would involve
rather more surgery to the MakeCurrent path than I can type quickly, let
alone test promptly against a wide enough range of servers and DRIs to
have any confidence in. I'll work on the real solution, but in the
meantime this is effectively not a memory leak for any real scenario,
and fixes a real bug.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Mike Lothian <mike@fireburn.co.uk>
Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106351
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106372
---
 src/glx/dri_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
index ab5d6c5bc0..d42ca71124 100644
--- a/src/glx/dri_common.c
+++ b/src/glx/dri_common.c
@@ -411,7 +411,8 @@  driInferDrawableConfig(struct glx_screen *psc, GLXDrawable draw)
 _X_HIDDEN __GLXDRIdrawable *
 driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
 {
-   struct glx_display *const priv = __glXInitialize(gc->psc->dpy);
+   Display *dpy = gc->psc->dpy;
+   struct glx_display *const priv = __glXInitialize(dpy);
    __GLXDRIdrawable *pdraw;
    struct glx_screen *psc;
    struct glx_config *config = gc->config;
@@ -449,6 +450,8 @@  driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
       return NULL;
    }
    pdraw->refcount = 1;
+   if ((glxDrawable & dpy->resource_mask) == dpy->resource_base)
+      pdraw->refcount ++;
 
    return pdraw;
 }

Comments

On 2018-05-08 09:47 PM, Adam Jackson wrote:
> dri*_bind_context, when switching current drawables, will drop the
> reference on the old one; since that refcount has probably now gone to
> zero that means we lose all the state we applied to that drawable
> before, like when swaps are expected to complete.
> 
> Dropping this reference might make some sense for drawables that aren't
> _ours_, since we don't get events for destroyed resources and need to
> rely on the server throwing errors when we name a no-longer-valid
> drawable. But if the resource is one that this client created, we can be
> reasonably sure that it will be explicitly destroyed by the same client

Is there any mechanism in place for this to result in
loader_dri3_drawable_fini actually getting called when destroying a
window without using glXDestroyWindow?


> - and if not, the client is likely to exit anyway, so the memory leak
> doesn't matter.>
> So, bump the refcnt if the XID of the drawable indicates that it's one
> of ours. This is, admittedly, a hack. The proper solution would involve
> rather more surgery to the MakeCurrent path than I can type quickly, let
> alone test promptly against a wide enough range of servers and DRIs to
> have any confidence in. I'll work on the real solution, but in the
> meantime this is effectively not a memory leak for any real scenario,
> and fixes a real bug.

An application which opens and closes many windows doesn't seem that
unrealistic to me, and I suspect many (most?) GLX users still don't use
glXDestroyWindow, so I'm not sure I can agree with this assessment.


> @@ -449,6 +450,8 @@ driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
>        return NULL;
>     }
>     pdraw->refcount = 1;
> +   if ((glxDrawable & dpy->resource_mask) == dpy->resource_base)

Should this be

   if ((glxDrawable & ~dpy->resource_mask) == dpy->resource_base)

(Negated dpy->resource_mask)? Otherwise this fails to take effect for me
here with glxgears:

(gdb) p/x glxDrawable
$1 = 0x200002
(gdb) p/x dpy->resource_mask
$2 = 0x1fffff
(gdb) p/x dpy->resource_base
$3 = 0x200000


With the code changed as above, pdraw->refcount is incremented to 2, and
then never drops to 0 with glxgears, and loader_dri3_drawable_fini is
never called, although glxgears explicitly calls XDestroyWindow.


How about the idea I described before, saving the loader_dri3_drawable
values in a hash table in dri3_screen? Maybe we could simply save a
pointer to the whole struct, after only freeing the renderbuffers.
On Tue, 8 May 2018 at 20:47 Adam Jackson <ajax@redhat.com> wrote:

> dri*_bind_context, when switching current drawables, will drop the
> reference on the old one; since that refcount has probably now gone to
> zero that means we lose all the state we applied to that drawable
> before, like when swaps are expected to complete.
>
> Dropping this reference might make some sense for drawables that aren't
> _ours_, since we don't get events for destroyed resources and need to
> rely on the server throwing errors when we name a no-longer-valid
> drawable. But if the resource is one that this client created, we can be
> reasonably sure that it will be explicitly destroyed by the same client
> - and if not, the client is likely to exit anyway, so the memory leak
> doesn't matter.
>
> So, bump the refcnt if the XID of the drawable indicates that it's one
> of ours. This is, admittedly, a hack. The proper solution would involve
> rather more surgery to the MakeCurrent path than I can type quickly, let
> alone test promptly against a wide enough range of servers and DRIs to
> have any confidence in. I'll work on the real solution, but in the
> meantime this is effectively not a memory leak for any real scenario,
> and fixes a real bug.
>
> Signed-off-by: Adam Jackson <ajax@redhat.com>
> Cc: Michel Dänzer <michel.daenzer@amd.com>
> Cc: Mike Lothian <mike@fireburn.co.uk>
> Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106351
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106372
> ---
>  src/glx/dri_common.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
> index ab5d6c5bc0..d42ca71124 100644
> --- a/src/glx/dri_common.c
> +++ b/src/glx/dri_common.c
> @@ -411,7 +411,8 @@ driInferDrawableConfig(struct glx_screen *psc,
> GLXDrawable draw)
>  _X_HIDDEN __GLXDRIdrawable *
>  driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
>  {
> -   struct glx_display *const priv = __glXInitialize(gc->psc->dpy);
> +   Display *dpy = gc->psc->dpy;
> +   struct glx_display *const priv = __glXInitialize(dpy);
>     __GLXDRIdrawable *pdraw;
>     struct glx_screen *psc;
>     struct glx_config *config = gc->config;
> @@ -449,6 +450,8 @@ driFetchDrawable(struct glx_context *gc, GLXDrawable
> glxDrawable)
>        return NULL;
>     }
>     pdraw->refcount = 1;
> +   if ((glxDrawable & dpy->resource_mask) == dpy->resource_base)
> +      pdraw->refcount ++;
>
>     return pdraw;
>  }
> --
> 2.17.0
>
>
Hi

This doesn't fix the freezes in Plasmashell

Both of Mario's patches improved things

Regards

Mike
With Michel's suggestion I've not seen a freeze in Plasmashell or Steam so
far

Tested-by: Mike Lothian <mike@fireburn.co.uk>

On Wed, 9 May 2018 at 21:47 Mike Lothian <mike@fireburn.co.uk> wrote:

> On Tue, 8 May 2018 at 20:47 Adam Jackson <ajax@redhat.com> wrote:
>
>> dri*_bind_context, when switching current drawables, will drop the
>> reference on the old one; since that refcount has probably now gone to
>> zero that means we lose all the state we applied to that drawable
>> before, like when swaps are expected to complete.
>>
>> Dropping this reference might make some sense for drawables that aren't
>> _ours_, since we don't get events for destroyed resources and need to
>> rely on the server throwing errors when we name a no-longer-valid
>> drawable. But if the resource is one that this client created, we can be
>> reasonably sure that it will be explicitly destroyed by the same client
>> - and if not, the client is likely to exit anyway, so the memory leak
>> doesn't matter.
>>
>> So, bump the refcnt if the XID of the drawable indicates that it's one
>> of ours. This is, admittedly, a hack. The proper solution would involve
>> rather more surgery to the MakeCurrent path than I can type quickly, let
>> alone test promptly against a wide enough range of servers and DRIs to
>> have any confidence in. I'll work on the real solution, but in the
>> meantime this is effectively not a memory leak for any real scenario,
>> and fixes a real bug.
>>
>> Signed-off-by: Adam Jackson <ajax@redhat.com>
>> Cc: Michel Dänzer <michel.daenzer@amd.com>
>> Cc: Mike Lothian <mike@fireburn.co.uk>
>> Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Cc: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106351
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106372
>> ---
>>  src/glx/dri_common.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/glx/dri_common.c b/src/glx/dri_common.c
>> index ab5d6c5bc0..d42ca71124 100644
>> --- a/src/glx/dri_common.c
>> +++ b/src/glx/dri_common.c
>> @@ -411,7 +411,8 @@ driInferDrawableConfig(struct glx_screen *psc,
>> GLXDrawable draw)
>>  _X_HIDDEN __GLXDRIdrawable *
>>  driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
>>  {
>> -   struct glx_display *const priv = __glXInitialize(gc->psc->dpy);
>> +   Display *dpy = gc->psc->dpy;
>> +   struct glx_display *const priv = __glXInitialize(dpy);
>>     __GLXDRIdrawable *pdraw;
>>     struct glx_screen *psc;
>>     struct glx_config *config = gc->config;
>> @@ -449,6 +450,8 @@ driFetchDrawable(struct glx_context *gc, GLXDrawable
>> glxDrawable)
>>        return NULL;
>>     }
>>     pdraw->refcount = 1;
>> +   if ((glxDrawable & dpy->resource_mask) == dpy->resource_base)
>> +      pdraw->refcount ++;
>>
>>     return pdraw;
>>  }
>> --
>> 2.17.0
>>
>>
> Hi
>
> This doesn't fix the freezes in Plasmashell
>
> Both of Mario's patches improved things
>
> Regards
>
> Mike
>
On Wed, 2018-05-09 at 18:17 +0200, Michel Dänzer wrote:
> On 2018-05-08 09:47 PM, Adam Jackson wrote:
> > dri*_bind_context, when switching current drawables, will drop the
> > reference on the old one; since that refcount has probably now gone to
> > zero that means we lose all the state we applied to that drawable
> > before, like when swaps are expected to complete.
> > 
> > Dropping this reference might make some sense for drawables that aren't
> > _ours_, since we don't get events for destroyed resources and need to
> > rely on the server throwing errors when we name a no-longer-valid
> > drawable. But if the resource is one that this client created, we can be
> > reasonably sure that it will be explicitly destroyed by the same client
> 
> Is there any mechanism in place for this to result in
> loader_dri3_drawable_fini actually getting called when destroying a
> window without using glXDestroyWindow?

No; there's not a way to hook XDestroyWindow, and we can't ourselves
listen for DestroyNotifys. In fact even closing the display wouldn't be
 enough, though I think that's true with or without this patch.

> > @@ -449,6 +450,8 @@ driFetchDrawable(struct glx_context *gc, GLXDrawable glxDrawable)
> >        return NULL;
> >     }
> >     pdraw->refcount = 1;
> > +   if ((glxDrawable & dpy->resource_mask) == dpy->resource_base)
> 
> Should this be
> 
>    if ((glxDrawable & ~dpy->resource_mask) == dpy->resource_base)
> 
> (Negated dpy->resource_mask)? Otherwise this fails to take effect for me
> here with glxgears:

Argh, yes. Too used to being on the server side where you have
clientAsMask.

> How about the idea I described before, saving the loader_dri3_drawable
> values in a hash table in dri3_screen? Maybe we could simply save a
> pointer to the whole struct, after only freeing the renderbuffers.

That would also leak, less in absolute terms but the same big-O, for
the same reasons.

- ajax
On 2018-05-10 05:09 PM, Adam Jackson wrote:
> On Wed, 2018-05-09 at 18:17 +0200, Michel Dänzer wrote:
> 
>> How about the idea I described before, saving the loader_dri3_drawable
>> values in a hash table in dri3_screen? Maybe we could simply save a
>> pointer to the whole struct, after only freeing the renderbuffers.
> 
> That would also leak, less in absolute terms but the same big-O, for
> the same reasons.

Sure, but with the same number x of leaked objects, there's a difference
between leaking x times tens of bytes, or x times potentially megabytes.
On 2018-05-11 10:23 AM, Michel Dänzer wrote:
> On 2018-05-10 05:09 PM, Adam Jackson wrote:
>> On Wed, 2018-05-09 at 18:17 +0200, Michel Dänzer wrote:
>>
>>> How about the idea I described before, saving the loader_dri3_drawable
>>> values in a hash table in dri3_screen? Maybe we could simply save a
>>> pointer to the whole struct, after only freeing the renderbuffers.
>>
>> That would also leak, less in absolute terms but the same big-O, for
>> the same reasons.
> 
> Sure, but with the same number x of leaked objects, there's a difference
> between leaking x times tens of bytes, or x times potentially megabytes.

Also, if we wanted to get fancy, we could periodically (in a thread?)
iterate over the hash table entries and check if the windows still exist.