[Mesa-dev] egl/dri2: dri2_make_current: Make sure display is initialized before using it

Submitted by Nicolas Boichat on July 15, 2016, 8:28 a.m.

Details

Message ID 1468571280-35869-1-git-send-email-drinkcat@google.com
State Superseded
Series "egl/dri2: dri2_make_current: Make sure display is initialized before using it"
Headers show

Commit Message

Nicolas Boichat July 15, 2016, 8:28 a.m.
android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
eglTerminate, followed by eglReleaseThread. In that case, the
display will not be initialized when eglReleaseThread calls
MakeCurrent with NULL parameters, to unbind the context, which
causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
either in glFlush or in a latter call.

A similar case is observed in this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).

This patch:
 1. Validates that the display is initialized, if ctx/dsurf/rsurf
    are not all NULL.
 2. Does not call glFlush/unBindContext is there is no display.
 3. However, it still goes through the normal path as
    drv->API.DestroyContext decrements the reference count on the
    context, and frees the structure.

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---

I did not actually test the case reported on the bug, only the CTS
test, would be nice if someone can try it out (Rhys? Chad?).

Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
fails", but wanted to keep the 2 patches separate as they are different
problems and discussions.

 src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 2e97d85..3269e52 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -1166,7 +1166,8 @@  dri2_destroy_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
    struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
 
    if (_eglPutContext(ctx)) {
-      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
+      if (dri2_dpy)
+         dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
       free(dri2_ctx);
    }
 
@@ -1189,6 +1190,10 @@  dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
    __DRIdrawable *ddraw, *rdraw;
    __DRIcontext *cctx;
 
+   /* Check that the display is initialized */
+   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
+      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
+
    /* make new bindings */
    if (!_eglBindContext(ctx, dsurf, rsurf, &old_ctx, &old_dsurf, &old_rsurf)) {
       /* _eglBindContext already sets the EGL error (in _eglCheckMakeCurrent) */
@@ -1196,14 +1201,14 @@  dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
    }
 
    /* flush before context switch */
-   if (old_ctx && dri2_drv->glFlush)
+   if (old_ctx && dri2_dpy && dri2_drv->glFlush)
       dri2_drv->glFlush();
 
    ddraw = (dsurf) ? dri2_dpy->vtbl->get_dri_drawable(dsurf) : NULL;
    rdraw = (rsurf) ? dri2_dpy->vtbl->get_dri_drawable(rsurf) : NULL;
    cctx = (dri2_ctx) ? dri2_ctx->dri_context : NULL;
 
-   if (old_ctx) {
+   if (old_ctx && dri2_dpy) {
       __DRIcontext *old_cctx = dri2_egl_context(old_ctx)->dri_context;
       dri2_dpy->core->unbindContext(old_cctx);
    }
@@ -1292,6 +1297,8 @@  static EGLBoolean
 dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
 {
    struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
+   if (!dri2_dpy)
+      return EGL_TRUE;
    return dri2_dpy->vtbl->destroy_surface(drv, dpy, surf);
 }
 

Comments

Rhys Kidd July 15, 2016, 11:16 a.m.
Whilst I don't know that code as well as I'd like. It does get my

Tested-by: Rhys Kidd <rhyskidd@gmail.com>

for fixing that piglit on i965 ILK. Thanks for the two patches!

On 15 July 2016 at 04:28, Nicolas Boichat <drinkcat@chromium.org> wrote:

> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
> eglTerminate, followed by eglReleaseThread. In that case, the
> display will not be initialized when eglReleaseThread calls
> MakeCurrent with NULL parameters, to unbind the context, which
> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
> either in glFlush or in a latter call.
>
> A similar case is observed in this bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
> call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>
> This patch:
>  1. Validates that the display is initialized, if ctx/dsurf/rsurf
>     are not all NULL.
>  2. Does not call glFlush/unBindContext is there is no display.
>  3. However, it still goes through the normal path as
>     drv->API.DestroyContext decrements the reference count on the
>     context, and frees the structure.
>
> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> ---
>
> I did not actually test the case reported on the bug, only the CTS
> test, would be nice if someone can try it out (Rhys? Chad?).
>
> Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
> fails", but wanted to keep the 2 patches separate as they are different
> problems and discussions.
>
>  src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c
> b/src/egl/drivers/dri2/egl_dri2.c
> index 2e97d85..3269e52 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -1166,7 +1166,8 @@ dri2_destroy_context(_EGLDriver *drv, _EGLDisplay
> *disp, _EGLContext *ctx)
>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>
>     if (_eglPutContext(ctx)) {
> -      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
> +      if (dri2_dpy)
> +         dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
>        free(dri2_ctx);
>     }
>
> @@ -1189,6 +1190,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay
> *disp, _EGLSurface *dsurf,
>     __DRIdrawable *ddraw, *rdraw;
>     __DRIcontext *cctx;
>
> +   /* Check that the display is initialized */
> +   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
> +      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
> +
>     /* make new bindings */
>     if (!_eglBindContext(ctx, dsurf, rsurf, &old_ctx, &old_dsurf,
> &old_rsurf)) {
>        /* _eglBindContext already sets the EGL error (in
> _eglCheckMakeCurrent) */
> @@ -1196,14 +1201,14 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay
> *disp, _EGLSurface *dsurf,
>     }
>
>     /* flush before context switch */
> -   if (old_ctx && dri2_drv->glFlush)
> +   if (old_ctx && dri2_dpy && dri2_drv->glFlush)
>        dri2_drv->glFlush();
>
>     ddraw = (dsurf) ? dri2_dpy->vtbl->get_dri_drawable(dsurf) : NULL;
>     rdraw = (rsurf) ? dri2_dpy->vtbl->get_dri_drawable(rsurf) : NULL;
>     cctx = (dri2_ctx) ? dri2_ctx->dri_context : NULL;
>
> -   if (old_ctx) {
> +   if (old_ctx && dri2_dpy) {
>        __DRIcontext *old_cctx = dri2_egl_context(old_ctx)->dri_context;
>        dri2_dpy->core->unbindContext(old_cctx);
>     }
> @@ -1292,6 +1297,8 @@ static EGLBoolean
>  dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
>  {
>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
> +   if (!dri2_dpy)
> +      return EGL_TRUE;
>     return dri2_dpy->vtbl->destroy_surface(drv, dpy, surf);
>  }
>
> --
> 2.8.0.rc3.226.g39d4020
>
>
Emil Velikov July 15, 2016, 1:03 p.m.
On 15 July 2016 at 09:28, Nicolas Boichat <drinkcat@chromium.org> wrote:
> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
> eglTerminate, followed by eglReleaseThread. In that case, the
> display will not be initialized when eglReleaseThread calls
> MakeCurrent with NULL parameters, to unbind the context, which
> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
> either in glFlush or in a latter call.
>
> A similar case is observed in this bug:
> https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
> call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>
> This patch:
>  1. Validates that the display is initialized, if ctx/dsurf/rsurf
>     are not all NULL.
>  2. Does not call glFlush/unBindContext is there is no display.
>  3. However, it still goes through the normal path as
>     drv->API.DestroyContext decrements the reference count on the
>     context, and frees the structure.
>
> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> ---
>
> I did not actually test the case reported on the bug, only the CTS
> test, would be nice if someone can try it out (Rhys? Chad?).
>
> Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
> fails", but wanted to keep the 2 patches separate as they are different
> problems and discussions.
>
>  src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index 2e97d85..3269e52 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -1166,7 +1166,8 @@ dri2_destroy_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>
>     if (_eglPutContext(ctx)) {
> -      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
> +      if (dri2_dpy)
Not sure if this is the right place/we need this.

When calling eglDestroyContext (after eglTerminate or not) this cannot
happen since the _EGL_CHECK_CONTEXT macro will dive into the _eglCheck
functions and will bail out since the since the dpy is no initialized.

The only case we can reach this is from dri2_make_current. See below for more.


> @@ -1189,6 +1190,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>     __DRIdrawable *ddraw, *rdraw;
>     __DRIcontext *cctx;
>
> +   /* Check that the display is initialized */
> +   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
> +      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
> +
Was doing to say "the first few lines in eglMakeCurrent) should
already handle these" only to realise that dri2_dpy doesn't wrap
around/subclass the respective _EGLfoo type (_EGLDisplay), like the
dri2_egl_{surface,context...} :-\

At the same time, dri2_make_current cannot (should not really) do
anything in the !dri2_dpy case, since we cannot:
 - call unbind the old ctx bind the new one, or even restore/rebind
the original ctx
 - free any of the ctx/surf resources. The latter should already be
gone, as upon eglTerminate walk over the ctx/surf lists and tear them
all down (see _eglReleaseDisplayResources).

Thus something like the following should be fine imho.

if (!dri2_dpy)
   /* copy/reword some my earlier comment in here */
   return 0;

>     /* make new bindings */
>     if (!_eglBindContext(ctx, dsurf, rsurf, &old_ctx, &old_dsurf, &old_rsurf)) {
>        /* _eglBindContext already sets the EGL error (in _eglCheckMakeCurrent) */
> @@ -1196,14 +1201,14 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>     }
>
>     /* flush before context switch */
> -   if (old_ctx && dri2_drv->glFlush)
> +   if (old_ctx && dri2_dpy && dri2_drv->glFlush)
Mildly related: We might want to bail out in dri2_load on
!dri2_drv->glFlush and drop the check here. If we cannot get glapi
and/or glFlush symbol things are seriously stuffed and pretending to
continue just won't work.

>  dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
>  {
>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
> +   if (!dri2_dpy)
> +      return EGL_TRUE;
Analogous to the ctx hunk, this shouldn't be needed.

Cheers,
Emil
Nicolas Boichat July 18, 2016, 3:19 a.m.
On Fri, Jul 15, 2016 at 9:03 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
> On 15 July 2016 at 09:28, Nicolas Boichat <drinkcat@chromium.org> wrote:
>> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
>> eglTerminate, followed by eglReleaseThread. In that case, the
>> display will not be initialized when eglReleaseThread calls
>> MakeCurrent with NULL parameters, to unbind the context, which
>> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
>> either in glFlush or in a latter call.
>>
>> A similar case is observed in this bug:
>> https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
>> call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>>
>> This patch:
>>  1. Validates that the display is initialized, if ctx/dsurf/rsurf
>>     are not all NULL.
>>  2. Does not call glFlush/unBindContext is there is no display.
>>  3. However, it still goes through the normal path as
>>     drv->API.DestroyContext decrements the reference count on the
>>     context, and frees the structure.
>>
>> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
>> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
>> ---
>>
>> I did not actually test the case reported on the bug, only the CTS
>> test, would be nice if someone can try it out (Rhys? Chad?).
>>
>> Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
>> fails", but wanted to keep the 2 patches separate as they are different
>> problems and discussions.
>>
>>  src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
>> index 2e97d85..3269e52 100644
>> --- a/src/egl/drivers/dri2/egl_dri2.c
>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>> @@ -1166,7 +1166,8 @@ dri2_destroy_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
>>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>>
>>     if (_eglPutContext(ctx)) {
>> -      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
>> +      if (dri2_dpy)
> Not sure if this is the right place/we need this.
>
> When calling eglDestroyContext (after eglTerminate or not) this cannot
> happen since the _EGL_CHECK_CONTEXT macro will dive into the _eglCheck
> functions and will bail out since the since the dpy is no initialized.
>
> The only case we can reach this is from dri2_make_current. See below for more.

Yes, that's correct.

>> @@ -1189,6 +1190,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>>     __DRIdrawable *ddraw, *rdraw;
>>     __DRIcontext *cctx;
>>
>> +   /* Check that the display is initialized */
>> +   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
>> +      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
>> +
> Was doing to say "the first few lines in eglMakeCurrent) should
> already handle these" only to realise that dri2_dpy doesn't wrap
> around/subclass the respective _EGLfoo type (_EGLDisplay), like the
> dri2_egl_{surface,context...} :-\
>
> At the same time, dri2_make_current cannot (should not really) do
> anything in the !dri2_dpy case, since we cannot:
>  - call unbind the old ctx bind the new one, or even restore/rebind
> the original ctx
>  - free any of the ctx/surf resources. The latter should already be
> gone, as upon eglTerminate walk over the ctx/surf lists and tear them
> all down (see _eglReleaseDisplayResources).
>
> Thus something like the following should be fine imho.
>
> if (!dri2_dpy)
>    /* copy/reword some my earlier comment in here */
>    return 0;

Yes, that was my first approach, but we'd still leak the current dri2_ctx.

If I trace the reference counting right:
1. eglCreateContext => API.CreateContext =>
 - _eglInitResource => RefCount = 1
 - _eglLinkContext => RefCount = 2
2. eglMakeCurrent => API.MakeCurrent => _eglBindContext =>
eglGetContext => RefCount = 3
3. eglTerminate => API.Terminate => _eglReleaseDisplayResources =>
  - _eglUnlinkContext => _eglUnlinkResource => Refcount = 2
  - API.DestroyContext => RefCount = 1 (so context structure is not freed)
4. eglReleaseThread (or eglMakeCurrent(dpy, NULL, NULL, NULL)) =>
API.MakeCurrent(drv, disp, NULL, NULL, NULL) => API.DestroyContext =>
Refcount = 0 and structure is freed (with my current patch).

Well, the structure is freed, but driDestroyContext is never called
(so pcp->driScreenPriv->driver->DestroyContext is not called), which,
looks a bit concerning...

So one way around this is to get eglTerminate to destroy the current
context(s). But then the spec says
(https://www.khronos.org/registry/egl/sdk/docs/man/html/eglTerminate.xhtml):
If contexts or surfaces associated with display is current to any
thread, they are not released until they are no longer current as a
result of eglMakeCurrent.

(tbh I'm not sure if we are handling surfaces correctly either, but,
well, that's another problem...)

>>     /* make new bindings */
>>     if (!_eglBindContext(ctx, dsurf, rsurf, &old_ctx, &old_dsurf, &old_rsurf)) {
>>        /* _eglBindContext already sets the EGL error (in _eglCheckMakeCurrent) */
>> @@ -1196,14 +1201,14 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>>     }
>>
>>     /* flush before context switch */
>> -   if (old_ctx && dri2_drv->glFlush)
>> +   if (old_ctx && dri2_dpy && dri2_drv->glFlush)
> Mildly related: We might want to bail out in dri2_load on
> !dri2_drv->glFlush and drop the check here. If we cannot get glapi
> and/or glFlush symbol things are seriously stuffed and pretending to
> continue just won't work.
>
>>  dri2_destroy_surface(_EGLDriver *drv, _EGLDisplay *dpy, _EGLSurface *surf)
>>  {
>>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(dpy);
>> +   if (!dri2_dpy)
>> +      return EGL_TRUE;
> Analogous to the ctx hunk, this shouldn't be needed.
>
> Cheers,
> Emil
Emil Velikov July 18, 2016, 8:37 a.m.
On 18 July 2016 at 04:19, Nicolas Boichat <drinkcat@chromium.org> wrote:
> On Fri, Jul 15, 2016 at 9:03 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
>> On 15 July 2016 at 09:28, Nicolas Boichat <drinkcat@chromium.org> wrote:
>>> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
>>> eglTerminate, followed by eglReleaseThread. In that case, the
>>> display will not be initialized when eglReleaseThread calls
>>> MakeCurrent with NULL parameters, to unbind the context, which
>>> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
>>> either in glFlush or in a latter call.
>>>
>>> A similar case is observed in this bug:
>>> https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
>>> call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>>>
>>> This patch:
>>>  1. Validates that the display is initialized, if ctx/dsurf/rsurf
>>>     are not all NULL.
>>>  2. Does not call glFlush/unBindContext is there is no display.
>>>  3. However, it still goes through the normal path as
>>>     drv->API.DestroyContext decrements the reference count on the
>>>     context, and frees the structure.
>>>
>>> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
>>> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
>>> ---
>>>
>>> I did not actually test the case reported on the bug, only the CTS
>>> test, would be nice if someone can try it out (Rhys? Chad?).
>>>
>>> Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
>>> fails", but wanted to keep the 2 patches separate as they are different
>>> problems and discussions.
>>>
>>>  src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
>>> index 2e97d85..3269e52 100644
>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>> @@ -1166,7 +1166,8 @@ dri2_destroy_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
>>>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>>>
>>>     if (_eglPutContext(ctx)) {
>>> -      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
>>> +      if (dri2_dpy)
>> Not sure if this is the right place/we need this.
>>
>> When calling eglDestroyContext (after eglTerminate or not) this cannot
>> happen since the _EGL_CHECK_CONTEXT macro will dive into the _eglCheck
>> functions and will bail out since the since the dpy is no initialized.
>>
>> The only case we can reach this is from dri2_make_current. See below for more.
>
> Yes, that's correct.
>
>>> @@ -1189,6 +1190,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>>>     __DRIdrawable *ddraw, *rdraw;
>>>     __DRIcontext *cctx;
>>>
>>> +   /* Check that the display is initialized */
>>> +   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
>>> +      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
>>> +
>> Was doing to say "the first few lines in eglMakeCurrent) should
>> already handle these" only to realise that dri2_dpy doesn't wrap
>> around/subclass the respective _EGLfoo type (_EGLDisplay), like the
>> dri2_egl_{surface,context...} :-\
>>
>> At the same time, dri2_make_current cannot (should not really) do
>> anything in the !dri2_dpy case, since we cannot:
>>  - call unbind the old ctx bind the new one, or even restore/rebind
>> the original ctx
>>  - free any of the ctx/surf resources. The latter should already be
>> gone, as upon eglTerminate walk over the ctx/surf lists and tear them
>> all down (see _eglReleaseDisplayResources).
>>
>> Thus something like the following should be fine imho.
>>
>> if (!dri2_dpy)
>>    /* copy/reword some my earlier comment in here */
>>    return 0;
>
> Yes, that was my first approach, but we'd still leak the current dri2_ctx.
>
> If I trace the reference counting right:
> 1. eglCreateContext => API.CreateContext =>
>  - _eglInitResource => RefCount = 1
>  - _eglLinkContext => RefCount = 2
> 2. eglMakeCurrent => API.MakeCurrent => _eglBindContext =>
> eglGetContext => RefCount = 3
> 3. eglTerminate => API.Terminate => _eglReleaseDisplayResources =>
>   - _eglUnlinkContext => _eglUnlinkResource => Refcount = 2
>   - API.DestroyContext => RefCount = 1 (so context structure is not freed)
> 4. eglReleaseThread (or eglMakeCurrent(dpy, NULL, NULL, NULL)) =>
> API.MakeCurrent(drv, disp, NULL, NULL, NULL) => API.DestroyContext =>
> Refcount = 0 and structure is freed (with my current patch).
>
> Well, the structure is freed, but driDestroyContext is never called
> (so pcp->driScreenPriv->driver->DestroyContext is not called), which,
> looks a bit concerning...
>
This is exactly what made me question the proposed approach.

> So one way around this is to get eglTerminate to destroy the current> context(s). But then the spec says
> (https://www.khronos.org/registry/egl/sdk/docs/man/html/eglTerminate.xhtml):
> If contexts or surfaces associated with display is current to any
> thread, they are not released until they are no longer current as a
> result of eglMakeCurrent.
>
The way I see/read it:
 - 'leaking' ctx/surf resources if the user has not called
[eglMakeCurrent(... !NULL, !NULL) + eglTerminate +] eglReleaseThread
or eglMakeCurrent(...NULL,NULL) [+ eglTerminate] is expected.
 - we need to defer dpy vtbls (and active dri dpy connection?) in
eglReleaseThread or ctx/surf will leak

IMHO the better approach is to defer the dpy teardown (in the case of
missing eglMakeCurrent(...NULL,NULL)) to eglReleaseThread. Since that
one is more evasive (but shouldn't be too crazy) we could get this
patch, and revert to 'if (!dri2_dpy) return;' once everything else is
handled.

Not sure when we'll get time for that one, so we want an inline
XXX/TODO or two that roughly explains the idea.

Related: from a very brief look we seems to be leaking the
_EGLsync/_EGLimage resources.

Regards,
Emil
Nicolas Boichat July 20, 2016, 6:51 a.m.
On Mon, Jul 18, 2016 at 4:37 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
> On 18 July 2016 at 04:19, Nicolas Boichat <drinkcat@chromium.org> wrote:
>> On Fri, Jul 15, 2016 at 9:03 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
>>> On 15 July 2016 at 09:28, Nicolas Boichat <drinkcat@chromium.org> wrote:
>>>> android.opengl.cts.WrapperTest#testGetIntegerv1 CTS test calls
>>>> eglTerminate, followed by eglReleaseThread. In that case, the
>>>> display will not be initialized when eglReleaseThread calls
>>>> MakeCurrent with NULL parameters, to unbind the context, which
>>>> causes a a segfault in drv->API.MakeCurrent (dri2_make_current),
>>>> either in glFlush or in a latter call.
>>>>
>>>> A similar case is observed in this bug:
>>>> https://bugs.freedesktop.org/show_bug.cgi?id=69622, where the test
>>>> call eglTerminate, then eglMakeCurrent(dpy, NULL, NULL, NULL).
>>>>
>>>> This patch:
>>>>  1. Validates that the display is initialized, if ctx/dsurf/rsurf
>>>>     are not all NULL.
>>>>  2. Does not call glFlush/unBindContext is there is no display.
>>>>  3. However, it still goes through the normal path as
>>>>     drv->API.DestroyContext decrements the reference count on the
>>>>     context, and frees the structure.
>>>>
>>>> Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
>>>> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
>>>> ---
>>>>
>>>> I did not actually test the case reported on the bug, only the CTS
>>>> test, would be nice if someone can try it out (Rhys? Chad?).
>>>>
>>>> Applies after "egl/dri2: dri2_make_current: Set EGL error if bindContext
>>>> fails", but wanted to keep the 2 patches separate as they are different
>>>> problems and discussions.
>>>>
>>>>  src/egl/drivers/dri2/egl_dri2.c | 13 ++++++++++---
>>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
>>>> index 2e97d85..3269e52 100644
>>>> --- a/src/egl/drivers/dri2/egl_dri2.c
>>>> +++ b/src/egl/drivers/dri2/egl_dri2.c
>>>> @@ -1166,7 +1166,8 @@ dri2_destroy_context(_EGLDriver *drv, _EGLDisplay *disp, _EGLContext *ctx)
>>>>     struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
>>>>
>>>>     if (_eglPutContext(ctx)) {
>>>> -      dri2_dpy->core->destroyContext(dri2_ctx->dri_context);
>>>> +      if (dri2_dpy)
>>> Not sure if this is the right place/we need this.
>>>
>>> When calling eglDestroyContext (after eglTerminate or not) this cannot
>>> happen since the _EGL_CHECK_CONTEXT macro will dive into the _eglCheck
>>> functions and will bail out since the since the dpy is no initialized.
>>>
>>> The only case we can reach this is from dri2_make_current. See below for more.
>>
>> Yes, that's correct.
>>
>>>> @@ -1189,6 +1190,10 @@ dri2_make_current(_EGLDriver *drv, _EGLDisplay *disp, _EGLSurface *dsurf,
>>>>     __DRIdrawable *ddraw, *rdraw;
>>>>     __DRIcontext *cctx;
>>>>
>>>> +   /* Check that the display is initialized */
>>>> +   if ((ctx != NULL || dsurf != NULL || rsurf != NULL) && !dri2_dpy)
>>>> +      return _eglError(EGL_NOT_INITIALIZED, "eglMakeCurrent error");
>>>> +
>>> Was doing to say "the first few lines in eglMakeCurrent) should
>>> already handle these" only to realise that dri2_dpy doesn't wrap
>>> around/subclass the respective _EGLfoo type (_EGLDisplay), like the
>>> dri2_egl_{surface,context...} :-\
>>>
>>> At the same time, dri2_make_current cannot (should not really) do
>>> anything in the !dri2_dpy case, since we cannot:
>>>  - call unbind the old ctx bind the new one, or even restore/rebind
>>> the original ctx
>>>  - free any of the ctx/surf resources. The latter should already be
>>> gone, as upon eglTerminate walk over the ctx/surf lists and tear them
>>> all down (see _eglReleaseDisplayResources).
>>>
>>> Thus something like the following should be fine imho.
>>>
>>> if (!dri2_dpy)
>>>    /* copy/reword some my earlier comment in here */
>>>    return 0;
>>
>> Yes, that was my first approach, but we'd still leak the current dri2_ctx.
>>
>> If I trace the reference counting right:
>> 1. eglCreateContext => API.CreateContext =>
>>  - _eglInitResource => RefCount = 1
>>  - _eglLinkContext => RefCount = 2
>> 2. eglMakeCurrent => API.MakeCurrent => _eglBindContext =>
>> eglGetContext => RefCount = 3
>> 3. eglTerminate => API.Terminate => _eglReleaseDisplayResources =>
>>   - _eglUnlinkContext => _eglUnlinkResource => Refcount = 2
>>   - API.DestroyContext => RefCount = 1 (so context structure is not freed)
>> 4. eglReleaseThread (or eglMakeCurrent(dpy, NULL, NULL, NULL)) =>
>> API.MakeCurrent(drv, disp, NULL, NULL, NULL) => API.DestroyContext =>
>> Refcount = 0 and structure is freed (with my current patch).
>>
>> Well, the structure is freed, but driDestroyContext is never called
>> (so pcp->driScreenPriv->driver->DestroyContext is not called), which,
>> looks a bit concerning...
>>
> This is exactly what made me question the proposed approach.
>
>> So one way around this is to get eglTerminate to destroy the current> context(s). But then the spec says
>> (https://www.khronos.org/registry/egl/sdk/docs/man/html/eglTerminate.xhtml):
>> If contexts or surfaces associated with display is current to any
>> thread, they are not released until they are no longer current as a
>> result of eglMakeCurrent.
>>
> The way I see/read it:
>  - 'leaking' ctx/surf resources if the user has not called
> [eglMakeCurrent(... !NULL, !NULL) + eglTerminate +] eglReleaseThread
> or eglMakeCurrent(...NULL,NULL) [+ eglTerminate] is expected.
>  - we need to defer dpy vtbls (and active dri dpy connection?) in
> eglReleaseThread or ctx/surf will leak
>
> IMHO the better approach is to defer the dpy teardown (in the case of
> missing eglMakeCurrent(...NULL,NULL)) to eglReleaseThread. Since that
> one is more evasive (but shouldn't be too crazy) we could get this
> patch, and revert to 'if (!dri2_dpy) return;' once everything else is
> handled.
>
> Not sure when we'll get time for that one, so we want an inline
> XXX/TODO or two that roughly explains the idea.

Turns out this patch is quite leaky in this CTS test:
android.opengl.cts.WrapperTest#testThreadCleanup, which basically runs
eglInitialize, eglMakeCurrent, eglTerminate, eglReleaseThread in
separate thread, 1000 times in a row. (before, it would crash, with
this patch, it leaks a lot of memory)

I managed to add some reference counting to dri2_dpy instead, I'll
upload a patch soon.

> Related: from a very brief look we seems to be leaking the
> _EGLsync/_EGLimage resources.
>
> Regards,
> Emil