Xorg-1.20.1 crashes when using glamor on top of llvmpipe

Submitted by Hans de Goede on Aug. 30, 2018, 8:48 p.m.

Details

Message ID 861768d5-c991-852f-a56c-28438c66a9e1@redhat.com
State New
Series "Xorg-1.20.1 crashes when using glamor on top of llvmpipe"
Headers show

Commit Message

Hans de Goede Aug. 30, 2018, 8:48 p.m.
HI all,

I've been debugging some strange crashes with Xorg-1.20.1 inside
a virtualbox guest and I can use some help with this.

At first Xorg completely failed to start, running it under
gdbserver showed a backtrace pointing to a lazy symbol lookup
failure triggered by:

drmmode_display.c:905:
         return gbm_bo_get_stride(bo->gbm);

Which is part of:


uint32_t
drmmode_bo_get_pitch(drmmode_bo *bo)
{
#ifdef GLAMOR_HAS_GBM
     if (bo->gbm)
         return gbm_bo_get_stride(bo->gbm);
#endif

     return bo->dumb->pitch;
}

Strange enough a LD_PRELOAD of libgbm does not
fix this and libgbm already gets dragged in by
libglamor_egl.so so this should not be a problem.

Still I tried this change:


And now Xorg will start, still very weird since the
exact same Xorg binaries work fine on Intel integrated
gfx where the gbm_bo_get_stride() call also happens ...


So with this "fix" it starts, but it crashes as soon as I resize the
vm-window and thus the screen gets resized:

bt
#0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
     at osinit.c:114
#1  <signal handler called>
#2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
     bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
#3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
     height=992) at drmmode_display.c:3166
#4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
     width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
#5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
     at rrscreen.c:289
#6  0x000000000043fcee in Dispatch () at dispatch.c:478

And miscrinit.c:64 is hte "{" of:

Bool
miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
                      int bitsPerPixel, int devKind, void *pPixData)
{
     if (!pPixmap)
         return FALSE;

Which is a strange place to crash. Even more weird after adding a
breakpoint a bit before drmmode_display.c:3166, I get a segfault
while stepping through earlier lines, pointing at SmartScheduleTimer()
and specifically again at the opening "{" as if something is wrong with
the stack and the stack cannot handle function calls being stacked one
level deeper.

This makes me wonder if this is a stack depth/overflow issue, does the
xserver have code somewhere to limit its stacksize and could we be
hitting that ?

Or maybe a bad interaction with gcc-s stack protection?

This feels as if we are hitting a guard page at the end of the stack
here?

Regards,

Hans

Patch hide | download patch | download mbox

--- a/hw/xfree86/drivers/modesetting/Makefile.am
+++ b/hw/xfree86/drivers/modesetting/Makefile.am
@@ -39,7 +39,7 @@  AM_CPPFLAGS = \

  modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
  modesetting_drv_la_LDFLAGS = -module -avoid-version
-modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
+modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
  modesetting_drv_ladir = @moduledir@/drivers

  modesetting_drv_la_SOURCES = \

Comments

Hans de Goede Aug. 30, 2018, 8:58 p.m.
Hi,

On 30-08-18 22:48, Hans de Goede wrote:
> HI all,
> 
> I've been debugging some strange crashes with Xorg-1.20.1 inside
> a virtualbox guest and I can use some help with this.
> 
> At first Xorg completely failed to start, running it under
> gdbserver showed a backtrace pointing to a lazy symbol lookup
> failure triggered by:
> 
> drmmode_display.c:905:
>          return gbm_bo_get_stride(bo->gbm);
> 
> Which is part of:
> 
> 
> uint32_t
> drmmode_bo_get_pitch(drmmode_bo *bo)
> {
> #ifdef GLAMOR_HAS_GBM
>      if (bo->gbm)
>          return gbm_bo_get_stride(bo->gbm);
> #endif
> 
>      return bo->dumb->pitch;
> }
> 
> Strange enough a LD_PRELOAD of libgbm does not
> fix this and libgbm already gets dragged in by
> libglamor_egl.so so this should not be a problem.
> 
> Still I tried this change:
> 
> --- a/hw/xfree86/drivers/modesetting/Makefile.am
> +++ b/hw/xfree86/drivers/modesetting/Makefile.am
> @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
> 
>   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
>   modesetting_drv_la_LDFLAGS = -module -avoid-version
> -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
> +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
>   modesetting_drv_ladir = @moduledir@/drivers
> 
>   modesetting_drv_la_SOURCES = \
> 
> And now Xorg will start, still very weird since the
> exact same Xorg binaries work fine on Intel integrated
> gfx where the gbm_bo_get_stride() call also happens ...
> 
> 
> So with this "fix" it starts, but it crashes as soon as I resize the
> vm-window and thus the screen gets resized:
> 
> bt
> #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
>      at osinit.c:114
> #1  <signal handler called>
> #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
>      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
> #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
>      height=992) at drmmode_display.c:3166
> #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
>      width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
> #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
>      at rrscreen.c:289
> #6  0x000000000043fcee in Dispatch () at dispatch.c:478
> 
> And miscrinit.c:64 is hte "{" of:
> 
> Bool
> miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
>                       int bitsPerPixel, int devKind, void *pPixData)
> {
>      if (!pPixmap)
>          return FALSE;
> 
> Which is a strange place to crash. Even more weird after adding a
> breakpoint a bit before drmmode_display.c:3166, I get a segfault
> while stepping through earlier lines, pointing at SmartScheduleTimer()
> and specifically again at the opening "{" as if something is wrong with
> the stack and the stack cannot handle function calls being stacked one
> level deeper.
> 
> This makes me wonder if this is a stack depth/overflow issue, does the
> xserver have code somewhere to limit its stacksize and could we be
> hitting that ?
> 
> Or maybe a bad interaction with gcc-s stack protection?
> 
> This feels as if we are hitting a guard page at the end of the stack
> here?

One important thing which I only put in the subject, this happens
when using glmamor with llvmpipe, something which is new in 1.20,
older xservers never used glamor on llvmpipe. Things work fine
if I disable glamor in a xorg.conf snippet.

Arguably we should disable glamor when running on llvmpipe because
of performance reasons, still these crashes should not happen.

Regards,

Hans
Ray Strode Aug. 30, 2018, 9:44 p.m.
hi,

what version of mesa?

might be


https://cgit.freedesktop.org/mesa/mesa/commit/?id=9baff597ce021f7691187b0d1d1bbc16d07b13e1

Ray

On Thu, Aug 30, 2018, 5:00 PM Hans de Goede <hdegoede@redhat.com> wrote:

> Hi,
>
> On 30-08-18 22:48, Hans de Goede wrote:
> > HI all,
> >
> > I've been debugging some strange crashes with Xorg-1.20.1 inside
> > a virtualbox guest and I can use some help with this.
> >
> > At first Xorg completely failed to start, running it under
> > gdbserver showed a backtrace pointing to a lazy symbol lookup
> > failure triggered by:
> >
> > drmmode_display.c:905:
> >          return gbm_bo_get_stride(bo->gbm);
> >
> > Which is part of:
> >
> >
> > uint32_t
> > drmmode_bo_get_pitch(drmmode_bo *bo)
> > {
> > #ifdef GLAMOR_HAS_GBM
> >      if (bo->gbm)
> >          return gbm_bo_get_stride(bo->gbm);
> > #endif
> >
> >      return bo->dumb->pitch;
> > }
> >
> > Strange enough a LD_PRELOAD of libgbm does not
> > fix this and libgbm already gets dragged in by
> > libglamor_egl.so so this should not be a problem.
> >
> > Still I tried this change:
> >
> > --- a/hw/xfree86/drivers/modesetting/Makefile.am
> > +++ b/hw/xfree86/drivers/modesetting/Makefile.am
> > @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
> >
> >   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
> >   modesetting_drv_la_LDFLAGS = -module -avoid-version
> > -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
> > +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
> >   modesetting_drv_ladir = @moduledir@/drivers
> >
> >   modesetting_drv_la_SOURCES = \
> >
> > And now Xorg will start, still very weird since the
> > exact same Xorg binaries work fine on Intel integrated
> > gfx where the gbm_bo_get_stride() call also happens ...
> >
> >
> > So with this "fix" it starts, but it crashes as soon as I resize the
> > vm-window and thus the screen gets resized:
> >
> > bt
> > #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
> >      at osinit.c:114
> > #1  <signal handler called>
> > #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992,
> depth=-1,
> >      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
> > #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0,
> width=1920,
> >      height=992) at drmmode_display.c:3166
> > #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
> >      width=1920, height=992, mmWidth=508, mmHeight=262) at
> xf86RandR12.c:698
> > #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
> >      at rrscreen.c:289
> > #6  0x000000000043fcee in Dispatch () at dispatch.c:478
> >
> > And miscrinit.c:64 is hte "{" of:
> >
> > Bool
> > miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
> >                       int bitsPerPixel, int devKind, void *pPixData)
> > {
> >      if (!pPixmap)
> >          return FALSE;
> >
> > Which is a strange place to crash. Even more weird after adding a
> > breakpoint a bit before drmmode_display.c:3166, I get a segfault
> > while stepping through earlier lines, pointing at SmartScheduleTimer()
> > and specifically again at the opening "{" as if something is wrong with
> > the stack and the stack cannot handle function calls being stacked one
> > level deeper.
> >
> > This makes me wonder if this is a stack depth/overflow issue, does the
> > xserver have code somewhere to limit its stacksize and could we be
> > hitting that ?
> >
> > Or maybe a bad interaction with gcc-s stack protection?
> >
> > This feels as if we are hitting a guard page at the end of the stack
> > here?
>
> One important thing which I only put in the subject, this happens
> when using glmamor with llvmpipe, something which is new in 1.20,
> older xservers never used glamor on llvmpipe. Things work fine
> if I disable glamor in a xorg.conf snippet.
>
> Arguably we should disable glamor when running on llvmpipe because
> of performance reasons, still these crashes should not happen.
>
> Regards,
>
> Hans
>
> _______________________________________________
> xorg-devel@lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: https://lists.x.org/mailman/listinfo/xorg-devel
Hans de Goede Sept. 2, 2018, 6:06 p.m.
Hi,

On 30-08-18 23:44, Ray Strode wrote:
> hi,
> 
> what version of mesa?
> 
> might be
> 
> https://cgit.freedesktop.org/mesa/mesa/commit/?id=9baff597ce021f7691187b0d1d1bbc16d07b13e1

Ah yes, we were still at 18.2.0-rc3, I'm preparing an update to 18.2.0-rc5 now, thanks
for pointing me to this.

Regards,

Hans


> 
> Ray
> 
> On Thu, Aug 30, 2018, 5:00 PM Hans de Goede <hdegoede@redhat.com <mailto:hdegoede@redhat.com>> wrote:
> 
>     Hi,
> 
>     On 30-08-18 22:48, Hans de Goede wrote:
>      > HI all,
>      >
>      > I've been debugging some strange crashes with Xorg-1.20.1 inside
>      > a virtualbox guest and I can use some help with this.
>      >
>      > At first Xorg completely failed to start, running it under
>      > gdbserver showed a backtrace pointing to a lazy symbol lookup
>      > failure triggered by:
>      >
>      > drmmode_display.c:905:
>      >          return gbm_bo_get_stride(bo->gbm);
>      >
>      > Which is part of:
>      >
>      >
>      > uint32_t
>      > drmmode_bo_get_pitch(drmmode_bo *bo)
>      > {
>      > #ifdef GLAMOR_HAS_GBM
>      >      if (bo->gbm)
>      >          return gbm_bo_get_stride(bo->gbm);
>      > #endif
>      >
>      >      return bo->dumb->pitch;
>      > }
>      >
>      > Strange enough a LD_PRELOAD of libgbm does not
>      > fix this and libgbm already gets dragged in by
>      > libglamor_egl.so so this should not be a problem.
>      >
>      > Still I tried this change:
>      >
>      > --- a/hw/xfree86/drivers/modesetting/Makefile.am
>      > +++ b/hw/xfree86/drivers/modesetting/Makefile.am
>      > @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
>      >
>      >   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la <http://modesetting_drv.la>
>      >   modesetting_drv_la_LDFLAGS = -module -avoid-version
>      > -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
>      > +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
>      >   modesetting_drv_ladir = @moduledir@/drivers
>      >
>      >   modesetting_drv_la_SOURCES = \
>      >
>      > And now Xorg will start, still very weird since the
>      > exact same Xorg binaries work fine on Intel integrated
>      > gfx where the gbm_bo_get_stride() call also happens ...
>      >
>      >
>      > So with this "fix" it starts, but it crashes as soon as I resize the
>      > vm-window and thus the screen gets resized:
>      >
>      > bt
>      > #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
>      >      at osinit.c:114
>      > #1  <signal handler called>
>      > #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
>      >      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
>      > #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
>      >      height=992) at drmmode_display.c:3166
>      > #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
>      >      width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
>      > #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
>      >      at rrscreen.c:289
>      > #6  0x000000000043fcee in Dispatch () at dispatch.c:478
>      >
>      > And miscrinit.c:64 is hte "{" of:
>      >
>      > Bool
>      > miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
>      >                       int bitsPerPixel, int devKind, void *pPixData)
>      > {
>      >      if (!pPixmap)
>      >          return FALSE;
>      >
>      > Which is a strange place to crash. Even more weird after adding a
>      > breakpoint a bit before drmmode_display.c:3166, I get a segfault
>      > while stepping through earlier lines, pointing at SmartScheduleTimer()
>      > and specifically again at the opening "{" as if something is wrong with
>      > the stack and the stack cannot handle function calls being stacked one
>      > level deeper.
>      >
>      > This makes me wonder if this is a stack depth/overflow issue, does the
>      > xserver have code somewhere to limit its stacksize and could we be
>      > hitting that ?
>      >
>      > Or maybe a bad interaction with gcc-s stack protection?
>      >
>      > This feels as if we are hitting a guard page at the end of the stack
>      > here?
> 
>     One important thing which I only put in the subject, this happens
>     when using glmamor with llvmpipe, something which is new in 1.20,
>     older xservers never used glamor on llvmpipe. Things work fine
>     if I disable glamor in a xorg.conf snippet.
> 
>     Arguably we should disable glamor when running on llvmpipe because
>     of performance reasons, still these crashes should not happen.
> 
>     Regards,
> 
>     Hans
> 
>     _______________________________________________
>     xorg-devel@lists.x.org <mailto:xorg-devel@lists.x.org>: X.Org development
>     Archives: http://lists.x.org/archives/xorg-devel
>     Info: https://lists.x.org/mailman/listinfo/xorg-devel
>