Xorg-1.20.1 crashes when using glamor on top of llvmpipe

Submitted by Hans de Goede on Aug. 30, 2018, 8:48 p.m.

Details

Message ID 861768d5-c991-852f-a56c-28438c66a9e1@redhat.com
State New
Headers show
Series "Xorg-1.20.1 crashes when using glamor on top of llvmpipe" ( rev: 1 ) in X.org

Not browsing as part of any series.

Commit Message

Hans de Goede Aug. 30, 2018, 8:48 p.m.
HI all,

I've been debugging some strange crashes with Xorg-1.20.1 inside
a virtualbox guest and I can use some help with this.

At first Xorg completely failed to start, running it under
gdbserver showed a backtrace pointing to a lazy symbol lookup
failure triggered by:

drmmode_display.c:905:
         return gbm_bo_get_stride(bo->gbm);

Which is part of:


uint32_t
drmmode_bo_get_pitch(drmmode_bo *bo)
{
#ifdef GLAMOR_HAS_GBM
     if (bo->gbm)
         return gbm_bo_get_stride(bo->gbm);
#endif

     return bo->dumb->pitch;
}

Strange enough a LD_PRELOAD of libgbm does not
fix this and libgbm already gets dragged in by
libglamor_egl.so so this should not be a problem.

Still I tried this change:


And now Xorg will start, still very weird since the
exact same Xorg binaries work fine on Intel integrated
gfx where the gbm_bo_get_stride() call also happens ...


So with this "fix" it starts, but it crashes as soon as I resize the
vm-window and thus the screen gets resized:

bt
#0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
     at osinit.c:114
#1  <signal handler called>
#2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
     bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
#3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
     height=992) at drmmode_display.c:3166
#4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
     width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
#5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
     at rrscreen.c:289
#6  0x000000000043fcee in Dispatch () at dispatch.c:478

And miscrinit.c:64 is hte "{" of:

Bool
miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
                      int bitsPerPixel, int devKind, void *pPixData)
{
     if (!pPixmap)
         return FALSE;

Which is a strange place to crash. Even more weird after adding a
breakpoint a bit before drmmode_display.c:3166, I get a segfault
while stepping through earlier lines, pointing at SmartScheduleTimer()
and specifically again at the opening "{" as if something is wrong with
the stack and the stack cannot handle function calls being stacked one
level deeper.

This makes me wonder if this is a stack depth/overflow issue, does the
xserver have code somewhere to limit its stacksize and could we be
hitting that ?

Or maybe a bad interaction with gcc-s stack protection?

This feels as if we are hitting a guard page at the end of the stack
here?

Regards,

Hans

Patch hide | download patch | download mbox

--- a/hw/xfree86/drivers/modesetting/Makefile.am
+++ b/hw/xfree86/drivers/modesetting/Makefile.am
@@ -39,7 +39,7 @@  AM_CPPFLAGS = \

  modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
  modesetting_drv_la_LDFLAGS = -module -avoid-version
-modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
+modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
  modesetting_drv_ladir = @moduledir@/drivers

  modesetting_drv_la_SOURCES = \

Comments

Hi,

On 30-08-18 22:48, Hans de Goede wrote:
> HI all,
> 
> I've been debugging some strange crashes with Xorg-1.20.1 inside
> a virtualbox guest and I can use some help with this.
> 
> At first Xorg completely failed to start, running it under
> gdbserver showed a backtrace pointing to a lazy symbol lookup
> failure triggered by:
> 
> drmmode_display.c:905:
>          return gbm_bo_get_stride(bo->gbm);
> 
> Which is part of:
> 
> 
> uint32_t
> drmmode_bo_get_pitch(drmmode_bo *bo)
> {
> #ifdef GLAMOR_HAS_GBM
>      if (bo->gbm)
>          return gbm_bo_get_stride(bo->gbm);
> #endif
> 
>      return bo->dumb->pitch;
> }
> 
> Strange enough a LD_PRELOAD of libgbm does not
> fix this and libgbm already gets dragged in by
> libglamor_egl.so so this should not be a problem.
> 
> Still I tried this change:
> 
> --- a/hw/xfree86/drivers/modesetting/Makefile.am
> +++ b/hw/xfree86/drivers/modesetting/Makefile.am
> @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
> 
>   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
>   modesetting_drv_la_LDFLAGS = -module -avoid-version
> -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
> +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
>   modesetting_drv_ladir = @moduledir@/drivers
> 
>   modesetting_drv_la_SOURCES = \
> 
> And now Xorg will start, still very weird since the
> exact same Xorg binaries work fine on Intel integrated
> gfx where the gbm_bo_get_stride() call also happens ...
> 
> 
> So with this "fix" it starts, but it crashes as soon as I resize the
> vm-window and thus the screen gets resized:
> 
> bt
> #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
>      at osinit.c:114
> #1  <signal handler called>
> #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
>      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
> #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
>      height=992) at drmmode_display.c:3166
> #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
>      width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
> #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
>      at rrscreen.c:289
> #6  0x000000000043fcee in Dispatch () at dispatch.c:478
> 
> And miscrinit.c:64 is hte "{" of:
> 
> Bool
> miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
>                       int bitsPerPixel, int devKind, void *pPixData)
> {
>      if (!pPixmap)
>          return FALSE;
> 
> Which is a strange place to crash. Even more weird after adding a
> breakpoint a bit before drmmode_display.c:3166, I get a segfault
> while stepping through earlier lines, pointing at SmartScheduleTimer()
> and specifically again at the opening "{" as if something is wrong with
> the stack and the stack cannot handle function calls being stacked one
> level deeper.
> 
> This makes me wonder if this is a stack depth/overflow issue, does the
> xserver have code somewhere to limit its stacksize and could we be
> hitting that ?
> 
> Or maybe a bad interaction with gcc-s stack protection?
> 
> This feels as if we are hitting a guard page at the end of the stack
> here?

One important thing which I only put in the subject, this happens
when using glmamor with llvmpipe, something which is new in 1.20,
older xservers never used glamor on llvmpipe. Things work fine
if I disable glamor in a xorg.conf snippet.

Arguably we should disable glamor when running on llvmpipe because
of performance reasons, still these crashes should not happen.

Regards,

Hans
hi,

what version of mesa?

might be


https://cgit.freedesktop.org/mesa/mesa/commit/?id=9baff597ce021f7691187b0d1d1bbc16d07b13e1

Ray

On Thu, Aug 30, 2018, 5:00 PM Hans de Goede <hdegoede@redhat.com> wrote:

> Hi,
>
> On 30-08-18 22:48, Hans de Goede wrote:
> > HI all,
> >
> > I've been debugging some strange crashes with Xorg-1.20.1 inside
> > a virtualbox guest and I can use some help with this.
> >
> > At first Xorg completely failed to start, running it under
> > gdbserver showed a backtrace pointing to a lazy symbol lookup
> > failure triggered by:
> >
> > drmmode_display.c:905:
> >          return gbm_bo_get_stride(bo->gbm);
> >
> > Which is part of:
> >
> >
> > uint32_t
> > drmmode_bo_get_pitch(drmmode_bo *bo)
> > {
> > #ifdef GLAMOR_HAS_GBM
> >      if (bo->gbm)
> >          return gbm_bo_get_stride(bo->gbm);
> > #endif
> >
> >      return bo->dumb->pitch;
> > }
> >
> > Strange enough a LD_PRELOAD of libgbm does not
> > fix this and libgbm already gets dragged in by
> > libglamor_egl.so so this should not be a problem.
> >
> > Still I tried this change:
> >
> > --- a/hw/xfree86/drivers/modesetting/Makefile.am
> > +++ b/hw/xfree86/drivers/modesetting/Makefile.am
> > @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
> >
> >   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la
> >   modesetting_drv_la_LDFLAGS = -module -avoid-version
> > -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
> > +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
> >   modesetting_drv_ladir = @moduledir@/drivers
> >
> >   modesetting_drv_la_SOURCES = \
> >
> > And now Xorg will start, still very weird since the
> > exact same Xorg binaries work fine on Intel integrated
> > gfx where the gbm_bo_get_stride() call also happens ...
> >
> >
> > So with this "fix" it starts, but it crashes as soon as I resize the
> > vm-window and thus the screen gets resized:
> >
> > bt
> > #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
> >      at osinit.c:114
> > #1  <signal handler called>
> > #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992,
> depth=-1,
> >      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
> > #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0,
> width=1920,
> >      height=992) at drmmode_display.c:3166
> > #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
> >      width=1920, height=992, mmWidth=508, mmHeight=262) at
> xf86RandR12.c:698
> > #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
> >      at rrscreen.c:289
> > #6  0x000000000043fcee in Dispatch () at dispatch.c:478
> >
> > And miscrinit.c:64 is hte "{" of:
> >
> > Bool
> > miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
> >                       int bitsPerPixel, int devKind, void *pPixData)
> > {
> >      if (!pPixmap)
> >          return FALSE;
> >
> > Which is a strange place to crash. Even more weird after adding a
> > breakpoint a bit before drmmode_display.c:3166, I get a segfault
> > while stepping through earlier lines, pointing at SmartScheduleTimer()
> > and specifically again at the opening "{" as if something is wrong with
> > the stack and the stack cannot handle function calls being stacked one
> > level deeper.
> >
> > This makes me wonder if this is a stack depth/overflow issue, does the
> > xserver have code somewhere to limit its stacksize and could we be
> > hitting that ?
> >
> > Or maybe a bad interaction with gcc-s stack protection?
> >
> > This feels as if we are hitting a guard page at the end of the stack
> > here?
>
> One important thing which I only put in the subject, this happens
> when using glmamor with llvmpipe, something which is new in 1.20,
> older xservers never used glamor on llvmpipe. Things work fine
> if I disable glamor in a xorg.conf snippet.
>
> Arguably we should disable glamor when running on llvmpipe because
> of performance reasons, still these crashes should not happen.
>
> Regards,
>
> Hans
>
> _______________________________________________
> xorg-devel@lists.x.org: X.Org development
> Archives: http://lists.x.org/archives/xorg-devel
> Info: https://lists.x.org/mailman/listinfo/xorg-devel
Hi,

On 30-08-18 23:44, Ray Strode wrote:
> hi,
> 
> what version of mesa?
> 
> might be
> 
> https://cgit.freedesktop.org/mesa/mesa/commit/?id=9baff597ce021f7691187b0d1d1bbc16d07b13e1

Ah yes, we were still at 18.2.0-rc3, I'm preparing an update to 18.2.0-rc5 now, thanks
for pointing me to this.

Regards,

Hans


> 
> Ray
> 
> On Thu, Aug 30, 2018, 5:00 PM Hans de Goede <hdegoede@redhat.com <mailto:hdegoede@redhat.com>> wrote:
> 
>     Hi,
> 
>     On 30-08-18 22:48, Hans de Goede wrote:
>      > HI all,
>      >
>      > I've been debugging some strange crashes with Xorg-1.20.1 inside
>      > a virtualbox guest and I can use some help with this.
>      >
>      > At first Xorg completely failed to start, running it under
>      > gdbserver showed a backtrace pointing to a lazy symbol lookup
>      > failure triggered by:
>      >
>      > drmmode_display.c:905:
>      >          return gbm_bo_get_stride(bo->gbm);
>      >
>      > Which is part of:
>      >
>      >
>      > uint32_t
>      > drmmode_bo_get_pitch(drmmode_bo *bo)
>      > {
>      > #ifdef GLAMOR_HAS_GBM
>      >      if (bo->gbm)
>      >          return gbm_bo_get_stride(bo->gbm);
>      > #endif
>      >
>      >      return bo->dumb->pitch;
>      > }
>      >
>      > Strange enough a LD_PRELOAD of libgbm does not
>      > fix this and libgbm already gets dragged in by
>      > libglamor_egl.so so this should not be a problem.
>      >
>      > Still I tried this change:
>      >
>      > --- a/hw/xfree86/drivers/modesetting/Makefile.am
>      > +++ b/hw/xfree86/drivers/modesetting/Makefile.am
>      > @@ -39,7 +39,7 @@ AM_CPPFLAGS = \
>      >
>      >   modesetting_drv_la_LTLIBRARIES = modesetting_drv.la <http://modesetting_drv.la>
>      >   modesetting_drv_la_LDFLAGS = -module -avoid-version
>      > -modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS)
>      > +modesetting_drv_la_LIBADD = $(UDEV_LIBS) $(DRM_LIBS) $(GBM_LIBS)
>      >   modesetting_drv_ladir = @moduledir@/drivers
>      >
>      >   modesetting_drv_la_SOURCES = \
>      >
>      > And now Xorg will start, still very weird since the
>      > exact same Xorg binaries work fine on Intel integrated
>      > gfx where the gbm_bo_get_stride() call also happens ...
>      >
>      >
>      > So with this "fix" it starts, but it crashes as soon as I resize the
>      > vm-window and thus the screen gets resized:
>      >
>      > bt
>      > #0  OsSigHandler (signo=11, sip=0x7ffc160ad2f0, unused=0x7ffc160ad1c0)
>      >      at osinit.c:114
>      > #1  <signal handler called>
>      > #2  miModifyPixmapHeader (pPixmap=0x28c4460, width=1920, height=992, depth=-1,
>      >      bitsPerPixel=-1, devKind=7680, pPixData=0x0) at miscrinit.c:64
>      > #3  0x00007fdc13d64471 in drmmode_xf86crtc_resize (scrn=0x21751f0, width=1920,
>      >      height=992) at drmmode_display.c:3166
>      > #4  0x00000000004bb9d8 in xf86RandR12ScreenSetSize (pScreen=0x23dc9b0,
>      >      width=1920, height=992, mmWidth=508, mmHeight=262) at xf86RandR12.c:698
>      > #5  0x00000000005092f0 in ProcRRSetScreenSize (client=0x29c6af0)
>      >      at rrscreen.c:289
>      > #6  0x000000000043fcee in Dispatch () at dispatch.c:478
>      >
>      > And miscrinit.c:64 is hte "{" of:
>      >
>      > Bool
>      > miModifyPixmapHeader(PixmapPtr pPixmap, int width, int height, int depth,
>      >                       int bitsPerPixel, int devKind, void *pPixData)
>      > {
>      >      if (!pPixmap)
>      >          return FALSE;
>      >
>      > Which is a strange place to crash. Even more weird after adding a
>      > breakpoint a bit before drmmode_display.c:3166, I get a segfault
>      > while stepping through earlier lines, pointing at SmartScheduleTimer()
>      > and specifically again at the opening "{" as if something is wrong with
>      > the stack and the stack cannot handle function calls being stacked one
>      > level deeper.
>      >
>      > This makes me wonder if this is a stack depth/overflow issue, does the
>      > xserver have code somewhere to limit its stacksize and could we be
>      > hitting that ?
>      >
>      > Or maybe a bad interaction with gcc-s stack protection?
>      >
>      > This feels as if we are hitting a guard page at the end of the stack
>      > here?
> 
>     One important thing which I only put in the subject, this happens
>     when using glmamor with llvmpipe, something which is new in 1.20,
>     older xservers never used glamor on llvmpipe. Things work fine
>     if I disable glamor in a xorg.conf snippet.
> 
>     Arguably we should disable glamor when running on llvmpipe because
>     of performance reasons, still these crashes should not happen.
> 
>     Regards,
> 
>     Hans
> 
>     _______________________________________________
>     xorg-devel@lists.x.org <mailto:xorg-devel@lists.x.org>: X.Org development
>     Archives: http://lists.x.org/archives/xorg-devel
>     Info: https://lists.x.org/mailman/listinfo/xorg-devel
>