[1/5] i965: Hard code scratch_ids_per_subslice for Cherryview

Submitted by Jordan Justen on March 7, 2018, 8:16 a.m.

Details

Message ID 20180307081630.31882-1-jordan.l.justen@intel.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Jordan Justen March 7, 2018, 8:16 a.m.
Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs
seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
---
 src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 17 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
index 527f003977b..c121136c439 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -402,23 +402,33 @@  brw_alloc_stage_scratch(struct brw_context *brw,
       if (devinfo->gen >= 9)
          subslices = 4 * brw->screen->devinfo.num_slices;
 
-      /* WaCSScratchSize:hsw
-       *
-       * Haswell's scratch space address calculation appears to be sparse
-       * rather than tightly packed.  The Thread ID has bits indicating
-       * which subslice, EU within a subslice, and thread within an EU
-       * it is.  There's a maximum of two slices and two subslices, so these
-       * can be stored with a single bit.  Even though there are only 10 EUs
-       * per subslice, this is stored in 4 bits, so there's an effective
-       * maximum value of 16 EUs.  Similarly, although there are only 7
-       * threads per EU, this is stored in a 3 bit number, giving an effective
-       * maximum value of 8 threads per EU.
-       *
-       * This means that we need to use 16 * 8 instead of 10 * 7 for the
-       * number of threads per subslice.
-       */
-      const unsigned scratch_ids_per_subslice =
-         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
+      unsigned scratch_ids_per_subslice;
+      if (devinfo->is_haswell) {
+         /* WaCSScratchSize:hsw
+          *
+          * Haswell's scratch space address calculation appears to be sparse
+          * rather than tightly packed. The Thread ID has bits indicating
+          * which subslice, EU within a subslice, and thread within an EU it
+          * is. There's a maximum of two slices and two subslices, so these
+          * can be stored with a single bit. Even though there are only 10 EUs
+          * per subslice, this is stored in 4 bits, so there's an effective
+          * maximum value of 16 EUs. Similarly, although there are only 7
+          * threads per EU, this is stored in a 3 bit number, giving an
+          * effective maximum value of 8 threads per EU.
+          *
+          * This means that we need to use 16 * 8 instead of 10 * 7 for the
+          * number of threads per subslice.
+          */
+         scratch_ids_per_subslice = 16 * 8;
+      } else if (devinfo->is_cherryview) {
+         /* For Cherryview, it appears that the scratch addresses for the 6 EU
+          * devices may still generate compute scratch addresses covering the
+          * same range as 8 EU.
+          */
+         scratch_ids_per_subslice = 8 * 7;
+      } else {
+         scratch_ids_per_subslice = devinfo->max_cs_threads;
+      }
 
       thread_count = scratch_ids_per_subslice * subslices;
       break;

Comments

On Wednesday, March 7, 2018 12:16:26 AM PST Jordan Justen wrote:
> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs
> seems to help with a GPU hang seen on synmark CSDof.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 17 deletions(-)

Patches 1-2 are:
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.

Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>


On 07.03.2018 10:16, Jordan Justen wrote:
> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs

s/8/18/?

	- Eero


> seems to help with a GPU hang seen on synmark CSDof.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> ---
>   src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
>   1 file changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> index 527f003977b..c121136c439 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
>         if (devinfo->gen >= 9)
>            subslices = 4 * brw->screen->devinfo.num_slices;
>   
> -      /* WaCSScratchSize:hsw
> -       *
> -       * Haswell's scratch space address calculation appears to be sparse
> -       * rather than tightly packed.  The Thread ID has bits indicating
> -       * which subslice, EU within a subslice, and thread within an EU
> -       * it is.  There's a maximum of two slices and two subslices, so these
> -       * can be stored with a single bit.  Even though there are only 10 EUs
> -       * per subslice, this is stored in 4 bits, so there's an effective
> -       * maximum value of 16 EUs.  Similarly, although there are only 7
> -       * threads per EU, this is stored in a 3 bit number, giving an effective
> -       * maximum value of 8 threads per EU.
> -       *
> -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> -       * number of threads per subslice.
> -       */
> -      const unsigned scratch_ids_per_subslice =
> -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> +      unsigned scratch_ids_per_subslice;
> +      if (devinfo->is_haswell) {
> +         /* WaCSScratchSize:hsw
> +          *
> +          * Haswell's scratch space address calculation appears to be sparse
> +          * rather than tightly packed. The Thread ID has bits indicating
> +          * which subslice, EU within a subslice, and thread within an EU it
> +          * is. There's a maximum of two slices and two subslices, so these
> +          * can be stored with a single bit. Even though there are only 10 EUs
> +          * per subslice, this is stored in 4 bits, so there's an effective
> +          * maximum value of 16 EUs. Similarly, although there are only 7
> +          * threads per EU, this is stored in a 3 bit number, giving an
> +          * effective maximum value of 8 threads per EU.
> +          *
> +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +          * number of threads per subslice.
> +          */
> +         scratch_ids_per_subslice = 16 * 8;
> +      } else if (devinfo->is_cherryview) {
> +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> +          * devices may still generate compute scratch addresses covering the
> +          * same range as 8 EU.
> +          */
> +         scratch_ids_per_subslice = 8 * 7;
> +      } else {
> +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> +      }
>   
>         thread_count = scratch_ids_per_subslice * subslices;
>         break;
>
Hi,

Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
versions, which were earlier GPU hanging.  With this patch hangs are gone.

Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>


On 07.03.2018 10:16, Jordan Justen wrote:
> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs

s/8/18/?

	- Eero


> seems to help with a GPU hang seen on synmark CSDof.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> ---
>   src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
>   1 file changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> index 527f003977b..c121136c439 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
>         if (devinfo->gen >= 9)
>            subslices = 4 * brw->screen->devinfo.num_slices;
>   
> -      /* WaCSScratchSize:hsw
> -       *
> -       * Haswell's scratch space address calculation appears to be sparse
> -       * rather than tightly packed.  The Thread ID has bits indicating
> -       * which subslice, EU within a subslice, and thread within an EU
> -       * it is.  There's a maximum of two slices and two subslices, so these
> -       * can be stored with a single bit.  Even though there are only 10 EUs
> -       * per subslice, this is stored in 4 bits, so there's an effective
> -       * maximum value of 16 EUs.  Similarly, although there are only 7
> -       * threads per EU, this is stored in a 3 bit number, giving an effective
> -       * maximum value of 8 threads per EU.
> -       *
> -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> -       * number of threads per subslice.
> -       */
> -      const unsigned scratch_ids_per_subslice =
> -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> +      unsigned scratch_ids_per_subslice;
> +      if (devinfo->is_haswell) {
> +         /* WaCSScratchSize:hsw
> +          *
> +          * Haswell's scratch space address calculation appears to be sparse
> +          * rather than tightly packed. The Thread ID has bits indicating
> +          * which subslice, EU within a subslice, and thread within an EU it
> +          * is. There's a maximum of two slices and two subslices, so these
> +          * can be stored with a single bit. Even though there are only 10 EUs
> +          * per subslice, this is stored in 4 bits, so there's an effective
> +          * maximum value of 16 EUs. Similarly, although there are only 7
> +          * threads per EU, this is stored in a 3 bit number, giving an
> +          * effective maximum value of 8 threads per EU.
> +          *
> +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +          * number of threads per subslice.
> +          */
> +         scratch_ids_per_subslice = 16 * 8;
> +      } else if (devinfo->is_cherryview) {
> +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> +          * devices may still generate compute scratch addresses covering the
> +          * same range as 8 EU.
> +          */
> +         scratch_ids_per_subslice = 8 * 7;
> +      } else {
> +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> +      }
>   
>         thread_count = scratch_ids_per_subslice * subslices;
>         break;
>
On 2018-03-07 07:41:04, Eero Tamminen wrote:
> Hi,
> 
> Tested SynMark CSDof and GfxBench Aztec Ruins GL & GLES / normal & high 
> versions, which were earlier GPU hanging.  With this patch hangs are gone.
> 
> Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>

Thanks!

> On 07.03.2018 10:16, Jordan Justen wrote:
> > Ken suggested that we might be underallocating scratch space on HD
> > 400. Allocating scratch space as though there was actually 8 EUs
> 
> s/8/18/?
> 

I think you meant 16 rather than 18? I guess we have either 6 EU *per
subslice* (HD 400) or 8 EU per subslice (HD 405). With 2 subslices,
that'd be either 12 or 16 EU.

In my comments and commit message I should add 'per subslice' by the
6/8 EU numbers to make it clearer.

-Jordan

> 
> > seems to help with a GPU hang seen on synmark CSDof.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: <mesa-stable@lists.freedesktop.org>
> > Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> > ---
> >   src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
> >   1 file changed, 27 insertions(+), 17 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> > index 527f003977b..c121136c439 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> >         if (devinfo->gen >= 9)
> >            subslices = 4 * brw->screen->devinfo.num_slices;
> >   
> > -      /* WaCSScratchSize:hsw
> > -       *
> > -       * Haswell's scratch space address calculation appears to be sparse
> > -       * rather than tightly packed.  The Thread ID has bits indicating
> > -       * which subslice, EU within a subslice, and thread within an EU
> > -       * it is.  There's a maximum of two slices and two subslices, so these
> > -       * can be stored with a single bit.  Even though there are only 10 EUs
> > -       * per subslice, this is stored in 4 bits, so there's an effective
> > -       * maximum value of 16 EUs.  Similarly, although there are only 7
> > -       * threads per EU, this is stored in a 3 bit number, giving an effective
> > -       * maximum value of 8 threads per EU.
> > -       *
> > -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > -       * number of threads per subslice.
> > -       */
> > -      const unsigned scratch_ids_per_subslice =
> > -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > +      unsigned scratch_ids_per_subslice;
> > +      if (devinfo->is_haswell) {
> > +         /* WaCSScratchSize:hsw
> > +          *
> > +          * Haswell's scratch space address calculation appears to be sparse
> > +          * rather than tightly packed. The Thread ID has bits indicating
> > +          * which subslice, EU within a subslice, and thread within an EU it
> > +          * is. There's a maximum of two slices and two subslices, so these
> > +          * can be stored with a single bit. Even though there are only 10 EUs
> > +          * per subslice, this is stored in 4 bits, so there's an effective
> > +          * maximum value of 16 EUs. Similarly, although there are only 7
> > +          * threads per EU, this is stored in a 3 bit number, giving an
> > +          * effective maximum value of 8 threads per EU.
> > +          *
> > +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > +          * number of threads per subslice.
> > +          */
> > +         scratch_ids_per_subslice = 16 * 8;
> > +      } else if (devinfo->is_cherryview) {
> > +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> > +          * devices may still generate compute scratch addresses covering the
> > +          * same range as 8 EU.
> > +          */
> > +         scratch_ids_per_subslice = 8 * 7;
> > +      } else {
> > +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> > +      }
> >   
> >         thread_count = scratch_ids_per_subslice * subslices;
> >         break;
> > 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Could this be the reason that BSW systems never reliably passed all unit
tests?  Up to now, we re-execute each failing test, and mark it as a
pass if it succeeds a second time.

I'd like to remove that crutch if possible.

Jordan Justen <jordan.l.justen@intel.com> writes:

> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs
> seems to help with a GPU hang seen on synmark CSDof.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 17 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> index 527f003977b..c121136c439 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
>        if (devinfo->gen >= 9)
>           subslices = 4 * brw->screen->devinfo.num_slices;
>  
> -      /* WaCSScratchSize:hsw
> -       *
> -       * Haswell's scratch space address calculation appears to be sparse
> -       * rather than tightly packed.  The Thread ID has bits indicating
> -       * which subslice, EU within a subslice, and thread within an EU
> -       * it is.  There's a maximum of two slices and two subslices, so these
> -       * can be stored with a single bit.  Even though there are only 10 EUs
> -       * per subslice, this is stored in 4 bits, so there's an effective
> -       * maximum value of 16 EUs.  Similarly, although there are only 7
> -       * threads per EU, this is stored in a 3 bit number, giving an effective
> -       * maximum value of 8 threads per EU.
> -       *
> -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> -       * number of threads per subslice.
> -       */
> -      const unsigned scratch_ids_per_subslice =
> -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> +      unsigned scratch_ids_per_subslice;
> +      if (devinfo->is_haswell) {
> +         /* WaCSScratchSize:hsw
> +          *
> +          * Haswell's scratch space address calculation appears to be sparse
> +          * rather than tightly packed. The Thread ID has bits indicating
> +          * which subslice, EU within a subslice, and thread within an EU it
> +          * is. There's a maximum of two slices and two subslices, so these
> +          * can be stored with a single bit. Even though there are only 10 EUs
> +          * per subslice, this is stored in 4 bits, so there's an effective
> +          * maximum value of 16 EUs. Similarly, although there are only 7
> +          * threads per EU, this is stored in a 3 bit number, giving an
> +          * effective maximum value of 8 threads per EU.
> +          *
> +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +          * number of threads per subslice.
> +          */
> +         scratch_ids_per_subslice = 16 * 8;
> +      } else if (devinfo->is_cherryview) {
> +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> +          * devices may still generate compute scratch addresses covering the
> +          * same range as 8 EU.
> +          */
> +         scratch_ids_per_subslice = 8 * 7;
> +      } else {
> +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> +      }
>  
>        thread_count = scratch_ids_per_subslice * subslices;
>        break;
> -- 
> 2.16.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On 2018-03-09 09:51:31, Mark Janes wrote:
> Could this be the reason that BSW systems never reliably passed all unit
> tests?  Up to now, we re-execute each failing test, and mark it as a
> pass if it succeeds a second time.
> 
> I'd like to remove that crutch if possible.

It is possible. We basically had memory corruption happening outside
the scratch buffer. The corruption was happening a bit passed the end
of the buffer we had allocated. It can be difficult to predict the
outcome of such corruption. :)

-Jordan

> Jordan Justen <jordan.l.justen@intel.com> writes:
> 
> > Ken suggested that we might be underallocating scratch space on HD
> > 400. Allocating scratch space as though there was actually 8 EUs
> > seems to help with a GPU hang seen on synmark CSDof.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: <mesa-stable@lists.freedesktop.org>
> > Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
> >  1 file changed, 27 insertions(+), 17 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> > index 527f003977b..c121136c439 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> >        if (devinfo->gen >= 9)
> >           subslices = 4 * brw->screen->devinfo.num_slices;
> >  
> > -      /* WaCSScratchSize:hsw
> > -       *
> > -       * Haswell's scratch space address calculation appears to be sparse
> > -       * rather than tightly packed.  The Thread ID has bits indicating
> > -       * which subslice, EU within a subslice, and thread within an EU
> > -       * it is.  There's a maximum of two slices and two subslices, so these
> > -       * can be stored with a single bit.  Even though there are only 10 EUs
> > -       * per subslice, this is stored in 4 bits, so there's an effective
> > -       * maximum value of 16 EUs.  Similarly, although there are only 7
> > -       * threads per EU, this is stored in a 3 bit number, giving an effective
> > -       * maximum value of 8 threads per EU.
> > -       *
> > -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > -       * number of threads per subslice.
> > -       */
> > -      const unsigned scratch_ids_per_subslice =
> > -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > +      unsigned scratch_ids_per_subslice;
> > +      if (devinfo->is_haswell) {
> > +         /* WaCSScratchSize:hsw
> > +          *
> > +          * Haswell's scratch space address calculation appears to be sparse
> > +          * rather than tightly packed. The Thread ID has bits indicating
> > +          * which subslice, EU within a subslice, and thread within an EU it
> > +          * is. There's a maximum of two slices and two subslices, so these
> > +          * can be stored with a single bit. Even though there are only 10 EUs
> > +          * per subslice, this is stored in 4 bits, so there's an effective
> > +          * maximum value of 16 EUs. Similarly, although there are only 7
> > +          * threads per EU, this is stored in a 3 bit number, giving an
> > +          * effective maximum value of 8 threads per EU.
> > +          *
> > +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > +          * number of threads per subslice.
> > +          */
> > +         scratch_ids_per_subslice = 16 * 8;
> > +      } else if (devinfo->is_cherryview) {
> > +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> > +          * devices may still generate compute scratch addresses covering the
> > +          * same range as 8 EU.
> > +          */
> > +         scratch_ids_per_subslice = 8 * 7;
> > +      } else {
> > +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> > +      }
> >  
> >        thread_count = scratch_ids_per_subslice * subslices;
> >        break;
> > -- 
> > 2.16.1
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Wed, 2018-03-07 at 00:16 -0800, Jordan Justen wrote:
> Ken suggested that we might be underallocating scratch space on HD
> 400. Allocating scratch space as though there was actually 8 EUs
> seems to help with a GPU hang seen on synmark CSDof.
> 

FYI, in order to pick this commit for next 17.3 stable release, I need to pick
also:

commit f9d5a7add42af5a2e4410526d1480a08f41317ae
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Tue Oct 31 00:34:32 2017 -0700

    i965: Calculate thread_count in brw_alloc_stage_scratch
  

Unless you prefer not picking them, I'll add both.


Cheers!


	J.A.

> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: <mesa-stable@lists.freedesktop.org>
> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
>  1 file changed, 27 insertions(+), 17 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> index 527f003977b..c121136c439 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.c
> +++ b/src/mesa/drivers/dri/i965/brw_program.c
> @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
>        if (devinfo->gen >= 9)
>           subslices = 4 * brw->screen->devinfo.num_slices;
>  
> -      /* WaCSScratchSize:hsw
> -       *
> -       * Haswell's scratch space address calculation appears to be sparse
> -       * rather than tightly packed.  The Thread ID has bits indicating
> -       * which subslice, EU within a subslice, and thread within an EU
> -       * it is.  There's a maximum of two slices and two subslices, so these
> -       * can be stored with a single bit.  Even though there are only 10 EUs
> -       * per subslice, this is stored in 4 bits, so there's an effective
> -       * maximum value of 16 EUs.  Similarly, although there are only 7
> -       * threads per EU, this is stored in a 3 bit number, giving an effective
> -       * maximum value of 8 threads per EU.
> -       *
> -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> -       * number of threads per subslice.
> -       */
> -      const unsigned scratch_ids_per_subslice =
> -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> +      unsigned scratch_ids_per_subslice;
> +      if (devinfo->is_haswell) {
> +         /* WaCSScratchSize:hsw
> +          *
> +          * Haswell's scratch space address calculation appears to be sparse
> +          * rather than tightly packed. The Thread ID has bits indicating
> +          * which subslice, EU within a subslice, and thread within an EU it
> +          * is. There's a maximum of two slices and two subslices, so these
> +          * can be stored with a single bit. Even though there are only 10 EUs
> +          * per subslice, this is stored in 4 bits, so there's an effective
> +          * maximum value of 16 EUs. Similarly, although there are only 7
> +          * threads per EU, this is stored in a 3 bit number, giving an
> +          * effective maximum value of 8 threads per EU.
> +          *
> +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> +          * number of threads per subslice.
> +          */
> +         scratch_ids_per_subslice = 16 * 8;
> +      } else if (devinfo->is_cherryview) {
> +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> +          * devices may still generate compute scratch addresses covering the
> +          * same range as 8 EU.
> +          */
> +         scratch_ids_per_subslice = 8 * 7;
> +      } else {
> +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> +      }
>  
>        thread_count = scratch_ids_per_subslice * subslices;
>        break;
On 2018-03-26 08:23:13, Juan A. Suarez Romero wrote:
> On Wed, 2018-03-07 at 00:16 -0800, Jordan Justen wrote:
> > Ken suggested that we might be underallocating scratch space on HD
> > 400. Allocating scratch space as though there was actually 8 EUs
> > seems to help with a GPU hang seen on synmark CSDof.
> > 
> 
> FYI, in order to pick this commit for next 17.3 stable release, I need to pick
> also:
> 
> commit f9d5a7add42af5a2e4410526d1480a08f41317ae
> Author: Jordan Justen <jordan.l.justen@intel.com>
> Date:   Tue Oct 31 00:34:32 2017 -0700
> 
>     i965: Calculate thread_count in brw_alloc_stage_scratch

I believe that this commit lead to a regression with compute shaders,
which was fixed by:

commit a16dc04ad51c32e5c7d136e4dd6273d983385d3f
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Tue Oct 31 00:56:24 2017 -0700

    i965: properly initialize brw->cs.base.stage to MESA_SHADER_COMPUTE

You should probably add Ken's a16dc04ad51c before f9d5a7add42a.

-Jordan

> 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: <mesa-stable@lists.freedesktop.org>
> > Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_program.c | 44 ++++++++++++++++++++-------------
> >  1 file changed, 27 insertions(+), 17 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
> > index 527f003977b..c121136c439 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> >        if (devinfo->gen >= 9)
> >           subslices = 4 * brw->screen->devinfo.num_slices;
> >  
> > -      /* WaCSScratchSize:hsw
> > -       *
> > -       * Haswell's scratch space address calculation appears to be sparse
> > -       * rather than tightly packed.  The Thread ID has bits indicating
> > -       * which subslice, EU within a subslice, and thread within an EU
> > -       * it is.  There's a maximum of two slices and two subslices, so these
> > -       * can be stored with a single bit.  Even though there are only 10 EUs
> > -       * per subslice, this is stored in 4 bits, so there's an effective
> > -       * maximum value of 16 EUs.  Similarly, although there are only 7
> > -       * threads per EU, this is stored in a 3 bit number, giving an effective
> > -       * maximum value of 8 threads per EU.
> > -       *
> > -       * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > -       * number of threads per subslice.
> > -       */
> > -      const unsigned scratch_ids_per_subslice =
> > -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > +      unsigned scratch_ids_per_subslice;
> > +      if (devinfo->is_haswell) {
> > +         /* WaCSScratchSize:hsw
> > +          *
> > +          * Haswell's scratch space address calculation appears to be sparse
> > +          * rather than tightly packed. The Thread ID has bits indicating
> > +          * which subslice, EU within a subslice, and thread within an EU it
> > +          * is. There's a maximum of two slices and two subslices, so these
> > +          * can be stored with a single bit. Even though there are only 10 EUs
> > +          * per subslice, this is stored in 4 bits, so there's an effective
> > +          * maximum value of 16 EUs. Similarly, although there are only 7
> > +          * threads per EU, this is stored in a 3 bit number, giving an
> > +          * effective maximum value of 8 threads per EU.
> > +          *
> > +          * This means that we need to use 16 * 8 instead of 10 * 7 for the
> > +          * number of threads per subslice.
> > +          */
> > +         scratch_ids_per_subslice = 16 * 8;
> > +      } else if (devinfo->is_cherryview) {
> > +         /* For Cherryview, it appears that the scratch addresses for the 6 EU
> > +          * devices may still generate compute scratch addresses covering the
> > +          * same range as 8 EU.
> > +          */
> > +         scratch_ids_per_subslice = 8 * 7;
> > +      } else {
> > +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> > +      }
> >  
> >        thread_count = scratch_ids_per_subslice * subslices;
> >        break;
On Wed, 2018-03-28 at 14:55 -0700, Jordan Justen wrote:
> On 2018-03-26 08:23:13, Juan A. Suarez Romero wrote:
> > On Wed, 2018-03-07 at 00:16 -0800, Jordan Justen wrote:
> > > Ken suggested that we might be underallocating scratch space on
> > > HD
> > > 400. Allocating scratch space as though there was actually 8 EUs
> > > seems to help with a GPU hang seen on synmark CSDof.
> > > 
> > 
> > FYI, in order to pick this commit for next 17.3 stable release, I
> > need to pick
> > also:
> > 
> > commit f9d5a7add42af5a2e4410526d1480a08f41317ae
> > Author: Jordan Justen <jordan.l.justen@intel.com>
> > Date:   Tue Oct 31 00:34:32 2017 -0700
> > 
> >     i965: Calculate thread_count in brw_alloc_stage_scratch
> 
> I believe that this commit lead to a regression with compute shaders,
> which was fixed by:
> 
> commit a16dc04ad51c32e5c7d136e4dd6273d983385d3f
> Author: Kenneth Graunke <kenneth@whitecape.org>
> Date:   Tue Oct 31 00:56:24 2017 -0700
> 
>     i965: properly initialize brw->cs.base.stage to
> MESA_SHADER_COMPUTE
> 
> You should probably add Ken's a16dc04ad51c before f9d5a7add42a.
> 

Thanks a lot! Fortunately, a16dc04ad51c was already nominated and
included in 17.3.0. So it is in the stable branch.


	J.A.


> -Jordan
> 
> > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
> > > Cc: Kenneth Graunke <kenneth@whitecape.org>
> > > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > > Cc: <mesa-stable@lists.freedesktop.org>
> > > Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_program.c | 44
> > > ++++++++++++++++++++-------------
> > >  1 file changed, 27 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c
> > > b/src/mesa/drivers/dri/i965/brw_program.c
> > > index 527f003977b..c121136c439 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context
> > > *brw,
> > >        if (devinfo->gen >= 9)
> > >           subslices = 4 * brw->screen->devinfo.num_slices;
> > >  
> > > -      /* WaCSScratchSize:hsw
> > > -       *
> > > -       * Haswell's scratch space address calculation appears to
> > > be sparse
> > > -       * rather than tightly packed.  The Thread ID has bits
> > > indicating
> > > -       * which subslice, EU within a subslice, and thread within
> > > an EU
> > > -       * it is.  There's a maximum of two slices and two
> > > subslices, so these
> > > -       * can be stored with a single bit.  Even though there are
> > > only 10 EUs
> > > -       * per subslice, this is stored in 4 bits, so there's an
> > > effective
> > > -       * maximum value of 16 EUs.  Similarly, although there are
> > > only 7
> > > -       * threads per EU, this is stored in a 3 bit number,
> > > giving an effective
> > > -       * maximum value of 8 threads per EU.
> > > -       *
> > > -       * This means that we need to use 16 * 8 instead of 10 * 7
> > > for the
> > > -       * number of threads per subslice.
> > > -       */
> > > -      const unsigned scratch_ids_per_subslice =
> > > -         devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads;
> > > +      unsigned scratch_ids_per_subslice;
> > > +      if (devinfo->is_haswell) {
> > > +         /* WaCSScratchSize:hsw
> > > +          *
> > > +          * Haswell's scratch space address calculation appears
> > > to be sparse
> > > +          * rather than tightly packed. The Thread ID has bits
> > > indicating
> > > +          * which subslice, EU within a subslice, and thread
> > > within an EU it
> > > +          * is. There's a maximum of two slices and two
> > > subslices, so these
> > > +          * can be stored with a single bit. Even though there
> > > are only 10 EUs
> > > +          * per subslice, this is stored in 4 bits, so there's
> > > an effective
> > > +          * maximum value of 16 EUs. Similarly, although there
> > > are only 7
> > > +          * threads per EU, this is stored in a 3 bit number,
> > > giving an
> > > +          * effective maximum value of 8 threads per EU.
> > > +          *
> > > +          * This means that we need to use 16 * 8 instead of 10
> > > * 7 for the
> > > +          * number of threads per subslice.
> > > +          */
> > > +         scratch_ids_per_subslice = 16 * 8;
> > > +      } else if (devinfo->is_cherryview) {
> > > +         /* For Cherryview, it appears that the scratch
> > > addresses for the 6 EU
> > > +          * devices may still generate compute scratch addresses
> > > covering the
> > > +          * same range as 8 EU.
> > > +          */
> > > +         scratch_ids_per_subslice = 8 * 7;
> > > +      } else {
> > > +         scratch_ids_per_subslice = devinfo->max_cs_threads;
> > > +      }
> > >  
> > >        thread_count = scratch_ids_per_subslice * subslices;
> > >        break;
> 
> _______________________________________________
> mesa-stable mailing list
> mesa-stable@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable