Revert "intel/nir: Call nir_lower_io_to_scalar_early"

Submitted by Jason Ekstrand on Aug. 8, 2018, 7:05 p.m.

Details

Message ID 20180808190500.26112-1-jason.ekstrand@intel.com
State Accepted
Commit 10f44da775a69561c77438507298363ff4eeb65d
Headers show
Series "Revert "intel/nir: Call nir_lower_io_to_scalar_early"" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Jason Ekstrand Aug. 8, 2018, 7:05 p.m.
Commit 4434591bf56a6b0 caused substantially more URB messages in
geometry and tessellation shaders.  Before we can really enable this
sort of optimization,  We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
---
 src/intel/compiler/brw_nir.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 31ffbe613ec..29ad68fdb2a 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -713,18 +713,6 @@  brw_nir_link_shaders(const struct brw_compiler *compiler,
    nir_validate_shader(*producer);
    nir_validate_shader(*consumer);
 
-   const bool p_is_scalar =
-      compiler->scalar_stage[(*producer)->info.stage];
-   const bool c_is_scalar =
-      compiler->scalar_stage[(*consumer)->info.stage];
-
-   if (p_is_scalar && c_is_scalar) {
-      NIR_PASS_V(*producer, nir_lower_io_to_scalar_early, nir_var_shader_out);
-      NIR_PASS_V(*consumer, nir_lower_io_to_scalar_early, nir_var_shader_in);
-      *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
-      *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
-   }
-
    NIR_PASS_V(*producer, nir_remove_dead_variables, nir_var_shader_out);
    NIR_PASS_V(*consumer, nir_remove_dead_variables, nir_var_shader_in);
 
@@ -741,7 +729,12 @@  brw_nir_link_shaders(const struct brw_compiler *compiler,
       NIR_PASS_V(*consumer, nir_lower_indirect_derefs,
                  brw_nir_no_indirect_mask(compiler, (*consumer)->info.stage));
 
+      const bool p_is_scalar =
+         compiler->scalar_stage[(*producer)->info.stage];
       *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
+
+      const bool c_is_scalar =
+         compiler->scalar_stage[(*consumer)->info.stage];
       *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
    }
 }

Comments

On Wednesday, August 8, 2018 12:05:00 PM PDT Jason Ekstrand wrote:
> Commit 4434591bf56a6b0 caused substantially more URB messages in
> geometry and tessellation shaders.  Before we can really enable this
> sort of optimization,  We either need some way of combining them back
> together into vectors or we need to do cross-stage vector element
> elimination without splitting everything into scalars.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
> ---
>  src/intel/compiler/brw_nir.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)

For TCS outputs, we'd want something like you described.
For various inputs...we should just issue whole-vec4 URB loads
and CSE them.  But we're probably not.  We can fix that.

For now, this seems reasonable.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
I don't think this will be picked up by the stable spotting scripts, does it
need a Fixes: tag?

Dylan

Quoting Jason Ekstrand (2018-08-08 12:05:00)
> Commit 4434591bf56a6b0 caused substantially more URB messages in
> geometry and tessellation shaders.  Before we can really enable this
> sort of optimization,  We either need some way of combining them back
> together into vectors or we need to do cross-stage vector element
> elimination without splitting everything into scalars.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
> ---
>  src/intel/compiler/brw_nir.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 31ffbe613ec..29ad68fdb2a 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -713,18 +713,6 @@ brw_nir_link_shaders(const struct brw_compiler *compiler,
>     nir_validate_shader(*producer);
>     nir_validate_shader(*consumer);
>  
> -   const bool p_is_scalar =
> -      compiler->scalar_stage[(*producer)->info.stage];
> -   const bool c_is_scalar =
> -      compiler->scalar_stage[(*consumer)->info.stage];
> -
> -   if (p_is_scalar && c_is_scalar) {
> -      NIR_PASS_V(*producer, nir_lower_io_to_scalar_early, nir_var_shader_out);
> -      NIR_PASS_V(*consumer, nir_lower_io_to_scalar_early, nir_var_shader_in);
> -      *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
> -      *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
> -   }
> -
>     NIR_PASS_V(*producer, nir_remove_dead_variables, nir_var_shader_out);
>     NIR_PASS_V(*consumer, nir_remove_dead_variables, nir_var_shader_in);
>  
> @@ -741,7 +729,12 @@ brw_nir_link_shaders(const struct brw_compiler *compiler,
>        NIR_PASS_V(*consumer, nir_lower_indirect_derefs,
>                   brw_nir_no_indirect_mask(compiler, (*consumer)->info.stage));
>  
> +      const bool p_is_scalar =
> +         compiler->scalar_stage[(*producer)->info.stage];
>        *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
> +
> +      const bool c_is_scalar =
> +         compiler->scalar_stage[(*consumer)->info.stage];
>        *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
>     }
>  }
> -- 
> 2.17.1
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On 09/08/18 05:05, Jason Ekstrand wrote:
> Commit 4434591bf56a6b0 caused substantially more URB messages in
> geometry and tessellation shaders.  Before we can really enable this
> sort of optimization,  We either need some way of combining them back
> together into vectors or we need to do cross-stage vector element
> elimination without splitting everything into scalars.

You should be calling:

nir_compact_varyings() after remove_unused_varyings this should do most 
of what you want.
On Tue, Aug 14, 2018 at 7:25 PM Timothy Arceri <tarceri@itsqueeze.com>
wrote:

> On 09/08/18 05:05, Jason Ekstrand wrote:
> > Commit 4434591bf56a6b0 caused substantially more URB messages in
> > geometry and tessellation shaders.  Before we can really enable this
> > sort of optimization,  We either need some way of combining them back
> > together into vectors or we need to do cross-stage vector element
> > elimination without splitting everything into scalars.
>
> You should be calling:
>
> nir_compact_varyings() after remove_unused_varyings this should do most
> of what you want.
>

Does that turn things back into vectors?  The problem we're hitting is not
that things aren't compacted, it's that we want to read/write whole vectors
at a time as much as possible.  Just reverting the patch isn't quite the
proper solution because we want to be able to compact things and it would
be nice if we could load two variables at a time if they're packed
side-by-side in one vec4.  However, it does solve the perf regression for
now.

--Jason
On 16/08/18 02:10, Jason Ekstrand wrote:
> On Tue, Aug 14, 2018 at 7:25 PM Timothy Arceri <tarceri@itsqueeze.com 
> <mailto:tarceri@itsqueeze.com>> wrote:
> 
>     On 09/08/18 05:05, Jason Ekstrand wrote:
>      > Commit 4434591bf56a6b0 caused substantially more URB messages in
>      > geometry and tessellation shaders.  Before we can really enable this
>      > sort of optimization,  We either need some way of combining them back
>      > together into vectors or we need to do cross-stage vector element
>      > elimination without splitting everything into scalars.
> 
>     You should be calling:
> 
>     nir_compact_varyings() after remove_unused_varyings this should do most
>     of what you want.
> 
> 
> Does that turn things back into vectors?  The problem we're hitting is 
> not that things aren't compacted, it's that we want to read/write whole 
> vectors at a time as much as possible.  Just reverting the patch isn't 
> quite the proper solution because we want to be able to compact things 
> and it would be nice if we could load two variables at a time if they're 
> packed side-by-side in one vec4.  However, it does solve the perf 
> regression for now.

Yes. nir_compact_varyings() compacts everything into vec4s. The only 
thing it doesn't currently do is compact the locations (the nir->llvm 
pass does this for radeonsi anyway).

> 
> --Jason
This corrects all of the performance regressions that I can see on our
perf CI.

Please make sure to add the fixes tag before pushing.

Tested-by: Mark Janes <mark.a.janes@intel.com>

Jason Ekstrand <jason@jlekstrand.net> writes:

> Commit 4434591bf56a6b0 caused substantially more URB messages in
> geometry and tessellation shaders.  Before we can really enable this
> sort of optimization,  We either need some way of combining them back
> together into vectors or we need to do cross-stage vector element
> elimination without splitting everything into scalars.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
> ---
>  src/intel/compiler/brw_nir.c | 17 +++++------------
>  1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 31ffbe613ec..29ad68fdb2a 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -713,18 +713,6 @@ brw_nir_link_shaders(const struct brw_compiler *compiler,
>     nir_validate_shader(*producer);
>     nir_validate_shader(*consumer);
>  
> -   const bool p_is_scalar =
> -      compiler->scalar_stage[(*producer)->info.stage];
> -   const bool c_is_scalar =
> -      compiler->scalar_stage[(*consumer)->info.stage];
> -
> -   if (p_is_scalar && c_is_scalar) {
> -      NIR_PASS_V(*producer, nir_lower_io_to_scalar_early, nir_var_shader_out);
> -      NIR_PASS_V(*consumer, nir_lower_io_to_scalar_early, nir_var_shader_in);
> -      *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
> -      *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
> -   }
> -
>     NIR_PASS_V(*producer, nir_remove_dead_variables, nir_var_shader_out);
>     NIR_PASS_V(*consumer, nir_remove_dead_variables, nir_var_shader_in);
>  
> @@ -741,7 +729,12 @@ brw_nir_link_shaders(const struct brw_compiler *compiler,
>        NIR_PASS_V(*consumer, nir_lower_indirect_derefs,
>                   brw_nir_no_indirect_mask(compiler, (*consumer)->info.stage));
>  
> +      const bool p_is_scalar =
> +         compiler->scalar_stage[(*producer)->info.stage];
>        *producer = brw_nir_optimize(*producer, compiler, p_is_scalar);
> +
> +      const bool c_is_scalar =
> +         compiler->scalar_stage[(*consumer)->info.stage];
>        *consumer = brw_nir_optimize(*consumer, compiler, c_is_scalar);
>     }
>  }
> -- 
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Wed, Aug 15, 2018 at 5:36 PM Timothy Arceri <tarceri@itsqueeze.com>
wrote:

> On 16/08/18 02:10, Jason Ekstrand wrote:
> > On Tue, Aug 14, 2018 at 7:25 PM Timothy Arceri <tarceri@itsqueeze.com
> > <mailto:tarceri@itsqueeze.com>> wrote:
> >
> >     On 09/08/18 05:05, Jason Ekstrand wrote:
> >      > Commit 4434591bf56a6b0 caused substantially more URB messages in
> >      > geometry and tessellation shaders.  Before we can really enable
> this
> >      > sort of optimization,  We either need some way of combining them
> back
> >      > together into vectors or we need to do cross-stage vector element
> >      > elimination without splitting everything into scalars.
> >
> >     You should be calling:
> >
> >     nir_compact_varyings() after remove_unused_varyings this should do
> most
> >     of what you want.
> >
> >
> > Does that turn things back into vectors?  The problem we're hitting is
> > not that things aren't compacted, it's that we want to read/write whole
> > vectors at a time as much as possible.  Just reverting the patch isn't
> > quite the proper solution because we want to be able to compact things
> > and it would be nice if we could load two variables at a time if they're
> > packed side-by-side in one vec4.  However, it does solve the perf
> > regression for now.
>
> Yes. nir_compact_varyings() compacts everything into vec4s. The only
> thing it doesn't currently do is compact the locations (the nir->llvm
> pass does this for radeonsi anyway).
>

I think I was unclear.  nir_compact_varyings does pack things tightly but
it doesn't turn each set of 4 scalar vars into a single vec4 var which is
what we would need in order for it to fix the perf problem caused by these
patches.