[Mesa-dev] i965: Uniform loading via samplers with discard

Submitted by Cody Northrop on June 12, 2014, 7:35 p.m.

Details

Message ID CAP90KKcDnCvJW94y+WJmA9hLy593_XOmDEKcm1Sa1scJYd7cWQ@mail.gmail.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Cody Northrop June 12, 2014, 7:35 p.m.
Commit 17c7ead7 exposed a bug in how uniform loading happens in the
presence of discard.  It manifested itself in an application as randomly
incorrect pixels on the borders of conditional areas.

We believe it is due to how discards jump to the end of the shader
incorrectly for some channels.  The current implementation checks each 2x2
subspan to preserve derivatives.  When uniform loading via samplers was
turned on, it uses a full execution mask, as stated in
lower_uniform_pull_constant_loads(), and only populates four channels of
the destination (see generate_uniform_pull_constant_load_gen7()).  We think
that is happening incorrectly when the first subspan has been jumped over.

A possible fix is to only jump to the end of the shader if all relevant
channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
preserves the original speedup noted in commit beafced2.  There are other
more heavyweight fixes (i.e. don't use sampler for uniforms if discard in
use), but this seems like a good fix.  I've appended it below as a patch.
It changes the shader accordingly:

before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
null                            { align1 WE_all 1Q };
after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
null                            { align1 WE_all 1Q };
after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
null                            { align1 WE_all 1H };

I attached a test case to the bugzilla entry below.

Thanks for any review or feedback.  Curious if anyone sees a better fix.

-C


From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
From: Cody Northrop <cody@lunarg.com>
Date: Thu, 12 Jun 2014 09:07:18 -0600
Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads via
 sampler.

The series that implemented this optimization was done before
the changes to use samplers for uniform loads.  Uniform sampler
loads use special execution masks and only populate four
channels, so we can't jump over those or corruption ensues.
Use a more conservative jump mask which only jumps to the end
if all relevant channels are disabled.

No change was observed in GLbenchmark 2.7, so the optimization
is preserved.

Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Mike Stroyan <mike@lunarg.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8858852..fe05715 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1907,7 +1907,15 @@  fs_visitor::visit(ir_discard *ir)
        */
       fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
       discard_jump->flag_subreg = 1;
-      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
+
+      /* Uniforms are now loaded using samplers with a routine that has
+       * its own execution mask, so we can only jump if all relevant
+       * channels are dead.  This is more conservative than the previous
+       * four channel checking, but still preserves speedups.
+       */
+      discard_jump->predicate = (8 == dispatch_width)
+                                ? BRW_PREDICATE_ALIGN1_ANY8H
+                                : BRW_PREDICATE_ALIGN1_ANY16H;
       discard_jump->predicate_inverse = true;
    }
 }

Comments

Send patches with git-send-email and no other method.  Save this message
and apply it to origin/master using git-am to see why.

Hopefully Matt or Ken can review this...

On 06/12/2014 12:35 PM, Cody Northrop wrote:
> Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> presence of discard.  It manifested itself in an application as randomly
> incorrect pixels on the borders of conditional areas.
> 
> We believe it is due to how discards jump to the end of the shader
> incorrectly for some channels.  The current implementation checks each
> 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> was turned on, it uses a full execution mask, as stated in
> lower_uniform_pull_constant_loads(), and only populates four channels of
> the destination (see generate_uniform_pull_constant_load_gen7()).  We
> think that is happening incorrectly when the first subspan has been
> jumped over.
> 
> A possible fix is to only jump to the end of the shader if all relevant
> channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> preserves the original speedup noted in commit beafced2.  There are
> other more heavyweight fixes (i.e. don't use sampler for uniforms if
> discard in use), but this seems like a good fix.  I've appended it below
> as a patch.  It changes the shader accordingly:
> 
> before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2     
> null                            { align1 WE_all 1Q };
> after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2     
> null                            { align1 WE_all 1Q };
> after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2   
> null                            { align1 WE_all 1H };

All of this information should go in the commit message.

> I attached a test case to the bugzilla entry below.

Was the test also sent to the piglit mailing list for inclusion in the
the test suite?

> Thanks for any review or feedback.  Curious if anyone sees a better fix.
> 
> -C
> 
> 
> From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
> From: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> Date: Thu, 12 Jun 2014 09:07:18 -0600
> Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads via
>  sampler.
> 
> The series that implemented this optimization was done before
> the changes to use samplers for uniform loads.  Uniform sampler
> loads use special execution masks and only populate four
> channels, so we can't jump over those or corruption ensues.
> Use a more conservative jump mask which only jumps to the end
> if all relevant channels are disabled.
> 
> No change was observed in GLbenchmark 2.7, so the optimization
> is preserved.
> 
> Signed-off-by: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> Reviewed-by: Mike Stroyan <mike@lunarg.com <mailto:mike@lunarg.com>>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 8858852..fe05715 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
>         */
>        fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
>        discard_jump->flag_subreg = 1;
> -      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
> +
> +      /* Uniforms are now loaded using samplers with a routine that has
> +       * its own execution mask, so we can only jump if all relevant
> +       * channels are dead.  This is more conservative than the previous
> +       * four channel checking, but still preserves speedups.
> +       */
> +      discard_jump->predicate = (8 == dispatch_width)
> +                                ? BRW_PREDICATE_ALIGN1_ANY8H
> +                                : BRW_PREDICATE_ALIGN1_ANY16H;
>        discard_jump->predicate_inverse = true;
>     }
>  }
> -- 
> 1.8.3.2
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
> Send patches with git-send-email and no other method.  Save this message
> and apply it to origin/master using git-am to see why.
> 
> Hopefully Matt or Ken can review this...
> 
> On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > presence of discard.  It manifested itself in an application as randomly
> > incorrect pixels on the borders of conditional areas.
> > 
> > We believe it is due to how discards jump to the end of the shader
> > incorrectly for some channels.  The current implementation checks each
> > 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> > was turned on, it uses a full execution mask, as stated in
> > lower_uniform_pull_constant_loads(), and only populates four channels of
> > the destination (see generate_uniform_pull_constant_load_gen7()).  We
> > think that is happening incorrectly when the first subspan has been
> > jumped over.
> > 
> > A possible fix is to only jump to the end of the shader if all relevant
> > channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> > preserves the original speedup noted in commit beafced2.  There are
> > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > discard in use), but this seems like a good fix.  I've appended it below
> > as a patch.  It changes the shader accordingly:
> > 
> > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2     
> > null                            { align1 WE_all 1Q };
> > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2     
> > null                            { align1 WE_all 1Q };
> > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2   
> > null                            { align1 WE_all 1H };
> 
> All of this information should go in the commit message.
> 
> > I attached a test case to the bugzilla entry below.
> 
> Was the test also sent to the piglit mailing list for inclusion in the
> the test suite?
> 
> > Thanks for any review or feedback.  Curious if anyone sees a better fix.
> > 
> > -C
> > 
> > 
> > From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
> > From: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> > Date: Thu, 12 Jun 2014 09:07:18 -0600
> > Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads 
via
> >  sampler.
> > 
> > The series that implemented this optimization was done before
> > the changes to use samplers for uniform loads.  Uniform sampler
> > loads use special execution masks and only populate four
> > channels, so we can't jump over those or corruption ensues.
> > Use a more conservative jump mask which only jumps to the end
> > if all relevant channels are disabled.
> > 
> > No change was observed in GLbenchmark 2.7, so the optimization
> > is preserved.
> > 
> > Signed-off-by: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> > Reviewed-by: Mike Stroyan <mike@lunarg.com <mailto:mike@lunarg.com>>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 8858852..fe05715 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
> >         */
> >        fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
> >        discard_jump->flag_subreg = 1;
> > -      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
> > +
> > +      /* Uniforms are now loaded using samplers with a routine that has
> > +       * its own execution mask, so we can only jump if all relevant
> > +       * channels are dead.  This is more conservative than the previous
> > +       * four channel checking, but still preserves speedups.

This comment won't make much sense when reading the code, since "the previous 
four channel checking" no longer exists and can't be observed anywhere other 
than in git history.  Personally, I'd just drop the comment.

As far as I can tell, this is just broken and should have always been 
ANY8H/ANY16H.  Good catch, and thanks for fixing it!

> > +       */
> > +      discard_jump->predicate = (8 == dispatch_width)

Please drop the Yoda-conditional and use (dispatch_width == 8).

With those two changes, this would get:
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

If you resend a v2 using git-send-email, I'll be happy to push it.

> > +                                ? BRW_PREDICATE_ALIGN1_ANY8H
> > +                                : BRW_PREDICATE_ALIGN1_ANY16H;
> >        discard_jump->predicate_inverse = true;
> >     }
> >  }
On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
> Send patches with git-send-email and no other method.  Save this message
> and apply it to origin/master using git-am to see why.
> 
> Hopefully Matt or Ken can review this...
> 
> On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > presence of discard.  It manifested itself in an application as randomly
> > incorrect pixels on the borders of conditional areas.
> > 
> > We believe it is due to how discards jump to the end of the shader
> > incorrectly for some channels.  The current implementation checks each
> > 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> > was turned on, it uses a full execution mask, as stated in
> > lower_uniform_pull_constant_loads(), and only populates four channels of
> > the destination (see generate_uniform_pull_constant_load_gen7()).  We
> > think that is happening incorrectly when the first subspan has been
> > jumped over.
> > 
> > A possible fix is to only jump to the end of the shader if all relevant
> > channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> > preserves the original speedup noted in commit beafced2.  There are
> > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > discard in use), but this seems like a good fix.  I've appended it below
> > as a patch.  It changes the shader accordingly:
> > 
> > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2     
> > null                            { align1 WE_all 1Q };
> > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2     
> > null                            { align1 WE_all 1Q };
> > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2   
> > null                            { align1 WE_all 1H };
> 
> All of this information should go in the commit message.
> 
> > I attached a test case to the bugzilla entry below.
> 
> Was the test also sent to the piglit mailing list for inclusion in the
> the test suite?

Hi Cody,

I noticed that the Piglit test is a hack to an existing SSO test - would it be 
possible to create a new Piglit test for this, and submit it to 
piglit@lists.freedesktop.org for inclusion upstream?  It'd be great to have a 
regression test in place so we don't break it in the future.

Also, it apparently tries to use GLSL 1.40 in a legacy GL 3.0 context, which 
Mesa doesn't support.  I had to hack Mesa to get it to run.

Thanks again,
--Ken
On Tue, Jun 24, 2014 at 9:17 AM, Ian Romanick <idr@freedesktop.org> wrote:

> Send patches with git-send-email and no other method.  Save this message
> and apply it to origin/master using git-am to see why.
>
> Hopefully Matt or Ken can review this...
>
> On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > presence of discard.  It manifested itself in an application as randomly
> > incorrect pixels on the borders of conditional areas.
> >
> > We believe it is due to how discards jump to the end of the shader
> > incorrectly for some channels.  The current implementation checks each
> > 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> > was turned on, it uses a full execution mask, as stated in
> > lower_uniform_pull_constant_loads(), and only populates four channels of
> > the destination (see generate_uniform_pull_constant_load_gen7()).  We
> > think that is happening incorrectly when the first subspan has been
> > jumped over.
> >
> > A possible fix is to only jump to the end of the shader if all relevant
> > channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> > preserves the original speedup noted in commit beafced2.  There are
> > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > discard in use), but this seems like a good fix.  I've appended it below
> > as a patch.  It changes the shader accordingly:
> >
> > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
> > null                            { align1 WE_all 1Q };
> > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
> > null                            { align1 WE_all 1Q };
> > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
> > null                            { align1 WE_all 1H };
>
> All of this information should go in the commit message.
>

Will do.


>
> > I attached a test case to the bugzilla entry below.
>
> Was the test also sent to the piglit mailing list for inclusion in the
> the test suite?
>

No, it wasn't a full featured test.  I'll invest the time in creating a
better example.


>
> > Thanks for any review or feedback.  Curious if anyone sees a better fix.
> >
> > -C
> >
> >
> > From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
> > From: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> > Date: Thu, 12 Jun 2014 09:07:18 -0600
> > Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads
> via
> >  sampler.
> >
> > The series that implemented this optimization was done before
> > the changes to use samplers for uniform loads.  Uniform sampler
> > loads use special execution masks and only populate four
> > channels, so we can't jump over those or corruption ensues.
> > Use a more conservative jump mask which only jumps to the end
> > if all relevant channels are disabled.
> >
> > No change was observed in GLbenchmark 2.7, so the optimization
> > is preserved.
> >
> > Signed-off-by: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> > Reviewed-by: Mike Stroyan <mike@lunarg.com <mailto:mike@lunarg.com>>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 8858852..fe05715 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
> >         */
> >        fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
> >        discard_jump->flag_subreg = 1;
> > -      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
> > +
> > +      /* Uniforms are now loaded using samplers with a routine that has
> > +       * its own execution mask, so we can only jump if all relevant
> > +       * channels are dead.  This is more conservative than the previous
> > +       * four channel checking, but still preserves speedups.
> > +       */
> > +      discard_jump->predicate = (8 == dispatch_width)
> > +                                ? BRW_PREDICATE_ALIGN1_ANY8H
> > +                                : BRW_PREDICATE_ALIGN1_ANY16H;
> >        discard_jump->predicate_inverse = true;
> >     }
> >  }
> > --
> > 1.8.3.2
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
On Tue, Jun 24, 2014 at 10:23 AM, Kenneth Graunke <kenneth@whitecape.org>
wrote:

> On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
> > Send patches with git-send-email and no other method.  Save this message
> > and apply it to origin/master using git-am to see why.
> >
> > Hopefully Matt or Ken can review this...
> >
> > On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > > presence of discard.  It manifested itself in an application as
> randomly
> > > incorrect pixels on the borders of conditional areas.
> > >
> > > We believe it is due to how discards jump to the end of the shader
> > > incorrectly for some channels.  The current implementation checks each
> > > 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> > > was turned on, it uses a full execution mask, as stated in
> > > lower_uniform_pull_constant_loads(), and only populates four channels
> of
> > > the destination (see generate_uniform_pull_constant_load_gen7()).  We
> > > think that is happening incorrectly when the first subspan has been
> > > jumped over.
> > >
> > > A possible fix is to only jump to the end of the shader if all relevant
> > > channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> > > preserves the original speedup noted in commit beafced2.  There are
> > > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > > discard in use), but this seems like a good fix.  I've appended it
> below
> > > as a patch.  It changes the shader accordingly:
> > >
> > > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
> > > null                            { align1 WE_all 1Q };
> > > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
> > > null                            { align1 WE_all 1Q };
> > > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
> > > null                            { align1 WE_all 1H };
> >
> > All of this information should go in the commit message.
> >
> > > I attached a test case to the bugzilla entry below.
> >
> > Was the test also sent to the piglit mailing list for inclusion in the
> > the test suite?
> >
> > > Thanks for any review or feedback.  Curious if anyone sees a better
> fix.
> > >
> > > -C
> > >
> > >
> > > From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17 00:00:00 2001
> > > From: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>>
> > > Date: Thu, 12 Jun 2014 09:07:18 -0600
> > > Subject: [PATCH] i965/fs: Update discard jump to preserve uniform loads
> via
> > >  sampler.
> > >
> > > The series that implemented this optimization was done before
> > > the changes to use samplers for uniform loads.  Uniform sampler
> > > loads use special execution masks and only populate four
> > > channels, so we can't jump over those or corruption ensues.
> > > Use a more conservative jump mask which only jumps to the end
> > > if all relevant channels are disabled.
> > >
> > > No change was observed in GLbenchmark 2.7, so the optimization
> > > is preserved.
> > >
> > > Signed-off-by: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com
> >>
> > > Reviewed-by: Mike Stroyan <mike@lunarg.com <mailto:mike@lunarg.com>>
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > index 8858852..fe05715 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > > @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
> > >         */
> > >        fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
> > >        discard_jump->flag_subreg = 1;
> > > -      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
> > > +
> > > +      /* Uniforms are now loaded using samplers with a routine that
> has
> > > +       * its own execution mask, so we can only jump if all relevant
> > > +       * channels are dead.  This is more conservative than the
> previous
> > > +       * four channel checking, but still preserves speedups.
>
> This comment won't make much sense when reading the code, since "the
> previous
> four channel checking" no longer exists and can't be observed anywhere
> other
> than in git history.  Personally, I'd just drop the comment.
>
>
Sure, I'll clean up the comments.


> As far as I can tell, this is just broken and should have always been
> ANY8H/ANY16H.  Good catch, and thanks for fixing it!
>
> > > +       */
> > > +      discard_jump->predicate = (8 == dispatch_width)
>
> Please drop the Yoda-conditional and use (dispatch_width == 8).
>

Since dispatch_width is constant, it can go on the left.  '8' was more
read-only to my eyes.  I try to avoid common typo mistakes (= instead of
==).


>
> With those two changes, this would get:
> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
>
> If you resend a v2 using git-send-email, I'll be happy to push it.
>

Thanks, I'll fix all of this and send again correctly.


>
> > > +                                ? BRW_PREDICATE_ALIGN1_ANY8H
> > > +                                : BRW_PREDICATE_ALIGN1_ANY16H;
> > >        discard_jump->predicate_inverse = true;
> > >     }
> > >  }
>
On Tue, Jun 24, 2014 at 10:29 AM, Kenneth Graunke <kenneth@whitecape.org>
wrote:

> On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
> > Send patches with git-send-email and no other method.  Save this message
> > and apply it to origin/master using git-am to see why.
> >
> > Hopefully Matt or Ken can review this...
> >
> > On 06/12/2014 12:35 PM, Cody Northrop wrote:
> > > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
> > > presence of discard.  It manifested itself in an application as
> randomly
> > > incorrect pixels on the borders of conditional areas.
> > >
> > > We believe it is due to how discards jump to the end of the shader
> > > incorrectly for some channels.  The current implementation checks each
> > > 2x2 subspan to preserve derivatives.  When uniform loading via samplers
> > > was turned on, it uses a full execution mask, as stated in
> > > lower_uniform_pull_constant_loads(), and only populates four channels
> of
> > > the destination (see generate_uniform_pull_constant_load_gen7()).  We
> > > think that is happening incorrectly when the first subspan has been
> > > jumped over.
> > >
> > > A possible fix is to only jump to the end of the shader if all relevant
> > > channels are disabled, i.e. all 8 or 16, depending on dispatch.  This
> > > preserves the original speedup noted in commit beafced2.  There are
> > > other more heavyweight fixes (i.e. don't use sampler for uniforms if
> > > discard in use), but this seems like a good fix.  I've appended it
> below
> > > as a patch.  It changes the shader accordingly:
> > >
> > > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
> > > null                            { align1 WE_all 1Q };
> > > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
> > > null                            { align1 WE_all 1Q };
> > > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
> > > null                            { align1 WE_all 1H };
> >
> > All of this information should go in the commit message.
> >
> > > I attached a test case to the bugzilla entry below.
> >
> > Was the test also sent to the piglit mailing list for inclusion in the
> > the test suite?
>
> Hi Cody,
>
> I noticed that the Piglit test is a hack to an existing SSO test - would
> it be
> possible to create a new Piglit test for this, and submit it to
> piglit@lists.freedesktop.org for inclusion upstream?  It'd be great to
> have a
> regression test in place so we don't break it in the future.
>

Yes, I'll create a different test and submit it for inclusion.


>
> Also, it apparently tries to use GLSL 1.40 in a legacy GL 3.0 context,
> which
> Mesa doesn't support.  I had to hack Mesa to get it to run.
>
>
I just rebased the changes to latest mesa/piglit/waffle masters and the
test runs fine.  :\  Anyway, I agree it has a couple problems and I'll
avoid them in the new version.

Thanks again,
> --Ken
On Tuesday, June 24, 2014 02:11:42 PM Cody Northrop wrote:
[snip]
> > Hi Cody,
> >
> > I noticed that the Piglit test is a hack to an existing SSO test - would
> > it be
> > possible to create a new Piglit test for this, and submit it to
> > piglit@lists.freedesktop.org for inclusion upstream?  It'd be great to
> > have a
> > regression test in place so we don't break it in the future.
> >
> 
> Yes, I'll create a different test and submit it for inclusion.

Great, thanks!

> > Also, it apparently tries to use GLSL 1.40 in a legacy GL 3.0 context,
> > which
> > Mesa doesn't support.  I had to hack Mesa to get it to run.
> >
> I just rebased the changes to latest mesa/piglit/waffle masters and the
> test runs fine.  :\  Anyway, I agree it has a couple problems and I'll
> avoid them in the new version.

Weird - perhaps I was running it wrong.  I just applied the patch and ran

./bin/arb_separate_shader_object-rendezvous_by_location

which complained about std140 being unknown.  *shrug*
On 06/24/2014 01:09 PM, Cody Northrop wrote:
> 
> 
> 
> On Tue, Jun 24, 2014 at 10:23 AM, Kenneth Graunke <kenneth@whitecape.org
> <mailto:kenneth@whitecape.org>> wrote:
> 
>     On Tuesday, June 24, 2014 08:17:23 AM Ian Romanick wrote:
>     > Send patches with git-send-email and no other method.  Save this
>     message
>     > and apply it to origin/master using git-am to see why.
>     >
>     > Hopefully Matt or Ken can review this...
>     >
>     > On 06/12/2014 12:35 PM, Cody Northrop wrote:
>     > > Commit 17c7ead7 exposed a bug in how uniform loading happens in the
>     > > presence of discard.  It manifested itself in an application as
>     randomly
>     > > incorrect pixels on the borders of conditional areas.
>     > >
>     > > We believe it is due to how discards jump to the end of the shader
>     > > incorrectly for some channels.  The current implementation
>     checks each
>     > > 2x2 subspan to preserve derivatives.  When uniform loading via
>     samplers
>     > > was turned on, it uses a full execution mask, as stated in
>     > > lower_uniform_pull_constant_loads(), and only populates four
>     channels of
>     > > the destination (see
>     generate_uniform_pull_constant_load_gen7()).  We
>     > > think that is happening incorrectly when the first subspan has been
>     > > jumped over.
>     > >
>     > > A possible fix is to only jump to the end of the shader if all
>     relevant
>     > > channels are disabled, i.e. all 8 or 16, depending on dispatch.
>      This
>     > > preserves the original speedup noted in commit beafced2.  There are
>     > > other more heavyweight fixes (i.e. don't use sampler for uniforms if
>     > > discard in use), but this seems like a good fix.  I've appended
>     it below
>     > > as a patch.  It changes the shader accordingly:
>     > >
>     > > before    : 0x000000d0: (-f0.1.any4h) halt(8) 17 2
>     > > null                            { align1 WE_all 1Q };
>     > > after(8)  : 0x000000d0: (-f0.1.any8h) halt(8) 17 2
>     > > null                            { align1 WE_all 1Q };
>     > > after(16) : 0x00000250: (-f0.1.any16h) halt(16) 17 2
>     > > null                            { align1 WE_all 1H };
>     >
>     > All of this information should go in the commit message.
>     >
>     > > I attached a test case to the bugzilla entry below.
>     >
>     > Was the test also sent to the piglit mailing list for inclusion in the
>     > the test suite?
>     >
>     > > Thanks for any review or feedback.  Curious if anyone sees a
>     better fix.
>     > >
>     > > -C
>     > >
>     > >
>     > > From 871671c4ab8ff00e85b434865e8855fc356efa8f Mon Sep 17
>     00:00:00 2001
>     > > From: Cody Northrop <cody@lunarg.com <mailto:cody@lunarg.com>
>     <mailto:cody@lunarg.com <mailto:cody@lunarg.com>>>
>     > > Date: Thu, 12 Jun 2014 09:07:18 -0600
>     > > Subject: [PATCH] i965/fs: Update discard jump to preserve
>     uniform loads
>     via
>     > >  sampler.
>     > >
>     > > The series that implemented this optimization was done before
>     > > the changes to use samplers for uniform loads.  Uniform sampler
>     > > loads use special execution masks and only populate four
>     > > channels, so we can't jump over those or corruption ensues.
>     > > Use a more conservative jump mask which only jumps to the end
>     > > if all relevant channels are disabled.
>     > >
>     > > No change was observed in GLbenchmark 2.7, so the optimization
>     > > is preserved.
>     > >
>     > > Signed-off-by: Cody Northrop <cody@lunarg.com
>     <mailto:cody@lunarg.com> <mailto:cody@lunarg.com
>     <mailto:cody@lunarg.com>>>
>     > > Reviewed-by: Mike Stroyan <mike@lunarg.com
>     <mailto:mike@lunarg.com> <mailto:mike@lunarg.com
>     <mailto:mike@lunarg.com>>>
>     > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
>     > > ---
>     > >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 10 +++++++++-
>     > >  1 file changed, 9 insertions(+), 1 deletion(-)
>     > >
>     > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>     > > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>     > > index 8858852..fe05715 100644
>     > > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>     > > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>     > > @@ -1907,7 +1907,15 @@ fs_visitor::visit(ir_discard *ir)
>     > >         */
>     > >        fs_inst *discard_jump = emit(FS_OPCODE_DISCARD_JUMP);
>     > >        discard_jump->flag_subreg = 1;
>     > > -      discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;
>     > > +
>     > > +      /* Uniforms are now loaded using samplers with a routine
>     that has
>     > > +       * its own execution mask, so we can only jump if all
>     relevant
>     > > +       * channels are dead.  This is more conservative than the
>     previous
>     > > +       * four channel checking, but still preserves speedups.
> 
>     This comment won't make much sense when reading the code, since "the
>     previous
>     four channel checking" no longer exists and can't be observed
>     anywhere other
>     than in git history.  Personally, I'd just drop the comment.
> 
> 
> Sure, I'll clean up the comments.
>  
> 
>     As far as I can tell, this is just broken and should have always been
>     ANY8H/ANY16H.  Good catch, and thanks for fixing it!
> 
>     > > +       */
>     > > +      discard_jump->predicate = (8 == dispatch_width)
> 
>     Please drop the Yoda-conditional and use (dispatch_width == 8).
> 
> 
> Since dispatch_width is constant, it can go on the left.  '8' was more
> read-only to my eyes.  I try to avoid common typo mistakes (= instead of
> ==).

Please follow the coding practices of the project.  We do not use "Yoda
conditionals."  The common typo mistake is so well known that the
compiler emits a warning if you use it.

>     With those two changes, this would get:
>     Reviewed-by: Kenneth Graunke <kenneth@whitecape.org
>     <mailto:kenneth@whitecape.org>>
> 
>     If you resend a v2 using git-send-email, I'll be happy to push it.
> 
> 
> Thanks, I'll fix all of this and send again correctly.
>  
> 
> 
>     > > +                                ? BRW_PREDICATE_ALIGN1_ANY8H
>     > > +                                : BRW_PREDICATE_ALIGN1_ANY16H;
>     > >        discard_jump->predicate_inverse = true;
>     > >     }
>     > >  }
> 
> -- 
>  Cody Northrop
>  Graphics Software Engineer
>  LunarG, Inc.- 3D Driver Innovations
>  Email: cody@lunarg.com <mailto:cody@lunarg.com>
>  Website: http://www.lunarg.com <http://www.lunarg.com/>