[Mesa-dev,2/3] i965: Add runtime checks for line antialiasing in Gen < 6.

Submitted by Iago Toral Quiroga on May 27, 2014, 10:50 a.m.

Details

Message ID 1401187816-14168-3-git-send-email-itoral@igalia.com
State Superseded
Headers show

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga May 27, 2014, 10:50 a.m.
In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
has to be sent as part of the framebuffer write SEND message. This affects the
specific case where we have setup antialiased line rendering and we render
polygons which have one face setup in GL_LINE mode (line antialiasing
will be used) and the other one in GL_FILL mode (no line antialiasing needed).

Currently we are not doing this runtime test and instead we always send AA
data, which produces incorrect rendering of the GL_FILL face of the polygon in
in the aforementioned scenario (verified in ironlake and gm45).

In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
Gen5 this has never worked properly. Gen > 5 are not affected by this.

The patch fixes the problem by adding the appropriate runtime check and
adjusting the framebuffer write message accordingly in the conflictive
scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
---
 src/mesa/drivers/dri/i965/brw_fs.h           |  4 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 +++++++++++++++++-----------
 2 files changed, 58 insertions(+), 32 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index 60a4906..ab8912f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -452,6 +452,10 @@  public:
 
    void emit_color_write(int target, int index, int first_color_mrf);
    void emit_alpha_test();
+   void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
+                         bool header_present);
+   void emit_fb_write(int target, int base_mrf, int mlen, bool eot,
+                      bool header_present);
    void emit_fb_writes();
 
    void emit_shader_time_begin();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 171f063..4c3897b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -2731,6 +2731,54 @@  fs_visitor::emit_alpha_test()
 }
 
 void
+fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
+                             bool header_present)
+{
+   fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
+   inst->target = target;
+   inst->base_mrf = base_mrf;
+   inst->mlen = mlen;
+   inst->eot = eot;
+   inst->header_present = header_present;
+   if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
+      inst->predicate = BRW_PREDICATE_NORMAL;
+      inst->flag_subreg = 1;
+   }
+}
+
+void
+fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot,
+                          bool header_present)
+{
+   if (!runtime_check_aads_emit) {
+      do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
+   } else {
+      /* This can only happen in Gen < 6
+       */
+      fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type);
+      emit(AND(reg_tmp_ud,
+               fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)),
+               fs_reg(brw_imm_ud(1<<26))));
+      emit(CMP(reg_null_ud,
+               reg_tmp_ud,
+               fs_reg(brw_imm_ud(0)),
+               BRW_CONDITIONAL_Z));
+      emit(IF(BRW_PREDICATE_NORMAL));
+      {
+         /* Shift message header one register since we are not sending
+          * AA data stored in base_mrf+2
+          */
+         do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present);
+      }
+      emit(BRW_OPCODE_ELSE);
+      {
+         do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
+      }
+      emit(BRW_OPCODE_ENDIF);
+   }
+}
+
+void
 fs_visitor::emit_fb_writes()
 {
    this->current_annotation = "FB write header";
@@ -2848,16 +2896,7 @@  fs_visitor::emit_fb_writes()
       if (INTEL_DEBUG & DEBUG_SHADER_TIME)
          emit_shader_time_end();
 
-      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
-      inst->target = 0;
-      inst->base_mrf = base_mrf;
-      inst->mlen = nr - base_mrf;
-      inst->eot = true;
-      inst->header_present = header_present;
-      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
-         inst->predicate = BRW_PREDICATE_NORMAL;
-         inst->flag_subreg = 1;
-      }
+      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
 
       prog_data->dual_src_blend = true;
       this->current_annotation = NULL;
@@ -2894,19 +2933,10 @@  fs_visitor::emit_fb_writes()
             emit_shader_time_end();
       }
 
-      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
-      inst->target = target;
-      inst->base_mrf = base_mrf;
-      if (src0_alpha_to_render_target && target == 0)
-         inst->mlen = nr - base_mrf - reg_width;
-      else
-         inst->mlen = nr - base_mrf;
-      inst->eot = eot;
-      inst->header_present = header_present;
-      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
-         inst->predicate = BRW_PREDICATE_NORMAL;
-         inst->flag_subreg = 1;
-      }
+      int mlen = (src0_alpha_to_render_target && target == 0) ?
+         nr - base_mrf - reg_width : nr - base_mrf;
+
+      emit_fb_write(target, base_mrf, mlen, eot, header_present);
    }
 
    if (key->nr_color_regions == 0) {
@@ -2919,15 +2949,7 @@  fs_visitor::emit_fb_writes()
       if (INTEL_DEBUG & DEBUG_SHADER_TIME)
          emit_shader_time_end();
 
-      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
-      inst->base_mrf = base_mrf;
-      inst->mlen = nr - base_mrf;
-      inst->eot = true;
-      inst->header_present = header_present;
-      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
-         inst->predicate = BRW_PREDICATE_NORMAL;
-         inst->flag_subreg = 1;
-      }
+      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
    }
 
    this->current_annotation = NULL;

Comments

On 05/27/2014 03:50 AM, Iago Toral Quiroga wrote:
> In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
> has to be sent as part of the framebuffer write SEND message. This affects the
> specific case where we have setup antialiased line rendering and we render
> polygons which have one face setup in GL_LINE mode (line antialiasing
> will be used) and the other one in GL_FILL mode (no line antialiasing needed).
> 
> Currently we are not doing this runtime test and instead we always send AA
> data, which produces incorrect rendering of the GL_FILL face of the polygon in
> in the aforementioned scenario (verified in ironlake and gm45).
> 
> In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
> Gen5 this has never worked properly. Gen > 5 are not affected by this.
> 
> The patch fixes the problem by adding the appropriate runtime check and
> adjusting the framebuffer write message accordingly in the conflictive
> scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE).
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
> ---
>  src/mesa/drivers/dri/i965/brw_fs.h           |  4 ++
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 +++++++++++++++++-----------
>  2 files changed, 58 insertions(+), 32 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index 60a4906..ab8912f 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -452,6 +452,10 @@ public:
>  
>     void emit_color_write(int target, int index, int first_color_mrf);
>     void emit_alpha_test();
> +   void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> +                         bool header_present);
> +   void emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> +                      bool header_present);
>     void emit_fb_writes();
>  
>     void emit_shader_time_begin();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index 171f063..4c3897b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -2731,6 +2731,54 @@ fs_visitor::emit_alpha_test()
>  }
>  
>  void
> +fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> +                             bool header_present)
> +{
> +   fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> +   inst->target = target;
> +   inst->base_mrf = base_mrf;
> +   inst->mlen = mlen;
> +   inst->eot = eot;
> +   inst->header_present = header_present;
> +   if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> +      inst->predicate = BRW_PREDICATE_NORMAL;
> +      inst->flag_subreg = 1;
> +   }
> +}
> +
> +void
> +fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> +                          bool header_present)
> +{
> +   if (!runtime_check_aads_emit) {
> +      do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> +   } else {
> +      /* This can only happen in Gen < 6
> +       */
> +      fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type);
> +      emit(AND(reg_tmp_ud,
> +               fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)),

I think

    retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)

might be a little clearer than:

    get_element_ud(brw_vec8_grf(1,0), 6))

since it just refers to r1.6 right away, rather than r1.0 modified to
have a suboffset of 6.

> +               fs_reg(brw_imm_ud(1<<26))));
> +      emit(CMP(reg_null_ud,
> +               reg_tmp_ud,
> +               fs_reg(brw_imm_ud(0)),
> +               BRW_CONDITIONAL_Z));

You can actually generate a flag condition directly from the AND
instruction, and eliminate the CMP:

fs_inst *inst =
   emit(AND(reg_null_ud,
            fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD),
            fs_reg(0)));
inst->conditional_mod = BRW_CONDITIONAL_Z;

(you might have to use vec1(retype(brw_null_reg), BRW_REGISTER_TYPE_UD),
rather than reg_null_ud.)

> +      emit(IF(BRW_PREDICATE_NORMAL));
> +      {
> +         /* Shift message header one register since we are not sending
> +          * AA data stored in base_mrf+2
> +          */
> +         do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present);
> +      }
> +      emit(BRW_OPCODE_ELSE);
> +      {
> +         do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> +      }
> +      emit(BRW_OPCODE_ENDIF);

I suppose this probably works, but I've never seen a program end with an
ENDIF.  I'm not really comfortable with doing that, since the ENDIF
should never be executed.  With JMPI or ADD brw_ip_reg(), we'd just jump
over one instruction and then execute one FB write or the other, which
seems really straightforward.

But, looking at the bug, I see you tried both of those suggestions, and
it didn't work for some reason.  Huh.  It really should...maybe I can
look into why...

> +   }
> +}
> +
> +void
>  fs_visitor::emit_fb_writes()
>  {
>     this->current_annotation = "FB write header";
> @@ -2848,16 +2896,7 @@ fs_visitor::emit_fb_writes()
>        if (INTEL_DEBUG & DEBUG_SHADER_TIME)
>           emit_shader_time_end();
>  
> -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> -      inst->target = 0;
> -      inst->base_mrf = base_mrf;
> -      inst->mlen = nr - base_mrf;
> -      inst->eot = true;
> -      inst->header_present = header_present;
> -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> -         inst->predicate = BRW_PREDICATE_NORMAL;
> -         inst->flag_subreg = 1;
> -      }
> +      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
>  
>        prog_data->dual_src_blend = true;
>        this->current_annotation = NULL;
> @@ -2894,19 +2933,10 @@ fs_visitor::emit_fb_writes()
>              emit_shader_time_end();
>        }
>  
> -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> -      inst->target = target;
> -      inst->base_mrf = base_mrf;
> -      if (src0_alpha_to_render_target && target == 0)
> -         inst->mlen = nr - base_mrf - reg_width;
> -      else
> -         inst->mlen = nr - base_mrf;
> -      inst->eot = eot;
> -      inst->header_present = header_present;
> -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> -         inst->predicate = BRW_PREDICATE_NORMAL;
> -         inst->flag_subreg = 1;
> -      }
> +      int mlen = (src0_alpha_to_render_target && target == 0) ?
> +         nr - base_mrf - reg_width : nr - base_mrf;

Might be a bit easier to read as:

int mlen = nr - base_mrf;
if (src0_alpha_to_render_target && target == 0)
   mlen -= reg_width;

But either way is fine.

> +
> +      emit_fb_write(target, base_mrf, mlen, eot, header_present);
>     }
>  
>     if (key->nr_color_regions == 0) {
> @@ -2919,15 +2949,7 @@ fs_visitor::emit_fb_writes()
>        if (INTEL_DEBUG & DEBUG_SHADER_TIME)
>           emit_shader_time_end();
>  
> -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> -      inst->base_mrf = base_mrf;
> -      inst->mlen = nr - base_mrf;
> -      inst->eot = true;
> -      inst->header_present = header_present;
> -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> -         inst->predicate = BRW_PREDICATE_NORMAL;
> -         inst->flag_subreg = 1;
> -      }
> +      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
>     }
>  
>     this->current_annotation = NULL;
>
Hi Kenneth,

On Wed, 2014-05-28 at 12:42 -0700, Kenneth Graunke wrote:
> On 05/27/2014 03:50 AM, Iago Toral Quiroga wrote:
> > In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
> > has to be sent as part of the framebuffer write SEND message. This affects the
> > specific case where we have setup antialiased line rendering and we render
> > polygons which have one face setup in GL_LINE mode (line antialiasing
> > will be used) and the other one in GL_FILL mode (no line antialiasing needed).
> > 
> > Currently we are not doing this runtime test and instead we always send AA
> > data, which produces incorrect rendering of the GL_FILL face of the polygon in
> > in the aforementioned scenario (verified in ironlake and gm45).
> > 
> > In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
> > Gen5 this has never worked properly. Gen > 5 are not affected by this.
> > 
> > The patch fixes the problem by adding the appropriate runtime check and
> > adjusting the framebuffer write message accordingly in the conflictive
> > scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE).
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.h           |  4 ++
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 +++++++++++++++++-----------
> >  2 files changed, 58 insertions(+), 32 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 60a4906..ab8912f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -452,6 +452,10 @@ public:
> >  
> >     void emit_color_write(int target, int index, int first_color_mrf);
> >     void emit_alpha_test();
> > +   void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +                         bool header_present);
> > +   void emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +                      bool header_present);
> >     void emit_fb_writes();
> >  
> >     void emit_shader_time_begin();
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 171f063..4c3897b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -2731,6 +2731,54 @@ fs_visitor::emit_alpha_test()
> >  }
> >  
> >  void
> > +fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +                             bool header_present)
> > +{
> > +   fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> > +   inst->target = target;
> > +   inst->base_mrf = base_mrf;
> > +   inst->mlen = mlen;
> > +   inst->eot = eot;
> > +   inst->header_present = header_present;
> > +   if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> > +      inst->predicate = BRW_PREDICATE_NORMAL;
> > +      inst->flag_subreg = 1;
> > +   }
> > +}
> > +
> > +void
> > +fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot,
> > +                          bool header_present)
> > +{
> > +   if (!runtime_check_aads_emit) {
> > +      do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> > +   } else {
> > +      /* This can only happen in Gen < 6
> > +       */
> > +      fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type);
> > +      emit(AND(reg_tmp_ud,
> > +               fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)),
> 
> I think
> 
>     retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)
> 
> might be a little clearer than:
> 
>     get_element_ud(brw_vec8_grf(1,0), 6))
> 
> since it just refers to r1.6 right away, rather than r1.0 modified to
> have a suboffset of 6.

Sure, that looks better.

> > +               fs_reg(brw_imm_ud(1<<26))));
> > +      emit(CMP(reg_null_ud,
> > +               reg_tmp_ud,
> > +               fs_reg(brw_imm_ud(0)),
> > +               BRW_CONDITIONAL_Z));
> 
> You can actually generate a flag condition directly from the AND
> instruction, and eliminate the CMP:
> 
> fs_inst *inst =
>    emit(AND(reg_null_ud,
>             fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD),
>             fs_reg(0)));
> inst->conditional_mod = BRW_CONDITIONAL_Z;
> 
> (you might have to use vec1(retype(brw_null_reg), BRW_REGISTER_TYPE_UD),
> rather than reg_null_ud.)

Oh, that's much nicer! We also get rid of the temporary register with
this.

I think you messed up the parameters to the AND though. I believe this
is what you meant:

fs_inst *inst =
    emit(AND(vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD)),
             fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)),
             fs_reg(brw_imm_ud(1<<26))));
inst->conditional_mod = BRW_CONDITIONAL_Z;

> > +      emit(IF(BRW_PREDICATE_NORMAL));
> > +      {
> > +         /* Shift message header one register since we are not sending
> > +          * AA data stored in base_mrf+2
> > +          */
> > +         do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present);
> > +      }
> > +      emit(BRW_OPCODE_ELSE);
> > +      {
> > +         do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> > +      }
> > +      emit(BRW_OPCODE_ENDIF);
> 
> I suppose this probably works, but I've never seen a program end with an
> ENDIF.  I'm not really comfortable with doing that, since the ENDIF
> should never be executed.  With JMPI or ADD brw_ip_reg(), we'd just jump
> over one instruction and then execute one FB write or the other, which
> seems really straightforward.
> 
> But, looking at the bug, I see you tried both of those suggestions, and
> it didn't work for some reason.  Huh.  It really should...maybe I can
> look into why...

Right, I don't understand why that code did not work for me either. More
over, the code did not work as I expected in other ways, for example I
noticed that reverting the condition and the order of the fb_writes
(which should be completely the same thing) changed the behavior too...
I remember that it produced GPU hangs but I tried so many versions that
I can't tell... it felt as if the two fb_writes were being executed.  I
tried all kinds of things to make it work as intended, even including a
manual JMPI at the end of the IF branch to force the jump over the
second fb_write... but nothing.

I commented about this and included the C code and the generated
assembly for the JMPI version hoping that someone could tell if
something was off here:

http://lists.freedesktop.org/archives/mesa-dev/2014-May/059985.html

Maybe you can have a quick look at that see if there is something
obvious that I am missing...

Also, when I was trying the JMPI version I was developing the patches in
the generator rather than the visitor. I suppose that should not make
any difference though.

One last thing, JMPI was recently made private to the sf module, so if
we want to do go with it we need to revert f3cb2e6ed705.

> > +   }
> > +}
> > +
> > +void
> >  fs_visitor::emit_fb_writes()
> >  {
> >     this->current_annotation = "FB write header";
> > @@ -2848,16 +2896,7 @@ fs_visitor::emit_fb_writes()
> >        if (INTEL_DEBUG & DEBUG_SHADER_TIME)
> >           emit_shader_time_end();
> >  
> > -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> > -      inst->target = 0;
> > -      inst->base_mrf = base_mrf;
> > -      inst->mlen = nr - base_mrf;
> > -      inst->eot = true;
> > -      inst->header_present = header_present;
> > -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> > -         inst->predicate = BRW_PREDICATE_NORMAL;
> > -         inst->flag_subreg = 1;
> > -      }
> > +      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
> >  
> >        prog_data->dual_src_blend = true;
> >        this->current_annotation = NULL;
> > @@ -2894,19 +2933,10 @@ fs_visitor::emit_fb_writes()
> >              emit_shader_time_end();
> >        }
> >  
> > -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> > -      inst->target = target;
> > -      inst->base_mrf = base_mrf;
> > -      if (src0_alpha_to_render_target && target == 0)
> > -         inst->mlen = nr - base_mrf - reg_width;
> > -      else
> > -         inst->mlen = nr - base_mrf;
> > -      inst->eot = eot;
> > -      inst->header_present = header_present;
> > -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> > -         inst->predicate = BRW_PREDICATE_NORMAL;
> > -         inst->flag_subreg = 1;
> > -      }
> > +      int mlen = (src0_alpha_to_render_target && target == 0) ?
> > +         nr - base_mrf - reg_width : nr - base_mrf;
> 
> Might be a bit easier to read as:
> 
> int mlen = nr - base_mrf;
> if (src0_alpha_to_render_target && target == 0)
>    mlen -= reg_width;
> 
> But either way is fine.

Yeah, I think this looks better too.

> > +
> > +      emit_fb_write(target, base_mrf, mlen, eot, header_present);
> >     }
> >  
> >     if (key->nr_color_regions == 0) {
> > @@ -2919,15 +2949,7 @@ fs_visitor::emit_fb_writes()
> >        if (INTEL_DEBUG & DEBUG_SHADER_TIME)
> >           emit_shader_time_end();
> >  
> > -      fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
> > -      inst->base_mrf = base_mrf;
> > -      inst->mlen = nr - base_mrf;
> > -      inst->eot = true;
> > -      inst->header_present = header_present;
> > -      if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
> > -         inst->predicate = BRW_PREDICATE_NORMAL;
> > -         inst->flag_subreg = 1;
> > -      }
> > +      emit_fb_write(0, base_mrf, nr - base_mrf, true, header_present);
> >     }
> >  
> >     this->current_annotation = NULL;
> > 
> 
>
On 05/29/2014 01:21 AM, Iago Toral wrote:
> Hi Kenneth,
> 
> On Wed, 2014-05-28 at 12:42 -0700, Kenneth Graunke wrote:
>> On 05/27/2014 03:50 AM, Iago Toral Quiroga wrote:
>>> In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
>>> has to be sent as part of the framebuffer write SEND message. This affects the
>>> specific case where we have setup antialiased line rendering and we render
>>> polygons which have one face setup in GL_LINE mode (line antialiasing
>>> will be used) and the other one in GL_FILL mode (no line antialiasing needed).
>>>
>>> Currently we are not doing this runtime test and instead we always send AA
>>> data, which produces incorrect rendering of the GL_FILL face of the polygon in
>>> in the aforementioned scenario (verified in ironlake and gm45).
>>>
>>> In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In
>>> Gen5 this has never worked properly. Gen > 5 are not affected by this.
>>>
>>> The patch fixes the problem by adding the appropriate runtime check and
>>> adjusting the framebuffer write message accordingly in the conflictive
>>> scenario (detected with fs_visitor::runtime_check_aads_emit == TRUE).
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.h           |  4 ++
>>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 +++++++++++++++++-----------
>>>  2 files changed, 58 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>>> index 60a4906..ab8912f 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>>> @@ -452,6 +452,10 @@ public:
>>>  
>>>     void emit_color_write(int target, int index, int first_color_mrf);
>>>     void emit_alpha_test();
>>> +   void do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
>>> +                         bool header_present);
>>> +   void emit_fb_write(int target, int base_mrf, int mlen, bool eot,
>>> +                      bool header_present);
>>>     void emit_fb_writes();
>>>  
>>>     void emit_shader_time_begin();
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> index 171f063..4c3897b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> @@ -2731,6 +2731,54 @@ fs_visitor::emit_alpha_test()
>>>  }
>>>  
>>>  void
>>> +fs_visitor::do_emit_fb_write(int target, int base_mrf, int mlen, bool eot,
>>> +                             bool header_present)
>>> +{
>>> +   fs_inst *inst = emit(FS_OPCODE_FB_WRITE);
>>> +   inst->target = target;
>>> +   inst->base_mrf = base_mrf;
>>> +   inst->mlen = mlen;
>>> +   inst->eot = eot;
>>> +   inst->header_present = header_present;
>>> +   if ((brw->gen >= 8 || brw->is_haswell) && fp->UsesKill) {
>>> +      inst->predicate = BRW_PREDICATE_NORMAL;
>>> +      inst->flag_subreg = 1;
>>> +   }
>>> +}
>>> +
>>> +void
>>> +fs_visitor::emit_fb_write(int target, int base_mrf, int mlen, bool eot,
>>> +                          bool header_present)
>>> +{
>>> +   if (!runtime_check_aads_emit) {
>>> +      do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
>>> +   } else {
>>> +      /* This can only happen in Gen < 6
>>> +       */
>>> +      fs_reg reg_tmp_ud = fs_reg(this, glsl_type::uint_type);
>>> +      emit(AND(reg_tmp_ud,
>>> +               fs_reg(get_element_ud(brw_vec8_grf(1,0), 6)),
>>
>> I think
>>
>>     retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)
>>
>> might be a little clearer than:
>>
>>     get_element_ud(brw_vec8_grf(1,0), 6))
>>
>> since it just refers to r1.6 right away, rather than r1.0 modified to
>> have a suboffset of 6.
> 
> Sure, that looks better.
> 
>>> +               fs_reg(brw_imm_ud(1<<26))));
>>> +      emit(CMP(reg_null_ud,
>>> +               reg_tmp_ud,
>>> +               fs_reg(brw_imm_ud(0)),
>>> +               BRW_CONDITIONAL_Z));
>>
>> You can actually generate a flag condition directly from the AND
>> instruction, and eliminate the CMP:
>>
>> fs_inst *inst =
>>    emit(AND(reg_null_ud,
>>             fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD),
>>             fs_reg(0)));
>> inst->conditional_mod = BRW_CONDITIONAL_Z;
>>
>> (you might have to use vec1(retype(brw_null_reg), BRW_REGISTER_TYPE_UD),
>> rather than reg_null_ud.)
> 
> Oh, that's much nicer! We also get rid of the temporary register with
> this.
> 
> I think you messed up the parameters to the AND though. I believe this
> is what you meant:
> 
> fs_inst *inst =
>     emit(AND(vec1(retype(brw_null_reg(), BRW_REGISTER_TYPE_UD)),
>              fs_reg(retype(brw_vec1_grf(1, 6), BRW_REGISTER_TYPE_UD)),
>              fs_reg(brw_imm_ud(1<<26))));
> inst->conditional_mod = BRW_CONDITIONAL_Z;

Whoops.  Yeah, that's what I meant :)

>>> +      emit(IF(BRW_PREDICATE_NORMAL));
>>> +      {
>>> +         /* Shift message header one register since we are not sending
>>> +          * AA data stored in base_mrf+2
>>> +          */
>>> +         do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present);
>>> +      }
>>> +      emit(BRW_OPCODE_ELSE);
>>> +      {
>>> +         do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
>>> +      }
>>> +      emit(BRW_OPCODE_ENDIF);
>>
>> I suppose this probably works, but I've never seen a program end with an
>> ENDIF.  I'm not really comfortable with doing that, since the ENDIF
>> should never be executed.  With JMPI or ADD brw_ip_reg(), we'd just jump
>> over one instruction and then execute one FB write or the other, which
>> seems really straightforward.
>>
>> But, looking at the bug, I see you tried both of those suggestions, and
>> it didn't work for some reason.  Huh.  It really should...maybe I can
>> look into why...
> 
> Right, I don't understand why that code did not work for me either. More
> over, the code did not work as I expected in other ways, for example I
> noticed that reverting the condition and the order of the fb_writes
> (which should be completely the same thing) changed the behavior too...
> I remember that it produced GPU hangs but I tried so many versions that
> I can't tell... it felt as if the two fb_writes were being executed.  I
> tried all kinds of things to make it work as intended, even including a
> manual JMPI at the end of the IF branch to force the jump over the
> second fb_write... but nothing.
> 
> I commented about this and included the C code and the generated
> assembly for the JMPI version hoping that someone could tell if
> something was off here:
> 
> http://lists.freedesktop.org/archives/mesa-dev/2014-May/059985.html
> 
> Maybe you can have a quick look at that see if there is something
> obvious that I am missing...

Yeah, I'll try and take a look...

> Also, when I was trying the JMPI version I was developing the patches in
> the generator rather than the visitor. I suppose that should not make
> any difference though.
> 
> One last thing, JMPI was recently made private to the sf module, so if
> we want to do go with it we need to revert f3cb2e6ed705.

It wasn't, actually - brw_JMPI is still available in brw_eu_emit.c.
It's only brw_land_fwd_jump that moved to brw_sf_state.c, but all that
does is set the jump distance on the JMPI instruction.  It seems like
you know the jump distance when emitting the JMPI (jump over 1 FB write
and maybe some MOVs) so you could just set it straightaway, and skip
that function.  Or you could revert it - that'd be fine too.
On Thu, 2014-05-29 at 08:31 -0700, Kenneth Graunke wrote:
(...)
> >>> +      emit(IF(BRW_PREDICATE_NORMAL));
> >>> +      {
> >>> +         /* Shift message header one register since we are not sending
> >>> +          * AA data stored in base_mrf+2
> >>> +          */
> >>> +         do_emit_fb_write(target, base_mrf + 1, mlen - 1, eot, header_present);
> >>> +      }
> >>> +      emit(BRW_OPCODE_ELSE);
> >>> +      {
> >>> +         do_emit_fb_write(target, base_mrf, mlen, eot, header_present);
> >>> +      }
> >>> +      emit(BRW_OPCODE_ENDIF);
> >>
> >> I suppose this probably works, but I've never seen a program end with an
> >> ENDIF.  I'm not really comfortable with doing that, since the ENDIF
> >> should never be executed.  With JMPI or ADD brw_ip_reg(), we'd just jump
> >> over one instruction and then execute one FB write or the other, which
> >> seems really straightforward.
> >>
> >> But, looking at the bug, I see you tried both of those suggestions, and
> >> it didn't work for some reason.  Huh.  It really should...maybe I can
> >> look into why...
> > 
> > Right, I don't understand why that code did not work for me either. More
> > over, the code did not work as I expected in other ways, for example I
> > noticed that reverting the condition and the order of the fb_writes
> > (which should be completely the same thing) changed the behavior too...
> > I remember that it produced GPU hangs but I tried so many versions that
> > I can't tell... it felt as if the two fb_writes were being executed.  I
> > tried all kinds of things to make it work as intended, even including a
> > manual JMPI at the end of the IF branch to force the jump over the
> > second fb_write... but nothing.
> > 
> > I commented about this and included the C code and the generated
> > assembly for the JMPI version hoping that someone could tell if
> > something was off here:
> > 
> > http://lists.freedesktop.org/archives/mesa-dev/2014-May/059985.html
> > 
> > Maybe you can have a quick look at that see if there is something
> > obvious that I am missing...
> 
> Yeah, I'll try and take a look...

I found what was going on with this. I have to manually set the
execution_size of the JMPI instruction to BRW_EXECUTE_1, that seems to
fix the problem.

> > Also, when I was trying the JMPI version I was developing the patches in
> > the generator rather than the visitor. I suppose that should not make
> > any difference though.
> > 
> > One last thing, JMPI was recently made private to the sf module, so if
> > we want to do go with it we need to revert f3cb2e6ed705.
> 
> It wasn't, actually - brw_JMPI is still available in brw_eu_emit.c.
> It's only brw_land_fwd_jump that moved to brw_sf_state.c, but all that
> does is set the jump distance on the JMPI instruction.  It seems like
> you know the jump distance when emitting the JMPI (jump over 1 FB write
> and maybe some MOVs) so you could just set it straightaway, and skip
> that function.  Or you could revert it - that'd be fine too.

Oh, right... I did not notice that.

Still, brw_land_fwd_jump() seems very convenient and safer to use than
relying on manual calculations for the jump distance (I think that would
be specially error prone if someone has to change the code at a later
point, since they have to know that they need to alter the jump distance
if they add/remove instructions that execute in the conditional) so I
think reverting is the best option.

Iago