[Mesa-dev,PATCHv3,09/20] i965/fs: Get 64-bit indirect moves working on IVB.

Submitted by Francisco Jerez on Feb. 9, 2017, 6:16 p.m.

Details

Message ID 20170209181658.19472-1-currojerez@riseup.net
State New
Headers show
Series "Series without cover letter" ( rev: 2 ) in Mesa

Not browsing as part of any series.

Commit Message

Francisco Jerez Feb. 9, 2017, 6:16 p.m.
---
This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
doubles is not supported in IVB/BYT".

src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 ++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index ea4a3fe1399..0e2dbe23571 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -440,7 +440,7 @@  fs_generator::generate_mov_indirect(fs_inst *inst,
       brw_MOV(p, dst, reg);
    } else {
       /* Prior to Broadwell, there are only 8 address registers. */
-      assert(inst->exec_size == 8 || devinfo->gen >= 8);
+      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
 
       /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
       struct brw_reg addr = vec8(brw_address_reg(0));
@@ -478,7 +478,30 @@  fs_generator::generate_mov_indirect(fs_inst *inst,
        * code, using it saves us 0 instructions and would require quite a bit
        * of case-by-case work.  It's just not worth it.
        */
-      brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset));
+      if (devinfo->gen >= 8 || devinfo->is_haswell || type_sz(reg.type) < 8) {
+         brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset));
+      } else {
+         /* IVB reads two address register components per channel for
+          * indirectly addressed 64-bit sources, so we need to initialize
+          * adjacent address components to consecutive dwords of the source
+          * region by emitting two separate ADD instructions.  Found
+          * empirically.
+          */
+         assert(inst->exec_size <= 4);
+         brw_push_insn_state(p);
+         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
+
+         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
+                 brw_imm_uw(imm_byte_offset));
+         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
+
+         brw_ADD(p, spread(suboffset(addr, 1), 2), indirect_byte_offset,
+                 brw_imm_uw(imm_byte_offset + 4));
+         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
+
+         brw_pop_insn_state(p);
+      }
+
       struct brw_reg ind_src = brw_VxH_indirect(0, 0);
 
       brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));

Comments

Francisco Jerez <currojerez@riseup.net> writes:

> ---
> This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
> doubles is not supported in IVB/BYT".
>

Note that some of the fp64 indirect addressing test-cases still fail on
IVB even with this patch applied, but the reason doesn't seem to have
anything to do with indirect addressing.  Apparently the remaining
failures are caused by illegal code like:

| sel(8)          g30<1>D         g9<0,2,1>DF     g8.3<0,2,1>DF   { align1 1Q };

The problem is that the SEL instruction doesn't support most datatype
conversions, so this leads to data corruption on at least IVB.
According to the hardware docs, the only valid destination type of SEL
is DF if the execution type is DF, and F (or HF on CHV+) if the
execution type is F.  Integer 16 or 32-bit execution types seem to allow
converting to other single-precision integer types.

I'm not sure why we haven't seen this cause massive breakage on HSW+,
but I think we need some sort of legalization pass to do the type
conversion in a separate MOV for any instruction with an invalid
destination conversion.  You may be able to do it within the d2x pass
but then it would make sense to rename it to something more appropriate
like destination conversion lowering (lower_cvt? :P).

In addition there seem to be other issues with your d2x lowering code
(I'm bringing this up here because you don't seem to have sent the d2x
changes for review to the mailing list yet) -- It removes the original
instruction and creates a new one with the same opcode and first few
sources, which will miscompile the program if the original instruction
had a larger number of sources or any other instruction controls set.  I
suggest you modify the original instruction in-place to point it to the
temporary you've allocated, and then copy the data into the original
destination by using a strided move.

> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 ++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index ea4a3fe1399..0e2dbe23571 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
>        brw_MOV(p, dst, reg);
>     } else {
>        /* Prior to Broadwell, there are only 8 address registers. */
> -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
>  
>        /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
>        struct brw_reg addr = vec8(brw_address_reg(0));
> @@ -478,7 +478,30 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
>         * code, using it saves us 0 instructions and would require quite a bit
>         * of case-by-case work.  It's just not worth it.
>         */
> -      brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset));
> +      if (devinfo->gen >= 8 || devinfo->is_haswell || type_sz(reg.type) < 8) {
> +         brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset));
> +      } else {
> +         /* IVB reads two address register components per channel for
> +          * indirectly addressed 64-bit sources, so we need to initialize
> +          * adjacent address components to consecutive dwords of the source
> +          * region by emitting two separate ADD instructions.  Found
> +          * empirically.
> +          */
> +         assert(inst->exec_size <= 4);
> +         brw_push_insn_state(p);
> +         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> +
> +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
> +                 brw_imm_uw(imm_byte_offset));
> +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
> +
> +         brw_ADD(p, spread(suboffset(addr, 1), 2), indirect_byte_offset,
> +                 brw_imm_uw(imm_byte_offset + 4));
> +         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
> +
> +         brw_pop_insn_state(p);
> +      }
> +
>        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
>  
>        brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));
> -- 
> 2.11.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, 2017-02-09 at 18:28 -0800, Francisco Jerez wrote:
> Francisco Jerez <currojerez@riseup.net> writes:
> 
> > ---
> > This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
> > doubles is not supported in IVB/BYT".
> > 
> 
> Note that some of the fp64 indirect addressing test-cases still fail
> on
> IVB even with this patch applied, but the reason doesn't seem to have
> anything to do with indirect addressing.  Apparently the remaining
> failures are caused by illegal code like:
> 
> > sel(8)          g30<1>D         g9<0,2,1>DF     g8.3<0,2,1>DF   {
> > align1 1Q };
> 
> The problem is that the SEL instruction doesn't support most datatype
> conversions, so this leads to data corruption on at least IVB.
> According to the hardware docs, the only valid destination type of
> SEL
> is DF if the execution type is DF, and F (or HF on CHV+) if the
> execution type is F.  Integer 16 or 32-bit execution types seem to
> allow
> converting to other single-precision integer types.
> 

Some weeks ago I saw this issue while working in a solution for IVB's
MOV INDIRECT. That SEL is added by opt_peephole_sel():

https://lists.freedesktop.org/archives/mesa-dev/2017-January/142023.htm
l

However,  I did it very restrictive by just don't allowing any
conversion between different data type sizes. I am going to improve
that patch to include all the allowed cases.

> I'm not sure why we haven't seen this cause massive breakage on HSW+,
> but I think we need some sort of legalization pass to do the type
> conversion in a separate MOV for any instruction with an invalid
> destination conversion.  You may be able to do it within the d2x pass
> but then it would make sense to rename it to something more
> appropriate
> like destination conversion lowering (lower_cvt? :P).
> 

:)

> In addition there seem to be other issues with your d2x lowering code
> (I'm bringing this up here because you don't seem to have sent the
> d2x
> changes for review to the mailing list yet) -- It removes the
> original
> instruction and creates a new one with the same opcode and first few
> sources, which will miscompile the program if the original
> instruction
> had a larger number of sources or any other instruction controls
> set.  I
> suggest you modify the original instruction in-place to point it to
> the
> temporary you've allocated, and then copy the data into the original
> destination by using a strided move.
> 

OK.

Thanks,

Sam

> > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
> > ++++++++++++++++++++++++--
> >  1 file changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index ea4a3fe1399..0e2dbe23571 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
> > *inst,
> >        brw_MOV(p, dst, reg);
> >     } else {
> >        /* Prior to Broadwell, there are only 8 address registers.
> > */
> > -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
> >  
> >        /* We use VxH indirect addressing, clobbering a0.0 through
> > a0.7. */
> >        struct brw_reg addr = vec8(brw_address_reg(0));
> > @@ -478,7 +478,30 @@ fs_generator::generate_mov_indirect(fs_inst
> > *inst,
> >         * code, using it saves us 0 instructions and would require
> > quite a bit
> >         * of case-by-case work.  It's just not worth it.
> >         */
> > -      brw_ADD(p, addr, indirect_byte_offset,
> > brw_imm_uw(imm_byte_offset));
> > +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
> > type_sz(reg.type) < 8) {
> > +         brw_ADD(p, addr, indirect_byte_offset,
> > brw_imm_uw(imm_byte_offset));
> > +      } else {
> > +         /* IVB reads two address register components per channel
> > for
> > +          * indirectly addressed 64-bit sources, so we need to
> > initialize
> > +          * adjacent address components to consecutive dwords of
> > the source
> > +          * region by emitting two separate ADD
> > instructions.  Found
> > +          * empirically.
> > +          */
> > +         assert(inst->exec_size <= 4);
> > +         brw_push_insn_state(p);
> > +         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> > +
> > +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
> > +                 brw_imm_uw(imm_byte_offset));
> > +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
> > +
> > +         brw_ADD(p, spread(suboffset(addr, 1), 2),
> > indirect_byte_offset,
> > +                 brw_imm_uw(imm_byte_offset + 4));
> > +         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
> > +
> > +         brw_pop_insn_state(p);
> > +      }
> > +
> >        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
> >  
> >        brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));
> > -- 
> > 2.11.0
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, 2017-02-09 at 10:16 -0800, Francisco Jerez wrote:
> ---
> This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
> doubles is not supported in IVB/BYT".
> 
> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
> ++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index ea4a3fe1399..0e2dbe23571 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
> *inst,
>        brw_MOV(p, dst, reg);
>     } else {
>        /* Prior to Broadwell, there are only 8 address registers. */
> -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
>  
>        /* We use VxH indirect addressing, clobbering a0.0 through
> a0.7. */
>        struct brw_reg addr = vec8(brw_address_reg(0));
> @@ -478,7 +478,30 @@ fs_generator::generate_mov_indirect(fs_inst
> *inst,
>         * code, using it saves us 0 instructions and would require
> quite a bit
>         * of case-by-case work.  It's just not worth it.
>         */
> -      brw_ADD(p, addr, indirect_byte_offset,
> brw_imm_uw(imm_byte_offset));
> +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
> type_sz(reg.type) < 8) {
> +         brw_ADD(p, addr, indirect_byte_offset,
> brw_imm_uw(imm_byte_offset));
> +      } else {
> +         /* IVB reads two address register components per channel
> for
> +          * indirectly addressed 64-bit sources, so we need to
> initialize
> +          * adjacent address components to consecutive dwords of the
> source
> +          * region by emitting two separate ADD instructions.  Found
> +          * empirically.
> +          */
> +         assert(inst->exec_size <= 4);
> +         brw_push_insn_state(p);
> +         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
> +
> +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
> +                 brw_imm_uw(imm_byte_offset));
> +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
> +
> +         brw_ADD(p, spread(suboffset(addr, 1), 2),
> indirect_byte_offset,
> +                 brw_imm_uw(imm_byte_offset + 4));
> +         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
> +
> +         brw_pop_insn_state(p);
> +      }
> +

Oh, you did this change in the generator. The solution I was developing
last weeks was to do this in a lowering pass. Your change is better!

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

I am going to add it to my rc3 branch.

Thanks!

Sam

>        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
>  
>        brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));
Samuel Iglesias Gonsálvez <siglesias@igalia.com> writes:

> On Thu, 2017-02-09 at 18:28 -0800, Francisco Jerez wrote:
>> Francisco Jerez <currojerez@riseup.net> writes:
>> 
>> > ---
>> > This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
>> > doubles is not supported in IVB/BYT".
>> > 
>> 
>> Note that some of the fp64 indirect addressing test-cases still fail
>> on
>> IVB even with this patch applied, but the reason doesn't seem to have
>> anything to do with indirect addressing.  Apparently the remaining
>> failures are caused by illegal code like:
>> 
>> > sel(8)          g30<1>D         g9<0,2,1>DF     g8.3<0,2,1>DF   {
>> > align1 1Q };
>> 
>> The problem is that the SEL instruction doesn't support most datatype
>> conversions, so this leads to data corruption on at least IVB.
>> According to the hardware docs, the only valid destination type of
>> SEL
>> is DF if the execution type is DF, and F (or HF on CHV+) if the
>> execution type is F.  Integer 16 or 32-bit execution types seem to
>> allow
>> converting to other single-precision integer types.
>> 
>
> Some weeks ago I saw this issue while working in a solution for IVB's
> MOV INDIRECT. That SEL is added by opt_peephole_sel():
>
> https://lists.freedesktop.org/archives/mesa-dev/2017-January/142023.htm
> l
>
> However,  I did it very restrictive by just don't allowing any
> conversion between different data type sizes. I am going to improve
> that patch to include all the allowed cases.
>

I wouldn't worry too much about the couple of supported type
conversions, it doesn't seem terribly important, my concern with your
patch is that it does seem to allow several type conversions which are
invalid, like D->F, F->D, DF->Q, etc.

Aside from that I'm not particularly confident that your patch will fix
all potential sources of invalid conversions.  E.g. do we know that no
other optimization pass will ever change the destination type of the SEL
assuming that the initial type was correct?  What about other
instructions with double-to-single or single-to-double type conversion
restrictions? (A quick look at the PRMs would likely reveal instructions
with similar restrictions) -- I don't think we can answer these
questions without auditing the rest of the compiler back-end (while
doing this in a legalization pass would be pretty much obviously
correct).  Alternatively you could fix the EU validator to recognize
this and other cases of invalid conversions, which would make sure we
get an assertion failure in the likely case that we've missed something
so we can find the problem quickly instead of spending hours debugging
non-deterministic data corruption issues.  I'm okay either way,
validation or legalization pass, do it as you like.

>> I'm not sure why we haven't seen this cause massive breakage on HSW+,
>> but I think we need some sort of legalization pass to do the type
>> conversion in a separate MOV for any instruction with an invalid
>> destination conversion.  You may be able to do it within the d2x pass
>> but then it would make sense to rename it to something more
>> appropriate
>> like destination conversion lowering (lower_cvt? :P).
>> 
>
> :)
>
>> In addition there seem to be other issues with your d2x lowering code
>> (I'm bringing this up here because you don't seem to have sent the
>> d2x
>> changes for review to the mailing list yet) -- It removes the
>> original
>> instruction and creates a new one with the same opcode and first few
>> sources, which will miscompile the program if the original
>> instruction
>> had a larger number of sources or any other instruction controls
>> set.  I
>> suggest you modify the original instruction in-place to point it to
>> the
>> temporary you've allocated, and then copy the data into the original
>> destination by using a strided move.
>> 
>
> OK.
>
> Thanks,
>
> Sam
>
>> > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
>> > ++++++++++++++++++++++++--
>> >  1 file changed, 25 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > index ea4a3fe1399..0e2dbe23571 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> > @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
>> > *inst,
>> >        brw_MOV(p, dst, reg);
>> >     } else {
>> >        /* Prior to Broadwell, there are only 8 address registers.
>> > */
>> > -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
>> > +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
>> >  
>> >        /* We use VxH indirect addressing, clobbering a0.0 through
>> > a0.7. */
>> >        struct brw_reg addr = vec8(brw_address_reg(0));
>> > @@ -478,7 +478,30 @@ fs_generator::generate_mov_indirect(fs_inst
>> > *inst,
>> >         * code, using it saves us 0 instructions and would require
>> > quite a bit
>> >         * of case-by-case work.  It's just not worth it.
>> >         */
>> > -      brw_ADD(p, addr, indirect_byte_offset,
>> > brw_imm_uw(imm_byte_offset));
>> > +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
>> > type_sz(reg.type) < 8) {
>> > +         brw_ADD(p, addr, indirect_byte_offset,
>> > brw_imm_uw(imm_byte_offset));
>> > +      } else {
>> > +         /* IVB reads two address register components per channel
>> > for
>> > +          * indirectly addressed 64-bit sources, so we need to
>> > initialize
>> > +          * adjacent address components to consecutive dwords of
>> > the source
>> > +          * region by emitting two separate ADD
>> > instructions.  Found
>> > +          * empirically.
>> > +          */
>> > +         assert(inst->exec_size <= 4);
>> > +         brw_push_insn_state(p);
>> > +         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
>> > +
>> > +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
>> > +                 brw_imm_uw(imm_byte_offset));
>> > +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
>> > +
>> > +         brw_ADD(p, spread(suboffset(addr, 1), 2),
>> > indirect_byte_offset,
>> > +                 brw_imm_uw(imm_byte_offset + 4));
>> > +         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
>> > +
>> > +         brw_pop_insn_state(p);
>> > +      }
>> > +
>> >        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
>> >  
>> >        brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));
>> > -- 
>> > 2.11.0
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Samuel Iglesias Gonsálvez <siglesias@igalia.com> writes:

> On Thu, 2017-02-09 at 10:16 -0800, Francisco Jerez wrote:
>> ---
>> This replaces "[PATCH v2 09/20] i965/fs: indirect addressing with
>> doubles is not supported in IVB/BYT".
>> 
>> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
>> ++++++++++++++++++++++++--
>>  1 file changed, 25 insertions(+), 2 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index ea4a3fe1399..0e2dbe23571 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
>> *inst,
>>        brw_MOV(p, dst, reg);
>>     } else {
>>        /* Prior to Broadwell, there are only 8 address registers. */
>> -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
>> +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
>>  
>>        /* We use VxH indirect addressing, clobbering a0.0 through
>> a0.7. */
>>        struct brw_reg addr = vec8(brw_address_reg(0));
>> @@ -478,7 +478,30 @@ fs_generator::generate_mov_indirect(fs_inst
>> *inst,
>>         * code, using it saves us 0 instructions and would require
>> quite a bit
>>         * of case-by-case work.  It's just not worth it.
>>         */
>> -      brw_ADD(p, addr, indirect_byte_offset,
>> brw_imm_uw(imm_byte_offset));
>> +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
>> type_sz(reg.type) < 8) {
>> +         brw_ADD(p, addr, indirect_byte_offset,
>> brw_imm_uw(imm_byte_offset));
>> +      } else {
>> +         /* IVB reads two address register components per channel
>> for
>> +          * indirectly addressed 64-bit sources, so we need to
>> initialize
>> +          * adjacent address components to consecutive dwords of the
>> source
>> +          * region by emitting two separate ADD instructions.  Found
>> +          * empirically.
>> +          */
>> +         assert(inst->exec_size <= 4);
>> +         brw_push_insn_state(p);
>> +         brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
>> +
>> +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
>> +                 brw_imm_uw(imm_byte_offset));
>> +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst, true);
>> +
>> +         brw_ADD(p, spread(suboffset(addr, 1), 2),
>> indirect_byte_offset,
>> +                 brw_imm_uw(imm_byte_offset + 4));
>> +         brw_inst_set_no_dd_check(devinfo, brw_last_inst, true);
>> +
>> +         brw_pop_insn_state(p);
>> +      }
>> +
>
> Oh, you did this change in the generator. The solution I was developing
> last weeks was to do this in a lowering pass. Your change is better!
>

I don't think that doing it at the IR level would have worked...
MOV_INDIRECT makes assumptions about the components of the index source
being valid channel-by-channel, so you'd have to mark it as
force_writemask_all which would have brought in even more problems...

> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
>
> I am going to add it to my rc3 branch.
>
> Thanks!
>
Thank you.

> Sam
>
>>        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
>>  
>>        brw_inst *mov = brw_MOV(p, dst, retype(ind_src, dst.type));
On Fri, 2017-02-10 at 10:44 -0800, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez <siglesias@igalia.com> writes:
> 
> > On Thu, 2017-02-09 at 18:28 -0800, Francisco Jerez wrote:
> > > Francisco Jerez <currojerez@riseup.net> writes:
> > > 
> > > > ---
> > > > This replaces "[PATCH v2 09/20] i965/fs: indirect addressing
> > > > with
> > > > doubles is not supported in IVB/BYT".
> > > > 
> > > 
> > > Note that some of the fp64 indirect addressing test-cases still
> > > fail
> > > on
> > > IVB even with this patch applied, but the reason doesn't seem to
> > > have
> > > anything to do with indirect addressing.  Apparently the
> > > remaining
> > > failures are caused by illegal code like:
> > > 
> > > > sel(8)          g30<1>D         g9<0,2,1>DF     g8.3<0,2,1>DF  
> > > >  {
> > > > align1 1Q };
> > > 
> > > The problem is that the SEL instruction doesn't support most
> > > datatype
> > > conversions, so this leads to data corruption on at least IVB.
> > > According to the hardware docs, the only valid destination type
> > > of
> > > SEL
> > > is DF if the execution type is DF, and F (or HF on CHV+) if the
> > > execution type is F.  Integer 16 or 32-bit execution types seem
> > > to
> > > allow
> > > converting to other single-precision integer types.
> > > 
> > 
> > Some weeks ago I saw this issue while working in a solution for
> > IVB's
> > MOV INDIRECT. That SEL is added by opt_peephole_sel():
> > 
> > https://lists.freedesktop.org/archives/mesa-dev/2017-January/142023
> > .htm
> > l
> > 
> > However,  I did it very restrictive by just don't allowing any
> > conversion between different data type sizes. I am going to improve
> > that patch to include all the allowed cases.
> > 
> 
> I wouldn't worry too much about the couple of supported type
> conversions, it doesn't seem terribly important, my concern with your
> patch is that it does seem to allow several type conversions which
> are
> invalid, like D->F, F->D, DF->Q, etc.
> 

Right. I can just disallowing all the conversions (src.type !=
dst.type) in that pass... but I prefer the solutions suggested below.

> Aside from that I'm not particularly confident that your patch will
> fix
> all potential sources of invalid conversions.  E.g. do we know that
> no
> other optimization pass will ever change the destination type of the
> SEL
> assuming that the initial type was correct?  What about other
> instructions with double-to-single or single-to-double type
> conversion
> restrictions? (A quick look at the PRMs would likely reveal
> instructions
> with similar restrictions) -- I don't think we can answer these
> questions without auditing the rest of the compiler back-end (while
> doing this in a legalization pass would be pretty much obviously
> correct).  Alternatively you could fix the EU validator to recognize
> this and other cases of invalid conversions, which would make sure we
> get an assertion failure in the likely case that we've missed
> something
> so we can find the problem quickly instead of spending hours
> debugging
> non-deterministic data corruption issues.  I'm okay either way,
> validation or legalization pass, do it as you like.
> 

OK, thanks.

Sam

> > > I'm not sure why we haven't seen this cause massive breakage on
> > > HSW+,
> > > but I think we need some sort of legalization pass to do the type
> > > conversion in a separate MOV for any instruction with an invalid
> > > destination conversion.  You may be able to do it within the d2x
> > > pass
> > > but then it would make sense to rename it to something more
> > > appropriate
> > > like destination conversion lowering (lower_cvt? :P).
> > > 
> > 
> > :)
> > 
> > > In addition there seem to be other issues with your d2x lowering
> > > code
> > > (I'm bringing this up here because you don't seem to have sent
> > > the
> > > d2x
> > > changes for review to the mailing list yet) -- It removes the
> > > original
> > > instruction and creates a new one with the same opcode and first
> > > few
> > > sources, which will miscompile the program if the original
> > > instruction
> > > had a larger number of sources or any other instruction controls
> > > set.  I
> > > suggest you modify the original instruction in-place to point it
> > > to
> > > the
> > > temporary you've allocated, and then copy the data into the
> > > original
> > > destination by using a strided move.
> > > 
> > 
> > OK.
> > 
> > Thanks,
> > 
> > Sam
> > 
> > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27
> > > > ++++++++++++++++++++++++--
> > > >  1 file changed, 25 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > index ea4a3fe1399..0e2dbe23571 100644
> > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > > > @@ -440,7 +440,7 @@ fs_generator::generate_mov_indirect(fs_inst
> > > > *inst,
> > > >        brw_MOV(p, dst, reg);
> > > >     } else {
> > > >        /* Prior to Broadwell, there are only 8 address
> > > > registers.
> > > > */
> > > > -      assert(inst->exec_size == 8 || devinfo->gen >= 8);
> > > > +      assert(inst->exec_size <= 8 || devinfo->gen >= 8);
> > > >  
> > > >        /* We use VxH indirect addressing, clobbering a0.0
> > > > through
> > > > a0.7. */
> > > >        struct brw_reg addr = vec8(brw_address_reg(0));
> > > > @@ -478,7 +478,30 @@
> > > > fs_generator::generate_mov_indirect(fs_inst
> > > > *inst,
> > > >         * code, using it saves us 0 instructions and would
> > > > require
> > > > quite a bit
> > > >         * of case-by-case work.  It's just not worth it.
> > > >         */
> > > > -      brw_ADD(p, addr, indirect_byte_offset,
> > > > brw_imm_uw(imm_byte_offset));
> > > > +      if (devinfo->gen >= 8 || devinfo->is_haswell ||
> > > > type_sz(reg.type) < 8) {
> > > > +         brw_ADD(p, addr, indirect_byte_offset,
> > > > brw_imm_uw(imm_byte_offset));
> > > > +      } else {
> > > > +         /* IVB reads two address register components per
> > > > channel
> > > > for
> > > > +          * indirectly addressed 64-bit sources, so we need to
> > > > initialize
> > > > +          * adjacent address components to consecutive dwords
> > > > of
> > > > the source
> > > > +          * region by emitting two separate ADD
> > > > instructions.  Found
> > > > +          * empirically.
> > > > +          */
> > > > +         assert(inst->exec_size <= 4);
> > > > +         brw_push_insn_state(p);
> > > > +         brw_set_default_exec_size(p, cvt(inst->exec_size) -
> > > > 1);
> > > > +
> > > > +         brw_ADD(p, spread(addr, 2), indirect_byte_offset,
> > > > +                 brw_imm_uw(imm_byte_offset));
> > > > +         brw_inst_set_no_dd_clear(devinfo, brw_last_inst,
> > > > true);
> > > > +
> > > > +         brw_ADD(p, spread(suboffset(addr, 1), 2),
> > > > indirect_byte_offset,
> > > > +                 brw_imm_uw(imm_byte_offset + 4));
> > > > +         brw_inst_set_no_dd_check(devinfo, brw_last_inst,
> > > > true);
> > > > +
> > > > +         brw_pop_insn_state(p);
> > > > +      }
> > > > +
> > > >        struct brw_reg ind_src = brw_VxH_indirect(0, 0);
> > > >  
> > > >        brw_inst *mov = brw_MOV(p, dst, retype(ind_src,
> > > > dst.type));
> > > > -- 
> > > > 2.11.0
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev