[Mesa-dev,14/32] i965/fs: Fix register coalesce not to lose track of the second half of 16-wide moves.

Submitted by Francisco Jerez on Feb. 6, 2015, 2:42 p.m.

Details

Message ID 1423233792-11767-14-git-send-email-currojerez@riseup.net
State New
Headers show

Not browsing as part of any series.

Commit Message

Francisco Jerez Feb. 6, 2015, 2:42 p.m.
Fixes rewrite by the register coalesce pass of references to
individual halves of 16-wide coalesced registers.
---
 src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
index 09f0fad..2a26a46 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
@@ -211,9 +211,13 @@  fs_visitor::register_coalesce()
             continue;
          }
          reg_to_offset[offset] = inst->dst.reg_offset;
-         if (inst->src[0].width == 16)
-            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
          mov[offset] = inst;
+
+         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
+            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
+            mov[offset + 1] = inst;
+         }
+
          channels_remaining -= inst->regs_written;
       }
 

Comments

On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
> Fixes rewrite by the register coalesce pass of references to
> individual halves of 16-wide coalesced registers.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> index 09f0fad..2a26a46 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>              continue;
>           }
>           reg_to_offset[offset] = inst->dst.reg_offset;
> -         if (inst->src[0].width == 16)
> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>           mov[offset] = inst;
> +
> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
> +            mov[offset + 1] = inst;
> +         }
> +
>           channels_remaining -= inst->regs_written;
>        }
>
> --
> 2.1.3

I can believe it. It would help me to understand if we had an example
of a sequence of instructions that this code didn't handle properly.
Matt Turner <mattst88@gmail.com> writes:

> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>> Fixes rewrite by the register coalesce pass of references to
>> individual halves of 16-wide coalesced registers.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> index 09f0fad..2a26a46 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>>              continue;
>>           }
>>           reg_to_offset[offset] = inst->dst.reg_offset;
>> -         if (inst->src[0].width == 16)
>> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>           mov[offset] = inst;
>> +
>> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
>> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>> +            mov[offset + 1] = inst;
>> +         }
>> +
>>           channels_remaining -= inst->regs_written;
>>        }
>>
>> --
>> 2.1.3
>
> I can believe it. It would help me to understand if we had an example
> of a sequence of instructions that this code didn't handle properly.

The problem is in the "rewrite" phase of the register coalesce pass
(roughly lines 264-283).  It won't fix up instructions that reference
some specific offset of the coalesced register if mov[i] is NULL for
that offset, as is the case for the second half of a 16-wide move.  For
example:

| ADD (16) vgrf0:f, vgrf0:f, 1.0:f
| MOV (16) vgrf1:f, vgrf0:f
| MOV (8)  vgrf2:f, vgrf0+1:f { sechalf }

will get incorrectly register-coalesced into:

| ADD (16) vgrf1:f, vgrf1:f, 1.0:f
| MOV (8)  vgrf2:f, vgrf0+1:f { sechalf }
Francisco Jerez <currojerez@riseup.net> writes:

> Matt Turner <mattst88@gmail.com> writes:
>
>> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>>> Fixes rewrite by the register coalesce pass of references to
>>> individual halves of 16-wide coalesced registers.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> index 09f0fad..2a26a46 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>>>              continue;
>>>           }
>>>           reg_to_offset[offset] = inst->dst.reg_offset;
>>> -         if (inst->src[0].width == 16)
>>> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>>           mov[offset] = inst;
>>> +
>>> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
>>> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>> +            mov[offset + 1] = inst;
>>> +         }
>>> +
>>>           channels_remaining -= inst->regs_written;
>>>        }
>>>
>>> --
>>> 2.1.3
>>
>> I can believe it. It would help me to understand if we had an example
>> of a sequence of instructions that this code didn't handle properly.
>
> The problem is in the "rewrite" phase of the register coalesce pass
> (roughly lines 264-283).  It won't fix up instructions that reference
> some specific offset of the coalesced register if mov[i] is NULL for
> that offset, as is the case for the second half of a 16-wide move.  For
> example:
>
> | ADD (16) vgrf0:f, vgrf0:f, 1.0:f
> | MOV (16) vgrf1:f, vgrf0:f
> | MOV (8)  vgrf2:f, vgrf0+1:f { sechalf }
>
> will get incorrectly register-coalesced into:
>
> | ADD (16) vgrf1:f, vgrf1:f, 1.0:f
> | MOV (8)  vgrf2:f, vgrf0+1:f { sechalf }

Ping.  The SIMD lowering pass emits this kind of code so this will lead
to actual piglit regressions
(e.g. tests/spec/arb_shader_texture_lod/execution/glsl-fs-shadow2DGradARB-07.shader_test).
On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
> Fixes rewrite by the register coalesce pass of references to
> individual halves of 16-wide coalesced registers.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> index 09f0fad..2a26a46 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>              continue;
>           }
>           reg_to_offset[offset] = inst->dst.reg_offset;
> -         if (inst->src[0].width == 16)
> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>           mov[offset] = inst;
> +
> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {

Why are we not using "inst->regs_read(0) > 1"?

> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
> +            mov[offset + 1] = inst;
> +         }
> +
>           channels_remaining -= inst->regs_written;
>        }
>
> --
> 2.1.3
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Jason Ekstrand <jason@jlekstrand.net> writes:

> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>> Fixes rewrite by the register coalesce pass of references to
>> individual halves of 16-wide coalesced registers.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> index 09f0fad..2a26a46 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>>              continue;
>>           }
>>           reg_to_offset[offset] = inst->dst.reg_offset;
>> -         if (inst->src[0].width == 16)
>> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>           mov[offset] = inst;
>> +
>> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
>
> Why are we not using "inst->regs_read(0) > 1"?
>
Yeah, that would work too, and is certainly more readable, I'll change
it.

>> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>> +            mov[offset + 1] = inst;
>> +         }
>> +
>>           channels_remaining -= inst->regs_written;
>>        }
>>
>> --
>> 2.1.3
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Francisco Jerez <currojerez@riseup.net> writes:

> Jason Ekstrand <jason@jlekstrand.net> writes:
>
>> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>>> Fixes rewrite by the register coalesce pass of references to
>>> individual halves of 16-wide coalesced registers.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> index 09f0fad..2a26a46 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_register_coalesce.cpp
>>> @@ -211,9 +211,13 @@ fs_visitor::register_coalesce()
>>>              continue;
>>>           }
>>>           reg_to_offset[offset] = inst->dst.reg_offset;
>>> -         if (inst->src[0].width == 16)
>>> -            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>>           mov[offset] = inst;
>>> +
>>> +         if (inst->exec_size * type_sz(inst->src[0].type) > REG_SIZE) {
>>
>> Why are we not using "inst->regs_read(0) > 1"?
>>
> Yeah, that would work too, and is certainly more readable, I'll change
> it.
>
Actually upon re-reading the register coalesce pass I just cannot make
sense of the rewrite loop at the bottom...  Why does it need to loop for
each sub-register of the VGRF once for each instruction?  Wouldn't some
simple arithmetic be sufficient to find out whether a source or
destination overlaps with the coalesced register?  And why does it even
check that the movs are non-NULL?  Isn't it guaranteed because of the
previous checks that the whole register has been copied?

Bah...  How about we drop this patch (except for the line that changes
the condition in the if statement -- that was probably a bug of its own)
and fix the rewrite loop instead?

>>> +            reg_to_offset[offset + 1] = inst->dst.reg_offset + 1;
>>> +            mov[offset + 1] = inst;
>>> +         }
>>> +
>>>           channels_remaining -= inst->regs_written;
>>>        }
>>>
>>> --
>>> 2.1.3
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev