[2/5] intel/fs: Lower integer multiply correctly when destination stride equals 4.

Submitted by Francisco Jerez on Jan. 19, 2019, 12:08 a.m.

Details

Message ID 20190119000902.11597-2-currojerez@riseup.net
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Francisco Jerez Jan. 19, 2019, 12:08 a.m.
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.
---
 src/intel/compiler/brw_fs.cpp | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index f475b617df2..5768e0d6542 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3962,13 +3962,11 @@  fs_visitor::lower_integer_multiplication()
                 regions_overlap(inst->dst, inst->size_written,
                                 inst->src[0], inst->size_read(0)) ||
                 regions_overlap(inst->dst, inst->size_written,
-                                inst->src[1], inst->size_read(1))) {
+                                inst->src[1], inst->size_read(1)) ||
+                inst->dst.stride >= 4) {
                needs_mov = true;
-               /* Get a new VGRF but keep the same stride as inst->dst */
                low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
                             inst->dst.type);
-               low.stride = inst->dst.stride;
-               low.offset = inst->dst.offset % REG_SIZE;
             }
 
             /* Get a new VGRF but keep the same stride as inst->dst */

Comments

On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez@riseup.net>
wrote:

> Because the "low" temporary needs to be accessed with word type and
> twice the original stride, attempting to preserve the alignment of the
> original destination can potentially lead to instructions with illegal
> destination stride greater than four.  Because the CHV/BXT alignment
> restrictions are now being enforced by the regioning lowering pass run
> after lower_integer_multiplication(), there is no real need to
> preserve the original strides anymore.
>
> Note that this bug can be reproduced on stable branches, but
> back-porting would be non-trivial, because the fix relies on the
> regioning lowering pass recently introduced.
> ---
>  src/intel/compiler/brw_fs.cpp | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index f475b617df2..5768e0d6542 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>                  regions_overlap(inst->dst, inst->size_written,
>                                  inst->src[0], inst->size_read(0)) ||
>                  regions_overlap(inst->dst, inst->size_written,
> -                                inst->src[1], inst->size_read(1))) {
> +                                inst->src[1], inst->size_read(1)) ||
> +                inst->dst.stride >= 4) {
>

It would be nice to throw in a quick comment as to why we're adding a
temporary when stride >= 4.


>                 needs_mov = true;
> -               /* Get a new VGRF but keep the same stride as inst->dst */
>                 low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>                              inst->dst.type);
> -               low.stride = inst->dst.stride;
> -               low.offset = inst->dst.offset % REG_SIZE;
>              }
>
>              /* Get a new VGRF but keep the same stride as inst->dst */
> --
> 2.19.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
Jason Ekstrand <jason@jlekstrand.net> writes:

> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez@riseup.net>
> wrote:
>
>> Because the "low" temporary needs to be accessed with word type and
>> twice the original stride, attempting to preserve the alignment of the
>> original destination can potentially lead to instructions with illegal
>> destination stride greater than four.  Because the CHV/BXT alignment
>> restrictions are now being enforced by the regioning lowering pass run
>> after lower_integer_multiplication(), there is no real need to
>> preserve the original strides anymore.
>>
>> Note that this bug can be reproduced on stable branches, but
>> back-porting would be non-trivial, because the fix relies on the
>> regioning lowering pass recently introduced.
>> ---
>>  src/intel/compiler/brw_fs.cpp | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>> index f475b617df2..5768e0d6542 100644
>> --- a/src/intel/compiler/brw_fs.cpp
>> +++ b/src/intel/compiler/brw_fs.cpp
>> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>>                  regions_overlap(inst->dst, inst->size_written,
>>                                  inst->src[0], inst->size_read(0)) ||
>>                  regions_overlap(inst->dst, inst->size_written,
>> -                                inst->src[1], inst->size_read(1))) {
>> +                                inst->src[1], inst->size_read(1)) ||
>> +                inst->dst.stride >= 4) {
>>
>
> It would be nice to throw in a quick comment as to why we're adding a
> temporary when stride >= 4.
>

There seemed to be no pre-existing comment about any of the other
conditions to allocate a temporary, so I've added the following:

+ /* Get a new VGRF for the "low" 32x16-bit multiplication result if
+  * reusing the original destination is impossible due to hardware
+  * restrictions, source/destination overlap, or it being the null
+  * register.
+  */
                                      
>
>>                 needs_mov = true;
>> -               /* Get a new VGRF but keep the same stride as inst->dst */
>>                 low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>>                              inst->dst.type);
>> -               low.stride = inst->dst.stride;
>> -               low.offset = inst->dst.offset % REG_SIZE;
>>              }
>>
>>              /* Get a new VGRF but keep the same stride as inst->dst */
>> --
>> 2.19.2
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
On February 14, 2019 21:59:18 Francisco Jerez <currojerez@riseup.net> wrote:

> Jason Ekstrand <jason@jlekstrand.net> writes:
>
>> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez <currojerez@riseup.net>
>> wrote:
>>
>>> Because the "low" temporary needs to be accessed with word type and
>>> twice the original stride, attempting to preserve the alignment of the
>>> original destination can potentially lead to instructions with illegal
>>> destination stride greater than four.  Because the CHV/BXT alignment
>>> restrictions are now being enforced by the regioning lowering pass run
>>> after lower_integer_multiplication(), there is no real need to
>>> preserve the original strides anymore.
>>>
>>>
>>> Note that this bug can be reproduced on stable branches, but
>>> back-porting would be non-trivial, because the fix relies on the
>>> regioning lowering pass recently introduced.
>>> ---
>>> src/intel/compiler/brw_fs.cpp | 6 ++----
>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>>
>>> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
>>> index f475b617df2..5768e0d6542 100644
>>> --- a/src/intel/compiler/brw_fs.cpp
>>> +++ b/src/intel/compiler/brw_fs.cpp
>>> @@ -3962,13 +3962,11 @@ fs_visitor::lower_integer_multiplication()
>>>          regions_overlap(inst->dst, inst->size_written,
>>>                          inst->src[0], inst->size_read(0)) ||
>>>          regions_overlap(inst->dst, inst->size_written,
>>> -                                inst->src[1], inst->size_read(1))) {
>>> +                                inst->src[1], inst->size_read(1)) ||
>>> +                inst->dst.stride >= 4) {
>>
>> It would be nice to throw in a quick comment as to why we're adding a
>> temporary when stride >= 4.
>
> There seemed to be no pre-existing comment about any of the other
> conditions to allocate a temporary, so I've added the following:
>
> + /* Get a new VGRF for the "low" 32x16-bit multiplication result if
> +  * reusing the original destination is impossible due to hardware
> +  * restrictions, source/destination overlap, or it being the null
> +  * register.
> +  */

Works for me

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

>
>>
>>>         needs_mov = true;
>>> -               /* Get a new VGRF but keep the same stride as inst->dst */
>>>         low = fs_reg(VGRF, alloc.allocate(regs_written(inst)),
>>>                      inst->dst.type);
>>> -               low.stride = inst->dst.stride;
>>> -               low.offset = inst->dst.offset % REG_SIZE;
>>>      }
>>>
>>>
>>>      /* Get a new VGRF but keep the same stride as inst->dst */
>>> --
>>> 2.19.2
>>>
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev