[2/2] intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5.

Submitted by Kenneth Graunke on Aug. 3, 2018, 1:51 a.m.

Details

Message ID 20180803015128.3463-2-kenneth@whitecape.org
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Kenneth Graunke Aug. 3, 2018, 1:51 a.m.
When the SIMD16 Gen4-5 fragment shader payload contains source depth
(g2-3), destination stencil (g4), and destination depth (g5-6), the
single register of stencil makes the destination depth unaligned.

We were generating this instruction in the RT write payload setup:

   mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };

which is illegal, instructions with a source region spanning more than
one register need to be aligned to even registers.  This is because the
hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
compressed instruction into two mov(8)'s.

I believe this would cause the hardware to load g5 twice, replicating
subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
artifact blocks in both TIS-100 and Reicast.

Normally, we rely on the register allocator to even-align our virtual
GRFs.  But we don't control the payload, so we need to lower SIMD widths
to make it work.  To fix this, we teach lower_simd_width about the
restriction, and then call it again after lower_load_payload (which is
what generates the offending MOV).

Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
Tested-by: Diego Viola <diego.viola@gmail.com>
---
 src/intel/compiler/brw_fs.cpp | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 20b89035e1f..78dafd9b706 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5115,6 +5115,25 @@  get_fpu_lowered_simd_width(const struct gen_device_info *devinfo,
       }
    }
 
+   if (devinfo->gen < 6) {
+      /* From the G45 PRM, Volume 4 Page 361:
+       *
+       *    "Operand Alignment Rule: With the exceptions listed below, a
+       *     source/destination operand in general should be aligned to even
+       *     256-bit physical register with a region size equal to two 256-bit
+       *     physical registers."
+       *
+       * Normally we enforce this by allocating virtual registers to the
+       * even-aligned class.  But we need to handle payload registers.
+       */
+      for (unsigned i = 0; i < inst->sources; i++) {
+         if (inst->src[i].file == FIXED_GRF && (inst->src[i].nr & 1) &&
+             inst->size_read(i) >= 2 * REG_SIZE) {
+            max_width = MIN2(max_width, 8);
+         }
+      }
+   }
+
    /* From the IVB PRMs:
     *  "When an instruction is SIMD32, the low 16 bits of the execution mask
     *   are applied for both halves of the SIMD32 instruction. If different
@@ -6321,6 +6340,7 @@  fs_visitor::optimize()
    if (OPT(lower_load_payload)) {
       split_virtual_grfs();
       OPT(register_coalesce);
+      OPT(lower_simd_width);
       OPT(compute_to_mrf);
       OPT(dead_code_eliminate);
    }

Comments

On Thu, Aug 2, 2018 at 6:52 PM Kenneth Graunke <kenneth@whitecape.org>
wrote:

> When the SIMD16 Gen4-5 fragment shader payload contains source depth
> (g2-3), destination stencil (g4), and destination depth (g5-6), the
> single register of stencil makes the destination depth unaligned.
>
> We were generating this instruction in the RT write payload setup:
>
>    mov(16)   m14<1>F   g5<8,8,1>F   { align1 compr };
>
> which is illegal, instructions with a source region spanning more than
> one register need to be aligned to even registers.  This is because the
> hardware implicitly does (nr | 1) instead of (nr + 1) when splitting the
> compressed instruction into two mov(8)'s.
>

This seems like something worth validating.  Also, gen4 so maybe meh?

One nit below.  With that fixed and with or without validation, both are

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>


> I believe this would cause the hardware to load g5 twice, replicating
> subspan 0-1's destination depth to subspan 2-3.  This showed up as 2x2
> artifact blocks in both TIS-100 and Reicast.
>
> Normally, we rely on the register allocator to even-align our virtual
> GRFs.  But we don't control the payload, so we need to lower SIMD widths
> to make it work.  To fix this, we teach lower_simd_width about the
> restriction, and then call it again after lower_load_payload (which is
> what generates the offending MOV).
>
> Fixes: 8aee87fe4cce0a883867df3546db0e0a36908086 (i965: Use SIMD16 instead
> of SIMD8 on Gen4 when possible.)
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728
> Tested-by: Diego Viola <diego.viola@gmail.com>
> ---
>  src/intel/compiler/brw_fs.cpp | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 20b89035e1f..78dafd9b706 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -5115,6 +5115,25 @@ get_fpu_lowered_simd_width(const struct
> gen_device_info *devinfo,
>        }
>     }
>
> +   if (devinfo->gen < 6) {
> +      /* From the G45 PRM, Volume 4 Page 361:
> +       *
> +       *    "Operand Alignment Rule: With the exceptions listed below, a
> +       *     source/destination operand in general should be aligned to
> even
> +       *     256-bit physical register with a region size equal to two
> 256-bit
> +       *     physical registers."
> +       *
> +       * Normally we enforce this by allocating virtual registers to the
> +       * even-aligned class.  But we need to handle payload registers.
> +       */
> +      for (unsigned i = 0; i < inst->sources; i++) {
> +         if (inst->src[i].file == FIXED_GRF && (inst->src[i].nr & 1) &&
> +             inst->size_read(i) >= 2 * REG_SIZE) {
>

How about just inst->size_read(i) > REG_SIZE so that it's "more than one"
rather than "at least two"?


> +            max_width = MIN2(max_width, 8);
> +         }
> +      }
> +   }
> +
>     /* From the IVB PRMs:
>      *  "When an instruction is SIMD32, the low 16 bits of the execution
> mask
>      *   are applied for both halves of the SIMD32 instruction. If
> different
> @@ -6321,6 +6340,7 @@ fs_visitor::optimize()
>     if (OPT(lower_load_payload)) {
>        split_virtual_grfs();
>        OPT(register_coalesce);
> +      OPT(lower_simd_width);
>        OPT(compute_to_mrf);
>        OPT(dead_code_eliminate);
>     }
> --
> 2.18.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>