[Mesa-dev,73/95] i965/vec4: set force_vstride0 on any 64-bit source that has subnr > 0

Submitted by Iago Toral Quiroga on July 19, 2016, 10:41 a.m.

Details

Message ID 1468924892-6910-74-git-send-email-itoral@igalia.com
State New
Headers show
Series "i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0" ( rev: 2 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga July 19, 2016, 10:41 a.m.
From: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

Sometimes we emit code that has subnr > 0 to select the second half
of a DF register (components Z or W). For example, the 64-bit
shuffling code does this. For that code to work properly we need to
make sure that that we use a vstride=0 on these source registers too
(thus, it should set the flag force_vstride0 on the source).

Instead of always having to remember that we need to force the vstride
to 0 in these cases it is better if we just do this here together with
the other cases where we need we set this flag. This way there is only
one place in the driver where we handle this.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 1332d96..9672b2c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2298,7 +2298,6 @@  vec4_visitor::expand_64bit_swizzle_to_32bit()
                /* Subnr must be in units of bytes for FIXED_GRF */
                if (inst->src[arg].file == FIXED_GRF)
                   inst->src[arg].subnr *= type_sz(inst->src[arg].type);
-               inst->src[arg].force_vstride0 = true;
             } else {
                inst->src[arg].reg_offset += 1;
             }
@@ -2311,6 +2310,18 @@  vec4_visitor::expand_64bit_swizzle_to_32bit()
                inst->src[arg].force_vstride0 = true;
             }
          }
+
+         /* Any DF source with a subnr > 0 is intended to address the second
+          * half of a register and needs a vertical stride of 0 so we:
+          *
+          * 1. Don't violate register region restrictions, when execsize > 2
+          *    (we only use exec sizes of 4 and 8, so always)
+          * 2. Activate the gen7 instruction decompresion bug exploit, when
+          *    execsize == 8.
+          */
+         if (inst->src[arg].subnr)
+            inst->src[arg].force_vstride0 = true;
+
          inst->src[arg].swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
                                                swizzle * 2, swizzle * 2 + 1);
          progress = true;