[Mesa-dev,49/95] i965/vec4: implement access to DF source components Z/W

Submitted by Iago Toral Quiroga on July 19, 2016, 10:40 a.m.

Details

Message ID 1468924892-6910-50-git-send-email-itoral@igalia.com
State New
Headers show
Series "i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0" ( rev: 2 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga July 19, 2016, 10:40 a.m.
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the middle of the SIMD register and use X/Y swizzles.

The above, however, has the caveat that we can't do that without
violating register region restrictions unless we probably do some
sort of SIMD splitting.

Alternatively, we can accomplish what we need without SIMD splitting
by exploiting the gen7 hardware decompression bug for instructions
with a vstride=0. For example, an instruction like this:

mov(8) r2.x:DF r0.2<0>xyzw:DF

Activates the hardware bug and produces this region:

Component: x0   y0   z0   w0   x1   y1   z1   w1
Register:  r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3

Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2
execution and r1.2 and r1.3 are the same for the second vertex.

Using this to our advantage we can select r0.z:DF by doing
r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing
to split the instruction.

This patch makes makes the swizzle translation pass handle Z/W
swizzles by turning them into X/Y respectively and setting subnr
to point at the middle of the register together with a flag that
indicates that we want to use a vstride=0 with them. Then, when we
convert to hardware registers we check fo this flag and set the
vstride accordingly.

Of course, this only works for gen7, but that is the only hardware
platform were we implement align16/fp64a at the moment.

v2: Fix subnr for FIXED_GRF (Samuel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 42 +++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index bfbbd96..ea1e530 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1861,11 +1861,32 @@  vec4_visitor::convert_to_hw_regs()
             unsigned width = REG_SIZE / 2 / MAX2(4, type_size);
             reg = brw_vecn_grf(width, src.nr + src.reg_offset, 0);
             reg.type = src.type;
+            reg.subnr = src.subnr * type_size;
             reg.swizzle = src.swizzle;
             reg.abs = src.abs;
             reg.negate = src.negate;
             if (type_size == 8) {
-               reg.vstride = BRW_VERTICAL_STRIDE_2;
+               if (src.force_vstride0) {
+                  /* We use subnr to select components Z/W of DF operands using
+                   * X/Y swizzles. To do this we also need to set the vertical
+                   * stride to 0 so we don't violate register region
+                   * restrictions.
+                   *
+                   * In gen7, setting the vertical stride to 0 on compressed
+                   * instructions exploits a gen7 hardware hardware
+                   * decompression bug that allows us to select the second half
+                   * of a dvec4 for both vertices in a SIMD4x2 execution.
+                   *
+                   * FIXME: This only works for gen7. If we ever support
+                   * align16/fp64 in other hardware where we can't exploit this
+                   * bug we would also need to do appropriate SIMD splitting of
+                   * these instructions.
+                   */
+                  assert(devinfo->gen == 7);
+                  reg.vstride = BRW_VERTICAL_STRIDE_0;
+               } else {
+                  reg.vstride = BRW_VERTICAL_STRIDE_2;
+               }
             }
             break;
          }
@@ -2171,7 +2192,26 @@  vec4_visitor::expand_64bit_swizzle_to_32bit()
          /* This pass assumes that we have scalarized all DF instructions */
          assert(brw_is_single_value_swizzle(inst->src[arg].swizzle));
 
+         /* To gain access to Z/W components we need to use subnr to select
+          * the second half of the DF regiter and then use a X/Y swizzle to
+          * select Z/W respetively.
+          */
          unsigned swizzle = BRW_GET_SWZ(inst->src[arg].swizzle, 0);
+         if (swizzle >= 2) {
+            /* Uniforms work in units of a vec4, so to select the second
+             * half of a dvec3/4 uniform, increase reg_offset by one.
+             */
+            if (inst->src[arg].file != UNIFORM) {
+               inst->src[arg].subnr = 2;
+               /* Subnr must be in units of bytes for FIXED_GRF */
+               if (inst->src[arg].file == FIXED_GRF)
+                  inst->src[arg].subnr *= type_sz(inst->src[arg].type);
+               inst->src[arg].force_vstride0 = true;
+            } else {
+               inst->src[arg].reg_offset += 1;
+            }
+            swizzle -= 2;
+         }
          inst->src[arg].swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
                                                swizzle * 2, swizzle * 2 + 1);
          progress = true;