[Mesa-dev,55/95] i965/vec4: teach register coalescing about 64-bit

Submitted by Iago Toral Quiroga on July 19, 2016, 10:40 a.m.

Details

Message ID 1468924892-6910-56-git-send-email-itoral@igalia.com
State New
Headers show
Series "i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0" ( rev: 2 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga July 19, 2016, 10:40 a.m.
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.

Also, we should check that if we are coalescing a register from another
MOV we should be reading the same amount of data written by that MOV,
Otherwise it might not be safe to eliminate it. This can happen, for example,
when we have split fp64 MOVs with an exec size of 4 that only write one
register each and then a MOV with exec size of 8 that reads both. We want to
avoid the pass to think that it can coalesce from the first split MOV alone.
Ideally we would like the pass to see that it can coalesce from both split
MOVs instead, but for now we keep it simple.

Finally, the pass doesn't support coalescing of multiple registers but in the
case of normal SIMD4x2 double-precision instructions they naturally write two
registers (one per vertex) and there is no reason why we should not allow
coalescing in this case. Change the restriction to bail if we see instructions
that write more than 8 channels, where the channels can be 32-bit or 64-bit.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index a366548..1b190ab 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1181,6 +1181,19 @@  vec4_visitor::opt_register_coalesce()
                   scan_inst->dst.type == scan_inst->src[0].type))
                break;
 
+            /* Only allow coalescing between registers of the same type size.
+             * Otherwise we would need to make the pass aware of the fact that
+             * channel sizes are different for single and double precision.
+             */
+            if (type_sz(inst->src[0].type) != type_sz(scan_inst->src[0].type))
+               break;
+
+            /* Check that scan_inst writes the same amount of data that we read
+             * in the instruction
+             */
+            if (scan_inst->regs_written != inst->regs_read(0))
+               break;
+
             /* If we can't handle the swizzle, bail. */
             if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask,
                                           inst->src[0].swizzle,
@@ -1188,8 +1201,11 @@  vec4_visitor::opt_register_coalesce()
                break;
             }
 
-            /* This doesn't handle coalescing of multiple registers. */
-            if (scan_inst->regs_written > 1)
+            /* This doesn't handle coalescing writes larger than 8 channels
+             * (1 register for single-precision and two for double-precision)
+             */
+            if (DIV_ROUND_UP(REG_SIZE * scan_inst->regs_written,
+                             type_sz(scan_inst->dst.type)) > 8)
                break;
 
 	    /* Mark which channels we found unconditional writes for. */