[Mesa-dev,2/2] i965/fs: Improve search for the argument source in opt_zero_samples

Submitted by Neil Roberts on Oct. 20, 2015, 9:16 a.m.

Details

Message ID 1445332561-4216-2-git-send-email-neil@linux.intel.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Neil Roberts Oct. 20, 2015, 9:16 a.m.
The opt_zero_samples instruction tries to find the corresponding
load_payload instruction for each sample instruction. However it was
previously only looking at the previous instruction. This patch makes
it search back within the block to whatever was the last instruction
to write to each individual argument to the send message. There are
two reasons two do this:

On Gen<=6 load_payload isn't used and there is a separate message
register file. This version of the optimisation also finds MOVs into
the MRF registers so it now also works on SNB. Unfortunately this
doesn't show up in a shader-db report because the dead code eliminator
doesn't do anything for instructions writing to MRF registers so it
can't remove the redundant MOVs. However if I hack Mesa to report the
message lengths instead of the instruction counts then it shows this:

total mlen in shared programs: 2600373 -> 2574663 (-0.99%)
mlen in affected programs:     237077 -> 211367 (-10.84%)
helped:                        3508
HURT:                          0

I haven't tested whether reducing the message length without
decreasing the instruction count is actually a performance benefit but
it's hard to imagine that it could possibly be a disadvantage. It also
paves the way to reduce the instruction count later if someone
improves the dead code eliminator.

Secondly it could help on other gens because sometimes the
load_payload instruction can become separated from the corresponding
send instruction and the old version wouldn't work in those cases.
Currently this doesn't seem to make any difference in practice because
the register coalescer is run after this optimisation. However it
seems like this version is more robust.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 54 ++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 12 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 97d7fd7..f87a5a7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2150,6 +2150,41 @@  fs_visitor::opt_algebraic()
    return progress;
 }
 
+static bool
+last_texture_source_is_zero(const fs_inst *send_inst)
+{
+   int reg_offset = send_inst->mlen - send_inst->exec_size / 8;
+   fs_reg src;
+
+   /* Get the last argument of the texture instruction */
+   if (send_inst->is_send_from_grf())
+      src = byte_offset(send_inst->src[0], reg_offset * 32);
+   else
+      src = fs_reg(MRF, send_inst->base_mrf + reg_offset);
+
+   /* Look for the last instruction that writes to the source */
+   foreach_inst_in_block_reverse_starting_from(const fs_inst, inst, send_inst) {
+      if (inst->overwrites_reg(src)) {
+         if (inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) {
+            const int src_num = ((send_inst->mlen - send_inst->header_size) /
+                                 (inst->exec_size / 8) +
+                                 inst->header_size - 1);
+            return inst->src[src_num].is_zero();
+         } else if (inst->opcode == BRW_OPCODE_MOV) {
+            if (inst->is_partial_write() || !inst->dst.equals(src))
+               return false;
+
+            return inst->src[0].is_zero();
+         }
+
+         /* Something unknown is writing to the src */
+         break;
+      }
+   }
+
+   return false;
+}
+
 /**
  * Optimize sample messages that have constant zero values for the trailing
  * texture coordinates. We can just reduce the message length for these
@@ -2173,12 +2208,6 @@  fs_visitor::opt_zero_samples()
       if (!inst->is_tex())
          continue;
 
-      fs_inst *load_payload = (fs_inst *) inst->prev;
-
-      if (load_payload->is_head_sentinel() ||
-          load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
-         continue;
-
       /* We don't want to remove the message header or the first parameter.
        * Removing the first parameter is not allowed, see the Haswell PRM
        * volume 7, page 149:
@@ -2186,12 +2215,13 @@  fs_visitor::opt_zero_samples()
        *     "Parameter 0 is required except for the sampleinfo message, which
        *      has no parameter 0"
        */
-      while (inst->mlen > inst->header_size + inst->exec_size / 8 &&
-             load_payload->src[(inst->mlen - inst->header_size) /
-                               (inst->exec_size / 8) +
-                               inst->header_size - 1].is_zero()) {
-         inst->mlen -= inst->exec_size / 8;
-         progress = true;
+      while (inst->mlen > inst->header_size + inst->exec_size / 8) {
+         if (last_texture_source_is_zero(inst)) {
+            inst->mlen -= inst->exec_size / 8;
+            progress = true;
+         } else {
+            break;
+         }
       }
    }