[v3,26/42] intel/compiler: split is_partial_write() into two variants

Submitted by Iago Toral Quiroga on Jan. 15, 2019, 1:53 p.m.

Details

Message ID 20190115135414.2313-27-itoral@igalia.com
State New
Headers show
Series "intel: VK_KHR_shader_float16_int8 implementation" ( rev: 5 4 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga Jan. 15, 2019, 1:53 p.m.
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |

Patch hide | download patch | download mbox

================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
---
 src/intel/compiler/brw_fs.cpp                 | 31 +++++++++++++++----
 .../compiler/brw_fs_cmod_propagation.cpp      | 20 ++++++------
 .../compiler/brw_fs_copy_propagation.cpp      |  8 ++---
 src/intel/compiler/brw_fs_cse.cpp             |  3 +-
 .../compiler/brw_fs_dead_code_eliminate.cpp   |  2 +-
 src/intel/compiler/brw_fs_live_variables.cpp  |  2 +-
 src/intel/compiler/brw_fs_reg_allocate.cpp    |  2 +-
 .../compiler/brw_fs_register_coalesce.cpp     |  2 +-
 .../compiler/brw_fs_saturate_propagation.cpp  |  7 +++--
 src/intel/compiler/brw_fs_sel_peephole.cpp    |  4 +--
 src/intel/compiler/brw_ir_fs.h                |  3 +-
 11 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index d6096cd667d..77c955ac435 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -716,14 +716,33 @@  fs_visitor::limit_dispatch_width(unsigned n, const char *msg)
  * it.
  */
 bool
-fs_inst::is_partial_write() const
+fs_inst::is_partial_reg_write() const
 {
    return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
-           (this->exec_size * type_sz(this->dst.type)) < 32 ||
            !this->dst.is_contiguous() ||
+           (this->exec_size * type_sz(this->dst.type)) < REG_SIZE ||
            this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns true if the instruction has a flag that means it won't
+ * update an entire variable for the given dispatch width.
+ *
+ * This is only different from is_partial_reg_write() for SIMD8
+ * dispatches of 16-bit (or smaller) instructions.
+ */
+bool
+fs_inst::is_partial_var_write(uint32_t dispatch_width) const
+{
+   const uint32_t type_size = type_sz(this->dst.type);
+   uint32_t var_size = MIN2(REG_SIZE, dispatch_width * type_size);
+
+   return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
+           !this->dst.is_contiguous() ||
+           (this->exec_size * type_sz(this->dst.type)) < var_size ||
+           this->dst.offset % var_size != 0);
+}
+
 unsigned
 fs_inst::components_read(unsigned i) const
 {
@@ -2896,7 +2915,7 @@  fs_visitor::opt_register_renaming()
       if (depth == 0 &&
           inst->dst.file == VGRF &&
           alloc.sizes[inst->dst.nr] * REG_SIZE == inst->size_written &&
-          !inst->is_partial_write()) {
+          !inst->is_partial_reg_write()) {
          if (remap[dst] == ~0u) {
             remap[dst] = dst;
          } else {
@@ -3099,7 +3118,7 @@  fs_visitor::compute_to_mrf()
       next_ip++;
 
       if (inst->opcode != BRW_OPCODE_MOV ||
-	  inst->is_partial_write() ||
+	  inst->is_partial_reg_write() ||
 	  inst->dst.file != MRF || inst->src[0].file != VGRF ||
 	  inst->dst.type != inst->src[0].type ||
 	  inst->src[0].abs || inst->src[0].negate ||
@@ -3132,7 +3151,7 @@  fs_visitor::compute_to_mrf()
 	     * that writes that reg, but it would require smarter
 	     * tracking.
 	     */
-	    if (scan_inst->is_partial_write())
+	    if (scan_inst->is_partial_reg_write())
 	       break;
 
             /* Handling things not fully contained in the source of the copy
@@ -3444,7 +3463,7 @@  fs_visitor::remove_duplicate_mrf_writes()
       if (inst->opcode == BRW_OPCODE_MOV &&
 	  inst->dst.file == MRF &&
 	  inst->src[0].file != ARF &&
-	  !inst->is_partial_write()) {
+	  !inst->is_partial_reg_write()) {
          last_mrf_move[inst->dst.nr] = inst;
       }
    }
diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp b/src/intel/compiler/brw_fs_cmod_propagation.cpp
index 5fb522f810f..7bb5c9afbc9 100644
--- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
+++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
@@ -50,13 +50,13 @@ 
 
 static bool
 cmod_propagate_cmp_to_add(const gen_device_info *devinfo, bblock_t *block,
-                          fs_inst *inst)
+                          fs_inst *inst, unsigned dispatch_width)
 {
    bool read_flag = false;
 
    foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
       if (scan_inst->opcode == BRW_OPCODE_ADD &&
-          !scan_inst->is_partial_write() &&
+          !scan_inst->is_partial_var_write(dispatch_width) &&
           scan_inst->exec_size == inst->exec_size) {
          bool negate;
 
@@ -126,7 +126,7 @@  cmod_propagate_cmp_to_add(const gen_device_info *devinfo, bblock_t *block,
  */
 static bool
 cmod_propagate_not(const gen_device_info *devinfo, bblock_t *block,
-                   fs_inst *inst)
+                   fs_inst *inst, unsigned dispatch_width)
 {
    const enum brw_conditional_mod cond = brw_negate_cmod(inst->conditional_mod);
    bool read_flag = false;
@@ -141,7 +141,7 @@  cmod_propagate_not(const gen_device_info *devinfo, bblock_t *block,
              scan_inst->opcode != BRW_OPCODE_AND)
             break;
 
-         if (scan_inst->is_partial_write() ||
+         if (scan_inst->is_partial_var_write(dispatch_width) ||
              scan_inst->dst.offset != inst->src[0].offset ||
              scan_inst->exec_size != inst->exec_size)
             break;
@@ -166,7 +166,9 @@  cmod_propagate_not(const gen_device_info *devinfo, bblock_t *block,
 }
 
 static bool
-opt_cmod_propagation_local(const gen_device_info *devinfo, bblock_t *block)
+opt_cmod_propagation_local(const gen_device_info *devinfo,
+                           bblock_t *block,
+                           unsigned dispatch_width)
 {
    bool progress = false;
    int ip = block->end_ip + 1;
@@ -219,14 +221,14 @@  opt_cmod_propagation_local(const gen_device_info *devinfo, bblock_t *block)
        */
       if (inst->opcode == BRW_OPCODE_CMP && !inst->src[1].is_zero()) {
          if (brw_reg_type_is_floating_point(inst->src[0].type) &&
-             cmod_propagate_cmp_to_add(devinfo, block, inst))
+             cmod_propagate_cmp_to_add(devinfo, block, inst, dispatch_width))
             progress = true;
 
          continue;
       }
 
       if (inst->opcode == BRW_OPCODE_NOT) {
-         progress = cmod_propagate_not(devinfo, block, inst) || progress;
+         progress = cmod_propagate_not(devinfo, block, inst, dispatch_width) || progress;
          continue;
       }
 
@@ -234,7 +236,7 @@  opt_cmod_propagation_local(const gen_device_info *devinfo, bblock_t *block)
       foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
          if (regions_overlap(scan_inst->dst, scan_inst->size_written,
                              inst->src[0], inst->size_read(0))) {
-            if (scan_inst->is_partial_write() ||
+            if (scan_inst->is_partial_var_write(dispatch_width) ||
                 scan_inst->dst.offset != inst->src[0].offset ||
                 scan_inst->exec_size != inst->exec_size)
                break;
@@ -342,7 +344,7 @@  fs_visitor::opt_cmod_propagation()
    bool progress = false;
 
    foreach_block_reverse(block, cfg) {
-      progress = opt_cmod_propagation_local(devinfo, block) || progress;
+      progress = opt_cmod_propagation_local(devinfo, block, dispatch_width) || progress;
    }
 
    if (progress)
diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp b/src/intel/compiler/brw_fs_copy_propagation.cpp
index 77f2749ba04..4e20ddb683a 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -515,7 +515,7 @@  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
    /* Compute the first component of the copy that the instruction is
     * reading, and the base byte offset within that component.
     */
-   assert(entry->dst.offset % REG_SIZE == 0 && entry->dst.stride == 1);
+   assert(entry->dst.stride == 1);
    const unsigned component = rel_offset / type_sz(entry->dst.type);
    const unsigned suboffset = rel_offset % type_sz(entry->dst.type);
 
@@ -793,7 +793,7 @@  fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
 }
 
 static bool
-can_propagate_from(fs_inst *inst)
+can_propagate_from(fs_inst *inst, unsigned dispatch_width)
 {
    return (inst->opcode == BRW_OPCODE_MOV &&
            inst->dst.file == VGRF &&
@@ -804,7 +804,7 @@  can_propagate_from(fs_inst *inst)
             inst->src[0].file == UNIFORM ||
             inst->src[0].file == IMM) &&
            inst->src[0].type == inst->dst.type &&
-           !inst->is_partial_write());
+           !inst->is_partial_var_write(dispatch_width));
 }
 
 /* Walks a basic block and does copy propagation on it using the acp
@@ -856,7 +856,7 @@  fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, bblock_t *block,
       /* If this instruction's source could potentially be folded into the
        * operand of another instruction, add it to the ACP.
        */
-      if (can_propagate_from(inst)) {
+      if (can_propagate_from(inst, dispatch_width)) {
          acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
          entry->dst = inst->dst;
          entry->src = inst->src[0];
diff --git a/src/intel/compiler/brw_fs_cse.cpp b/src/intel/compiler/brw_fs_cse.cpp
index 6859733d58c..56813df2d2a 100644
--- a/src/intel/compiler/brw_fs_cse.cpp
+++ b/src/intel/compiler/brw_fs_cse.cpp
@@ -247,7 +247,8 @@  fs_visitor::opt_cse_local(bblock_t *block)
    int ip = block->start_ip;
    foreach_inst_in_block(fs_inst, inst, block) {
       /* Skip some cases. */
-      if (is_expression(this, inst) && !inst->is_partial_write() &&
+      if (is_expression(this, inst) &&
+          !inst->is_partial_var_write(dispatch_width) &&
           ((inst->dst.file != ARF && inst->dst.file != FIXED_GRF) ||
            inst->dst.is_null()))
       {
diff --git a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
index eeb71dd2b92..f24fd0643b8 100644
--- a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
+++ b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
@@ -110,7 +110,7 @@  fs_visitor::dead_code_eliminate()
          }
 
          if (inst->dst.file == VGRF) {
-            if (!inst->is_partial_write()) {
+            if (!inst->is_partial_reg_write()) {
                int var = live_intervals->var_from_reg(inst->dst);
                for (unsigned i = 0; i < regs_written(inst); i++) {
                   BITSET_CLEAR(live, var + i);
diff --git a/src/intel/compiler/brw_fs_live_variables.cpp b/src/intel/compiler/brw_fs_live_variables.cpp
index 059f076fa51..30625aa586a 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -84,7 +84,7 @@  fs_live_variables::setup_one_write(struct block_data *bd, fs_inst *inst,
     * screens off previous updates of that variable (VGRF channel).
     */
    if (inst->dst.file == VGRF) {
-      if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
+      if (!inst->is_partial_reg_write() && !BITSET_TEST(bd->use, var))
          BITSET_SET(bd->def, var);
 
       BITSET_SET(bd->defout, var);
diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp b/src/intel/compiler/brw_fs_reg_allocate.cpp
index 678afe6bab4..2228df30fde 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -1026,7 +1026,7 @@  fs_visitor::spill_reg(unsigned spill_reg)
           * write, there should be no need for the unspill since the
           * instruction will be overwriting the whole destination in any case.
 	  */
-         if (inst->is_partial_write() ||
+         if (inst->is_partial_reg_write() ||
              (!inst->force_writemask_all && !per_channel))
             emit_unspill(ubld, spill_src, subset_spill_offset,
                          regs_written(inst));
diff --git a/src/intel/compiler/brw_fs_register_coalesce.cpp b/src/intel/compiler/brw_fs_register_coalesce.cpp
index 4fe6773da54..27234292c30 100644
--- a/src/intel/compiler/brw_fs_register_coalesce.cpp
+++ b/src/intel/compiler/brw_fs_register_coalesce.cpp
@@ -70,7 +70,7 @@  is_coalesce_candidate(const fs_visitor *v, const fs_inst *inst)
 {
    if ((inst->opcode != BRW_OPCODE_MOV &&
         inst->opcode != SHADER_OPCODE_LOAD_PAYLOAD) ||
-       inst->is_partial_write() ||
+       inst->is_partial_reg_write() ||
        inst->saturate ||
        inst->src[0].file != VGRF ||
        inst->src[0].negate ||
diff --git a/src/intel/compiler/brw_fs_saturate_propagation.cpp b/src/intel/compiler/brw_fs_saturate_propagation.cpp
index d6cfa79a618..1e1461063ae 100644
--- a/src/intel/compiler/brw_fs_saturate_propagation.cpp
+++ b/src/intel/compiler/brw_fs_saturate_propagation.cpp
@@ -43,7 +43,8 @@ 
  */
 
 static bool
-opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
+opt_saturate_propagation_local(fs_visitor *v, bblock_t *block,
+                               unsigned dispatch_width)
 {
    bool progress = false;
    int ip = block->end_ip + 1;
@@ -66,7 +67,7 @@  opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
       foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
          if (regions_overlap(scan_inst->dst, scan_inst->size_written,
                              inst->src[0], inst->size_read(0))) {
-            if (scan_inst->is_partial_write() ||
+            if (scan_inst->is_partial_var_write(dispatch_width) ||
                 (scan_inst->dst.type != inst->dst.type &&
                  !scan_inst->can_change_types()))
                break;
@@ -153,7 +154,7 @@  fs_visitor::opt_saturate_propagation()
    calculate_live_intervals();
 
    foreach_block (block, cfg) {
-      progress = opt_saturate_propagation_local(this, block) || progress;
+      progress = opt_saturate_propagation_local(this, block, dispatch_width) || progress;
    }
 
    /* Live intervals are still valid. */
diff --git a/src/intel/compiler/brw_fs_sel_peephole.cpp b/src/intel/compiler/brw_fs_sel_peephole.cpp
index 6395b409b7c..98d640a3bfe 100644
--- a/src/intel/compiler/brw_fs_sel_peephole.cpp
+++ b/src/intel/compiler/brw_fs_sel_peephole.cpp
@@ -167,8 +167,8 @@  fs_visitor::opt_peephole_sel()
              then_mov[i]->exec_size != else_mov[i]->exec_size ||
              then_mov[i]->group != else_mov[i]->group ||
              then_mov[i]->force_writemask_all != else_mov[i]->force_writemask_all ||
-             then_mov[i]->is_partial_write() ||
-             else_mov[i]->is_partial_write() ||
+             then_mov[i]->is_partial_var_write(dispatch_width) ||
+             else_mov[i]->is_partial_var_write(dispatch_width) ||
              then_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE ||
              else_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE) {
             movs = i;
diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index ba4d6a95720..48b66ca5a65 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -349,7 +349,8 @@  public:
 
    bool equals(fs_inst *inst) const;
    bool is_send_from_grf() const;
-   bool is_partial_write() const;
+   bool is_partial_reg_write() const;
+   bool is_partial_var_write(unsigned dispatch_width) const;
    bool is_copy_payload(const brw::simple_allocator &grf_alloc) const;
    unsigned components_read(unsigned i) const;
    unsigned size_read(int arg) const;

Comments

Ugh... I really don't like this...  But I also don't have a better idea
off-hand.  The unfortunate reality is that this IR really isn't designed to
be able to handle this sort of thing.  My #1 concern here is that I don't
think it does good things when we have instructions with exec_size <
dispatch_width such as split instructions in SIMD32.  I think it's *mostly*
a no-op there.  I'll have to think on this one a bit more.  Don't wait to
re-send the v4 until I've come up with something.

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga <itoral@igalia.com>
wrote:

> This function is used in two different scenarios that for 32-bit
> instructions are the same, but for 16-bit instructions are not.
>
> One scenario is that in which we are working at a SIMD8 register
> level and we need to know if a register is fully defined or written.
> This is useful, for example, in the context of liveness analysis or
> register allocation, where we work with units of registers.
>
> The other scenario is that in which we want to know if an instruction
> is writing a full scalar component or just some subset of it. This is
> useful, for example, in the context of some optimization passes
> like copy propagation.
>
> For 32-bit instructions (or larger), a SIMD8 dispatch will always write
> at least a full SIMD8 register (32B) if the write is not partial. The
> function is_partial_write() checks this to determine if we have a partial
> write. However, when we deal with 16-bit instructions, that logic disables
> some optimizations that should be safe. For example, a SIMD8 16-bit MOV
> will
> only update half of a SIMD register, but it is still a complete write of
> the
> variable for a SIMD8 dispatch, so we should not prevent copy propagation in
> this scenario because we don't write all 32 bytes in the SIMD register
> or because the write starts at offset 16B (wehere we pack components Y or
> W of 16-bit vectors).
>
> This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
> instructions, which lose a number of optimizations because of this, most
> important of which is copy-propagation.
>
> This patch splits is_partial_write() into is_partial_reg_write(), which
> represents the current is_partial_write(), useful for things like
> liveness analysis, and is_partial_var_write(), which considers
> the dispatch size to check if we are writing a full variable (rather
> than a full register) to decide if the write is partial or not, which
> is what we really want in many optimization passes.
>
> Then the patch goes on and rewrites all uses of is_partial_write() to use
> one or the other version. Specifically, we use is_partial_var_write()
> in the following places: copy propagation, cmod propagation, common
> subexpression elimination, saturate propagation and sel peephole.
>
> Notice that the semantics of is_partial_var_write() exactly match the
> current implementation of is_partial_write() for anything that is
> 32-bit or larger, so no changes are expected for 32-bit instructions.
>
> Tested against ~5000 tests involving 16-bit instructions in CTS produced
> the following changes in instruction counts:
>
>             Patched  |     Master    |    %    |
> ================================================
> SIMD8  |    621,900  |    706,721    | -12.00% |
> ================================================
> SIMD16 |     93,252  |     93,252    |   0.00% |
> ================================================
>
> As expected, the change only affects SIMD8 dispatches.
>
> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
> ---
>  src/intel/compiler/brw_fs.cpp                 | 31 +++++++++++++++----
>  .../compiler/brw_fs_cmod_propagation.cpp      | 20 ++++++------
>  .../compiler/brw_fs_copy_propagation.cpp      |  8 ++---
>  src/intel/compiler/brw_fs_cse.cpp             |  3 +-
>  .../compiler/brw_fs_dead_code_eliminate.cpp   |  2 +-
>  src/intel/compiler/brw_fs_live_variables.cpp  |  2 +-
>  src/intel/compiler/brw_fs_reg_allocate.cpp    |  2 +-
>  .../compiler/brw_fs_register_coalesce.cpp     |  2 +-
>  .../compiler/brw_fs_saturate_propagation.cpp  |  7 +++--
>  src/intel/compiler/brw_fs_sel_peephole.cpp    |  4 +--
>  src/intel/compiler/brw_ir_fs.h                |  3 +-
>  11 files changed, 54 insertions(+), 30 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index d6096cd667d..77c955ac435 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -716,14 +716,33 @@ fs_visitor::limit_dispatch_width(unsigned n, const
> char *msg)
>   * it.
>   */
>  bool
> -fs_inst::is_partial_write() const
> +fs_inst::is_partial_reg_write() const
>  {
>     return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> -           (this->exec_size * type_sz(this->dst.type)) < 32 ||
>             !this->dst.is_contiguous() ||
> +           (this->exec_size * type_sz(this->dst.type)) < REG_SIZE ||
>             this->dst.offset % REG_SIZE != 0);
>  }
>
> +/**
> + * Returns true if the instruction has a flag that means it won't
> + * update an entire variable for the given dispatch width.
> + *
> + * This is only different from is_partial_reg_write() for SIMD8
> + * dispatches of 16-bit (or smaller) instructions.
> + */
> +bool
> +fs_inst::is_partial_var_write(uint32_t dispatch_width) const
> +{
> +   const uint32_t type_size = type_sz(this->dst.type);
> +   uint32_t var_size = MIN2(REG_SIZE, dispatch_width * type_size);
> +
> +   return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> +           !this->dst.is_contiguous() ||
> +           (this->exec_size * type_sz(this->dst.type)) < var_size ||
> +           this->dst.offset % var_size != 0);
> +}
> +
>  unsigned
>  fs_inst::components_read(unsigned i) const
>  {
> @@ -2896,7 +2915,7 @@ fs_visitor::opt_register_renaming()
>        if (depth == 0 &&
>            inst->dst.file == VGRF &&
>            alloc.sizes[inst->dst.nr] * REG_SIZE == inst->size_written &&
> -          !inst->is_partial_write()) {
> +          !inst->is_partial_reg_write()) {
>           if (remap[dst] == ~0u) {
>              remap[dst] = dst;
>           } else {
> @@ -3099,7 +3118,7 @@ fs_visitor::compute_to_mrf()
>        next_ip++;
>
>        if (inst->opcode != BRW_OPCODE_MOV ||
> -         inst->is_partial_write() ||
> +         inst->is_partial_reg_write() ||
>           inst->dst.file != MRF || inst->src[0].file != VGRF ||
>           inst->dst.type != inst->src[0].type ||
>           inst->src[0].abs || inst->src[0].negate ||
> @@ -3132,7 +3151,7 @@ fs_visitor::compute_to_mrf()
>              * that writes that reg, but it would require smarter
>              * tracking.
>              */
> -           if (scan_inst->is_partial_write())
> +           if (scan_inst->is_partial_reg_write())
>                break;
>
>              /* Handling things not fully contained in the source of the
> copy
> @@ -3444,7 +3463,7 @@ fs_visitor::remove_duplicate_mrf_writes()
>        if (inst->opcode == BRW_OPCODE_MOV &&
>           inst->dst.file == MRF &&
>           inst->src[0].file != ARF &&
> -         !inst->is_partial_write()) {
> +         !inst->is_partial_reg_write()) {
>           last_mrf_move[inst->dst.nr] = inst;
>        }
>     }
> diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> index 5fb522f810f..7bb5c9afbc9 100644
> --- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> @@ -50,13 +50,13 @@
>
>  static bool
>  cmod_propagate_cmp_to_add(const gen_device_info *devinfo, bblock_t *block,
> -                          fs_inst *inst)
> +                          fs_inst *inst, unsigned dispatch_width)
>  {
>     bool read_flag = false;
>
>     foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst) {
>        if (scan_inst->opcode == BRW_OPCODE_ADD &&
> -          !scan_inst->is_partial_write() &&
> +          !scan_inst->is_partial_var_write(dispatch_width) &&
>            scan_inst->exec_size == inst->exec_size) {
>           bool negate;
>
> @@ -126,7 +126,7 @@ cmod_propagate_cmp_to_add(const gen_device_info
> *devinfo, bblock_t *block,
>   */
>  static bool
>  cmod_propagate_not(const gen_device_info *devinfo, bblock_t *block,
> -                   fs_inst *inst)
> +                   fs_inst *inst, unsigned dispatch_width)
>  {
>     const enum brw_conditional_mod cond =
> brw_negate_cmod(inst->conditional_mod);
>     bool read_flag = false;
> @@ -141,7 +141,7 @@ cmod_propagate_not(const gen_device_info *devinfo,
> bblock_t *block,
>               scan_inst->opcode != BRW_OPCODE_AND)
>              break;
>
> -         if (scan_inst->is_partial_write() ||
> +         if (scan_inst->is_partial_var_write(dispatch_width) ||
>               scan_inst->dst.offset != inst->src[0].offset ||
>               scan_inst->exec_size != inst->exec_size)
>              break;
> @@ -166,7 +166,9 @@ cmod_propagate_not(const gen_device_info *devinfo,
> bblock_t *block,
>  }
>
>  static bool
> -opt_cmod_propagation_local(const gen_device_info *devinfo, bblock_t
> *block)
> +opt_cmod_propagation_local(const gen_device_info *devinfo,
> +                           bblock_t *block,
> +                           unsigned dispatch_width)
>  {
>     bool progress = false;
>     int ip = block->end_ip + 1;
> @@ -219,14 +221,14 @@ opt_cmod_propagation_local(const gen_device_info
> *devinfo, bblock_t *block)
>         */
>        if (inst->opcode == BRW_OPCODE_CMP && !inst->src[1].is_zero()) {
>           if (brw_reg_type_is_floating_point(inst->src[0].type) &&
> -             cmod_propagate_cmp_to_add(devinfo, block, inst))
> +             cmod_propagate_cmp_to_add(devinfo, block, inst,
> dispatch_width))
>              progress = true;
>
>           continue;
>        }
>
>        if (inst->opcode == BRW_OPCODE_NOT) {
> -         progress = cmod_propagate_not(devinfo, block, inst) || progress;
> +         progress = cmod_propagate_not(devinfo, block, inst,
> dispatch_width) || progress;
>           continue;
>        }
>
> @@ -234,7 +236,7 @@ opt_cmod_propagation_local(const gen_device_info
> *devinfo, bblock_t *block)
>        foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst,
> inst) {
>           if (regions_overlap(scan_inst->dst, scan_inst->size_written,
>                               inst->src[0], inst->size_read(0))) {
> -            if (scan_inst->is_partial_write() ||
> +            if (scan_inst->is_partial_var_write(dispatch_width) ||
>                  scan_inst->dst.offset != inst->src[0].offset ||
>                  scan_inst->exec_size != inst->exec_size)
>                 break;
> @@ -342,7 +344,7 @@ fs_visitor::opt_cmod_propagation()
>     bool progress = false;
>
>     foreach_block_reverse(block, cfg) {
> -      progress = opt_cmod_propagation_local(devinfo, block) || progress;
> +      progress = opt_cmod_propagation_local(devinfo, block,
> dispatch_width) || progress;
>     }
>
>     if (progress)
> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> b/src/intel/compiler/brw_fs_copy_propagation.cpp
> index 77f2749ba04..4e20ddb683a 100644
> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> @@ -515,7 +515,7 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg,
> acp_entry *entry)
>     /* Compute the first component of the copy that the instruction is
>      * reading, and the base byte offset within that component.
>      */
> -   assert(entry->dst.offset % REG_SIZE == 0 && entry->dst.stride == 1);
> +   assert(entry->dst.stride == 1);
>     const unsigned component = rel_offset / type_sz(entry->dst.type);
>     const unsigned suboffset = rel_offset % type_sz(entry->dst.type);
>
> @@ -793,7 +793,7 @@ fs_visitor::try_constant_propagate(fs_inst *inst,
> acp_entry *entry)
>  }
>
>  static bool
> -can_propagate_from(fs_inst *inst)
> +can_propagate_from(fs_inst *inst, unsigned dispatch_width)
>  {
>     return (inst->opcode == BRW_OPCODE_MOV &&
>             inst->dst.file == VGRF &&
> @@ -804,7 +804,7 @@ can_propagate_from(fs_inst *inst)
>              inst->src[0].file == UNIFORM ||
>              inst->src[0].file == IMM) &&
>             inst->src[0].type == inst->dst.type &&
> -           !inst->is_partial_write());
> +           !inst->is_partial_var_write(dispatch_width));
>  }
>
>  /* Walks a basic block and does copy propagation on it using the acp
> @@ -856,7 +856,7 @@ fs_visitor::opt_copy_propagation_local(void
> *copy_prop_ctx, bblock_t *block,
>        /* If this instruction's source could potentially be folded into the
>         * operand of another instruction, add it to the ACP.
>         */
> -      if (can_propagate_from(inst)) {
> +      if (can_propagate_from(inst, dispatch_width)) {
>           acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
>           entry->dst = inst->dst;
>           entry->src = inst->src[0];
> diff --git a/src/intel/compiler/brw_fs_cse.cpp
> b/src/intel/compiler/brw_fs_cse.cpp
> index 6859733d58c..56813df2d2a 100644
> --- a/src/intel/compiler/brw_fs_cse.cpp
> +++ b/src/intel/compiler/brw_fs_cse.cpp
> @@ -247,7 +247,8 @@ fs_visitor::opt_cse_local(bblock_t *block)
>     int ip = block->start_ip;
>     foreach_inst_in_block(fs_inst, inst, block) {
>        /* Skip some cases. */
> -      if (is_expression(this, inst) && !inst->is_partial_write() &&
> +      if (is_expression(this, inst) &&
> +          !inst->is_partial_var_write(dispatch_width) &&
>            ((inst->dst.file != ARF && inst->dst.file != FIXED_GRF) ||
>             inst->dst.is_null()))
>        {
> diff --git a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> index eeb71dd2b92..f24fd0643b8 100644
> --- a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> +++ b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> @@ -110,7 +110,7 @@ fs_visitor::dead_code_eliminate()
>           }
>
>           if (inst->dst.file == VGRF) {
> -            if (!inst->is_partial_write()) {
> +            if (!inst->is_partial_reg_write()) {
>                 int var = live_intervals->var_from_reg(inst->dst);
>                 for (unsigned i = 0; i < regs_written(inst); i++) {
>                    BITSET_CLEAR(live, var + i);
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index 059f076fa51..30625aa586a 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -84,7 +84,7 @@ fs_live_variables::setup_one_write(struct block_data
> *bd, fs_inst *inst,
>      * screens off previous updates of that variable (VGRF channel).
>      */
>     if (inst->dst.file == VGRF) {
> -      if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
> +      if (!inst->is_partial_reg_write() && !BITSET_TEST(bd->use, var))
>           BITSET_SET(bd->def, var);
>
>        BITSET_SET(bd->defout, var);
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index 678afe6bab4..2228df30fde 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -1026,7 +1026,7 @@ fs_visitor::spill_reg(unsigned spill_reg)
>            * write, there should be no need for the unspill since the
>            * instruction will be overwriting the whole destination in any
> case.
>           */
> -         if (inst->is_partial_write() ||
> +         if (inst->is_partial_reg_write() ||
>               (!inst->force_writemask_all && !per_channel))
>              emit_unspill(ubld, spill_src, subset_spill_offset,
>                           regs_written(inst));
> diff --git a/src/intel/compiler/brw_fs_register_coalesce.cpp
> b/src/intel/compiler/brw_fs_register_coalesce.cpp
> index 4fe6773da54..27234292c30 100644
> --- a/src/intel/compiler/brw_fs_register_coalesce.cpp
> +++ b/src/intel/compiler/brw_fs_register_coalesce.cpp
> @@ -70,7 +70,7 @@ is_coalesce_candidate(const fs_visitor *v, const fs_inst
> *inst)
>  {
>     if ((inst->opcode != BRW_OPCODE_MOV &&
>          inst->opcode != SHADER_OPCODE_LOAD_PAYLOAD) ||
> -       inst->is_partial_write() ||
> +       inst->is_partial_reg_write() ||
>         inst->saturate ||
>         inst->src[0].file != VGRF ||
>         inst->src[0].negate ||
> diff --git a/src/intel/compiler/brw_fs_saturate_propagation.cpp
> b/src/intel/compiler/brw_fs_saturate_propagation.cpp
> index d6cfa79a618..1e1461063ae 100644
> --- a/src/intel/compiler/brw_fs_saturate_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_saturate_propagation.cpp
> @@ -43,7 +43,8 @@
>   */
>
>  static bool
> -opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
> +opt_saturate_propagation_local(fs_visitor *v, bblock_t *block,
> +                               unsigned dispatch_width)
>  {
>     bool progress = false;
>     int ip = block->end_ip + 1;
> @@ -66,7 +67,7 @@ opt_saturate_propagation_local(fs_visitor *v, bblock_t
> *block)
>        foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst,
> inst) {
>           if (regions_overlap(scan_inst->dst, scan_inst->size_written,
>                               inst->src[0], inst->size_read(0))) {
> -            if (scan_inst->is_partial_write() ||
> +            if (scan_inst->is_partial_var_write(dispatch_width) ||
>                  (scan_inst->dst.type != inst->dst.type &&
>                   !scan_inst->can_change_types()))
>                 break;
> @@ -153,7 +154,7 @@ fs_visitor::opt_saturate_propagation()
>     calculate_live_intervals();
>
>     foreach_block (block, cfg) {
> -      progress = opt_saturate_propagation_local(this, block) || progress;
> +      progress = opt_saturate_propagation_local(this, block,
> dispatch_width) || progress;
>     }
>
>     /* Live intervals are still valid. */
> diff --git a/src/intel/compiler/brw_fs_sel_peephole.cpp
> b/src/intel/compiler/brw_fs_sel_peephole.cpp
> index 6395b409b7c..98d640a3bfe 100644
> --- a/src/intel/compiler/brw_fs_sel_peephole.cpp
> +++ b/src/intel/compiler/brw_fs_sel_peephole.cpp
> @@ -167,8 +167,8 @@ fs_visitor::opt_peephole_sel()
>               then_mov[i]->exec_size != else_mov[i]->exec_size ||
>               then_mov[i]->group != else_mov[i]->group ||
>               then_mov[i]->force_writemask_all !=
> else_mov[i]->force_writemask_all ||
> -             then_mov[i]->is_partial_write() ||
> -             else_mov[i]->is_partial_write() ||
> +             then_mov[i]->is_partial_var_write(dispatch_width) ||
> +             else_mov[i]->is_partial_var_write(dispatch_width) ||
>               then_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE ||
>               else_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE) {
>              movs = i;
> diff --git a/src/intel/compiler/brw_ir_fs.h
> b/src/intel/compiler/brw_ir_fs.h
> index ba4d6a95720..48b66ca5a65 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -349,7 +349,8 @@ public:
>
>     bool equals(fs_inst *inst) const;
>     bool is_send_from_grf() const;
> -   bool is_partial_write() const;
> +   bool is_partial_reg_write() const;
> +   bool is_partial_var_write(unsigned dispatch_width) const;
>     bool is_copy_payload(const brw::simple_allocator &grf_alloc) const;
>     unsigned components_read(unsigned i) const;
>     unsigned size_read(int arg) const;
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
On Fri, 2019-01-18 at 14:23 -0600, Jason Ekstrand wrote:
> Ugh... I really don't like this...  But I also don't have a better
> idea off-hand.  The unfortunate reality is that this IR really isn't
> designed to be able to handle this sort of thing.  My #1 concern here
> is that I don't think it does good things when we have instructions
> with exec_size < dispatch_width such as split instructions in
> SIMD32.  

Right.... I guess this is the same case as SIMD16 instructions split to
2xSIMD8 really. I think this would consider each split instruction
a  partial write, since that is what each chunk of the instruction is
really doing (writing a subset of the channels of a variable), but of
course, the original unsplit instruction mihgt not be a partial write
which is where we would be losing optimization opportunities, we just
don't have that information available once the instruction has been
split.
Would it make sense to have fs_inst contain information about whether
this is an instruction that has been split from a larger instruction
that was doing a  full variable write? I guess we do have all that info
available when we decide to split instructions in the SIMD
width  lowering pass.
> I think it's *mostly* a no-op there.  I'll have to think on this one
> a bit more.  Don't wait to re-send the v4 until I've come up with
> something.
> 
> On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga <itoral@igalia.com
> > wrote:
> > This function is used in two different scenarios that for 32-bit
> > 
> > instructions are the same, but for 16-bit instructions are not.
> > 
> > 
> > 
> > One scenario is that in which we are working at a SIMD8 register
> > 
> > level and we need to know if a register is fully defined or
> > written.
> > 
> > This is useful, for example, in the context of liveness analysis or
> > 
> > register allocation, where we work with units of registers.
> > 
> > 
> > 
> > The other scenario is that in which we want to know if an
> > instruction
> > 
> > is writing a full scalar component or just some subset of it. This
> > is
> > 
> > useful, for example, in the context of some optimization passes
> > 
> > like copy propagation.
> > 
> > 
> > 
> > For 32-bit instructions (or larger), a SIMD8 dispatch will always
> > write
> > 
> > at least a full SIMD8 register (32B) if the write is not partial.
> > The
> > 
> > function is_partial_write() checks this to determine if we have a
> > partial
> > 
> > write. However, when we deal with 16-bit instructions, that logic
> > disables
> > 
> > some optimizations that should be safe. For example, a SIMD8 16-bit 
> > MOV will
> > 
> > only update half of a SIMD register, but it is still a complete
> > write of the
> > 
> > variable for a SIMD8 dispatch, so we should not prevent copy
> > propagation in
> > 
> > this scenario because we don't write all 32 bytes in the SIMD
> > register
> > 
> > or because the write starts at offset 16B (wehere we pack
> > components Y or
> > 
> > W of 16-bit vectors).
> > 
> > 
> > 
> > This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
> > 
> > instructions, which lose a number of optimizations because of this,
> > most
> > 
> > important of which is copy-propagation.
> > 
> > 
> > 
> > This patch splits is_partial_write() into is_partial_reg_write(),
> > which
> > 
> > represents the current is_partial_write(), useful for things like
> > 
> > liveness analysis, and is_partial_var_write(), which considers
> > 
> > the dispatch size to check if we are writing a full variable
> > (rather
> > 
> > than a full register) to decide if the write is partial or not,
> > which
> > 
> > is what we really want in many optimization passes.
> > 
> > 
> > 
> > Then the patch goes on and rewrites all uses of is_partial_write()
> > to use
> > 
> > one or the other version. Specifically, we use
> > is_partial_var_write()
> > 
> > in the following places: copy propagation, cmod propagation, common
> > 
> > subexpression elimination, saturate propagation and sel peephole.
> > 
> > 
> > 
> > Notice that the semantics of is_partial_var_write() exactly match
> > the
> > 
> > current implementation of is_partial_write() for anything that is
> > 
> > 32-bit or larger, so no changes are expected for 32-bit
> > instructions.
> > 
> > 
> > 
> > Tested against ~5000 tests involving 16-bit instructions in CTS
> > produced
> > 
> > the following changes in instruction counts:
> > 
> > 
> > 
> >             Patched  |     Master    |    %    |
> > 
> > ================================================
> > 
> > SIMD8  |    621,900  |    706,721    | -12.00% |
> > 
> > ================================================
> > 
> > SIMD16 |     93,252  |     93,252    |   0.00% |
> > 
> > ================================================
> > 
> > 
> > 
> > As expected, the change only affects SIMD8 dispatches.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
> > 
> > ---
> > 
> >  src/intel/compiler/brw_fs.cpp                 | 31
> > +++++++++++++++----
> > 
> >  .../compiler/brw_fs_cmod_propagation.cpp      | 20 ++++++------
> > 
> >  .../compiler/brw_fs_copy_propagation.cpp      |  8 ++---
> > 
> >  src/intel/compiler/brw_fs_cse.cpp             |  3 +-
> > 
> >  .../compiler/brw_fs_dead_code_eliminate.cpp   |  2 +-
> > 
> >  src/intel/compiler/brw_fs_live_variables.cpp  |  2 +-
> > 
> >  src/intel/compiler/brw_fs_reg_allocate.cpp    |  2 +-
> > 
> >  .../compiler/brw_fs_register_coalesce.cpp     |  2 +-
> > 
> >  .../compiler/brw_fs_saturate_propagation.cpp  |  7 +++--
> > 
> >  src/intel/compiler/brw_fs_sel_peephole.cpp    |  4 +--
> > 
> >  src/intel/compiler/brw_ir_fs.h                |  3 +-
> > 
> >  11 files changed, 54 insertions(+), 30 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > 
> > index d6096cd667d..77c955ac435 100644
> > 
> > --- a/src/intel/compiler/brw_fs.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs.cpp
> > 
> > @@ -716,14 +716,33 @@ fs_visitor::limit_dispatch_width(unsigned n,
> > const char *msg)
> > 
> >   * it.
> > 
> >   */
> > 
> >  bool
> > 
> > -fs_inst::is_partial_write() const
> > 
> > +fs_inst::is_partial_reg_write() const
> > 
> >  {
> > 
> >     return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> > 
> > -           (this->exec_size * type_sz(this->dst.type)) < 32 ||
> > 
> >             !this->dst.is_contiguous() ||
> > 
> > +           (this->exec_size * type_sz(this->dst.type)) < REG_SIZE
> > ||
> > 
> >             this->dst.offset % REG_SIZE != 0);
> > 
> >  }
> > 
> > 
> > 
> > +/**
> > 
> > + * Returns true if the instruction has a flag that means it won't
> > 
> > + * update an entire variable for the given dispatch width.
> > 
> > + *
> > 
> > + * This is only different from is_partial_reg_write() for SIMD8
> > 
> > + * dispatches of 16-bit (or smaller) instructions.
> > 
> > + */
> > 
> > +bool
> > 
> > +fs_inst::is_partial_var_write(uint32_t dispatch_width) const
> > 
> > +{
> > 
> > +   const uint32_t type_size = type_sz(this->dst.type);
> > 
> > +   uint32_t var_size = MIN2(REG_SIZE, dispatch_width * type_size);
> > 
> > +
> > 
> > +   return ((this->predicate && this->opcode != BRW_OPCODE_SEL) ||
> > 
> > +           !this->dst.is_contiguous() ||
> > 
> > +           (this->exec_size * type_sz(this->dst.type)) < var_size
> > ||
> > 
> > +           this->dst.offset % var_size != 0);
> > 
> > +}
> > 
> > +
> > 
> >  unsigned
> > 
> >  fs_inst::components_read(unsigned i) const
> > 
> >  {
> > 
> > @@ -2896,7 +2915,7 @@ fs_visitor::opt_register_renaming()
> > 
> >        if (depth == 0 &&
> > 
> >            inst->dst.file == VGRF &&
> > 
> >            alloc.sizes[inst->dst.nr] * REG_SIZE == inst-
> > >size_written &&
> > 
> > -          !inst->is_partial_write()) {
> > 
> > +          !inst->is_partial_reg_write()) {
> > 
> >           if (remap[dst] == ~0u) {
> > 
> >              remap[dst] = dst;
> > 
> >           } else {
> > 
> > @@ -3099,7 +3118,7 @@ fs_visitor::compute_to_mrf()
> > 
> >        next_ip++;
> > 
> > 
> > 
> >        if (inst->opcode != BRW_OPCODE_MOV ||
> > 
> > -         inst->is_partial_write() ||
> > 
> > +         inst->is_partial_reg_write() ||
> > 
> >           inst->dst.file != MRF || inst->src[0].file != VGRF ||
> > 
> >           inst->dst.type != inst->src[0].type ||
> > 
> >           inst->src[0].abs || inst->src[0].negate ||
> > 
> > @@ -3132,7 +3151,7 @@ fs_visitor::compute_to_mrf()
> > 
> >              * that writes that reg, but it would require smarter
> > 
> >              * tracking.
> > 
> >              */
> > 
> > -           if (scan_inst->is_partial_write())
> > 
> > +           if (scan_inst->is_partial_reg_write())
> > 
> >                break;
> > 
> > 
> > 
> >              /* Handling things not fully contained in the source
> > of the copy
> > 
> > @@ -3444,7 +3463,7 @@ fs_visitor::remove_duplicate_mrf_writes()
> > 
> >        if (inst->opcode == BRW_OPCODE_MOV &&
> > 
> >           inst->dst.file == MRF &&
> > 
> >           inst->src[0].file != ARF &&
> > 
> > -         !inst->is_partial_write()) {
> > 
> > +         !inst->is_partial_reg_write()) {
> > 
> >           last_mrf_move[inst->dst.nr] = inst;
> > 
> >        }
> > 
> >     }
> > 
> > diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> > b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> > 
> > index 5fb522f810f..7bb5c9afbc9 100644
> > 
> > --- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> > 
> > @@ -50,13 +50,13 @@
> > 
> > 
> > 
> >  static bool
> > 
> >  cmod_propagate_cmp_to_add(const gen_device_info *devinfo, bblock_t
> > *block,
> > 
> > -                          fs_inst *inst)
> > 
> > +                          fs_inst *inst, unsigned dispatch_width)
> > 
> >  {
> > 
> >     bool read_flag = false;
> > 
> > 
> > 
> >     foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst,
> > inst) {
> > 
> >        if (scan_inst->opcode == BRW_OPCODE_ADD &&
> > 
> > -          !scan_inst->is_partial_write() &&
> > 
> > +          !scan_inst->is_partial_var_write(dispatch_width) &&
> > 
> >            scan_inst->exec_size == inst->exec_size) {
> > 
> >           bool negate;
> > 
> > 
> > 
> > @@ -126,7 +126,7 @@ cmod_propagate_cmp_to_add(const gen_device_info
> > *devinfo, bblock_t *block,
> > 
> >   */
> > 
> >  static bool
> > 
> >  cmod_propagate_not(const gen_device_info *devinfo, bblock_t
> > *block,
> > 
> > -                   fs_inst *inst)
> > 
> > +                   fs_inst *inst, unsigned dispatch_width)
> > 
> >  {
> > 
> >     const enum brw_conditional_mod cond = brw_negate_cmod(inst-
> > >conditional_mod);
> > 
> >     bool read_flag = false;
> > 
> > @@ -141,7 +141,7 @@ cmod_propagate_not(const gen_device_info
> > *devinfo, bblock_t *block,
> > 
> >               scan_inst->opcode != BRW_OPCODE_AND)
> > 
> >              break;
> > 
> > 
> > 
> > -         if (scan_inst->is_partial_write() ||
> > 
> > +         if (scan_inst->is_partial_var_write(dispatch_width) ||
> > 
> >               scan_inst->dst.offset != inst->src[0].offset ||
> > 
> >               scan_inst->exec_size != inst->exec_size)
> > 
> >              break;
> > 
> > @@ -166,7 +166,9 @@ cmod_propagate_not(const gen_device_info
> > *devinfo, bblock_t *block,
> > 
> >  }
> > 
> > 
> > 
> >  static bool
> > 
> > -opt_cmod_propagation_local(const gen_device_info *devinfo,
> > bblock_t *block)
> > 
> > +opt_cmod_propagation_local(const gen_device_info *devinfo,
> > 
> > +                           bblock_t *block,
> > 
> > +                           unsigned dispatch_width)
> > 
> >  {
> > 
> >     bool progress = false;
> > 
> >     int ip = block->end_ip + 1;
> > 
> > @@ -219,14 +221,14 @@ opt_cmod_propagation_local(const
> > gen_device_info *devinfo, bblock_t *block)
> > 
> >         */
> > 
> >        if (inst->opcode == BRW_OPCODE_CMP && !inst-
> > >src[1].is_zero()) {
> > 
> >           if (brw_reg_type_is_floating_point(inst->src[0].type) &&
> > 
> > -             cmod_propagate_cmp_to_add(devinfo, block, inst))
> > 
> > +             cmod_propagate_cmp_to_add(devinfo, block, inst,
> > dispatch_width))
> > 
> >              progress = true;
> > 
> > 
> > 
> >           continue;
> > 
> >        }
> > 
> > 
> > 
> >        if (inst->opcode == BRW_OPCODE_NOT) {
> > 
> > -         progress = cmod_propagate_not(devinfo, block, inst) ||
> > progress;
> > 
> > +         progress = cmod_propagate_not(devinfo, block, inst,
> > dispatch_width) || progress;
> > 
> >           continue;
> > 
> >        }
> > 
> > 
> > 
> > @@ -234,7 +236,7 @@ opt_cmod_propagation_local(const
> > gen_device_info *devinfo, bblock_t *block)
> > 
> >        foreach_inst_in_block_reverse_starting_from(fs_inst,
> > scan_inst, inst) {
> > 
> >           if (regions_overlap(scan_inst->dst, scan_inst-
> > >size_written,
> > 
> >                               inst->src[0], inst->size_read(0))) {
> > 
> > -            if (scan_inst->is_partial_write() ||
> > 
> > +            if (scan_inst->is_partial_var_write(dispatch_width) ||
> > 
> >                  scan_inst->dst.offset != inst->src[0].offset ||
> > 
> >                  scan_inst->exec_size != inst->exec_size)
> > 
> >                 break;
> > 
> > @@ -342,7 +344,7 @@ fs_visitor::opt_cmod_propagation()
> > 
> >     bool progress = false;
> > 
> > 
> > 
> >     foreach_block_reverse(block, cfg) {
> > 
> > -      progress = opt_cmod_propagation_local(devinfo, block) ||
> > progress;
> > 
> > +      progress = opt_cmod_propagation_local(devinfo, block,
> > dispatch_width) || progress;
> > 
> >     }
> > 
> > 
> > 
> >     if (progress)
> > 
> > diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > 
> > index 77f2749ba04..4e20ddb683a 100644
> > 
> > --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> > 
> > @@ -515,7 +515,7 @@ fs_visitor::try_copy_propagate(fs_inst *inst,
> > int arg, acp_entry *entry)
> > 
> >     /* Compute the first component of the copy that the instruction
> > is
> > 
> >      * reading, and the base byte offset within that component.
> > 
> >      */
> > 
> > -   assert(entry->dst.offset % REG_SIZE == 0 && entry->dst.stride
> > == 1);
> > 
> > +   assert(entry->dst.stride == 1);
> > 
> >     const unsigned component = rel_offset / type_sz(entry-
> > >dst.type);
> > 
> >     const unsigned suboffset = rel_offset % type_sz(entry-
> > >dst.type);
> > 
> > 
> > 
> > @@ -793,7 +793,7 @@ fs_visitor::try_constant_propagate(fs_inst
> > *inst, acp_entry *entry)
> > 
> >  }
> > 
> > 
> > 
> >  static bool
> > 
> > -can_propagate_from(fs_inst *inst)
> > 
> > +can_propagate_from(fs_inst *inst, unsigned dispatch_width)
> > 
> >  {
> > 
> >     return (inst->opcode == BRW_OPCODE_MOV &&
> > 
> >             inst->dst.file == VGRF &&
> > 
> > @@ -804,7 +804,7 @@ can_propagate_from(fs_inst *inst)
> > 
> >              inst->src[0].file == UNIFORM ||
> > 
> >              inst->src[0].file == IMM) &&
> > 
> >             inst->src[0].type == inst->dst.type &&
> > 
> > -           !inst->is_partial_write());
> > 
> > +           !inst->is_partial_var_write(dispatch_width));
> > 
> >  }
> > 
> > 
> > 
> >  /* Walks a basic block and does copy propagation on it using the
> > acp
> > 
> > @@ -856,7 +856,7 @@ fs_visitor::opt_copy_propagation_local(void
> > *copy_prop_ctx, bblock_t *block,
> > 
> >        /* If this instruction's source could potentially be folded
> > into the
> > 
> >         * operand of another instruction, add it to the ACP.
> > 
> >         */
> > 
> > -      if (can_propagate_from(inst)) {
> > 
> > +      if (can_propagate_from(inst, dispatch_width)) {
> > 
> >           acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
> > 
> >           entry->dst = inst->dst;
> > 
> >           entry->src = inst->src[0];
> > 
> > diff --git a/src/intel/compiler/brw_fs_cse.cpp
> > b/src/intel/compiler/brw_fs_cse.cpp
> > 
> > index 6859733d58c..56813df2d2a 100644
> > 
> > --- a/src/intel/compiler/brw_fs_cse.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_cse.cpp
> > 
> > @@ -247,7 +247,8 @@ fs_visitor::opt_cse_local(bblock_t *block)
> > 
> >     int ip = block->start_ip;
> > 
> >     foreach_inst_in_block(fs_inst, inst, block) {
> > 
> >        /* Skip some cases. */
> > 
> > -      if (is_expression(this, inst) && !inst->is_partial_write()
> > &&
> > 
> > +      if (is_expression(this, inst) &&
> > 
> > +          !inst->is_partial_var_write(dispatch_width) &&
> > 
> >            ((inst->dst.file != ARF && inst->dst.file != FIXED_GRF)
> > ||
> > 
> >             inst->dst.is_null()))
> > 
> >        {
> > 
> > diff --git a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> > b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> > 
> > index eeb71dd2b92..f24fd0643b8 100644
> > 
> > --- a/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_dead_code_eliminate.cpp
> > 
> > @@ -110,7 +110,7 @@ fs_visitor::dead_code_eliminate()
> > 
> >           }
> > 
> > 
> > 
> >           if (inst->dst.file == VGRF) {
> > 
> > -            if (!inst->is_partial_write()) {
> > 
> > +            if (!inst->is_partial_reg_write()) {
> > 
> >                 int var = live_intervals->var_from_reg(inst->dst);
> > 
> >                 for (unsigned i = 0; i < regs_written(inst); i++) {
> > 
> >                    BITSET_CLEAR(live, var + i);
> > 
> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> > b/src/intel/compiler/brw_fs_live_variables.cpp
> > 
> > index 059f076fa51..30625aa586a 100644
> > 
> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> > 
> > @@ -84,7 +84,7 @@ fs_live_variables::setup_one_write(struct
> > block_data *bd, fs_inst *inst,
> > 
> >      * screens off previous updates of that variable (VGRF
> > channel).
> > 
> >      */
> > 
> >     if (inst->dst.file == VGRF) {
> > 
> > -      if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
> > 
> > +      if (!inst->is_partial_reg_write() && !BITSET_TEST(bd->use,
> > var))
> > 
> >           BITSET_SET(bd->def, var);
> > 
> > 
> > 
> >        BITSET_SET(bd->defout, var);
> > 
> > diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp
> > b/src/intel/compiler/brw_fs_reg_allocate.cpp
> > 
> > index 678afe6bab4..2228df30fde 100644
> > 
> > --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> > 
> > @@ -1026,7 +1026,7 @@ fs_visitor::spill_reg(unsigned spill_reg)
> > 
> >            * write, there should be no need for the unspill since
> > the
> > 
> >            * instruction will be overwriting the whole destination
> > in any case.
> > 
> >           */
> > 
> > -         if (inst->is_partial_write() ||
> > 
> > +         if (inst->is_partial_reg_write() ||
> > 
> >               (!inst->force_writemask_all && !per_channel))
> > 
> >              emit_unspill(ubld, spill_src, subset_spill_offset,
> > 
> >                           regs_written(inst));
> > 
> > diff --git a/src/intel/compiler/brw_fs_register_coalesce.cpp
> > b/src/intel/compiler/brw_fs_register_coalesce.cpp
> > 
> > index 4fe6773da54..27234292c30 100644
> > 
> > --- a/src/intel/compiler/brw_fs_register_coalesce.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_register_coalesce.cpp
> > 
> > @@ -70,7 +70,7 @@ is_coalesce_candidate(const fs_visitor *v, const
> > fs_inst *inst)
> > 
> >  {
> > 
> >     if ((inst->opcode != BRW_OPCODE_MOV &&
> > 
> >          inst->opcode != SHADER_OPCODE_LOAD_PAYLOAD) ||
> > 
> > -       inst->is_partial_write() ||
> > 
> > +       inst->is_partial_reg_write() ||
> > 
> >         inst->saturate ||
> > 
> >         inst->src[0].file != VGRF ||
> > 
> >         inst->src[0].negate ||
> > 
> > diff --git a/src/intel/compiler/brw_fs_saturate_propagation.cpp
> > b/src/intel/compiler/brw_fs_saturate_propagation.cpp
> > 
> > index d6cfa79a618..1e1461063ae 100644
> > 
> > --- a/src/intel/compiler/brw_fs_saturate_propagation.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_saturate_propagation.cpp
> > 
> > @@ -43,7 +43,8 @@
> > 
> >   */
> > 
> > 
> > 
> >  static bool
> > 
> > -opt_saturate_propagation_local(fs_visitor *v, bblock_t *block)
> > 
> > +opt_saturate_propagation_local(fs_visitor *v, bblock_t *block,
> > 
> > +                               unsigned dispatch_width)
> > 
> >  {
> > 
> >     bool progress = false;
> > 
> >     int ip = block->end_ip + 1;
> > 
> > @@ -66,7 +67,7 @@ opt_saturate_propagation_local(fs_visitor *v,
> > bblock_t *block)
> > 
> >        foreach_inst_in_block_reverse_starting_from(fs_inst,
> > scan_inst, inst) {
> > 
> >           if (regions_overlap(scan_inst->dst, scan_inst-
> > >size_written,
> > 
> >                               inst->src[0], inst->size_read(0))) {
> > 
> > -            if (scan_inst->is_partial_write() ||
> > 
> > +            if (scan_inst->is_partial_var_write(dispatch_width) ||
> > 
> >                  (scan_inst->dst.type != inst->dst.type &&
> > 
> >                   !scan_inst->can_change_types()))
> > 
> >                 break;
> > 
> > @@ -153,7 +154,7 @@ fs_visitor::opt_saturate_propagation()
> > 
> >     calculate_live_intervals();
> > 
> > 
> > 
> >     foreach_block (block, cfg) {
> > 
> > -      progress = opt_saturate_propagation_local(this, block) ||
> > progress;
> > 
> > +      progress = opt_saturate_propagation_local(this, block,
> > dispatch_width) || progress;
> > 
> >     }
> > 
> > 
> > 
> >     /* Live intervals are still valid. */
> > 
> > diff --git a/src/intel/compiler/brw_fs_sel_peephole.cpp
> > b/src/intel/compiler/brw_fs_sel_peephole.cpp
> > 
> > index 6395b409b7c..98d640a3bfe 100644
> > 
> > --- a/src/intel/compiler/brw_fs_sel_peephole.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_sel_peephole.cpp
> > 
> > @@ -167,8 +167,8 @@ fs_visitor::opt_peephole_sel()
> > 
> >               then_mov[i]->exec_size != else_mov[i]->exec_size ||
> > 
> >               then_mov[i]->group != else_mov[i]->group ||
> > 
> >               then_mov[i]->force_writemask_all != else_mov[i]-
> > >force_writemask_all ||
> > 
> > -             then_mov[i]->is_partial_write() ||
> > 
> > -             else_mov[i]->is_partial_write() ||
> > 
> > +             then_mov[i]->is_partial_var_write(dispatch_width) ||
> > 
> > +             else_mov[i]->is_partial_var_write(dispatch_width) ||
> > 
> >               then_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE
> > ||
> > 
> >               else_mov[i]->conditional_mod != BRW_CONDITIONAL_NONE)
> > {
> > 
> >              movs = i;
> > 
> > diff --git a/src/intel/compiler/brw_ir_fs.h
> > b/src/intel/compiler/brw_ir_fs.h
> > 
> > index ba4d6a95720..48b66ca5a65 100644
> > 
> > --- a/src/intel/compiler/brw_ir_fs.h
> > 
> > +++ b/src/intel/compiler/brw_ir_fs.h
> > 
> > @@ -349,7 +349,8 @@ public:
> > 
> > 
> > 
> >     bool equals(fs_inst *inst) const;
> > 
> >     bool is_send_from_grf() const;
> > 
> > -   bool is_partial_write() const;
> > 
> > +   bool is_partial_reg_write() const;
> > 
> > +   bool is_partial_var_write(unsigned dispatch_width) const;
> > 
> >     bool is_copy_payload(const brw::simple_allocator &grf_alloc)
> > const;
> > 
> >     unsigned components_read(unsigned i) const;
> > 
> >     unsigned size_read(int arg) const;
> >