[v2,52/53] intel/compiler: fix combine constants for Align16 with half-float prior to gen9

Submitted by Iago Toral Quiroga on Dec. 19, 2018, 11:51 a.m.

Details

Message ID 20181219115121.20815-53-itoral@igalia.com
State New
Headers show
Series "intel: VK_KHR_shader_float16_int8 implementation" ( rev: 3 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga Dec. 19, 2018, 11:51 a.m.
There is a hardware restriction where <0,1,0>:HF in Align16 doesn't replicate
a single 16-bit channel, but instead it replicates a full 32-bit channel.
---
 .../compiler/brw_fs_combine_constants.cpp     | 24 +++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp b/src/intel/compiler/brw_fs_combine_constants.cpp
index 24307e365ab..082aa536906 100644
--- a/src/intel/compiler/brw_fs_combine_constants.cpp
+++ b/src/intel/compiler/brw_fs_combine_constants.cpp
@@ -301,7 +301,26 @@  fs_visitor::opt_combine_constants()
        */
       exec_node *n = (imm->inst ? imm->inst :
                       imm->block->last_non_control_flow_inst()->next);
-      const fs_builder ibld = bld.at(imm->block, n).exec_all().group(1, 0);
+
+      /* Prior to gen9 we also have to deal with this restriction:
+       *
+       * "In Align16 mode, the channel selects and channel enables apply to a
+       *  pair of half-floats, because these parameters are defined for DWord
+       *  elements ONLY. This is applicable when both source and destination
+       *  are half-floats."
+       *
+       * This means that when we emit a 3-src instruction such as MAD or LRP,
+       * for which we use Align16, if we need to promote an HF constant to a
+       * register we need to be aware that the  <0,1,0>:HF region would still
+       * read 2 HF slots and not not replicate the single one like we want.
+       * We fix this by populating both HF slots with the constant we need to
+       * read.
+       */
+      const uint32_t width =
+         devinfo->gen < 9 &&
+         imm->type == BRW_REGISTER_TYPE_HF &&
+         (!imm->inst || imm->inst->is_3src(devinfo)) ? 2 : 1;
+      const fs_builder ibld = bld.at(imm->block, n).exec_all().group(width, 0);
 
       reg = retype(reg, imm->type);
       if (imm->type == BRW_REGISTER_TYPE_F) {
@@ -314,7 +333,8 @@  fs_visitor::opt_combine_constants()
       imm->subreg_offset = reg.offset;
 
       /* Keep offsets 32-bit aligned since we are mixing 32-bit and 16-bit
-       * constants into the same register
+       * constants into the same register (and we are writing 32-bit slots
+       * prior to gen9 for HF constants anyway).
        *
        * TODO: try to pack pairs of HF constants into each 32-bit slot
        */