[v4,34/40] intel/compiler: validate region restrictions for half-float conversions

Submitted by Iago Toral Quiroga on Feb. 12, 2019, 11:56 a.m.

Details

Message ID 20190212115607.21467-35-itoral@igalia.com
State New
Headers show
Series "intel: VK_KHR_shader_float16_int8 implementation" ( rev: 10 9 8 7 6 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga Feb. 12, 2019, 11:56 a.m.
---
 src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
 src/intel/compiler/test_eu_validate.cpp | 122 ++++++++++++++++++++++++
 2 files changed, 185 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c
index 000a05cb6ac..203641fecb9 100644
--- a/src/intel/compiler/brw_eu_validate.c
+++ b/src/intel/compiler/brw_eu_validate.c
@@ -531,7 +531,69 @@  general_restrictions_based_on_operand_types(const struct gen_device_info *devinf
        exec_type_size == 8 && dst_type_size == 4)
       dst_type_size = 8;
 
-   if (exec_type_size > dst_type_size) {
+   /* From the BDW+ PRM:
+    *
+    *    "There is no direct conversion from HF to DF or DF to HF.
+    *     There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
+    */
+   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
+   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
+            ((dst_type == BRW_REGISTER_TYPE_HF && type_sz(src0_type) == 8) ||
+             (dst_type_size == 8 && src0_type == BRW_REGISTER_TYPE_HF)),
+            "There are no direct conversion between 64-bit types and HF");
+
+   /* From the BDW+ PRM:
+    *
+    *   "Conversion between Integer and HF (Half Float) must be
+    *    DWord-aligned and strided by a DWord on the destination."
+    *
+    * But this seems to be expanded on CHV and SKL+ by:
+    *
+    *   "There is a relaxed alignment rule for word destinations. When
+    *    the destination type is word (UW, W, HF), destination data types
+    *    can be aligned to either the lowest word or the second lowest
+    *    word of the execution channel. This means the destination data
+    *    words can be either all in the even word locations or all in the
+    *    odd word locations."
+    *
+    * We do not implement the second rule as is though, since empirical testing
+    * shows inconsistencies:
+    *   - It suggests that packed 16-bit is not allowed, which is not true.
+    *   - It suggests that conversions from Q/DF to W (which need to be 64-bit
+    *     aligned on the destination) are not possible, which is not true.
+    *   - It suggests that conversions from 16-bit executions types to W need
+    *     to be 32-bit aligned, which doesn't seem to be necessary.
+    *
+    * So from this rule we only validate the implication that conversion from
+    * F to HF needs to be DWord aligned too (in BDW this is limited to
+    * conversions from integer types).
+    */
+   bool is_half_float_conversion =
+       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
+       dst_type != src0_type &&
+       (dst_type == BRW_REGISTER_TYPE_HF || src0_type == BRW_REGISTER_TYPE_HF);
+
+   if (is_half_float_conversion) {
+      assert(devinfo->gen >= 8);
+
+      if ((dst_type == BRW_REGISTER_TYPE_HF && brw_reg_type_is_integer(src0_type)) ||
+          (brw_reg_type_is_integer(dst_type) && src0_type == BRW_REGISTER_TYPE_HF)) {
+         ERROR_IF(dst_stride * dst_type_size != 4,
+                  "Conversions between integer and half-float must be strided "
+                  "by a DWord on the destination");
+
+         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo, inst);
+         ERROR_IF(subreg % 4 != 0,
+                  "Conversions between integer and half-float must be aligned "
+                  "to a DWord on the destination");
+      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
+                 dst_type == BRW_REGISTER_TYPE_HF) {
+         ERROR_IF(dst_stride != 2,
+                  "Conversions to HF must have either all words in even word "
+                  "locations or all words in odd word locations");
+      }
+
+   } else if (exec_type_size > dst_type_size) {
       if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst))) {
          ERROR_IF(dst_stride * dst_type_size != exec_type_size,
                   "Destination stride must be equal to the ratio of the sizes "
diff --git a/src/intel/compiler/test_eu_validate.cpp b/src/intel/compiler/test_eu_validate.cpp
index 73300b23122..1557b6d2452 100644
--- a/src/intel/compiler/test_eu_validate.cpp
+++ b/src/intel/compiler/test_eu_validate.cpp
@@ -848,6 +848,128 @@  TEST_P(validation_test, byte_destination_relaxed_alignment)
    }
 }
 
+TEST_P(validation_test, half_float_conversion)
+{
+   static const struct {
+      enum brw_reg_type dst_type;
+      enum brw_reg_type src_type;
+      unsigned dst_stride;
+      unsigned dst_subnr;
+      bool expected_result;
+   } inst[] = {
+#define INST(dst_type, src_type, dst_stride, dst_subnr, expected_result)  \
+      {                                                                   \
+         BRW_REGISTER_TYPE_##dst_type,                                    \
+         BRW_REGISTER_TYPE_##src_type,                                    \
+         BRW_HORIZONTAL_STRIDE_##dst_stride,                              \
+         dst_subnr,                                                       \
+         expected_result,                                                 \
+      }
+
+      /* MOV to half-float destination (F source handled later) */
+      INST(HF,  B, 1, 0, false),
+      INST(HF,  W, 1, 0, false),
+      INST(HF, HF, 1, 0, true),
+      INST(HF, HF, 1, 2, true),
+      INST(HF,  D, 1, 0, false),
+      INST(HF,  Q, 1, 0, false),
+      INST(HF, DF, 1, 0, false),
+      INST(HF,  B, 2, 0, true),
+      INST(HF,  B, 2, 2, false),
+      INST(HF,  W, 2, 0, true),
+      INST(HF,  W, 2, 2, false),
+      INST(HF, HF, 2, 0, true),
+      INST(HF, HF, 2, 2, true),
+      INST(HF,  D, 2, 0, true),
+      INST(HF,  D, 2, 2, false),
+      INST(HF,  Q, 2, 0, false),
+      INST(HF, DF, 2, 0, false),
+      INST(HF,  B, 4, 0, false),
+      INST(HF,  W, 4, 0, false),
+      INST(HF, HF, 4, 0, true),
+      INST(HF, HF, 4, 2, true),
+      INST(HF,  D, 4, 0, false),
+      INST(HF,  Q, 4, 0, false),
+      INST(HF, DF, 4, 0, false),
+
+      /* MOV from half-float source */
+      INST( B, HF, 1, 0, false),
+      INST( W, HF, 1, 0, false),
+      INST( D, HF, 1, 0, true),
+      INST( D, HF, 1, 4, true),
+      INST( F, HF, 1, 0, true),
+      INST( F, HF, 1, 4, true),
+      INST( Q, HF, 1, 0, false),
+      INST(DF, HF, 1, 0, false),
+      INST( B, HF, 2, 0, false),
+      INST( W, HF, 2, 0, true),
+      INST( W, HF, 2, 2, false),
+      INST( D, HF, 2, 0, false),
+      INST( F, HF, 2, 0, true),
+      INST( F, HF, 2, 2, true),
+      INST( B, HF, 4, 0, true),
+      INST( B, HF, 4, 1, false),
+      INST( W, HF, 4, 0, false),
+
+#undef INST
+   };
+
+   if (devinfo.gen < 8)
+      return;
+
+   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
+      if (!devinfo.has_64bit_types &&
+          (type_sz(inst[i].src_type) == 8 || type_sz(inst[i].dst_type) == 8)) {
+         continue;
+      }
+
+      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0, inst[i].src_type));
+
+      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
+
+      brw_inst_set_dst_hstride(&devinfo, last_inst, inst[i].dst_stride);
+      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst, inst[i].dst_subnr);
+
+      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
+         brw_inst_set_src0_vstride(&devinfo, last_inst, BRW_VERTICAL_STRIDE_4);
+         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_2);
+         brw_inst_set_src0_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_2);
+      } else {
+         brw_inst_set_src0_vstride(&devinfo, last_inst, BRW_VERTICAL_STRIDE_4);
+         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_4);
+         brw_inst_set_src0_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_1);
+      }
+
+      EXPECT_EQ(inst[i].expected_result, validate(p));
+
+      clear_instructions(p);
+   }
+
+   /* Conversion from F to HF has Dword destination alignment restriction
+    * on CHV and SKL+ only
+    */
+   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
+              retype(g0, BRW_REGISTER_TYPE_F));
+
+   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_1);
+
+   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
+      EXPECT_FALSE(validate(p));
+   } else {
+      assert(devinfo.gen == 8);
+      EXPECT_TRUE(validate(p));
+   }
+   clear_instructions(p);
+
+   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
+              retype(g0, BRW_REGISTER_TYPE_F));
+
+   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_2);
+
+   EXPECT_TRUE(validate(p));
+   clear_instructions(p);
+}
+
 TEST_P(validation_test, vector_immediate_destination_alignment)
 {
    static const struct {

Comments

On Tue, Feb 12, 2019 at 11:53 AM Iago Toral Quiroga <itoral@igalia.com>
wrote:

> ---
>  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
>  src/intel/compiler/test_eu_validate.cpp | 122 ++++++++++++++++++++++++
>  2 files changed, 185 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c
> b/src/intel/compiler/brw_eu_validate.c
> index 000a05cb6ac..203641fecb9 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -531,7 +531,69 @@ general_restrictions_based_on_operand_types(const
> struct gen_device_info *devinf
>         exec_type_size == 8 && dst_type_size == 4)
>        dst_type_size = 8;
>
> -   if (exec_type_size > dst_type_size) {
> +   /* From the BDW+ PRM:
> +    *
> +    *    "There is no direct conversion from HF to DF or DF to HF.
> +    *     There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
> +    */
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> +            ((dst_type == BRW_REGISTER_TYPE_HF && type_sz(src0_type) ==
> 8) ||
> +             (dst_type_size == 8 && src0_type == BRW_REGISTER_TYPE_HF)),
> +            "There are no direct conversion between 64-bit types and HF");
> +
> +   /* From the BDW+ PRM:
> +    *
> +    *   "Conversion between Integer and HF (Half Float) must be
> +    *    DWord-aligned and strided by a DWord on the destination."
> +    *
> +    * But this seems to be expanded on CHV and SKL+ by:
> +    *
> +    *   "There is a relaxed alignment rule for word destinations. When
> +    *    the destination type is word (UW, W, HF), destination data types
> +    *    can be aligned to either the lowest word or the second lowest
> +    *    word of the execution channel. This means the destination data
> +    *    words can be either all in the even word locations or all in the
> +    *    odd word locations."
> +    *
> +    * We do not implement the second rule as is though, since empirical
> testing
> +    * shows inconsistencies:
> +    *   - It suggests that packed 16-bit is not allowed, which is not
> true.
> +    *   - It suggests that conversions from Q/DF to W (which need to be
> 64-bit
> +    *     aligned on the destination) are not possible, which is not true.
> +    *   - It suggests that conversions from 16-bit executions types to W
> need
> +    *     to be 32-bit aligned, which doesn't seem to be necessary.
> +    *
> +    * So from this rule we only validate the implication that conversion
> from
> +    * F to HF needs to be DWord aligned too (in BDW this is limited to
> +    * conversions from integer types).
> +    */
> +   bool is_half_float_conversion =
> +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> +       dst_type != src0_type &&
> +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> BRW_REGISTER_TYPE_HF);
> +
> +   if (is_half_float_conversion) {
> +      assert(devinfo->gen >= 8);
> +
> +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> brw_reg_type_is_integer(src0_type)) ||
> +          (brw_reg_type_is_integer(dst_type) && src0_type ==
> BRW_REGISTER_TYPE_HF)) {
> +         ERROR_IF(dst_stride * dst_type_size != 4,
> +                  "Conversions between integer and half-float must be
> strided "
> +                  "by a DWord on the destination");
>

Does this mean stride must be 4B or does it mean a multiple of 4B?


> +
> +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo, inst);
> +         ERROR_IF(subreg % 4 != 0,
> +                  "Conversions between integer and half-float must be
> aligned "
> +                  "to a DWord on the destination");
> +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> +                 dst_type == BRW_REGISTER_TYPE_HF) {
> +         ERROR_IF(dst_stride != 2,
>

Should this be dst_stride != 2 or dst_stride == 1?  If dst_stride were, say
4, that would place them all in even or all in odd locations.  It's only if
dst_stride == 1 that you end up with both even and odd.


> +                  "Conversions to HF must have either all words in even
> word "
> +                  "locations or all words in odd word locations");
> +      }
> +
> +   } else if (exec_type_size > dst_type_size) {
>        if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst))) {
>           ERROR_IF(dst_stride * dst_type_size != exec_type_size,
>                    "Destination stride must be equal to the ratio of the
> sizes "
> diff --git a/src/intel/compiler/test_eu_validate.cpp
> b/src/intel/compiler/test_eu_validate.cpp
> index 73300b23122..1557b6d2452 100644
> --- a/src/intel/compiler/test_eu_validate.cpp
> +++ b/src/intel/compiler/test_eu_validate.cpp
> @@ -848,6 +848,128 @@ TEST_P(validation_test,
> byte_destination_relaxed_alignment)
>     }
>  }
>
> +TEST_P(validation_test, half_float_conversion)
> +{
> +   static const struct {
> +      enum brw_reg_type dst_type;
> +      enum brw_reg_type src_type;
> +      unsigned dst_stride;
> +      unsigned dst_subnr;
> +      bool expected_result;
> +   } inst[] = {
> +#define INST(dst_type, src_type, dst_stride, dst_subnr, expected_result)
> \
> +      {
>  \
> +         BRW_REGISTER_TYPE_##dst_type,
> \
> +         BRW_REGISTER_TYPE_##src_type,
> \
> +         BRW_HORIZONTAL_STRIDE_##dst_stride,
> \
> +         dst_subnr,
>  \
> +         expected_result,
>  \
> +      }
> +
> +      /* MOV to half-float destination (F source handled later) */
> +      INST(HF,  B, 1, 0, false),
> +      INST(HF,  W, 1, 0, false),
> +      INST(HF, HF, 1, 0, true),
> +      INST(HF, HF, 1, 2, true),
> +      INST(HF,  D, 1, 0, false),
> +      INST(HF,  Q, 1, 0, false),
> +      INST(HF, DF, 1, 0, false),
> +      INST(HF,  B, 2, 0, true),
> +      INST(HF,  B, 2, 2, false),
> +      INST(HF,  W, 2, 0, true),
> +      INST(HF,  W, 2, 2, false),
> +      INST(HF, HF, 2, 0, true),
> +      INST(HF, HF, 2, 2, true),
> +      INST(HF,  D, 2, 0, true),
> +      INST(HF,  D, 2, 2, false),
> +      INST(HF,  Q, 2, 0, false),
> +      INST(HF, DF, 2, 0, false),
> +      INST(HF,  B, 4, 0, false),
> +      INST(HF,  W, 4, 0, false),
> +      INST(HF, HF, 4, 0, true),
> +      INST(HF, HF, 4, 2, true),
> +      INST(HF,  D, 4, 0, false),
> +      INST(HF,  Q, 4, 0, false),
> +      INST(HF, DF, 4, 0, false),
> +
> +      /* MOV from half-float source */
> +      INST( B, HF, 1, 0, false),
> +      INST( W, HF, 1, 0, false),
> +      INST( D, HF, 1, 0, true),
> +      INST( D, HF, 1, 4, true),
> +      INST( F, HF, 1, 0, true),
> +      INST( F, HF, 1, 4, true),
> +      INST( Q, HF, 1, 0, false),
> +      INST(DF, HF, 1, 0, false),
> +      INST( B, HF, 2, 0, false),
> +      INST( W, HF, 2, 0, true),
> +      INST( W, HF, 2, 2, false),
> +      INST( D, HF, 2, 0, false),
> +      INST( F, HF, 2, 0, true),
> +      INST( F, HF, 2, 2, true),
> +      INST( B, HF, 4, 0, true),
> +      INST( B, HF, 4, 1, false),
> +      INST( W, HF, 4, 0, false),
> +
> +#undef INST
> +   };
> +
> +   if (devinfo.gen < 8)
> +      return;
> +
> +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> +      if (!devinfo.has_64bit_types &&
> +          (type_sz(inst[i].src_type) == 8 || type_sz(inst[i].dst_type) ==
> 8)) {
> +         continue;
> +      }
> +
> +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> inst[i].src_type));
> +
> +      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
> +
> +      brw_inst_set_dst_hstride(&devinfo, last_inst, inst[i].dst_stride);
> +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
> inst[i].dst_subnr);
> +
> +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> BRW_VERTICAL_STRIDE_4);
> +         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_2);
> +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> BRW_HORIZONTAL_STRIDE_2);
> +      } else {
> +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> BRW_VERTICAL_STRIDE_4);
> +         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_4);
> +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> BRW_HORIZONTAL_STRIDE_1);
> +      }
> +
> +      EXPECT_EQ(inst[i].expected_result, validate(p));
> +
> +      clear_instructions(p);
> +   }
> +
> +   /* Conversion from F to HF has Dword destination alignment restriction
> +    * on CHV and SKL+ only
> +    */
> +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> +              retype(g0, BRW_REGISTER_TYPE_F));
> +
> +   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_1);
> +
> +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> +      EXPECT_FALSE(validate(p));
> +   } else {
> +      assert(devinfo.gen == 8);
> +      EXPECT_TRUE(validate(p));
> +   }
> +   clear_instructions(p);
> +
> +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> +              retype(g0, BRW_REGISTER_TYPE_F));
> +
> +   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_2);
> +
> +   EXPECT_TRUE(validate(p));
> +   clear_instructions(p);
> +}
> +
>  TEST_P(validation_test, vector_immediate_destination_alignment)
>  {
>     static const struct {
> --
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Sat, 2019-02-16 at 09:40 -0600, Jason Ekstrand wrote:
> On Tue, Feb 12, 2019 at 11:53 AM Iago Toral Quiroga <
> itoral@igalia.com> wrote:
> > ---
> > 
> >  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
> > 
> >  src/intel/compiler/test_eu_validate.cpp | 122
> > ++++++++++++++++++++++++
> > 
> >  2 files changed, 185 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_eu_validate.c
> > b/src/intel/compiler/brw_eu_validate.c
> > 
> > index 000a05cb6ac..203641fecb9 100644
> > 
> > --- a/src/intel/compiler/brw_eu_validate.c
> > 
> > +++ b/src/intel/compiler/brw_eu_validate.c
> > 
> > @@ -531,7 +531,69 @@
> > general_restrictions_based_on_operand_types(const struct
> > gen_device_info *devinf
> > 
> >         exec_type_size == 8 && dst_type_size == 4)
> > 
> >        dst_type_size = 8;
> > 
> > 
> > 
> > -   if (exec_type_size > dst_type_size) {
> > 
> > +   /* From the BDW+ PRM:
> > 
> > +    *
> > 
> > +    *    "There is no direct conversion from HF to DF or DF to HF.
> > 
> > +    *     There is no direct conversion from HF to Q/UQ or Q/UQ to
> > HF."
> > 
> > +    */
> > 
> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
> > inst);
> > 
> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
> > type_sz(src0_type) == 8) ||
> > 
> > +             (dst_type_size == 8 && src0_type ==
> > BRW_REGISTER_TYPE_HF)),
> > 
> > +            "There are no direct conversion between 64-bit types
> > and HF");
> > 
> > +
> > 
> > +   /* From the BDW+ PRM:
> > 
> > +    *
> > 
> > +    *   "Conversion between Integer and HF (Half Float) must be
> > 
> > +    *    DWord-aligned and strided by a DWord on the destination."
> > 
> > +    *
> > 
> > +    * But this seems to be expanded on CHV and SKL+ by:
> > 
> > +    *
> > 
> > +    *   "There is a relaxed alignment rule for word destinations.
> > When
> > 
> > +    *    the destination type is word (UW, W, HF), destination
> > data types
> > 
> > +    *    can be aligned to either the lowest word or the second
> > lowest
> > 
> > +    *    word of the execution channel. This means the destination
> > data
> > 
> > +    *    words can be either all in the even word locations or all
> > in the
> > 
> > +    *    odd word locations."
> > 
> > +    *
> > 
> > +    * We do not implement the second rule as is though, since
> > empirical testing
> > 
> > +    * shows inconsistencies:
> > 
> > +    *   - It suggests that packed 16-bit is not allowed, which is
> > not true.
> > 
> > +    *   - It suggests that conversions from Q/DF to W (which need
> > to be 64-bit
> > 
> > +    *     aligned on the destination) are not possible, which is
> > not true.
> > 
> > +    *   - It suggests that conversions from 16-bit executions
> > types to W need
> > 
> > +    *     to be 32-bit aligned, which doesn't seem to be
> > necessary.
> > 
> > +    *
> > 
> > +    * So from this rule we only validate the implication that
> > conversion from
> > 
> > +    * F to HF needs to be DWord aligned too (in BDW this is
> > limited to
> > 
> > +    * conversions from integer types).
> > 
> > +    */
> > 
> > +   bool is_half_float_conversion =
> > 
> > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > 
> > +       dst_type != src0_type &&
> > 
> > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > BRW_REGISTER_TYPE_HF);
> > 
> > +
> > 
> > +   if (is_half_float_conversion) {
> > 
> > +      assert(devinfo->gen >= 8);
> > 
> > +
> > 
> > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > brw_reg_type_is_integer(src0_type)) ||
> > 
> > +          (brw_reg_type_is_integer(dst_type) && src0_type ==
> > BRW_REGISTER_TYPE_HF)) {
> > 
> > +         ERROR_IF(dst_stride * dst_type_size != 4,
> > 
> > +                  "Conversions between integer and half-float must
> > be strided "
> > 
> > +                  "by a DWord on the destination");
> 
> Does this mean stride must be 4B or does it mean a multiple of 4B?
>  
> > +
> > 
> > +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
> > inst);
> > 
> > +         ERROR_IF(subreg % 4 != 0,
> > 
> > +                  "Conversions between integer and half-float must
> > be aligned "
> > 
> > +                  "to a DWord on the destination");
> > 
> > +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> > 
> > +                 dst_type == BRW_REGISTER_TYPE_HF) {
> > 
> > +         ERROR_IF(dst_stride != 2,
> 
> Should this be dst_stride != 2 or dst_stride == 1?  If dst_stride
> were, say 4, that would place them all in even or all in odd
> locations.  It's only if dst_stride == 1 that you end up with both
> even and odd.

I think this needs to be exactly a DWord for both the stride and the
alignment. When Curro explained this he made the case that what is
probably happening under the hood is that there is a promotion of the
exec type to 32-bit, and then the following general restriction
applies:
"When the Execution Data Type is wider than the destination data type,
the destination mustbe aligned as required by the wider execution data
type and specify a HorzStride equal tothe ratio in sizes of the two
data types. For example, a mov with a D source and B destinationmust
use a 4-byte aligned destination and a Dst.HorzStride of 4"
Which is described in the same terms (requiring exact stride and
alignment to match the execution type).
> > +                  "Conversions to HF must have either all words in
> > even word "
> > 
> > +                  "locations or all words in odd word locations");
> > 
> > +      }
> > 
> > +
> > 
> > +   } else if (exec_type_size > dst_type_size) {
> > 
> >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst)))
> > {
> > 
> >           ERROR_IF(dst_stride * dst_type_size != exec_type_size,
> > 
> >                    "Destination stride must be equal to the ratio
> > of the sizes "
> > 
> > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > b/src/intel/compiler/test_eu_validate.cpp
> > 
> > index 73300b23122..1557b6d2452 100644
> > 
> > --- a/src/intel/compiler/test_eu_validate.cpp
> > 
> > +++ b/src/intel/compiler/test_eu_validate.cpp
> > 
> > @@ -848,6 +848,128 @@ TEST_P(validation_test,
> > byte_destination_relaxed_alignment)
> > 
> >     }
> > 
> >  }
> > 
> > 
> > 
> > +TEST_P(validation_test, half_float_conversion)
> > 
> > +{
> > 
> > +   static const struct {
> > 
> > +      enum brw_reg_type dst_type;
> > 
> > +      enum brw_reg_type src_type;
> > 
> > +      unsigned dst_stride;
> > 
> > +      unsigned dst_subnr;
> > 
> > +      bool expected_result;
> > 
> > +   } inst[] = {
> > 
> > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
> > expected_result)  \
> > 
> > +      {                                                           
> >        \
> > 
> > +         BRW_REGISTER_TYPE_##dst_type,                           
> >         \
> > 
> > +         BRW_REGISTER_TYPE_##src_type,                           
> >         \
> > 
> > +         BRW_HORIZONTAL_STRIDE_##dst_stride,                     
> >         \
> > 
> > +         dst_subnr,                                               
> >        \
> > 
> > +         expected_result,                                         
> >        \
> > 
> > +      }
> > 
> > +
> > 
> > +      /* MOV to half-float destination (F source handled later) */
> > 
> > +      INST(HF,  B, 1, 0, false),
> > 
> > +      INST(HF,  W, 1, 0, false),
> > 
> > +      INST(HF, HF, 1, 0, true),
> > 
> > +      INST(HF, HF, 1, 2, true),
> > 
> > +      INST(HF,  D, 1, 0, false),
> > 
> > +      INST(HF,  Q, 1, 0, false),
> > 
> > +      INST(HF, DF, 1, 0, false),
> > 
> > +      INST(HF,  B, 2, 0, true),
> > 
> > +      INST(HF,  B, 2, 2, false),
> > 
> > +      INST(HF,  W, 2, 0, true),
> > 
> > +      INST(HF,  W, 2, 2, false),
> > 
> > +      INST(HF, HF, 2, 0, true),
> > 
> > +      INST(HF, HF, 2, 2, true),
> > 
> > +      INST(HF,  D, 2, 0, true),
> > 
> > +      INST(HF,  D, 2, 2, false),
> > 
> > +      INST(HF,  Q, 2, 0, false),
> > 
> > +      INST(HF, DF, 2, 0, false),
> > 
> > +      INST(HF,  B, 4, 0, false),
> > 
> > +      INST(HF,  W, 4, 0, false),
> > 
> > +      INST(HF, HF, 4, 0, true),
> > 
> > +      INST(HF, HF, 4, 2, true),
> > 
> > +      INST(HF,  D, 4, 0, false),
> > 
> > +      INST(HF,  Q, 4, 0, false),
> > 
> > +      INST(HF, DF, 4, 0, false),
> > 
> > +
> > 
> > +      /* MOV from half-float source */
> > 
> > +      INST( B, HF, 1, 0, false),
> > 
> > +      INST( W, HF, 1, 0, false),
> > 
> > +      INST( D, HF, 1, 0, true),
> > 
> > +      INST( D, HF, 1, 4, true),
> > 
> > +      INST( F, HF, 1, 0, true),
> > 
> > +      INST( F, HF, 1, 4, true),
> > 
> > +      INST( Q, HF, 1, 0, false),
> > 
> > +      INST(DF, HF, 1, 0, false),
> > 
> > +      INST( B, HF, 2, 0, false),
> > 
> > +      INST( W, HF, 2, 0, true),
> > 
> > +      INST( W, HF, 2, 2, false),
> > 
> > +      INST( D, HF, 2, 0, false),
> > 
> > +      INST( F, HF, 2, 0, true),
> > 
> > +      INST( F, HF, 2, 2, true),
> > 
> > +      INST( B, HF, 4, 0, true),
> > 
> > +      INST( B, HF, 4, 1, false),
> > 
> > +      INST( W, HF, 4, 0, false),
> > 
> > +
> > 
> > +#undef INST
> > 
> > +   };
> > 
> > +
> > 
> > +   if (devinfo.gen < 8)
> > 
> > +      return;
> > 
> > +
> > 
> > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> > 
> > +      if (!devinfo.has_64bit_types &&
> > 
> > +          (type_sz(inst[i].src_type) == 8 ||
> > type_sz(inst[i].dst_type) == 8)) {
> > 
> > +         continue;
> > 
> > +      }
> > 
> > +
> > 
> > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> > inst[i].src_type));
> > 
> > +
> > 
> > +      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
> > 
> > +
> > 
> > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
> > inst[i].dst_stride);
> > 
> > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
> > inst[i].dst_subnr);
> > 
> > +
> > 
> > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> > 
> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > BRW_VERTICAL_STRIDE_4);
> > 
> > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > BRW_WIDTH_2);
> > 
> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_2);
> > 
> > +      } else {
> > 
> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > BRW_VERTICAL_STRIDE_4);
> > 
> > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > BRW_WIDTH_4);
> > 
> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_1);
> > 
> > +      }
> > 
> > +
> > 
> > +      EXPECT_EQ(inst[i].expected_result, validate(p));
> > 
> > +
> > 
> > +      clear_instructions(p);
> > 
> > +   }
> > 
> > +
> > 
> > +   /* Conversion from F to HF has Dword destination alignment
> > restriction
> > 
> > +    * on CHV and SKL+ only
> > 
> > +    */
> > 
> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > 
> > +              retype(g0, BRW_REGISTER_TYPE_F));
> > 
> > +
> > 
> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_1);
> > 
> > +
> > 
> > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> > 
> > +      EXPECT_FALSE(validate(p));
> > 
> > +   } else {
> > 
> > +      assert(devinfo.gen == 8);
> > 
> > +      EXPECT_TRUE(validate(p));
> > 
> > +   }
> > 
> > +   clear_instructions(p);
> > 
> > +
> > 
> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > 
> > +              retype(g0, BRW_REGISTER_TYPE_F));
> > 
> > +
> > 
> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_2);
> > 
> > +
> > 
> > +   EXPECT_TRUE(validate(p));
> > 
> > +   clear_instructions(p);
> > 
> > +}
> > 
> > +
> > 
> >  TEST_P(validation_test, vector_immediate_destination_alignment)
> > 
> >  {
> > 
> >     static const struct {
> >
Iago Toral Quiroga <itoral@igalia.com> writes:

> ---
>  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
>  src/intel/compiler/test_eu_validate.cpp | 122 ++++++++++++++++++++++++
>  2 files changed, 185 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c b/src/intel/compiler/brw_eu_validate.c
> index 000a05cb6ac..203641fecb9 100644
> --- a/src/intel/compiler/brw_eu_validate.c
> +++ b/src/intel/compiler/brw_eu_validate.c
> @@ -531,7 +531,69 @@ general_restrictions_based_on_operand_types(const struct gen_device_info *devinf
>         exec_type_size == 8 && dst_type_size == 4)
>        dst_type_size = 8;
>  
> -   if (exec_type_size > dst_type_size) {
> +   /* From the BDW+ PRM:
> +    *
> +    *    "There is no direct conversion from HF to DF or DF to HF.
> +    *     There is no direct conversion from HF to Q/UQ or Q/UQ to HF."
> +    */
> +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo, inst);
> +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&

Why is only the MOV instruction handled here and below?  Aren't other
instructions able to do implicit conversions?  Probably means you need
to deal with two sources rather than one.

> +            ((dst_type == BRW_REGISTER_TYPE_HF && type_sz(src0_type) == 8) ||
> +             (dst_type_size == 8 && src0_type == BRW_REGISTER_TYPE_HF)),
> +            "There are no direct conversion between 64-bit types and HF");
> +
> +   /* From the BDW+ PRM:
> +    *
> +    *   "Conversion between Integer and HF (Half Float) must be
> +    *    DWord-aligned and strided by a DWord on the destination."
> +    *
> +    * But this seems to be expanded on CHV and SKL+ by:
> +    *
> +    *   "There is a relaxed alignment rule for word destinations. When
> +    *    the destination type is word (UW, W, HF), destination data types
> +    *    can be aligned to either the lowest word or the second lowest
> +    *    word of the execution channel. This means the destination data
> +    *    words can be either all in the even word locations or all in the
> +    *    odd word locations."
> +    *
> +    * We do not implement the second rule as is though, since empirical testing
> +    * shows inconsistencies:
> +    *   - It suggests that packed 16-bit is not allowed, which is not true.
> +    *   - It suggests that conversions from Q/DF to W (which need to be 64-bit
> +    *     aligned on the destination) are not possible, which is not true.
> +    *   - It suggests that conversions from 16-bit executions types to W need
> +    *     to be 32-bit aligned, which doesn't seem to be necessary.
> +    *
> +    * So from this rule we only validate the implication that conversion from
> +    * F to HF needs to be DWord aligned too (in BDW this is limited to
> +    * conversions from integer types).
> +    */
> +   bool is_half_float_conversion =
> +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> +       dst_type != src0_type &&
> +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type == BRW_REGISTER_TYPE_HF);
> +
> +   if (is_half_float_conversion) {
> +      assert(devinfo->gen >= 8);
> +
> +      if ((dst_type == BRW_REGISTER_TYPE_HF && brw_reg_type_is_integer(src0_type)) ||
> +          (brw_reg_type_is_integer(dst_type) && src0_type == BRW_REGISTER_TYPE_HF)) {
> +         ERROR_IF(dst_stride * dst_type_size != 4,
> +                  "Conversions between integer and half-float must be strided "
> +                  "by a DWord on the destination");
> +
> +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo, inst);
> +         ERROR_IF(subreg % 4 != 0,
> +                  "Conversions between integer and half-float must be aligned "
> +                  "to a DWord on the destination");
> +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> +                 dst_type == BRW_REGISTER_TYPE_HF) {
> +         ERROR_IF(dst_stride != 2,
> +                  "Conversions to HF must have either all words in even word "
> +                  "locations or all words in odd word locations");
> +      }
> +
> +   } else if (exec_type_size > dst_type_size) {
>        if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst))) {
>           ERROR_IF(dst_stride * dst_type_size != exec_type_size,
>                    "Destination stride must be equal to the ratio of the sizes "
> diff --git a/src/intel/compiler/test_eu_validate.cpp b/src/intel/compiler/test_eu_validate.cpp
> index 73300b23122..1557b6d2452 100644
> --- a/src/intel/compiler/test_eu_validate.cpp
> +++ b/src/intel/compiler/test_eu_validate.cpp
> @@ -848,6 +848,128 @@ TEST_P(validation_test, byte_destination_relaxed_alignment)
>     }
>  }
>  
> +TEST_P(validation_test, half_float_conversion)
> +{
> +   static const struct {
> +      enum brw_reg_type dst_type;
> +      enum brw_reg_type src_type;
> +      unsigned dst_stride;
> +      unsigned dst_subnr;
> +      bool expected_result;
> +   } inst[] = {
> +#define INST(dst_type, src_type, dst_stride, dst_subnr, expected_result)  \
> +      {                                                                   \
> +         BRW_REGISTER_TYPE_##dst_type,                                    \
> +         BRW_REGISTER_TYPE_##src_type,                                    \
> +         BRW_HORIZONTAL_STRIDE_##dst_stride,                              \
> +         dst_subnr,                                                       \
> +         expected_result,                                                 \
> +      }
> +
> +      /* MOV to half-float destination (F source handled later) */
> +      INST(HF,  B, 1, 0, false),
> +      INST(HF,  W, 1, 0, false),
> +      INST(HF, HF, 1, 0, true),
> +      INST(HF, HF, 1, 2, true),
> +      INST(HF,  D, 1, 0, false),
> +      INST(HF,  Q, 1, 0, false),
> +      INST(HF, DF, 1, 0, false),
> +      INST(HF,  B, 2, 0, true),
> +      INST(HF,  B, 2, 2, false),
> +      INST(HF,  W, 2, 0, true),
> +      INST(HF,  W, 2, 2, false),
> +      INST(HF, HF, 2, 0, true),
> +      INST(HF, HF, 2, 2, true),
> +      INST(HF,  D, 2, 0, true),
> +      INST(HF,  D, 2, 2, false),
> +      INST(HF,  Q, 2, 0, false),
> +      INST(HF, DF, 2, 0, false),
> +      INST(HF,  B, 4, 0, false),
> +      INST(HF,  W, 4, 0, false),
> +      INST(HF, HF, 4, 0, true),
> +      INST(HF, HF, 4, 2, true),
> +      INST(HF,  D, 4, 0, false),
> +      INST(HF,  Q, 4, 0, false),
> +      INST(HF, DF, 4, 0, false),
> +
> +      /* MOV from half-float source */
> +      INST( B, HF, 1, 0, false),
> +      INST( W, HF, 1, 0, false),
> +      INST( D, HF, 1, 0, true),
> +      INST( D, HF, 1, 4, true),
> +      INST( F, HF, 1, 0, true),
> +      INST( F, HF, 1, 4, true),
> +      INST( Q, HF, 1, 0, false),
> +      INST(DF, HF, 1, 0, false),
> +      INST( B, HF, 2, 0, false),
> +      INST( W, HF, 2, 0, true),
> +      INST( W, HF, 2, 2, false),
> +      INST( D, HF, 2, 0, false),
> +      INST( F, HF, 2, 0, true),
> +      INST( F, HF, 2, 2, true),
> +      INST( B, HF, 4, 0, true),
> +      INST( B, HF, 4, 1, false),
> +      INST( W, HF, 4, 0, false),
> +
> +#undef INST
> +   };
> +
> +   if (devinfo.gen < 8)
> +      return;
> +
> +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> +      if (!devinfo.has_64bit_types &&
> +          (type_sz(inst[i].src_type) == 8 || type_sz(inst[i].dst_type) == 8)) {
> +         continue;
> +      }
> +
> +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0, inst[i].src_type));
> +
> +      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
> +
> +      brw_inst_set_dst_hstride(&devinfo, last_inst, inst[i].dst_stride);
> +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst, inst[i].dst_subnr);
> +
> +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> +         brw_inst_set_src0_vstride(&devinfo, last_inst, BRW_VERTICAL_STRIDE_4);
> +         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_2);
> +         brw_inst_set_src0_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_2);
> +      } else {
> +         brw_inst_set_src0_vstride(&devinfo, last_inst, BRW_VERTICAL_STRIDE_4);
> +         brw_inst_set_src0_width(&devinfo, last_inst, BRW_WIDTH_4);
> +         brw_inst_set_src0_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_1);
> +      }
> +
> +      EXPECT_EQ(inst[i].expected_result, validate(p));
> +
> +      clear_instructions(p);
> +   }
> +
> +   /* Conversion from F to HF has Dword destination alignment restriction
> +    * on CHV and SKL+ only
> +    */
> +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> +              retype(g0, BRW_REGISTER_TYPE_F));
> +
> +   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_1);
> +
> +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> +      EXPECT_FALSE(validate(p));
> +   } else {
> +      assert(devinfo.gen == 8);
> +      EXPECT_TRUE(validate(p));
> +   }
> +   clear_instructions(p);
> +
> +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> +              retype(g0, BRW_REGISTER_TYPE_F));
> +
> +   brw_inst_set_dst_hstride(&devinfo, last_inst, BRW_HORIZONTAL_STRIDE_2);
> +
> +   EXPECT_TRUE(validate(p));
> +   clear_instructions(p);
> +}
> +
>  TEST_P(validation_test, vector_immediate_destination_alignment)
>  {
>     static const struct {
> -- 
> 2.17.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
> Iago Toral Quiroga <itoral@igalia.com> writes:
> 
> > ---
> >  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
> >  src/intel/compiler/test_eu_validate.cpp | 122
> > ++++++++++++++++++++++++
> >  2 files changed, 185 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/compiler/brw_eu_validate.c
> > b/src/intel/compiler/brw_eu_validate.c
> > index 000a05cb6ac..203641fecb9 100644
> > --- a/src/intel/compiler/brw_eu_validate.c
> > +++ b/src/intel/compiler/brw_eu_validate.c
> > @@ -531,7 +531,69 @@
> > general_restrictions_based_on_operand_types(const struct
> > gen_device_info *devinf
> >         exec_type_size == 8 && dst_type_size == 4)
> >        dst_type_size = 8;
> >  
> > -   if (exec_type_size > dst_type_size) {
> > +   /* From the BDW+ PRM:
> > +    *
> > +    *    "There is no direct conversion from HF to DF or DF to HF.
> > +    *     There is no direct conversion from HF to Q/UQ or Q/UQ to
> > HF."
> > +    */
> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
> > inst);
> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> 
> Why is only the MOV instruction handled here and below?  Aren't other
> instructions able to do implicit conversions?  Probably means you
> need
> to deal with two sources rather than one.

This comes from the programming notes of the MOV instruction (Volume
2a, Command Reference - Instructions - MOV), so it is described
specifically for the MOV instruction. I should probably have made this
clear in the comment.

> > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
> > type_sz(src0_type) == 8) ||
> > +             (dst_type_size == 8 && src0_type ==
> > BRW_REGISTER_TYPE_HF)),
> > +            "There are no direct conversion between 64-bit types
> > and HF");
> > +
> > +   /* From the BDW+ PRM:
> > +    *
> > +    *   "Conversion between Integer and HF (Half Float) must be
> > +    *    DWord-aligned and strided by a DWord on the destination."
> > +    *
> > +    * But this seems to be expanded on CHV and SKL+ by:
> > +    *
> > +    *   "There is a relaxed alignment rule for word destinations.
> > When
> > +    *    the destination type is word (UW, W, HF), destination
> > data types
> > +    *    can be aligned to either the lowest word or the second
> > lowest
> > +    *    word of the execution channel. This means the destination
> > data
> > +    *    words can be either all in the even word locations or all
> > in the
> > +    *    odd word locations."
> > +    *
> > +    * We do not implement the second rule as is though, since
> > empirical testing
> > +    * shows inconsistencies:
> > +    *   - It suggests that packed 16-bit is not allowed, which is
> > not true.
> > +    *   - It suggests that conversions from Q/DF to W (which need
> > to be 64-bit
> > +    *     aligned on the destination) are not possible, which is
> > not true.
> > +    *   - It suggests that conversions from 16-bit executions
> > types to W need
> > +    *     to be 32-bit aligned, which doesn't seem to be
> > necessary.
> > +    *
> > +    * So from this rule we only validate the implication that
> > conversion from
> > +    * F to HF needs to be DWord aligned too (in BDW this is
> > limited to
> > +    * conversions from integer types).
> > +    */
> > +   bool is_half_float_conversion =
> > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > +       dst_type != src0_type &&
> > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > BRW_REGISTER_TYPE_HF);
> > +
> > +   if (is_half_float_conversion) {
> > +      assert(devinfo->gen >= 8);
> > +
> > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > brw_reg_type_is_integer(src0_type)) ||
> > +          (brw_reg_type_is_integer(dst_type) && src0_type ==
> > BRW_REGISTER_TYPE_HF)) {
> > +         ERROR_IF(dst_stride * dst_type_size != 4,
> > +                  "Conversions between integer and half-float must
> > be strided "
> > +                  "by a DWord on the destination");
> > +
> > +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
> > inst);
> > +         ERROR_IF(subreg % 4 != 0,
> > +                  "Conversions between integer and half-float must
> > be aligned "
> > +                  "to a DWord on the destination");
> > +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
> > +                 dst_type == BRW_REGISTER_TYPE_HF) {
> > +         ERROR_IF(dst_stride != 2,
> > +                  "Conversions to HF must have either all words in
> > even word "
> > +                  "locations or all words in odd word locations");
> > +      }
> > +
> > +   } else if (exec_type_size > dst_type_size) {
> >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst)))
> > {
> >           ERROR_IF(dst_stride * dst_type_size != exec_type_size,
> >                    "Destination stride must be equal to the ratio
> > of the sizes "
> > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > b/src/intel/compiler/test_eu_validate.cpp
> > index 73300b23122..1557b6d2452 100644
> > --- a/src/intel/compiler/test_eu_validate.cpp
> > +++ b/src/intel/compiler/test_eu_validate.cpp
> > @@ -848,6 +848,128 @@ TEST_P(validation_test,
> > byte_destination_relaxed_alignment)
> >     }
> >  }
> >  
> > +TEST_P(validation_test, half_float_conversion)
> > +{
> > +   static const struct {
> > +      enum brw_reg_type dst_type;
> > +      enum brw_reg_type src_type;
> > +      unsigned dst_stride;
> > +      unsigned dst_subnr;
> > +      bool expected_result;
> > +   } inst[] = {
> > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
> > expected_result)  \
> > +      {                                                           
> >         \
> > +         BRW_REGISTER_TYPE_##dst_type,                            
> >         \
> > +         BRW_REGISTER_TYPE_##src_type,                            
> >         \
> > +         BRW_HORIZONTAL_STRIDE_##dst_stride,                      
> >         \
> > +         dst_subnr,                                               
> >         \
> > +         expected_result,                                         
> >         \
> > +      }
> > +
> > +      /* MOV to half-float destination (F source handled later) */
> > +      INST(HF,  B, 1, 0, false),
> > +      INST(HF,  W, 1, 0, false),
> > +      INST(HF, HF, 1, 0, true),
> > +      INST(HF, HF, 1, 2, true),
> > +      INST(HF,  D, 1, 0, false),
> > +      INST(HF,  Q, 1, 0, false),
> > +      INST(HF, DF, 1, 0, false),
> > +      INST(HF,  B, 2, 0, true),
> > +      INST(HF,  B, 2, 2, false),
> > +      INST(HF,  W, 2, 0, true),
> > +      INST(HF,  W, 2, 2, false),
> > +      INST(HF, HF, 2, 0, true),
> > +      INST(HF, HF, 2, 2, true),
> > +      INST(HF,  D, 2, 0, true),
> > +      INST(HF,  D, 2, 2, false),
> > +      INST(HF,  Q, 2, 0, false),
> > +      INST(HF, DF, 2, 0, false),
> > +      INST(HF,  B, 4, 0, false),
> > +      INST(HF,  W, 4, 0, false),
> > +      INST(HF, HF, 4, 0, true),
> > +      INST(HF, HF, 4, 2, true),
> > +      INST(HF,  D, 4, 0, false),
> > +      INST(HF,  Q, 4, 0, false),
> > +      INST(HF, DF, 4, 0, false),
> > +
> > +      /* MOV from half-float source */
> > +      INST( B, HF, 1, 0, false),
> > +      INST( W, HF, 1, 0, false),
> > +      INST( D, HF, 1, 0, true),
> > +      INST( D, HF, 1, 4, true),
> > +      INST( F, HF, 1, 0, true),
> > +      INST( F, HF, 1, 4, true),
> > +      INST( Q, HF, 1, 0, false),
> > +      INST(DF, HF, 1, 0, false),
> > +      INST( B, HF, 2, 0, false),
> > +      INST( W, HF, 2, 0, true),
> > +      INST( W, HF, 2, 2, false),
> > +      INST( D, HF, 2, 0, false),
> > +      INST( F, HF, 2, 0, true),
> > +      INST( F, HF, 2, 2, true),
> > +      INST( B, HF, 4, 0, true),
> > +      INST( B, HF, 4, 1, false),
> > +      INST( W, HF, 4, 0, false),
> > +
> > +#undef INST
> > +   };
> > +
> > +   if (devinfo.gen < 8)
> > +      return;
> > +
> > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
> > +      if (!devinfo.has_64bit_types &&
> > +          (type_sz(inst[i].src_type) == 8 ||
> > type_sz(inst[i].dst_type) == 8)) {
> > +         continue;
> > +      }
> > +
> > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> > inst[i].src_type));
> > +
> > +      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
> > +
> > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
> > inst[i].dst_stride);
> > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
> > inst[i].dst_subnr);
> > +
> > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > BRW_VERTICAL_STRIDE_4);
> > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > BRW_WIDTH_2);
> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_2);
> > +      } else {
> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > BRW_VERTICAL_STRIDE_4);
> > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > BRW_WIDTH_4);
> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_1);
> > +      }
> > +
> > +      EXPECT_EQ(inst[i].expected_result, validate(p));
> > +
> > +      clear_instructions(p);
> > +   }
> > +
> > +   /* Conversion from F to HF has Dword destination alignment
> > restriction
> > +    * on CHV and SKL+ only
> > +    */
> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > +              retype(g0, BRW_REGISTER_TYPE_F));
> > +
> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_1);
> > +
> > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> > +      EXPECT_FALSE(validate(p));
> > +   } else {
> > +      assert(devinfo.gen == 8);
> > +      EXPECT_TRUE(validate(p));
> > +   }
> > +   clear_instructions(p);
> > +
> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > +              retype(g0, BRW_REGISTER_TYPE_F));
> > +
> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > BRW_HORIZONTAL_STRIDE_2);
> > +
> > +   EXPECT_TRUE(validate(p));
> > +   clear_instructions(p);
> > +}
> > +
> >  TEST_P(validation_test, vector_immediate_destination_alignment)
> >  {
> >     static const struct {
> > -- 
> > 2.17.1
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Iago Toral <itoral@igalia.com> writes:

> On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga <itoral@igalia.com> writes:
>> 
>> > ---
>> >  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
>> >  src/intel/compiler/test_eu_validate.cpp | 122
>> > ++++++++++++++++++++++++
>> >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > b/src/intel/compiler/brw_eu_validate.c
>> > index 000a05cb6ac..203641fecb9 100644
>> > --- a/src/intel/compiler/brw_eu_validate.c
>> > +++ b/src/intel/compiler/brw_eu_validate.c
>> > @@ -531,7 +531,69 @@
>> > general_restrictions_based_on_operand_types(const struct
>> > gen_device_info *devinf
>> >         exec_type_size == 8 && dst_type_size == 4)
>> >        dst_type_size = 8;
>> >  
>> > -   if (exec_type_size > dst_type_size) {
>> > +   /* From the BDW+ PRM:
>> > +    *
>> > +    *    "There is no direct conversion from HF to DF or DF to HF.
>> > +    *     There is no direct conversion from HF to Q/UQ or Q/UQ to
>> > HF."
>> > +    */
>> > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
>> > inst);
>> > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>> 
>> Why is only the MOV instruction handled here and below?  Aren't other
>> instructions able to do implicit conversions?  Probably means you
>> need
>> to deal with two sources rather than one.
>
> This comes from the programming notes of the MOV instruction (Volume
> 2a, Command Reference - Instructions - MOV), so it is described
> specifically for the MOV instruction. I should probably have made this
> clear in the comment.
>

Maybe the one above is specified in the MOV page only, probably due to
an oversight (If these restrictions were really specific to the MOV
instruction, what would prevent you from implementing such conversions
through a different instruction?  E.g. "ADD dst:df, src:hf, 0" which
would be substantially more efficient than what you're doing in PATCH
02), but I see other restriction checks in this patch which are
certainly specified in the generic regioning restrictions page and
you're limiting to the MOV instruction...

>> > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > type_sz(src0_type) == 8) ||
>> > +             (dst_type_size == 8 && src0_type ==
>> > BRW_REGISTER_TYPE_HF)),
>> > +            "There are no direct conversion between 64-bit types
>> > and HF");
>> > +
>> > +   /* From the BDW+ PRM:
>> > +    *
>> > +    *   "Conversion between Integer and HF (Half Float) must be
>> > +    *    DWord-aligned and strided by a DWord on the destination."
>> > +    *
>> > +    * But this seems to be expanded on CHV and SKL+ by:
>> > +    *
>> > +    *   "There is a relaxed alignment rule for word destinations.
>> > When
>> > +    *    the destination type is word (UW, W, HF), destination
>> > data types
>> > +    *    can be aligned to either the lowest word or the second
>> > lowest
>> > +    *    word of the execution channel. This means the destination
>> > data
>> > +    *    words can be either all in the even word locations or all
>> > in the
>> > +    *    odd word locations."
>> > +    *
>> > +    * We do not implement the second rule as is though, since
>> > empirical testing
>> > +    * shows inconsistencies:
>> > +    *   - It suggests that packed 16-bit is not allowed, which is
>> > not true.
>> > +    *   - It suggests that conversions from Q/DF to W (which need
>> > to be 64-bit
>> > +    *     aligned on the destination) are not possible, which is
>> > not true.
>> > +    *   - It suggests that conversions from 16-bit executions
>> > types to W need
>> > +    *     to be 32-bit aligned, which doesn't seem to be
>> > necessary.
>> > +    *
>> > +    * So from this rule we only validate the implication that
>> > conversion from
>> > +    * F to HF needs to be DWord aligned too (in BDW this is
>> > limited to
>> > +    * conversions from integer types).
>> > +    */
>> > +   bool is_half_float_conversion =
>> > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>> > +       dst_type != src0_type &&
>> > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
>> > BRW_REGISTER_TYPE_HF);
>> > +
>> > +   if (is_half_float_conversion) {
>> > +      assert(devinfo->gen >= 8);
>> > +
>> > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > brw_reg_type_is_integer(src0_type)) ||
>> > +          (brw_reg_type_is_integer(dst_type) && src0_type ==
>> > BRW_REGISTER_TYPE_HF)) {
>> > +         ERROR_IF(dst_stride * dst_type_size != 4,
>> > +                  "Conversions between integer and half-float must
>> > be strided "
>> > +                  "by a DWord on the destination");
>> > +
>> > +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
>> > inst);
>> > +         ERROR_IF(subreg % 4 != 0,
>> > +                  "Conversions between integer and half-float must
>> > be aligned "
>> > +                  "to a DWord on the destination");
>> > +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9) &&
>> > +                 dst_type == BRW_REGISTER_TYPE_HF) {
>> > +         ERROR_IF(dst_stride != 2,
>> > +                  "Conversions to HF must have either all words in
>> > even word "
>> > +                  "locations or all words in odd word locations");
>> > +      }
>> > +
>> > +   } else if (exec_type_size > dst_type_size) {
>> >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst)))
>> > {
>> >           ERROR_IF(dst_stride * dst_type_size != exec_type_size,
>> >                    "Destination stride must be equal to the ratio
>> > of the sizes "
>> > diff --git a/src/intel/compiler/test_eu_validate.cpp
>> > b/src/intel/compiler/test_eu_validate.cpp
>> > index 73300b23122..1557b6d2452 100644
>> > --- a/src/intel/compiler/test_eu_validate.cpp
>> > +++ b/src/intel/compiler/test_eu_validate.cpp
>> > @@ -848,6 +848,128 @@ TEST_P(validation_test,
>> > byte_destination_relaxed_alignment)
>> >     }
>> >  }
>> >  
>> > +TEST_P(validation_test, half_float_conversion)
>> > +{
>> > +   static const struct {
>> > +      enum brw_reg_type dst_type;
>> > +      enum brw_reg_type src_type;
>> > +      unsigned dst_stride;
>> > +      unsigned dst_subnr;
>> > +      bool expected_result;
>> > +   } inst[] = {
>> > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
>> > expected_result)  \
>> > +      {                                                           
>> >         \
>> > +         BRW_REGISTER_TYPE_##dst_type,                            
>> >         \
>> > +         BRW_REGISTER_TYPE_##src_type,                            
>> >         \
>> > +         BRW_HORIZONTAL_STRIDE_##dst_stride,                      
>> >         \
>> > +         dst_subnr,                                               
>> >         \
>> > +         expected_result,                                         
>> >         \
>> > +      }
>> > +
>> > +      /* MOV to half-float destination (F source handled later) */
>> > +      INST(HF,  B, 1, 0, false),
>> > +      INST(HF,  W, 1, 0, false),
>> > +      INST(HF, HF, 1, 0, true),
>> > +      INST(HF, HF, 1, 2, true),
>> > +      INST(HF,  D, 1, 0, false),
>> > +      INST(HF,  Q, 1, 0, false),
>> > +      INST(HF, DF, 1, 0, false),
>> > +      INST(HF,  B, 2, 0, true),
>> > +      INST(HF,  B, 2, 2, false),
>> > +      INST(HF,  W, 2, 0, true),
>> > +      INST(HF,  W, 2, 2, false),
>> > +      INST(HF, HF, 2, 0, true),
>> > +      INST(HF, HF, 2, 2, true),
>> > +      INST(HF,  D, 2, 0, true),
>> > +      INST(HF,  D, 2, 2, false),
>> > +      INST(HF,  Q, 2, 0, false),
>> > +      INST(HF, DF, 2, 0, false),
>> > +      INST(HF,  B, 4, 0, false),
>> > +      INST(HF,  W, 4, 0, false),
>> > +      INST(HF, HF, 4, 0, true),
>> > +      INST(HF, HF, 4, 2, true),
>> > +      INST(HF,  D, 4, 0, false),
>> > +      INST(HF,  Q, 4, 0, false),
>> > +      INST(HF, DF, 4, 0, false),
>> > +
>> > +      /* MOV from half-float source */
>> > +      INST( B, HF, 1, 0, false),
>> > +      INST( W, HF, 1, 0, false),
>> > +      INST( D, HF, 1, 0, true),
>> > +      INST( D, HF, 1, 4, true),
>> > +      INST( F, HF, 1, 0, true),
>> > +      INST( F, HF, 1, 4, true),
>> > +      INST( Q, HF, 1, 0, false),
>> > +      INST(DF, HF, 1, 0, false),
>> > +      INST( B, HF, 2, 0, false),
>> > +      INST( W, HF, 2, 0, true),
>> > +      INST( W, HF, 2, 2, false),
>> > +      INST( D, HF, 2, 0, false),
>> > +      INST( F, HF, 2, 0, true),
>> > +      INST( F, HF, 2, 2, true),
>> > +      INST( B, HF, 4, 0, true),
>> > +      INST( B, HF, 4, 1, false),
>> > +      INST( W, HF, 4, 0, false),
>> > +
>> > +#undef INST
>> > +   };
>> > +
>> > +   if (devinfo.gen < 8)
>> > +      return;
>> > +
>> > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
>> > +      if (!devinfo.has_64bit_types &&
>> > +          (type_sz(inst[i].src_type) == 8 ||
>> > type_sz(inst[i].dst_type) == 8)) {
>> > +         continue;
>> > +      }
>> > +
>> > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
>> > inst[i].src_type));
>> > +
>> > +      brw_inst_set_exec_size(&devinfo, last_inst, BRW_EXECUTE_4);
>> > +
>> > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > inst[i].dst_stride);
>> > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
>> > inst[i].dst_subnr);
>> > +
>> > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
>> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > BRW_VERTICAL_STRIDE_4);
>> > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > BRW_WIDTH_2);
>> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > BRW_HORIZONTAL_STRIDE_2);
>> > +      } else {
>> > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > BRW_VERTICAL_STRIDE_4);
>> > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > BRW_WIDTH_4);
>> > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > BRW_HORIZONTAL_STRIDE_1);
>> > +      }
>> > +
>> > +      EXPECT_EQ(inst[i].expected_result, validate(p));
>> > +
>> > +      clear_instructions(p);
>> > +   }
>> > +
>> > +   /* Conversion from F to HF has Dword destination alignment
>> > restriction
>> > +    * on CHV and SKL+ only
>> > +    */
>> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > +
>> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > BRW_HORIZONTAL_STRIDE_1);
>> > +
>> > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
>> > +      EXPECT_FALSE(validate(p));
>> > +   } else {
>> > +      assert(devinfo.gen == 8);
>> > +      EXPECT_TRUE(validate(p));
>> > +   }
>> > +   clear_instructions(p);
>> > +
>> > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > +
>> > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > BRW_HORIZONTAL_STRIDE_2);
>> > +
>> > +   EXPECT_TRUE(validate(p));
>> > +   clear_instructions(p);
>> > +}
>> > +
>> >  TEST_P(validation_test, vector_immediate_destination_alignment)
>> >  {
>> >     static const struct {
>> > -- 
>> > 2.17.1
>> > 
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
> > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > 
> > > > ---
> > > >  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
> > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > ++++++++++++++++++++++++
> > > >  2 files changed, 185 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > b/src/intel/compiler/brw_eu_validate.c
> > > > index 000a05cb6ac..203641fecb9 100644
> > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > @@ -531,7 +531,69 @@
> > > > general_restrictions_based_on_operand_types(const struct
> > > > gen_device_info *devinf
> > > >         exec_type_size == 8 && dst_type_size == 4)
> > > >        dst_type_size = 8;
> > > >  
> > > > -   if (exec_type_size > dst_type_size) {
> > > > +   /* From the BDW+ PRM:
> > > > +    *
> > > > +    *    "There is no direct conversion from HF to DF or DF to
> > > > HF.
> > > > +    *     There is no direct conversion from HF to Q/UQ or
> > > > Q/UQ to
> > > > HF."
> > > > +    */
> > > > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
> > > > inst);
> > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
> > > > &&
> > > 
> > > Why is only the MOV instruction handled here and below?  Aren't
> > > other
> > > instructions able to do implicit conversions?  Probably means you
> > > need
> > > to deal with two sources rather than one.
> > 
> > This comes from the programming notes of the MOV instruction
> > (Volume
> > 2a, Command Reference - Instructions - MOV), so it is described
> > specifically for the MOV instruction. I should probably have made
> > this
> > clear in the comment.
> > 
> 
> Maybe the one above is specified in the MOV page only, probably due
> to
> an oversight (If these restrictions were really specific to the MOV
> instruction, what would prevent you from implementing such
> conversions
> through a different instruction?  E.g. "ADD dst:df, src:hf, 0" which
> would be substantially more efficient than what you're doing in PATCH
> 02)

Instructions that take HF can only be strictly HF or mix F and HF
(mixed mode float), with MOV being the only exception. That means that
any instruction like the one you use above are invalid. Maybe we should
validate explicitly that instructions that use HF are strictly HF or
mixed-float mode only?

>  but I see other restriction checks in this patch which are
> certainly specified in the generic regioning restrictions page and
> you're limiting to the MOV instruction...

There are two rules below:

1. The one about conversions between integer and half-float. Again,
these can only happen through MOV for the same reasons, so I think this
one should be fine.

2. The one about word destinations (of which we are only really
implementing conversions from F->HF). Here the rule is more generic and
I agree that expanding this to include any other mixed float mode
instruction would make sense. However, validation for mixed float mode
has its own set rules, some of which are incompatible with the general
region restrictions being validated here, so I think it is inconvenient
to try and do that validation  here (see patch 36 and then patch 37).
What I would propose, if you agree, is that we only implement this for
MOV here, and then for mixed float mode instructions, we implement the
more generic version of this check (that would then go in patch 37).
How does that sound?

> > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > type_sz(src0_type) == 8) ||
> > > > +             (dst_type_size == 8 && src0_type ==
> > > > BRW_REGISTER_TYPE_HF)),
> > > > +            "There are no direct conversion between 64-bit
> > > > types
> > > > and HF");
> > > > +
> > > > +   /* From the BDW+ PRM:
> > > > +    *
> > > > +    *   "Conversion between Integer and HF (Half Float) must
> > > > be
> > > > +    *    DWord-aligned and strided by a DWord on the
> > > > destination."
> > > > +    *
> > > > +    * But this seems to be expanded on CHV and SKL+ by:
> > > > +    *
> > > > +    *   "There is a relaxed alignment rule for word
> > > > destinations.
> > > > When
> > > > +    *    the destination type is word (UW, W, HF), destination
> > > > data types
> > > > +    *    can be aligned to either the lowest word or the
> > > > second
> > > > lowest
> > > > +    *    word of the execution channel. This means the
> > > > destination
> > > > data
> > > > +    *    words can be either all in the even word locations or
> > > > all
> > > > in the
> > > > +    *    odd word locations."
> > > > +    *
> > > > +    * We do not implement the second rule as is though, since
> > > > empirical testing
> > > > +    * shows inconsistencies:
> > > > +    *   - It suggests that packed 16-bit is not allowed, which
> > > > is
> > > > not true.
> > > > +    *   - It suggests that conversions from Q/DF to W (which
> > > > need
> > > > to be 64-bit
> > > > +    *     aligned on the destination) are not possible, which
> > > > is
> > > > not true.
> > > > +    *   - It suggests that conversions from 16-bit executions
> > > > types to W need
> > > > +    *     to be 32-bit aligned, which doesn't seem to be
> > > > necessary.
> > > > +    *
> > > > +    * So from this rule we only validate the implication that
> > > > conversion from
> > > > +    * F to HF needs to be DWord aligned too (in BDW this is
> > > > limited to
> > > > +    * conversions from integer types).
> > > > +    */
> > > > +   bool is_half_float_conversion =
> > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > > > +       dst_type != src0_type &&
> > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > > > BRW_REGISTER_TYPE_HF);
> > > > +
> > > > +   if (is_half_float_conversion) {
> > > > +      assert(devinfo->gen >= 8);
> > > > +
> > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > brw_reg_type_is_integer(src0_type)) ||
> > > > +          (brw_reg_type_is_integer(dst_type) && src0_type ==
> > > > BRW_REGISTER_TYPE_HF)) {
> > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
> > > > +                  "Conversions between integer and half-float
> > > > must
> > > > be strided "
> > > > +                  "by a DWord on the destination");
> > > > +
> > > > +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
> > > > inst);
> > > > +         ERROR_IF(subreg % 4 != 0,
> > > > +                  "Conversions between integer and half-float
> > > > must
> > > > be aligned "
> > > > +                  "to a DWord on the destination");
> > > > +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9)
> > > > &&
> > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
> > > > +         ERROR_IF(dst_stride != 2,
> > > > +                  "Conversions to HF must have either all
> > > > words in
> > > > even word "
> > > > +                  "locations or all words in odd word
> > > > locations");
> > > > +      }
> > > > +
> > > > +   } else if (exec_type_size > dst_type_size) {
> > > >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo,
> > > > inst)))
> > > > {
> > > >           ERROR_IF(dst_stride * dst_type_size !=
> > > > exec_type_size,
> > > >                    "Destination stride must be equal to the
> > > > ratio
> > > > of the sizes "
> > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > > > b/src/intel/compiler/test_eu_validate.cpp
> > > > index 73300b23122..1557b6d2452 100644
> > > > --- a/src/intel/compiler/test_eu_validate.cpp
> > > > +++ b/src/intel/compiler/test_eu_validate.cpp
> > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
> > > > byte_destination_relaxed_alignment)
> > > >     }
> > > >  }
> > > >  
> > > > +TEST_P(validation_test, half_float_conversion)
> > > > +{
> > > > +   static const struct {
> > > > +      enum brw_reg_type dst_type;
> > > > +      enum brw_reg_type src_type;
> > > > +      unsigned dst_stride;
> > > > +      unsigned dst_subnr;
> > > > +      bool expected_result;
> > > > +   } inst[] = {
> > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
> > > > expected_result)  \
> > > > +      {                                                       
> > > >     
> > > >         \
> > > > +         BRW_REGISTER_TYPE_##dst_type,                        
> > > >     
> > > >         \
> > > > +         BRW_REGISTER_TYPE_##src_type,                        
> > > >     
> > > >         \
> > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,                  
> > > >     
> > > >         \
> > > > +         dst_subnr,                                           
> > > >     
> > > >         \
> > > > +         expected_result,                                     
> > > >     
> > > >         \
> > > > +      }
> > > > +
> > > > +      /* MOV to half-float destination (F source handled
> > > > later) */
> > > > +      INST(HF,  B, 1, 0, false),
> > > > +      INST(HF,  W, 1, 0, false),
> > > > +      INST(HF, HF, 1, 0, true),
> > > > +      INST(HF, HF, 1, 2, true),
> > > > +      INST(HF,  D, 1, 0, false),
> > > > +      INST(HF,  Q, 1, 0, false),
> > > > +      INST(HF, DF, 1, 0, false),
> > > > +      INST(HF,  B, 2, 0, true),
> > > > +      INST(HF,  B, 2, 2, false),
> > > > +      INST(HF,  W, 2, 0, true),
> > > > +      INST(HF,  W, 2, 2, false),
> > > > +      INST(HF, HF, 2, 0, true),
> > > > +      INST(HF, HF, 2, 2, true),
> > > > +      INST(HF,  D, 2, 0, true),
> > > > +      INST(HF,  D, 2, 2, false),
> > > > +      INST(HF,  Q, 2, 0, false),
> > > > +      INST(HF, DF, 2, 0, false),
> > > > +      INST(HF,  B, 4, 0, false),
> > > > +      INST(HF,  W, 4, 0, false),
> > > > +      INST(HF, HF, 4, 0, true),
> > > > +      INST(HF, HF, 4, 2, true),
> > > > +      INST(HF,  D, 4, 0, false),
> > > > +      INST(HF,  Q, 4, 0, false),
> > > > +      INST(HF, DF, 4, 0, false),
> > > > +
> > > > +      /* MOV from half-float source */
> > > > +      INST( B, HF, 1, 0, false),
> > > > +      INST( W, HF, 1, 0, false),
> > > > +      INST( D, HF, 1, 0, true),
> > > > +      INST( D, HF, 1, 4, true),
> > > > +      INST( F, HF, 1, 0, true),
> > > > +      INST( F, HF, 1, 4, true),
> > > > +      INST( Q, HF, 1, 0, false),
> > > > +      INST(DF, HF, 1, 0, false),
> > > > +      INST( B, HF, 2, 0, false),
> > > > +      INST( W, HF, 2, 0, true),
> > > > +      INST( W, HF, 2, 2, false),
> > > > +      INST( D, HF, 2, 0, false),
> > > > +      INST( F, HF, 2, 0, true),
> > > > +      INST( F, HF, 2, 2, true),
> > > > +      INST( B, HF, 4, 0, true),
> > > > +      INST( B, HF, 4, 1, false),
> > > > +      INST( W, HF, 4, 0, false),
> > > > +
> > > > +#undef INST
> > > > +   };
> > > > +
> > > > +   if (devinfo.gen < 8)
> > > > +      return;
> > > > +
> > > > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]);
> > > > i++) {
> > > > +      if (!devinfo.has_64bit_types &&
> > > > +          (type_sz(inst[i].src_type) == 8 ||
> > > > type_sz(inst[i].dst_type) == 8)) {
> > > > +         continue;
> > > > +      }
> > > > +
> > > > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> > > > inst[i].src_type));
> > > > +
> > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
> > > > BRW_EXECUTE_4);
> > > > +
> > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > inst[i].dst_stride);
> > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
> > > > inst[i].dst_subnr);
> > > > +
> > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > BRW_VERTICAL_STRIDE_4);
> > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > BRW_WIDTH_2);
> > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > BRW_HORIZONTAL_STRIDE_2);
> > > > +      } else {
> > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > BRW_VERTICAL_STRIDE_4);
> > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > BRW_WIDTH_4);
> > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > BRW_HORIZONTAL_STRIDE_1);
> > > > +      }
> > > > +
> > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
> > > > +
> > > > +      clear_instructions(p);
> > > > +   }
> > > > +
> > > > +   /* Conversion from F to HF has Dword destination alignment
> > > > restriction
> > > > +    * on CHV and SKL+ only
> > > > +    */
> > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > +
> > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > BRW_HORIZONTAL_STRIDE_1);
> > > > +
> > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> > > > +      EXPECT_FALSE(validate(p));
> > > > +   } else {
> > > > +      assert(devinfo.gen == 8);
> > > > +      EXPECT_TRUE(validate(p));
> > > > +   }
> > > > +   clear_instructions(p);
> > > > +
> > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > +
> > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > BRW_HORIZONTAL_STRIDE_2);
> > > > +
> > > > +   EXPECT_TRUE(validate(p));
> > > > +   clear_instructions(p);
> > > > +}
> > > > +
> > > >  TEST_P(validation_test,
> > > > vector_immediate_destination_alignment)
> > > >  {
> > > >     static const struct {
> > > > -- 
> > > > 2.17.1
> > > > 
> > > > _______________________________________________
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Iago Toral <itoral@igalia.com> writes:

> On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> Iago Toral <itoral@igalia.com> writes:
>> 
>> > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral Quiroga <itoral@igalia.com> writes:
>> > > 
>> > > > ---
>> > > >  src/intel/compiler/brw_eu_validate.c    |  64 ++++++++++++-
>> > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > ++++++++++++++++++++++++
>> > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > 
>> > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > index 000a05cb6ac..203641fecb9 100644
>> > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > @@ -531,7 +531,69 @@
>> > > > general_restrictions_based_on_operand_types(const struct
>> > > > gen_device_info *devinf
>> > > >         exec_type_size == 8 && dst_type_size == 4)
>> > > >        dst_type_size = 8;
>> > > >  
>> > > > -   if (exec_type_size > dst_type_size) {
>> > > > +   /* From the BDW+ PRM:
>> > > > +    *
>> > > > +    *    "There is no direct conversion from HF to DF or DF to
>> > > > HF.
>> > > > +    *     There is no direct conversion from HF to Q/UQ or
>> > > > Q/UQ to
>> > > > HF."
>> > > > +    */
>> > > > +   enum brw_reg_type src0_type = brw_inst_src0_type(devinfo,
>> > > > inst);
>> > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
>> > > > &&
>> > > 
>> > > Why is only the MOV instruction handled here and below?  Aren't
>> > > other
>> > > instructions able to do implicit conversions?  Probably means you
>> > > need
>> > > to deal with two sources rather than one.
>> > 
>> > This comes from the programming notes of the MOV instruction
>> > (Volume
>> > 2a, Command Reference - Instructions - MOV), so it is described
>> > specifically for the MOV instruction. I should probably have made
>> > this
>> > clear in the comment.
>> > 
>> 
>> Maybe the one above is specified in the MOV page only, probably due
>> to
>> an oversight (If these restrictions were really specific to the MOV
>> instruction, what would prevent you from implementing such
>> conversions
>> through a different instruction?  E.g. "ADD dst:df, src:hf, 0" which
>> would be substantially more efficient than what you're doing in PATCH
>> 02)
>
> Instructions that take HF can only be strictly HF or mix F and HF
> (mixed mode float), with MOV being the only exception. That means that
> any instruction like the one you use above are invalid. Maybe we should
> validate explicitly that instructions that use HF are strictly HF or
> mixed-float mode only?
>

So you're acknowledging that the conversions checked above are illegal
whether the instruction is a MOV or something else.  Where are we
preventing instructions other than MOV with such conversions from being
accepted by the validator?

>>  but I see other restriction checks in this patch which are
>> certainly specified in the generic regioning restrictions page and
>> you're limiting to the MOV instruction...
>
> There are two rules below:
>
> 1. The one about conversions between integer and half-float. Again,
> these can only happen through MOV for the same reasons, so I think this
> one should be fine.
>

Why do you think that can only happen through a MOV instruction?  The
hardware spec lists the following as a valid example in the register
region restrictions page:

| add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w

> 2. The one about word destinations (of which we are only really
> implementing conversions from F->HF). Here the rule is more generic and
> I agree that expanding this to include any other mixed float mode
> instruction would make sense. However, validation for mixed float mode
> has its own set rules, some of which are incompatible with the general
> region restrictions being validated here, so I think it is inconvenient
> to try and do that validation  here (see patch 36 and then patch 37).
> What I would propose, if you agree, is that we only implement this for
> MOV here, and then for mixed float mode instructions, we implement the
> more generic version of this check (that would then go in patch 37).
> How does that sound?
>

I still don't understand why you want to implement the same restriction
twice, once for MOV and once for all other mixed-mode instructions.  How
is that more convenient?

>> > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > type_sz(src0_type) == 8) ||
>> > > > +             (dst_type_size == 8 && src0_type ==
>> > > > BRW_REGISTER_TYPE_HF)),
>> > > > +            "There are no direct conversion between 64-bit
>> > > > types
>> > > > and HF");
>> > > > +
>> > > > +   /* From the BDW+ PRM:
>> > > > +    *
>> > > > +    *   "Conversion between Integer and HF (Half Float) must
>> > > > be
>> > > > +    *    DWord-aligned and strided by a DWord on the
>> > > > destination."
>> > > > +    *
>> > > > +    * But this seems to be expanded on CHV and SKL+ by:
>> > > > +    *
>> > > > +    *   "There is a relaxed alignment rule for word
>> > > > destinations.
>> > > > When
>> > > > +    *    the destination type is word (UW, W, HF), destination
>> > > > data types
>> > > > +    *    can be aligned to either the lowest word or the
>> > > > second
>> > > > lowest
>> > > > +    *    word of the execution channel. This means the
>> > > > destination
>> > > > data
>> > > > +    *    words can be either all in the even word locations or
>> > > > all
>> > > > in the
>> > > > +    *    odd word locations."
>> > > > +    *
>> > > > +    * We do not implement the second rule as is though, since
>> > > > empirical testing
>> > > > +    * shows inconsistencies:
>> > > > +    *   - It suggests that packed 16-bit is not allowed, which
>> > > > is
>> > > > not true.
>> > > > +    *   - It suggests that conversions from Q/DF to W (which
>> > > > need
>> > > > to be 64-bit
>> > > > +    *     aligned on the destination) are not possible, which
>> > > > is
>> > > > not true.
>> > > > +    *   - It suggests that conversions from 16-bit executions
>> > > > types to W need
>> > > > +    *     to be 32-bit aligned, which doesn't seem to be
>> > > > necessary.
>> > > > +    *
>> > > > +    * So from this rule we only validate the implication that
>> > > > conversion from
>> > > > +    * F to HF needs to be DWord aligned too (in BDW this is
>> > > > limited to
>> > > > +    * conversions from integer types).
>> > > > +    */
>> > > > +   bool is_half_float_conversion =
>> > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
>> > > > +       dst_type != src0_type &&
>> > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
>> > > > BRW_REGISTER_TYPE_HF);
>> > > > +
>> > > > +   if (is_half_float_conversion) {
>> > > > +      assert(devinfo->gen >= 8);
>> > > > +
>> > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > brw_reg_type_is_integer(src0_type)) ||
>> > > > +          (brw_reg_type_is_integer(dst_type) && src0_type ==
>> > > > BRW_REGISTER_TYPE_HF)) {
>> > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
>> > > > +                  "Conversions between integer and half-float
>> > > > must
>> > > > be strided "
>> > > > +                  "by a DWord on the destination");
>> > > > +
>> > > > +         unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo,
>> > > > inst);
>> > > > +         ERROR_IF(subreg % 4 != 0,
>> > > > +                  "Conversions between integer and half-float
>> > > > must
>> > > > be aligned "
>> > > > +                  "to a DWord on the destination");
>> > > > +      } else if ((devinfo->is_cherryview || devinfo->gen >= 9)
>> > > > &&
>> > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
>> > > > +         ERROR_IF(dst_stride != 2,
>> > > > +                  "Conversions to HF must have either all
>> > > > words in
>> > > > even word "
>> > > > +                  "locations or all words in odd word
>> > > > locations");
>> > > > +      }
>> > > > +
>> > > > +   } else if (exec_type_size > dst_type_size) {
>> > > >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo,
>> > > > inst)))
>> > > > {
>> > > >           ERROR_IF(dst_stride * dst_type_size !=
>> > > > exec_type_size,
>> > > >                    "Destination stride must be equal to the
>> > > > ratio
>> > > > of the sizes "
>> > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
>> > > > b/src/intel/compiler/test_eu_validate.cpp
>> > > > index 73300b23122..1557b6d2452 100644
>> > > > --- a/src/intel/compiler/test_eu_validate.cpp
>> > > > +++ b/src/intel/compiler/test_eu_validate.cpp
>> > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
>> > > > byte_destination_relaxed_alignment)
>> > > >     }
>> > > >  }
>> > > >  
>> > > > +TEST_P(validation_test, half_float_conversion)
>> > > > +{
>> > > > +   static const struct {
>> > > > +      enum brw_reg_type dst_type;
>> > > > +      enum brw_reg_type src_type;
>> > > > +      unsigned dst_stride;
>> > > > +      unsigned dst_subnr;
>> > > > +      bool expected_result;
>> > > > +   } inst[] = {
>> > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
>> > > > expected_result)  \
>> > > > +      {                                                       
>> > > >     
>> > > >         \
>> > > > +         BRW_REGISTER_TYPE_##dst_type,                        
>> > > >     
>> > > >         \
>> > > > +         BRW_REGISTER_TYPE_##src_type,                        
>> > > >     
>> > > >         \
>> > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,                  
>> > > >     
>> > > >         \
>> > > > +         dst_subnr,                                           
>> > > >     
>> > > >         \
>> > > > +         expected_result,                                     
>> > > >     
>> > > >         \
>> > > > +      }
>> > > > +
>> > > > +      /* MOV to half-float destination (F source handled
>> > > > later) */
>> > > > +      INST(HF,  B, 1, 0, false),
>> > > > +      INST(HF,  W, 1, 0, false),
>> > > > +      INST(HF, HF, 1, 0, true),
>> > > > +      INST(HF, HF, 1, 2, true),
>> > > > +      INST(HF,  D, 1, 0, false),
>> > > > +      INST(HF,  Q, 1, 0, false),
>> > > > +      INST(HF, DF, 1, 0, false),
>> > > > +      INST(HF,  B, 2, 0, true),
>> > > > +      INST(HF,  B, 2, 2, false),
>> > > > +      INST(HF,  W, 2, 0, true),
>> > > > +      INST(HF,  W, 2, 2, false),
>> > > > +      INST(HF, HF, 2, 0, true),
>> > > > +      INST(HF, HF, 2, 2, true),
>> > > > +      INST(HF,  D, 2, 0, true),
>> > > > +      INST(HF,  D, 2, 2, false),
>> > > > +      INST(HF,  Q, 2, 0, false),
>> > > > +      INST(HF, DF, 2, 0, false),
>> > > > +      INST(HF,  B, 4, 0, false),
>> > > > +      INST(HF,  W, 4, 0, false),
>> > > > +      INST(HF, HF, 4, 0, true),
>> > > > +      INST(HF, HF, 4, 2, true),
>> > > > +      INST(HF,  D, 4, 0, false),
>> > > > +      INST(HF,  Q, 4, 0, false),
>> > > > +      INST(HF, DF, 4, 0, false),
>> > > > +
>> > > > +      /* MOV from half-float source */
>> > > > +      INST( B, HF, 1, 0, false),
>> > > > +      INST( W, HF, 1, 0, false),
>> > > > +      INST( D, HF, 1, 0, true),
>> > > > +      INST( D, HF, 1, 4, true),
>> > > > +      INST( F, HF, 1, 0, true),
>> > > > +      INST( F, HF, 1, 4, true),
>> > > > +      INST( Q, HF, 1, 0, false),
>> > > > +      INST(DF, HF, 1, 0, false),
>> > > > +      INST( B, HF, 2, 0, false),
>> > > > +      INST( W, HF, 2, 0, true),
>> > > > +      INST( W, HF, 2, 2, false),
>> > > > +      INST( D, HF, 2, 0, false),
>> > > > +      INST( F, HF, 2, 0, true),
>> > > > +      INST( F, HF, 2, 2, true),
>> > > > +      INST( B, HF, 4, 0, true),
>> > > > +      INST( B, HF, 4, 1, false),
>> > > > +      INST( W, HF, 4, 0, false),
>> > > > +
>> > > > +#undef INST
>> > > > +   };
>> > > > +
>> > > > +   if (devinfo.gen < 8)
>> > > > +      return;
>> > > > +
>> > > > +   for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]);
>> > > > i++) {
>> > > > +      if (!devinfo.has_64bit_types &&
>> > > > +          (type_sz(inst[i].src_type) == 8 ||
>> > > > type_sz(inst[i].dst_type) == 8)) {
>> > > > +         continue;
>> > > > +      }
>> > > > +
>> > > > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
>> > > > inst[i].src_type));
>> > > > +
>> > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
>> > > > BRW_EXECUTE_4);
>> > > > +
>> > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > inst[i].dst_stride);
>> > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
>> > > > inst[i].dst_subnr);
>> > > > +
>> > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
>> > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > BRW_VERTICAL_STRIDE_4);
>> > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > BRW_WIDTH_2);
>> > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > +      } else {
>> > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > BRW_VERTICAL_STRIDE_4);
>> > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > BRW_WIDTH_4);
>> > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > +      }
>> > > > +
>> > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
>> > > > +
>> > > > +      clear_instructions(p);
>> > > > +   }
>> > > > +
>> > > > +   /* Conversion from F to HF has Dword destination alignment
>> > > > restriction
>> > > > +    * on CHV and SKL+ only
>> > > > +    */
>> > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > +
>> > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > +
>> > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
>> > > > +      EXPECT_FALSE(validate(p));
>> > > > +   } else {
>> > > > +      assert(devinfo.gen == 8);
>> > > > +      EXPECT_TRUE(validate(p));
>> > > > +   }
>> > > > +   clear_instructions(p);
>> > > > +
>> > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > +
>> > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > +
>> > > > +   EXPECT_TRUE(validate(p));
>> > > > +   clear_instructions(p);
>> > > > +}
>> > > > +
>> > > >  TEST_P(validation_test,
>> > > > vector_immediate_destination_alignment)
>> > > >  {
>> > > >     static const struct {
>> > > > -- 
>> > > > 2.17.1
>> > > > 
>> > > > _______________________________________________
>> > > > mesa-dev mailing list
>> > > > mesa-dev@lists.freedesktop.org
>> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> > > Iago Toral <itoral@igalia.com> writes:
> > > 
> > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
> > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > 
> > > > > > ---
> > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > ++++++++++++-
> > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > ++++++++++++++++++++++++
> > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > @@ -531,7 +531,69 @@
> > > > > > general_restrictions_based_on_operand_types(const struct
> > > > > > gen_device_info *devinf
> > > > > >         exec_type_size == 8 && dst_type_size == 4)
> > > > > >        dst_type_size = 8;
> > > > > >  
> > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > +   /* From the BDW+ PRM:
> > > > > > +    *
> > > > > > +    *    "There is no direct conversion from HF to DF or
> > > > > > DF to
> > > > > > HF.
> > > > > > +    *     There is no direct conversion from HF to Q/UQ or
> > > > > > Q/UQ to
> > > > > > HF."
> > > > > > +    */
> > > > > > +   enum brw_reg_type src0_type =
> > > > > > brw_inst_src0_type(devinfo,
> > > > > > inst);
> > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > BRW_OPCODE_MOV
> > > > > > &&
> > > > > 
> > > > > Why is only the MOV instruction handled here and
> > > > > below?  Aren't
> > > > > other
> > > > > instructions able to do implicit conversions?  Probably means
> > > > > you
> > > > > need
> > > > > to deal with two sources rather than one.
> > > > 
> > > > This comes from the programming notes of the MOV instruction
> > > > (Volume
> > > > 2a, Command Reference - Instructions - MOV), so it is described
> > > > specifically for the MOV instruction. I should probably have
> > > > made
> > > > this
> > > > clear in the comment.
> > > > 
> > > 
> > > Maybe the one above is specified in the MOV page only, probably
> > > due
> > > to
> > > an oversight (If these restrictions were really specific to the
> > > MOV
> > > instruction, what would prevent you from implementing such
> > > conversions
> > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
> > > which
> > > would be substantially more efficient than what you're doing in
> > > PATCH
> > > 02)
> > 
> > Instructions that take HF can only be strictly HF or mix F and HF
> > (mixed mode float), with MOV being the only exception. That means
> > that
> > any instruction like the one you use above are invalid. Maybe we
> > should
> > validate explicitly that instructions that use HF are strictly HF
> > or
> > mixed-float mode only?
> > 
> 
> So you're acknowledging that the conversions checked above are
> illegal
> whether the instruction is a MOV or something else.  Where are we
> preventing instructions other than MOV with such conversions from
> being
> accepted by the validator?

We aren't, because the validator is not checking, in general, for
accepted type combinations for specific instructions anywhere as far as
I can see. If we want to start doing this with HF conversions
specifically, I guess it is fine, but in case it is not clear, I think
it won't bring any immediate benefits with the
VK_KHR_shader_float16_int8 implementation since this series only ever
emits conversions involving HF operands via MOV instructions, which is
why I thought that validating that no direct MOV conversions from DF/Q
types ever happen was useful, since we have code in the driver to
handle this scenario.

> > >  but I see other restriction checks in this patch which are
> > > certainly specified in the generic regioning restrictions page
> > > and
> > > you're limiting to the MOV instruction...
> > 
> > There are two rules below:
> > 
> > 1. The one about conversions between integer and half-float. Again,
> > these can only happen through MOV for the same reasons, so I think
> > this
> > one should be fine.
> > 
> 
> Why do you think that can only happen through a MOV instruction?  The
> hardware spec lists the following as a valid example in the register
> region restrictions page:
> 
> > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w

I thought these instructions were not valid. I looked again at the BDW
PRM and indeed, it lists that HF destination for ADD is only allowed
with HF source. The same for all other instructions that can take HF
except (S)MOV. However, I have just now realized that the SKL+ PRM is
different and it allows ADD to do this, but as far as I can see, ADD is
the only exception: you cannot do this with MUL, MAD or MATH for
example. I am not sure what to make of this, but I guess that if we
want to validate this it doesn't hurt to be safe and assume that we can
have more instructions doing implicit conversions.

If course, there is the fact that we never emit these kind of
conversions anyway, at least not with this series (we don't even use
mixed-float mode) but I guess it is fine to add that validation if only
to be safe in the future if we ever decide to start using the extra
flexibility allowed by the hardware.


> > 2. The one about word destinations (of which we are only really
> > implementing conversions from F->HF). Here the rule is more generic
> > and
> > I agree that expanding this to include any other mixed float mode
> > instruction would make sense. However, validation for mixed float
> > mode
> > has its own set rules, some of which are incompatible with the
> > general
> > region restrictions being validated here, so I think it is
> > inconvenient
> > to try and do that validation  here (see patch 36 and then patch
> > 37).
> > What I would propose, if you agree, is that we only implement this
> > for
> > MOV here, and then for mixed float mode instructions, we implement
> > the
> > more generic version of this check (that would then go in patch
> > 37).
> > How does that sound?
> > 
> 
> I still don't understand why you want to implement the same
> restriction
> twice, once for MOV and once for all other mixed-mode
> instructions.  How
> is that more convenient?

The restrictions based on operand types that are checked in the
validator are specific to Byte or cases where the execution type is
larger than the destination stride, for which mixed float has a
different set of restrictions.

For example, for mixed float we have this rule:

"In Align1, destination stride can be smaller than execution type"

Which overrides this other from "General restrictions based on operand
types":

"Destination stride must be equal to the ratio of the sizes of the
execution data type to the destination type"

So I thought that it would make things easier to keep all restrictions
for mixed float separate and make sure that we skipped mixed float
instructions in general_restrictions_based_on_operand_types() so we
avoid having to add code to skip individual general restrictions that that are overriden for mixed float mode anyway.

Anyway, I'll try to rework the patches to do more generic validation of
HF conversions like you ask and see what kind of code we end up with.

> > > > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > > > type_sz(src0_type) == 8) ||
> > > > > > +             (dst_type_size == 8 && src0_type ==
> > > > > > BRW_REGISTER_TYPE_HF)),
> > > > > > +            "There are no direct conversion between 64-bit
> > > > > > types
> > > > > > and HF");
> > > > > > +
> > > > > > +   /* From the BDW+ PRM:
> > > > > > +    *
> > > > > > +    *   "Conversion between Integer and HF (Half Float)
> > > > > > must
> > > > > > be
> > > > > > +    *    DWord-aligned and strided by a DWord on the
> > > > > > destination."
> > > > > > +    *
> > > > > > +    * But this seems to be expanded on CHV and SKL+ by:
> > > > > > +    *
> > > > > > +    *   "There is a relaxed alignment rule for word
> > > > > > destinations.
> > > > > > When
> > > > > > +    *    the destination type is word (UW, W, HF),
> > > > > > destination
> > > > > > data types
> > > > > > +    *    can be aligned to either the lowest word or the
> > > > > > second
> > > > > > lowest
> > > > > > +    *    word of the execution channel. This means the
> > > > > > destination
> > > > > > data
> > > > > > +    *    words can be either all in the even word
> > > > > > locations or
> > > > > > all
> > > > > > in the
> > > > > > +    *    odd word locations."
> > > > > > +    *
> > > > > > +    * We do not implement the second rule as is though,
> > > > > > since
> > > > > > empirical testing
> > > > > > +    * shows inconsistencies:
> > > > > > +    *   - It suggests that packed 16-bit is not allowed,
> > > > > > which
> > > > > > is
> > > > > > not true.
> > > > > > +    *   - It suggests that conversions from Q/DF to W
> > > > > > (which
> > > > > > need
> > > > > > to be 64-bit
> > > > > > +    *     aligned on the destination) are not possible,
> > > > > > which
> > > > > > is
> > > > > > not true.
> > > > > > +    *   - It suggests that conversions from 16-bit
> > > > > > executions
> > > > > > types to W need
> > > > > > +    *     to be 32-bit aligned, which doesn't seem to be
> > > > > > necessary.
> > > > > > +    *
> > > > > > +    * So from this rule we only validate the implication
> > > > > > that
> > > > > > conversion from
> > > > > > +    * F to HF needs to be DWord aligned too (in BDW this
> > > > > > is
> > > > > > limited to
> > > > > > +    * conversions from integer types).
> > > > > > +    */
> > > > > > +   bool is_half_float_conversion =
> > > > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV &&
> > > > > > +       dst_type != src0_type &&
> > > > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > > > > > BRW_REGISTER_TYPE_HF);
> > > > > > +
> > > > > > +   if (is_half_float_conversion) {
> > > > > > +      assert(devinfo->gen >= 8);
> > > > > > +
> > > > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > > > brw_reg_type_is_integer(src0_type)) ||
> > > > > > +          (brw_reg_type_is_integer(dst_type) && src0_type
> > > > > > ==
> > > > > > BRW_REGISTER_TYPE_HF)) {
> > > > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
> > > > > > +                  "Conversions between integer and half-
> > > > > > float
> > > > > > must
> > > > > > be strided "
> > > > > > +                  "by a DWord on the destination");
> > > > > > +
> > > > > > +         unsigned subreg =
> > > > > > brw_inst_dst_da1_subreg_nr(devinfo,
> > > > > > inst);
> > > > > > +         ERROR_IF(subreg % 4 != 0,
> > > > > > +                  "Conversions between integer and half-
> > > > > > float
> > > > > > must
> > > > > > be aligned "
> > > > > > +                  "to a DWord on the destination");
> > > > > > +      } else if ((devinfo->is_cherryview || devinfo->gen
> > > > > > >= 9)
> > > > > > &&
> > > > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
> > > > > > +         ERROR_IF(dst_stride != 2,
> > > > > > +                  "Conversions to HF must have either all
> > > > > > words in
> > > > > > even word "
> > > > > > +                  "locations or all words in odd word
> > > > > > locations");
> > > > > > +      }
> > > > > > +
> > > > > > +   } else if (exec_type_size > dst_type_size) {
> > > > > >        if (!(dst_type_is_byte && inst_is_raw_move(devinfo,
> > > > > > inst)))
> > > > > > {
> > > > > >           ERROR_IF(dst_stride * dst_type_size !=
> > > > > > exec_type_size,
> > > > > >                    "Destination stride must be equal to the
> > > > > > ratio
> > > > > > of the sizes "
> > > > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > > > > > b/src/intel/compiler/test_eu_validate.cpp
> > > > > > index 73300b23122..1557b6d2452 100644
> > > > > > --- a/src/intel/compiler/test_eu_validate.cpp
> > > > > > +++ b/src/intel/compiler/test_eu_validate.cpp
> > > > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
> > > > > > byte_destination_relaxed_alignment)
> > > > > >     }
> > > > > >  }
> > > > > >  
> > > > > > +TEST_P(validation_test, half_float_conversion)
> > > > > > +{
> > > > > > +   static const struct {
> > > > > > +      enum brw_reg_type dst_type;
> > > > > > +      enum brw_reg_type src_type;
> > > > > > +      unsigned dst_stride;
> > > > > > +      unsigned dst_subnr;
> > > > > > +      bool expected_result;
> > > > > > +   } inst[] = {
> > > > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
> > > > > > expected_result)  \
> > > > > > +      {                                                   
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +         BRW_REGISTER_TYPE_##dst_type,                    
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +         BRW_REGISTER_TYPE_##src_type,                    
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,              
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +         dst_subnr,                                       
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +         expected_result,                                 
> > > > > >     
> > > > > >     
> > > > > >         \
> > > > > > +      }
> > > > > > +
> > > > > > +      /* MOV to half-float destination (F source handled
> > > > > > later) */
> > > > > > +      INST(HF,  B, 1, 0, false),
> > > > > > +      INST(HF,  W, 1, 0, false),
> > > > > > +      INST(HF, HF, 1, 0, true),
> > > > > > +      INST(HF, HF, 1, 2, true),
> > > > > > +      INST(HF,  D, 1, 0, false),
> > > > > > +      INST(HF,  Q, 1, 0, false),
> > > > > > +      INST(HF, DF, 1, 0, false),
> > > > > > +      INST(HF,  B, 2, 0, true),
> > > > > > +      INST(HF,  B, 2, 2, false),
> > > > > > +      INST(HF,  W, 2, 0, true),
> > > > > > +      INST(HF,  W, 2, 2, false),
> > > > > > +      INST(HF, HF, 2, 0, true),
> > > > > > +      INST(HF, HF, 2, 2, true),
> > > > > > +      INST(HF,  D, 2, 0, true),
> > > > > > +      INST(HF,  D, 2, 2, false),
> > > > > > +      INST(HF,  Q, 2, 0, false),
> > > > > > +      INST(HF, DF, 2, 0, false),
> > > > > > +      INST(HF,  B, 4, 0, false),
> > > > > > +      INST(HF,  W, 4, 0, false),
> > > > > > +      INST(HF, HF, 4, 0, true),
> > > > > > +      INST(HF, HF, 4, 2, true),
> > > > > > +      INST(HF,  D, 4, 0, false),
> > > > > > +      INST(HF,  Q, 4, 0, false),
> > > > > > +      INST(HF, DF, 4, 0, false),
> > > > > > +
> > > > > > +      /* MOV from half-float source */
> > > > > > +      INST( B, HF, 1, 0, false),
> > > > > > +      INST( W, HF, 1, 0, false),
> > > > > > +      INST( D, HF, 1, 0, true),
> > > > > > +      INST( D, HF, 1, 4, true),
> > > > > > +      INST( F, HF, 1, 0, true),
> > > > > > +      INST( F, HF, 1, 4, true),
> > > > > > +      INST( Q, HF, 1, 0, false),
> > > > > > +      INST(DF, HF, 1, 0, false),
> > > > > > +      INST( B, HF, 2, 0, false),
> > > > > > +      INST( W, HF, 2, 0, true),
> > > > > > +      INST( W, HF, 2, 2, false),
> > > > > > +      INST( D, HF, 2, 0, false),
> > > > > > +      INST( F, HF, 2, 0, true),
> > > > > > +      INST( F, HF, 2, 2, true),
> > > > > > +      INST( B, HF, 4, 0, true),
> > > > > > +      INST( B, HF, 4, 1, false),
> > > > > > +      INST( W, HF, 4, 0, false),
> > > > > > +
> > > > > > +#undef INST
> > > > > > +   };
> > > > > > +
> > > > > > +   if (devinfo.gen < 8)
> > > > > > +      return;
> > > > > > +
> > > > > > +   for (unsigned i = 0; i < sizeof(inst) /
> > > > > > sizeof(inst[0]);
> > > > > > i++) {
> > > > > > +      if (!devinfo.has_64bit_types &&
> > > > > > +          (type_sz(inst[i].src_type) == 8 ||
> > > > > > type_sz(inst[i].dst_type) == 8)) {
> > > > > > +         continue;
> > > > > > +      }
> > > > > > +
> > > > > > +      brw_MOV(p, retype(g0, inst[i].dst_type), retype(g0,
> > > > > > inst[i].src_type));
> > > > > > +
> > > > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
> > > > > > BRW_EXECUTE_4);
> > > > > > +
> > > > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > inst[i].dst_stride);
> > > > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo, last_inst,
> > > > > > inst[i].dst_subnr);
> > > > > > +
> > > > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > > > BRW_VERTICAL_STRIDE_4);
> > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > > > BRW_WIDTH_2);
> > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > > > BRW_HORIZONTAL_STRIDE_2);
> > > > > > +      } else {
> > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > > > BRW_VERTICAL_STRIDE_4);
> > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > > > BRW_WIDTH_4);
> > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > > > BRW_HORIZONTAL_STRIDE_1);
> > > > > > +      }
> > > > > > +
> > > > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
> > > > > > +
> > > > > > +      clear_instructions(p);
> > > > > > +   }
> > > > > > +
> > > > > > +   /* Conversion from F to HF has Dword destination
> > > > > > alignment
> > > > > > restriction
> > > > > > +    * on CHV and SKL+ only
> > > > > > +    */
> > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > > > +
> > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > BRW_HORIZONTAL_STRIDE_1);
> > > > > > +
> > > > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> > > > > > +      EXPECT_FALSE(validate(p));
> > > > > > +   } else {
> > > > > > +      assert(devinfo.gen == 8);
> > > > > > +      EXPECT_TRUE(validate(p));
> > > > > > +   }
> > > > > > +   clear_instructions(p);
> > > > > > +
> > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > > > +
> > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > BRW_HORIZONTAL_STRIDE_2);
> > > > > > +
> > > > > > +   EXPECT_TRUE(validate(p));
> > > > > > +   clear_instructions(p);
> > > > > > +}
> > > > > > +
> > > > > >  TEST_P(validation_test,
> > > > > > vector_immediate_destination_alignment)
> > > > > >  {
> > > > > >     static const struct {
> > > > > > -- 
> > > > > > 2.17.1
> > > > > > 
> > > > > > _______________________________________________
> > > > > > mesa-dev mailing list
> > > > > > mesa-dev@lists.freedesktop.org
> > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Fri, 2019-03-01 at 09:39 +0100, Iago Toral wrote:
> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > Iago Toral <itoral@igalia.com> writes:
> > 
> > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> > > > Iago Toral <itoral@igalia.com> writes:
> > > > 
> > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
> > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > 
> > > > > > > ---
> > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > > ++++++++++++-
> > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > > ++++++++++++++++++++++++
> > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > @@ -531,7 +531,69 @@
> > > > > > > general_restrictions_based_on_operand_types(const struct
> > > > > > > gen_device_info *devinf
> > > > > > >         exec_type_size == 8 && dst_type_size == 4)
> > > > > > >        dst_type_size = 8;
> > > > > > >  
> > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > +   /* From the BDW+ PRM:
> > > > > > > +    *
> > > > > > > +    *    "There is no direct conversion from HF to DF or
> > > > > > > DF to
> > > > > > > HF.
> > > > > > > +    *     There is no direct conversion from HF to Q/UQ
> > > > > > > or
> > > > > > > Q/UQ to
> > > > > > > HF."
> > > > > > > +    */
> > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > inst);
> > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > > BRW_OPCODE_MOV
> > > > > > > &&
> > > > > > 
> > > > > > Why is only the MOV instruction handled here and
> > > > > > below?  Aren't
> > > > > > other
> > > > > > instructions able to do implicit conversions?  Probably
> > > > > > means
> > > > > > you
> > > > > > need
> > > > > > to deal with two sources rather than one.
> > > > > 
> > > > > This comes from the programming notes of the MOV instruction
> > > > > (Volume
> > > > > 2a, Command Reference - Instructions - MOV), so it is
> > > > > described
> > > > > specifically for the MOV instruction. I should probably have
> > > > > made
> > > > > this
> > > > > clear in the comment.
> > > > > 
> > > > 
> > > > Maybe the one above is specified in the MOV page only, probably
> > > > due
> > > > to
> > > > an oversight (If these restrictions were really specific to the
> > > > MOV
> > > > instruction, what would prevent you from implementing such
> > > > conversions
> > > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
> > > > which
> > > > would be substantially more efficient than what you're doing in
> > > > PATCH
> > > > 02)
> > > 
> > > Instructions that take HF can only be strictly HF or mix F and HF
> > > (mixed mode float), with MOV being the only exception. That means
> > > that
> > > any instruction like the one you use above are invalid. Maybe we
> > > should
> > > validate explicitly that instructions that use HF are strictly HF
> > > or
> > > mixed-float mode only?
> > > 
> > 
> > So you're acknowledging that the conversions checked above are
> > illegal
> > whether the instruction is a MOV or something else.  Where are we
> > preventing instructions other than MOV with such conversions from
> > being
> > accepted by the validator?
> 
> We aren't, because the validator is not checking, in general, for
> accepted type combinations for specific instructions anywhere as far
> as
> I can see. If we want to start doing this with HF conversions
> specifically, I guess it is fine, but in case it is not clear, I
> think
> it won't bring any immediate benefits with the
> VK_KHR_shader_float16_int8 implementation since this series only ever
> emits conversions involving HF operands via MOV instructions, which
> is
> why I thought that validating that no direct MOV conversions from
> DF/Q
> types ever happen was useful, since we have code in the driver to
> handle this scenario.
> 
> > > >  but I see other restriction checks in this patch which are
> > > > certainly specified in the generic regioning restrictions page
> > > > and
> > > > you're limiting to the MOV instruction...
> > > 
> > > There are two rules below:
> > > 
> > > 1. The one about conversions between integer and half-float.
> > > Again,
> > > these can only happen through MOV for the same reasons, so I
> > > think
> > > this
> > > one should be fine.
> > > 
> > 
> > Why do you think that can only happen through a MOV
> > instruction?  The
> > hardware spec lists the following as a valid example in the
> > register
> > region restrictions page:
> > 
> > > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
> 
> I thought these instructions were not valid. I looked again at the
> BDW
> PRM and indeed, it lists that HF destination for ADD is only allowed
> with HF source. The same for all other instructions that can take HF
> except (S)MOV. However, I have just now realized that the SKL+ PRM is
> different and it allows ADD to do this, but as far as I can see, ADD
> is
> the only exception: you cannot do this with MUL, MAD or MATH for
> example. I am not sure what to make of this, but I guess that if we
> want to validate this it doesn't hurt to be safe and assume that we
> can
> have more instructions doing implicit conversions.
> 
> If course, there is the fact that we never emit these kind of
> conversions anyway, at least not with this series (we don't even use
> mixed-float mode) but I guess it is fine to add that validation if
> only
> to be safe in the future if we ever decide to start using the extra
> flexibility allowed by the hardware.

I am just now realizing that in Align16 mode, we always expect
destination horizontal stride to be exactly 1, no matter the type. This
means that Align16 mixed float instructions with HF destination
(implicit conversion from F->HF) would also require this (actually the
docs make this requirement explicit for mixed float instructions
anyway). This would make it impossible to satisfy the rule for DWord
alignment on F->HF conversions for implicit conversions in mixed float
mode instructions. I guess these conversion restrictions are formulated
specifically for Align1 and should be skipped for Align16, even if that
is not explicitly stated in the docs. Does that make sense?

> 
> > > 2. The one about word destinations (of which we are only really
> > > implementing conversions from F->HF). Here the rule is more
> > > generic
> > > and
> > > I agree that expanding this to include any other mixed float mode
> > > instruction would make sense. However, validation for mixed float
> > > mode
> > > has its own set rules, some of which are incompatible with the
> > > general
> > > region restrictions being validated here, so I think it is
> > > inconvenient
> > > to try and do that validation  here (see patch 36 and then patch
> > > 37).
> > > What I would propose, if you agree, is that we only implement
> > > this
> > > for
> > > MOV here, and then for mixed float mode instructions, we
> > > implement
> > > the
> > > more generic version of this check (that would then go in patch
> > > 37).
> > > How does that sound?
> > > 
> > 
> > I still don't understand why you want to implement the same
> > restriction
> > twice, once for MOV and once for all other mixed-mode
> > instructions.  How
> > is that more convenient?
> 
> The restrictions based on operand types that are checked in the
> validator are specific to Byte or cases where the execution type is
> larger than the destination stride, for which mixed float has a
> different set of restrictions.
> 
> For example, for mixed float we have this rule:
> 
> "In Align1, destination stride can be smaller than execution type"
> 
> Which overrides this other from "General restrictions based on
> operand
> types":
> 
> "Destination stride must be equal to the ratio of the sizes of the
> execution data type to the destination type"
> 
> So I thought that it would make things easier to keep all
> restrictions
> for mixed float separate and make sure that we skipped mixed float
> instructions in general_restrictions_based_on_operand_types() so we
> avoid having to add code to skip individual general restrictions that
> that are overriden for mixed float mode anyway.
> 
> Anyway, I'll try to rework the patches to do more generic validation
> of
> HF conversions like you ask and see what kind of code we end up with.
> 
> > > > > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > > > > type_sz(src0_type) == 8) ||
> > > > > > > +             (dst_type_size == 8 && src0_type ==
> > > > > > > BRW_REGISTER_TYPE_HF)),
> > > > > > > +            "There are no direct conversion between 64-
> > > > > > > bit
> > > > > > > types
> > > > > > > and HF");
> > > > > > > +
> > > > > > > +   /* From the BDW+ PRM:
> > > > > > > +    *
> > > > > > > +    *   "Conversion between Integer and HF (Half Float)
> > > > > > > must
> > > > > > > be
> > > > > > > +    *    DWord-aligned and strided by a DWord on the
> > > > > > > destination."
> > > > > > > +    *
> > > > > > > +    * But this seems to be expanded on CHV and SKL+ by:
> > > > > > > +    *
> > > > > > > +    *   "There is a relaxed alignment rule for word
> > > > > > > destinations.
> > > > > > > When
> > > > > > > +    *    the destination type is word (UW, W, HF),
> > > > > > > destination
> > > > > > > data types
> > > > > > > +    *    can be aligned to either the lowest word or the
> > > > > > > second
> > > > > > > lowest
> > > > > > > +    *    word of the execution channel. This means the
> > > > > > > destination
> > > > > > > data
> > > > > > > +    *    words can be either all in the even word
> > > > > > > locations or
> > > > > > > all
> > > > > > > in the
> > > > > > > +    *    odd word locations."
> > > > > > > +    *
> > > > > > > +    * We do not implement the second rule as is though,
> > > > > > > since
> > > > > > > empirical testing
> > > > > > > +    * shows inconsistencies:
> > > > > > > +    *   - It suggests that packed 16-bit is not allowed,
> > > > > > > which
> > > > > > > is
> > > > > > > not true.
> > > > > > > +    *   - It suggests that conversions from Q/DF to W
> > > > > > > (which
> > > > > > > need
> > > > > > > to be 64-bit
> > > > > > > +    *     aligned on the destination) are not possible,
> > > > > > > which
> > > > > > > is
> > > > > > > not true.
> > > > > > > +    *   - It suggests that conversions from 16-bit
> > > > > > > executions
> > > > > > > types to W need
> > > > > > > +    *     to be 32-bit aligned, which doesn't seem to be
> > > > > > > necessary.
> > > > > > > +    *
> > > > > > > +    * So from this rule we only validate the implication
> > > > > > > that
> > > > > > > conversion from
> > > > > > > +    * F to HF needs to be DWord aligned too (in BDW this
> > > > > > > is
> > > > > > > limited to
> > > > > > > +    * conversions from integer types).
> > > > > > > +    */
> > > > > > > +   bool is_half_float_conversion =
> > > > > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
> > > > > > > &&
> > > > > > > +       dst_type != src0_type &&
> > > > > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
> > > > > > > BRW_REGISTER_TYPE_HF);
> > > > > > > +
> > > > > > > +   if (is_half_float_conversion) {
> > > > > > > +      assert(devinfo->gen >= 8);
> > > > > > > +
> > > > > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
> > > > > > > brw_reg_type_is_integer(src0_type)) ||
> > > > > > > +          (brw_reg_type_is_integer(dst_type) &&
> > > > > > > src0_type
> > > > > > > ==
> > > > > > > BRW_REGISTER_TYPE_HF)) {
> > > > > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
> > > > > > > +                  "Conversions between integer and half-
> > > > > > > float
> > > > > > > must
> > > > > > > be strided "
> > > > > > > +                  "by a DWord on the destination");
> > > > > > > +
> > > > > > > +         unsigned subreg =
> > > > > > > brw_inst_dst_da1_subreg_nr(devinfo,
> > > > > > > inst);
> > > > > > > +         ERROR_IF(subreg % 4 != 0,
> > > > > > > +                  "Conversions between integer and half-
> > > > > > > float
> > > > > > > must
> > > > > > > be aligned "
> > > > > > > +                  "to a DWord on the destination");
> > > > > > > +      } else if ((devinfo->is_cherryview || devinfo->gen
> > > > > > > > = 9)
> > > > > > > 
> > > > > > > &&
> > > > > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
> > > > > > > +         ERROR_IF(dst_stride != 2,
> > > > > > > +                  "Conversions to HF must have either
> > > > > > > all
> > > > > > > words in
> > > > > > > even word "
> > > > > > > +                  "locations or all words in odd word
> > > > > > > locations");
> > > > > > > +      }
> > > > > > > +
> > > > > > > +   } else if (exec_type_size > dst_type_size) {
> > > > > > >        if (!(dst_type_is_byte &&
> > > > > > > inst_is_raw_move(devinfo,
> > > > > > > inst)))
> > > > > > > {
> > > > > > >           ERROR_IF(dst_stride * dst_type_size !=
> > > > > > > exec_type_size,
> > > > > > >                    "Destination stride must be equal to
> > > > > > > the
> > > > > > > ratio
> > > > > > > of the sizes "
> > > > > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
> > > > > > > b/src/intel/compiler/test_eu_validate.cpp
> > > > > > > index 73300b23122..1557b6d2452 100644
> > > > > > > --- a/src/intel/compiler/test_eu_validate.cpp
> > > > > > > +++ b/src/intel/compiler/test_eu_validate.cpp
> > > > > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
> > > > > > > byte_destination_relaxed_alignment)
> > > > > > >     }
> > > > > > >  }
> > > > > > >  
> > > > > > > +TEST_P(validation_test, half_float_conversion)
> > > > > > > +{
> > > > > > > +   static const struct {
> > > > > > > +      enum brw_reg_type dst_type;
> > > > > > > +      enum brw_reg_type src_type;
> > > > > > > +      unsigned dst_stride;
> > > > > > > +      unsigned dst_subnr;
> > > > > > > +      bool expected_result;
> > > > > > > +   } inst[] = {
> > > > > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
> > > > > > > expected_result)  \
> > > > > > > +      {                                                 
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +         BRW_REGISTER_TYPE_##dst_type,                  
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +         BRW_REGISTER_TYPE_##src_type,                  
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,            
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +         dst_subnr,                                     
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +         expected_result,                               
> > > > > > >   
> > > > > > >     
> > > > > > >     
> > > > > > >         \
> > > > > > > +      }
> > > > > > > +
> > > > > > > +      /* MOV to half-float destination (F source handled
> > > > > > > later) */
> > > > > > > +      INST(HF,  B, 1, 0, false),
> > > > > > > +      INST(HF,  W, 1, 0, false),
> > > > > > > +      INST(HF, HF, 1, 0, true),
> > > > > > > +      INST(HF, HF, 1, 2, true),
> > > > > > > +      INST(HF,  D, 1, 0, false),
> > > > > > > +      INST(HF,  Q, 1, 0, false),
> > > > > > > +      INST(HF, DF, 1, 0, false),
> > > > > > > +      INST(HF,  B, 2, 0, true),
> > > > > > > +      INST(HF,  B, 2, 2, false),
> > > > > > > +      INST(HF,  W, 2, 0, true),
> > > > > > > +      INST(HF,  W, 2, 2, false),
> > > > > > > +      INST(HF, HF, 2, 0, true),
> > > > > > > +      INST(HF, HF, 2, 2, true),
> > > > > > > +      INST(HF,  D, 2, 0, true),
> > > > > > > +      INST(HF,  D, 2, 2, false),
> > > > > > > +      INST(HF,  Q, 2, 0, false),
> > > > > > > +      INST(HF, DF, 2, 0, false),
> > > > > > > +      INST(HF,  B, 4, 0, false),
> > > > > > > +      INST(HF,  W, 4, 0, false),
> > > > > > > +      INST(HF, HF, 4, 0, true),
> > > > > > > +      INST(HF, HF, 4, 2, true),
> > > > > > > +      INST(HF,  D, 4, 0, false),
> > > > > > > +      INST(HF,  Q, 4, 0, false),
> > > > > > > +      INST(HF, DF, 4, 0, false),
> > > > > > > +
> > > > > > > +      /* MOV from half-float source */
> > > > > > > +      INST( B, HF, 1, 0, false),
> > > > > > > +      INST( W, HF, 1, 0, false),
> > > > > > > +      INST( D, HF, 1, 0, true),
> > > > > > > +      INST( D, HF, 1, 4, true),
> > > > > > > +      INST( F, HF, 1, 0, true),
> > > > > > > +      INST( F, HF, 1, 4, true),
> > > > > > > +      INST( Q, HF, 1, 0, false),
> > > > > > > +      INST(DF, HF, 1, 0, false),
> > > > > > > +      INST( B, HF, 2, 0, false),
> > > > > > > +      INST( W, HF, 2, 0, true),
> > > > > > > +      INST( W, HF, 2, 2, false),
> > > > > > > +      INST( D, HF, 2, 0, false),
> > > > > > > +      INST( F, HF, 2, 0, true),
> > > > > > > +      INST( F, HF, 2, 2, true),
> > > > > > > +      INST( B, HF, 4, 0, true),
> > > > > > > +      INST( B, HF, 4, 1, false),
> > > > > > > +      INST( W, HF, 4, 0, false),
> > > > > > > +
> > > > > > > +#undef INST
> > > > > > > +   };
> > > > > > > +
> > > > > > > +   if (devinfo.gen < 8)
> > > > > > > +      return;
> > > > > > > +
> > > > > > > +   for (unsigned i = 0; i < sizeof(inst) /
> > > > > > > sizeof(inst[0]);
> > > > > > > i++) {
> > > > > > > +      if (!devinfo.has_64bit_types &&
> > > > > > > +          (type_sz(inst[i].src_type) == 8 ||
> > > > > > > type_sz(inst[i].dst_type) == 8)) {
> > > > > > > +         continue;
> > > > > > > +      }
> > > > > > > +
> > > > > > > +      brw_MOV(p, retype(g0, inst[i].dst_type),
> > > > > > > retype(g0,
> > > > > > > inst[i].src_type));
> > > > > > > +
> > > > > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
> > > > > > > BRW_EXECUTE_4);
> > > > > > > +
> > > > > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > > inst[i].dst_stride);
> > > > > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo,
> > > > > > > last_inst,
> > > > > > > inst[i].dst_subnr);
> > > > > > > +
> > > > > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > > > > BRW_VERTICAL_STRIDE_4);
> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > > > > BRW_WIDTH_2);
> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > > > > BRW_HORIZONTAL_STRIDE_2);
> > > > > > > +      } else {
> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
> > > > > > > BRW_VERTICAL_STRIDE_4);
> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
> > > > > > > BRW_WIDTH_4);
> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
> > > > > > > BRW_HORIZONTAL_STRIDE_1);
> > > > > > > +      }
> > > > > > > +
> > > > > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
> > > > > > > +
> > > > > > > +      clear_instructions(p);
> > > > > > > +   }
> > > > > > > +
> > > > > > > +   /* Conversion from F to HF has Dword destination
> > > > > > > alignment
> > > > > > > restriction
> > > > > > > +    * on CHV and SKL+ only
> > > > > > > +    */
> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > > > > +
> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > > BRW_HORIZONTAL_STRIDE_1);
> > > > > > > +
> > > > > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
> > > > > > > +      EXPECT_FALSE(validate(p));
> > > > > > > +   } else {
> > > > > > > +      assert(devinfo.gen == 8);
> > > > > > > +      EXPECT_TRUE(validate(p));
> > > > > > > +   }
> > > > > > > +   clear_instructions(p);
> > > > > > > +
> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
> > > > > > > +
> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
> > > > > > > BRW_HORIZONTAL_STRIDE_2);
> > > > > > > +
> > > > > > > +   EXPECT_TRUE(validate(p));
> > > > > > > +   clear_instructions(p);
> > > > > > > +}
> > > > > > > +
> > > > > > >  TEST_P(validation_test,
> > > > > > > vector_immediate_destination_alignment)
> > > > > > >  {
> > > > > > >     static const struct {
> > > > > > > -- 
> > > > > > > 2.17.1
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > mesa-dev mailing list
> > > > > > > mesa-dev@lists.freedesktop.org
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Iago Toral <itoral@igalia.com> writes:

> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> Iago Toral <itoral@igalia.com> writes:
>> 
>> > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > Iago Toral <itoral@igalia.com> writes:
>> > > 
>> > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
>> > > > > 
>> > > > > > ---
>> > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
>> > > > > > ++++++++++++-
>> > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > ++++++++++++++++++++++++
>> > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > 
>> > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > @@ -531,7 +531,69 @@
>> > > > > > general_restrictions_based_on_operand_types(const struct
>> > > > > > gen_device_info *devinf
>> > > > > >         exec_type_size == 8 && dst_type_size == 4)
>> > > > > >        dst_type_size = 8;
>> > > > > >  
>> > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > +   /* From the BDW+ PRM:
>> > > > > > +    *
>> > > > > > +    *    "There is no direct conversion from HF to DF or
>> > > > > > DF to
>> > > > > > HF.
>> > > > > > +    *     There is no direct conversion from HF to Q/UQ or
>> > > > > > Q/UQ to
>> > > > > > HF."
>> > > > > > +    */
>> > > > > > +   enum brw_reg_type src0_type =
>> > > > > > brw_inst_src0_type(devinfo,
>> > > > > > inst);
>> > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > BRW_OPCODE_MOV
>> > > > > > &&
>> > > > > 
>> > > > > Why is only the MOV instruction handled here and
>> > > > > below?  Aren't
>> > > > > other
>> > > > > instructions able to do implicit conversions?  Probably means
>> > > > > you
>> > > > > need
>> > > > > to deal with two sources rather than one.
>> > > > 
>> > > > This comes from the programming notes of the MOV instruction
>> > > > (Volume
>> > > > 2a, Command Reference - Instructions - MOV), so it is described
>> > > > specifically for the MOV instruction. I should probably have
>> > > > made
>> > > > this
>> > > > clear in the comment.
>> > > > 
>> > > 
>> > > Maybe the one above is specified in the MOV page only, probably
>> > > due
>> > > to
>> > > an oversight (If these restrictions were really specific to the
>> > > MOV
>> > > instruction, what would prevent you from implementing such
>> > > conversions
>> > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
>> > > which
>> > > would be substantially more efficient than what you're doing in
>> > > PATCH
>> > > 02)
>> > 
>> > Instructions that take HF can only be strictly HF or mix F and HF
>> > (mixed mode float), with MOV being the only exception. That means
>> > that
>> > any instruction like the one you use above are invalid. Maybe we
>> > should
>> > validate explicitly that instructions that use HF are strictly HF
>> > or
>> > mixed-float mode only?
>> > 
>> 
>> So you're acknowledging that the conversions checked above are
>> illegal
>> whether the instruction is a MOV or something else.  Where are we
>> preventing instructions other than MOV with such conversions from
>> being
>> accepted by the validator?
>
> We aren't, because the validator is not checking, in general, for
> accepted type combinations for specific instructions anywhere as far as
> I can see.

Luckily these type conversion restrictions aren't really specific to any
instruction AFAICT, even though they only seem to be mentioned
explicitly for the MOV instruction...

> If we want to start doing this with HF conversions specifically, I
> guess it is fine, but in case it is not clear, I think it won't bring
> any immediate benefits with the VK_KHR_shader_float16_int8
> implementation since this series only ever emits conversions involving
> HF operands via MOV instructions,

Yes, I can see that's the intention of this series, but how can you make
sure that nothing in the optimizer is breaking your assumption if you
don't add some validator code to verify the claim of your last
paragraph?

> which is why I thought that validating that no direct MOV conversions
> from DF/Q types ever happen was useful, since we have code in the
> driver to handle this scenario.
>
>[...]
>> 
>> I still don't understand why you want to implement the same
>> restriction
>> twice, once for MOV and once for all other mixed-mode
>> instructions.  How
>> is that more convenient?
>
> The restrictions based on operand types that are checked in the
> validator are specific to Byte or cases where the execution type is
> larger than the destination stride, for which mixed float has a
> different set of restrictions.
>
> For example, for mixed float we have this rule:
>
> "In Align1, destination stride can be smaller than execution type"
>
> Which overrides this other from "General restrictions based on operand
> types":
>
> "Destination stride must be equal to the ratio of the sizes of the
> execution data type to the destination type"
>
> So I thought that it would make things easier to keep all restrictions
> for mixed float separate and make sure that we skipped mixed float
> instructions in general_restrictions_based_on_operand_types() so we
> avoid having to add code to skip individual general restrictions that that are overriden for mixed float mode anyway.
>

I'm fine with that, but it doesn't seem like what this patch is doing...
Isn't it adding code to validate mixed-mode float MOV instructions in
general_restrictions_based_on_operand_types()?

> Anyway, I'll try to rework the patches to do more generic validation of
> HF conversions like you ask and see what kind of code we end up with.
>

Thanks.

>[...]
Iago Toral <itoral@igalia.com> writes:

> On Fri, 2019-03-01 at 09:39 +0100, Iago Toral wrote:
>> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > Iago Toral <itoral@igalia.com> writes:
>> > 
>> > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > Iago Toral <itoral@igalia.com> writes:
>> > > > 
>> > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
>> > > > > > 
>> > > > > > > ---
>> > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
>> > > > > > > ++++++++++++-
>> > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > ++++++++++++++++++++++++
>> > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > 
>> > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > general_restrictions_based_on_operand_types(const struct
>> > > > > > > gen_device_info *devinf
>> > > > > > >         exec_type_size == 8 && dst_type_size == 4)
>> > > > > > >        dst_type_size = 8;
>> > > > > > >  
>> > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > +    *
>> > > > > > > +    *    "There is no direct conversion from HF to DF or
>> > > > > > > DF to
>> > > > > > > HF.
>> > > > > > > +    *     There is no direct conversion from HF to Q/UQ
>> > > > > > > or
>> > > > > > > Q/UQ to
>> > > > > > > HF."
>> > > > > > > +    */
>> > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > inst);
>> > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > BRW_OPCODE_MOV
>> > > > > > > &&
>> > > > > > 
>> > > > > > Why is only the MOV instruction handled here and
>> > > > > > below?  Aren't
>> > > > > > other
>> > > > > > instructions able to do implicit conversions?  Probably
>> > > > > > means
>> > > > > > you
>> > > > > > need
>> > > > > > to deal with two sources rather than one.
>> > > > > 
>> > > > > This comes from the programming notes of the MOV instruction
>> > > > > (Volume
>> > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > described
>> > > > > specifically for the MOV instruction. I should probably have
>> > > > > made
>> > > > > this
>> > > > > clear in the comment.
>> > > > > 
>> > > > 
>> > > > Maybe the one above is specified in the MOV page only, probably
>> > > > due
>> > > > to
>> > > > an oversight (If these restrictions were really specific to the
>> > > > MOV
>> > > > instruction, what would prevent you from implementing such
>> > > > conversions
>> > > > through a different instruction?  E.g. "ADD dst:df, src:hf, 0"
>> > > > which
>> > > > would be substantially more efficient than what you're doing in
>> > > > PATCH
>> > > > 02)
>> > > 
>> > > Instructions that take HF can only be strictly HF or mix F and HF
>> > > (mixed mode float), with MOV being the only exception. That means
>> > > that
>> > > any instruction like the one you use above are invalid. Maybe we
>> > > should
>> > > validate explicitly that instructions that use HF are strictly HF
>> > > or
>> > > mixed-float mode only?
>> > > 
>> > 
>> > So you're acknowledging that the conversions checked above are
>> > illegal
>> > whether the instruction is a MOV or something else.  Where are we
>> > preventing instructions other than MOV with such conversions from
>> > being
>> > accepted by the validator?
>> 
>> We aren't, because the validator is not checking, in general, for
>> accepted type combinations for specific instructions anywhere as far
>> as
>> I can see. If we want to start doing this with HF conversions
>> specifically, I guess it is fine, but in case it is not clear, I
>> think
>> it won't bring any immediate benefits with the
>> VK_KHR_shader_float16_int8 implementation since this series only ever
>> emits conversions involving HF operands via MOV instructions, which
>> is
>> why I thought that validating that no direct MOV conversions from
>> DF/Q
>> types ever happen was useful, since we have code in the driver to
>> handle this scenario.
>> 
>> > > >  but I see other restriction checks in this patch which are
>> > > > certainly specified in the generic regioning restrictions page
>> > > > and
>> > > > you're limiting to the MOV instruction...
>> > > 
>> > > There are two rules below:
>> > > 
>> > > 1. The one about conversions between integer and half-float.
>> > > Again,
>> > > these can only happen through MOV for the same reasons, so I
>> > > think
>> > > this
>> > > one should be fine.
>> > > 
>> > 
>> > Why do you think that can only happen through a MOV
>> > instruction?  The
>> > hardware spec lists the following as a valid example in the
>> > register
>> > region restrictions page:
>> > 
>> > > add (8) r10.0<2>:hf r11.0<8;8,1>:w r12.0<8;8,1>:w
>> 
>> I thought these instructions were not valid. I looked again at the
>> BDW
>> PRM and indeed, it lists that HF destination for ADD is only allowed
>> with HF source. The same for all other instructions that can take HF
>> except (S)MOV. However, I have just now realized that the SKL+ PRM is
>> different and it allows ADD to do this, but as far as I can see, ADD
>> is
>> the only exception: you cannot do this with MUL, MAD or MATH for
>> example. I am not sure what to make of this, but I guess that if we
>> want to validate this it doesn't hurt to be safe and assume that we
>> can
>> have more instructions doing implicit conversions.
>> 
>> If course, there is the fact that we never emit these kind of
>> conversions anyway, at least not with this series (we don't even use
>> mixed-float mode) but I guess it is fine to add that validation if
>> only
>> to be safe in the future if we ever decide to start using the extra
>> flexibility allowed by the hardware.
>
> I am just now realizing that in Align16 mode, we always expect
> destination horizontal stride to be exactly 1, no matter the type. This
> means that Align16 mixed float instructions with HF destination
> (implicit conversion from F->HF) would also require this (actually the
> docs make this requirement explicit for mixed float instructions
> anyway). This would make it impossible to satisfy the rule for DWord
> alignment on F->HF conversions for implicit conversions in mixed float
> mode instructions. I guess these conversion restrictions are formulated
> specifically for Align1 and should be skipped for Align16, even if that
> is not explicitly stated in the docs. Does that make sense?
>

Yes, that's what I would assume.  The hardware spec is seriously
contradictory indeed...

>> 
>> > > 2. The one about word destinations (of which we are only really
>> > > implementing conversions from F->HF). Here the rule is more
>> > > generic
>> > > and
>> > > I agree that expanding this to include any other mixed float mode
>> > > instruction would make sense. However, validation for mixed float
>> > > mode
>> > > has its own set rules, some of which are incompatible with the
>> > > general
>> > > region restrictions being validated here, so I think it is
>> > > inconvenient
>> > > to try and do that validation  here (see patch 36 and then patch
>> > > 37).
>> > > What I would propose, if you agree, is that we only implement
>> > > this
>> > > for
>> > > MOV here, and then for mixed float mode instructions, we
>> > > implement
>> > > the
>> > > more generic version of this check (that would then go in patch
>> > > 37).
>> > > How does that sound?
>> > > 
>> > 
>> > I still don't understand why you want to implement the same
>> > restriction
>> > twice, once for MOV and once for all other mixed-mode
>> > instructions.  How
>> > is that more convenient?
>> 
>> The restrictions based on operand types that are checked in the
>> validator are specific to Byte or cases where the execution type is
>> larger than the destination stride, for which mixed float has a
>> different set of restrictions.
>> 
>> For example, for mixed float we have this rule:
>> 
>> "In Align1, destination stride can be smaller than execution type"
>> 
>> Which overrides this other from "General restrictions based on
>> operand
>> types":
>> 
>> "Destination stride must be equal to the ratio of the sizes of the
>> execution data type to the destination type"
>> 
>> So I thought that it would make things easier to keep all
>> restrictions
>> for mixed float separate and make sure that we skipped mixed float
>> instructions in general_restrictions_based_on_operand_types() so we
>> avoid having to add code to skip individual general restrictions that
>> that are overriden for mixed float mode anyway.
>> 
>> Anyway, I'll try to rework the patches to do more generic validation
>> of
>> HF conversions like you ask and see what kind of code we end up with.
>> 
>> > > > > > > +            ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > > > > type_sz(src0_type) == 8) ||
>> > > > > > > +             (dst_type_size == 8 && src0_type ==
>> > > > > > > BRW_REGISTER_TYPE_HF)),
>> > > > > > > +            "There are no direct conversion between 64-
>> > > > > > > bit
>> > > > > > > types
>> > > > > > > and HF");
>> > > > > > > +
>> > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > +    *
>> > > > > > > +    *   "Conversion between Integer and HF (Half Float)
>> > > > > > > must
>> > > > > > > be
>> > > > > > > +    *    DWord-aligned and strided by a DWord on the
>> > > > > > > destination."
>> > > > > > > +    *
>> > > > > > > +    * But this seems to be expanded on CHV and SKL+ by:
>> > > > > > > +    *
>> > > > > > > +    *   "There is a relaxed alignment rule for word
>> > > > > > > destinations.
>> > > > > > > When
>> > > > > > > +    *    the destination type is word (UW, W, HF),
>> > > > > > > destination
>> > > > > > > data types
>> > > > > > > +    *    can be aligned to either the lowest word or the
>> > > > > > > second
>> > > > > > > lowest
>> > > > > > > +    *    word of the execution channel. This means the
>> > > > > > > destination
>> > > > > > > data
>> > > > > > > +    *    words can be either all in the even word
>> > > > > > > locations or
>> > > > > > > all
>> > > > > > > in the
>> > > > > > > +    *    odd word locations."
>> > > > > > > +    *
>> > > > > > > +    * We do not implement the second rule as is though,
>> > > > > > > since
>> > > > > > > empirical testing
>> > > > > > > +    * shows inconsistencies:
>> > > > > > > +    *   - It suggests that packed 16-bit is not allowed,
>> > > > > > > which
>> > > > > > > is
>> > > > > > > not true.
>> > > > > > > +    *   - It suggests that conversions from Q/DF to W
>> > > > > > > (which
>> > > > > > > need
>> > > > > > > to be 64-bit
>> > > > > > > +    *     aligned on the destination) are not possible,
>> > > > > > > which
>> > > > > > > is
>> > > > > > > not true.
>> > > > > > > +    *   - It suggests that conversions from 16-bit
>> > > > > > > executions
>> > > > > > > types to W need
>> > > > > > > +    *     to be 32-bit aligned, which doesn't seem to be
>> > > > > > > necessary.
>> > > > > > > +    *
>> > > > > > > +    * So from this rule we only validate the implication
>> > > > > > > that
>> > > > > > > conversion from
>> > > > > > > +    * F to HF needs to be DWord aligned too (in BDW this
>> > > > > > > is
>> > > > > > > limited to
>> > > > > > > +    * conversions from integer types).
>> > > > > > > +    */
>> > > > > > > +   bool is_half_float_conversion =
>> > > > > > > +       brw_inst_opcode(devinfo, inst) == BRW_OPCODE_MOV
>> > > > > > > &&
>> > > > > > > +       dst_type != src0_type &&
>> > > > > > > +       (dst_type == BRW_REGISTER_TYPE_HF || src0_type ==
>> > > > > > > BRW_REGISTER_TYPE_HF);
>> > > > > > > +
>> > > > > > > +   if (is_half_float_conversion) {
>> > > > > > > +      assert(devinfo->gen >= 8);
>> > > > > > > +
>> > > > > > > +      if ((dst_type == BRW_REGISTER_TYPE_HF &&
>> > > > > > > brw_reg_type_is_integer(src0_type)) ||
>> > > > > > > +          (brw_reg_type_is_integer(dst_type) &&
>> > > > > > > src0_type
>> > > > > > > ==
>> > > > > > > BRW_REGISTER_TYPE_HF)) {
>> > > > > > > +         ERROR_IF(dst_stride * dst_type_size != 4,
>> > > > > > > +                  "Conversions between integer and half-
>> > > > > > > float
>> > > > > > > must
>> > > > > > > be strided "
>> > > > > > > +                  "by a DWord on the destination");
>> > > > > > > +
>> > > > > > > +         unsigned subreg =
>> > > > > > > brw_inst_dst_da1_subreg_nr(devinfo,
>> > > > > > > inst);
>> > > > > > > +         ERROR_IF(subreg % 4 != 0,
>> > > > > > > +                  "Conversions between integer and half-
>> > > > > > > float
>> > > > > > > must
>> > > > > > > be aligned "
>> > > > > > > +                  "to a DWord on the destination");
>> > > > > > > +      } else if ((devinfo->is_cherryview || devinfo->gen
>> > > > > > > > = 9)
>> > > > > > > 
>> > > > > > > &&
>> > > > > > > +                 dst_type == BRW_REGISTER_TYPE_HF) {
>> > > > > > > +         ERROR_IF(dst_stride != 2,
>> > > > > > > +                  "Conversions to HF must have either
>> > > > > > > all
>> > > > > > > words in
>> > > > > > > even word "
>> > > > > > > +                  "locations or all words in odd word
>> > > > > > > locations");
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +   } else if (exec_type_size > dst_type_size) {
>> > > > > > >        if (!(dst_type_is_byte &&
>> > > > > > > inst_is_raw_move(devinfo,
>> > > > > > > inst)))
>> > > > > > > {
>> > > > > > >           ERROR_IF(dst_stride * dst_type_size !=
>> > > > > > > exec_type_size,
>> > > > > > >                    "Destination stride must be equal to
>> > > > > > > the
>> > > > > > > ratio
>> > > > > > > of the sizes "
>> > > > > > > diff --git a/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > b/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > index 73300b23122..1557b6d2452 100644
>> > > > > > > --- a/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > +++ b/src/intel/compiler/test_eu_validate.cpp
>> > > > > > > @@ -848,6 +848,128 @@ TEST_P(validation_test,
>> > > > > > > byte_destination_relaxed_alignment)
>> > > > > > >     }
>> > > > > > >  }
>> > > > > > >  
>> > > > > > > +TEST_P(validation_test, half_float_conversion)
>> > > > > > > +{
>> > > > > > > +   static const struct {
>> > > > > > > +      enum brw_reg_type dst_type;
>> > > > > > > +      enum brw_reg_type src_type;
>> > > > > > > +      unsigned dst_stride;
>> > > > > > > +      unsigned dst_subnr;
>> > > > > > > +      bool expected_result;
>> > > > > > > +   } inst[] = {
>> > > > > > > +#define INST(dst_type, src_type, dst_stride, dst_subnr,
>> > > > > > > expected_result)  \
>> > > > > > > +      {                                                 
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_REGISTER_TYPE_##dst_type,                  
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_REGISTER_TYPE_##src_type,                  
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         BRW_HORIZONTAL_STRIDE_##dst_stride,            
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         dst_subnr,                                     
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +         expected_result,                               
>> > > > > > >   
>> > > > > > >     
>> > > > > > >     
>> > > > > > >         \
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      /* MOV to half-float destination (F source handled
>> > > > > > > later) */
>> > > > > > > +      INST(HF,  B, 1, 0, false),
>> > > > > > > +      INST(HF,  W, 1, 0, false),
>> > > > > > > +      INST(HF, HF, 1, 0, true),
>> > > > > > > +      INST(HF, HF, 1, 2, true),
>> > > > > > > +      INST(HF,  D, 1, 0, false),
>> > > > > > > +      INST(HF,  Q, 1, 0, false),
>> > > > > > > +      INST(HF, DF, 1, 0, false),
>> > > > > > > +      INST(HF,  B, 2, 0, true),
>> > > > > > > +      INST(HF,  B, 2, 2, false),
>> > > > > > > +      INST(HF,  W, 2, 0, true),
>> > > > > > > +      INST(HF,  W, 2, 2, false),
>> > > > > > > +      INST(HF, HF, 2, 0, true),
>> > > > > > > +      INST(HF, HF, 2, 2, true),
>> > > > > > > +      INST(HF,  D, 2, 0, true),
>> > > > > > > +      INST(HF,  D, 2, 2, false),
>> > > > > > > +      INST(HF,  Q, 2, 0, false),
>> > > > > > > +      INST(HF, DF, 2, 0, false),
>> > > > > > > +      INST(HF,  B, 4, 0, false),
>> > > > > > > +      INST(HF,  W, 4, 0, false),
>> > > > > > > +      INST(HF, HF, 4, 0, true),
>> > > > > > > +      INST(HF, HF, 4, 2, true),
>> > > > > > > +      INST(HF,  D, 4, 0, false),
>> > > > > > > +      INST(HF,  Q, 4, 0, false),
>> > > > > > > +      INST(HF, DF, 4, 0, false),
>> > > > > > > +
>> > > > > > > +      /* MOV from half-float source */
>> > > > > > > +      INST( B, HF, 1, 0, false),
>> > > > > > > +      INST( W, HF, 1, 0, false),
>> > > > > > > +      INST( D, HF, 1, 0, true),
>> > > > > > > +      INST( D, HF, 1, 4, true),
>> > > > > > > +      INST( F, HF, 1, 0, true),
>> > > > > > > +      INST( F, HF, 1, 4, true),
>> > > > > > > +      INST( Q, HF, 1, 0, false),
>> > > > > > > +      INST(DF, HF, 1, 0, false),
>> > > > > > > +      INST( B, HF, 2, 0, false),
>> > > > > > > +      INST( W, HF, 2, 0, true),
>> > > > > > > +      INST( W, HF, 2, 2, false),
>> > > > > > > +      INST( D, HF, 2, 0, false),
>> > > > > > > +      INST( F, HF, 2, 0, true),
>> > > > > > > +      INST( F, HF, 2, 2, true),
>> > > > > > > +      INST( B, HF, 4, 0, true),
>> > > > > > > +      INST( B, HF, 4, 1, false),
>> > > > > > > +      INST( W, HF, 4, 0, false),
>> > > > > > > +
>> > > > > > > +#undef INST
>> > > > > > > +   };
>> > > > > > > +
>> > > > > > > +   if (devinfo.gen < 8)
>> > > > > > > +      return;
>> > > > > > > +
>> > > > > > > +   for (unsigned i = 0; i < sizeof(inst) /
>> > > > > > > sizeof(inst[0]);
>> > > > > > > i++) {
>> > > > > > > +      if (!devinfo.has_64bit_types &&
>> > > > > > > +          (type_sz(inst[i].src_type) == 8 ||
>> > > > > > > type_sz(inst[i].dst_type) == 8)) {
>> > > > > > > +         continue;
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      brw_MOV(p, retype(g0, inst[i].dst_type),
>> > > > > > > retype(g0,
>> > > > > > > inst[i].src_type));
>> > > > > > > +
>> > > > > > > +      brw_inst_set_exec_size(&devinfo, last_inst,
>> > > > > > > BRW_EXECUTE_4);
>> > > > > > > +
>> > > > > > > +      brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > inst[i].dst_stride);
>> > > > > > > +      brw_inst_set_dst_da1_subreg_nr(&devinfo,
>> > > > > > > last_inst,
>> > > > > > > inst[i].dst_subnr);
>> > > > > > > +
>> > > > > > > +      if (inst[i].src_type == BRW_REGISTER_TYPE_B) {
>> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > > > > BRW_VERTICAL_STRIDE_4);
>> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > > > > BRW_WIDTH_2);
>> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > > > > +      } else {
>> > > > > > > +         brw_inst_set_src0_vstride(&devinfo, last_inst,
>> > > > > > > BRW_VERTICAL_STRIDE_4);
>> > > > > > > +         brw_inst_set_src0_width(&devinfo, last_inst,
>> > > > > > > BRW_WIDTH_4);
>> > > > > > > +         brw_inst_set_src0_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > > > > +      }
>> > > > > > > +
>> > > > > > > +      EXPECT_EQ(inst[i].expected_result, validate(p));
>> > > > > > > +
>> > > > > > > +      clear_instructions(p);
>> > > > > > > +   }
>> > > > > > > +
>> > > > > > > +   /* Conversion from F to HF has Dword destination
>> > > > > > > alignment
>> > > > > > > restriction
>> > > > > > > +    * on CHV and SKL+ only
>> > > > > > > +    */
>> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > > > > +
>> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_1);
>> > > > > > > +
>> > > > > > > +   if (devinfo.gen >= 9 || devinfo.is_cherryview) {
>> > > > > > > +      EXPECT_FALSE(validate(p));
>> > > > > > > +   } else {
>> > > > > > > +      assert(devinfo.gen == 8);
>> > > > > > > +      EXPECT_TRUE(validate(p));
>> > > > > > > +   }
>> > > > > > > +   clear_instructions(p);
>> > > > > > > +
>> > > > > > > +   brw_MOV(p, retype(g0, BRW_REGISTER_TYPE_HF),
>> > > > > > > +              retype(g0, BRW_REGISTER_TYPE_F));
>> > > > > > > +
>> > > > > > > +   brw_inst_set_dst_hstride(&devinfo, last_inst,
>> > > > > > > BRW_HORIZONTAL_STRIDE_2);
>> > > > > > > +
>> > > > > > > +   EXPECT_TRUE(validate(p));
>> > > > > > > +   clear_instructions(p);
>> > > > > > > +}
>> > > > > > > +
>> > > > > > >  TEST_P(validation_test,
>> > > > > > > vector_immediate_destination_alignment)
>> > > > > > >  {
>> > > > > > >     static const struct {
>> > > > > > > -- 
>> > > > > > > 2.17.1
>> > > > > > > 
>> > > > > > > _______________________________________________
>> > > > > > > mesa-dev mailing list
>> > > > > > > mesa-dev@lists.freedesktop.org
>> > > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > > Iago Toral <itoral@igalia.com> writes:
> > > 
> > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > 
> > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
> > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > > 
> > > > > > > > ---
> > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > > > ++++++++++++-
> > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > > > ++++++++++++++++++++++++
> > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > general_restrictions_based_on_operand_types(const
> > > > > > > > struct
> > > > > > > > gen_device_info *devinf
> > > > > > > >         exec_type_size == 8 && dst_type_size == 4)
> > > > > > > >        dst_type_size = 8;
> > > > > > > >  
> > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > +    *
> > > > > > > > +    *    "There is no direct conversion from HF to DF
> > > > > > > > or
> > > > > > > > DF to
> > > > > > > > HF.
> > > > > > > > +    *     There is no direct conversion from HF to
> > > > > > > > Q/UQ or
> > > > > > > > Q/UQ to
> > > > > > > > HF."
> > > > > > > > +    */
> > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > inst);
> > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > > > BRW_OPCODE_MOV
> > > > > > > > &&
> > > > > > > 
> > > > > > > Why is only the MOV instruction handled here and
> > > > > > > below?  Aren't
> > > > > > > other
> > > > > > > instructions able to do implicit conversions?  Probably
> > > > > > > means
> > > > > > > you
> > > > > > > need
> > > > > > > to deal with two sources rather than one.
> > > > > > 
> > > > > > This comes from the programming notes of the MOV
> > > > > > instruction
> > > > > > (Volume
> > > > > > 2a, Command Reference - Instructions - MOV), so it is
> > > > > > described
> > > > > > specifically for the MOV instruction. I should probably
> > > > > > have
> > > > > > made
> > > > > > this
> > > > > > clear in the comment.
> > > > > > 
> > > > > 
> > > > > Maybe the one above is specified in the MOV page only,
> > > > > probably
> > > > > due
> > > > > to
> > > > > an oversight (If these restrictions were really specific to
> > > > > the
> > > > > MOV
> > > > > instruction, what would prevent you from implementing such
> > > > > conversions
> > > > > through a different instruction?  E.g. "ADD dst:df, src:hf,
> > > > > 0"
> > > > > which
> > > > > would be substantially more efficient than what you're doing
> > > > > in
> > > > > PATCH
> > > > > 02)
> > > > 
> > > > Instructions that take HF can only be strictly HF or mix F and
> > > > HF
> > > > (mixed mode float), with MOV being the only exception. That
> > > > means
> > > > that
> > > > any instruction like the one you use above are invalid. Maybe
> > > > we
> > > > should
> > > > validate explicitly that instructions that use HF are strictly
> > > > HF
> > > > or
> > > > mixed-float mode only?
> > > > 
> > > 
> > > So you're acknowledging that the conversions checked above are
> > > illegal
> > > whether the instruction is a MOV or something else.  Where are we
> > > preventing instructions other than MOV with such conversions from
> > > being
> > > accepted by the validator?
> > 
> > We aren't, because the validator is not checking, in general, for
> > accepted type combinations for specific instructions anywhere as
> > far as
> > I can see.
> 
> Luckily these type conversion restrictions aren't really specific to
> any
> instruction AFAICT, even though they only seem to be mentioned
> explicitly for the MOV instruction...
> 
> > If we want to start doing this with HF conversions specifically, I
> > guess it is fine, but in case it is not clear, I think it won't
> > bring
> > any immediate benefits with the VK_KHR_shader_float16_int8
> > implementation since this series only ever emits conversions
> > involving
> > HF operands via MOV instructions,
> 
> Yes, I can see that's the intention of this series, but how can you
> make
> sure that nothing in the optimizer is breaking your assumption if you
> don't add some validator code to verify the claim of your last
> paragraph?
> 
> > which is why I thought that validating that no direct MOV
> > conversions
> > from DF/Q types ever happen was useful, since we have code in the
> > driver to handle this scenario.
> > 
> > [...]
> > > 
> > > I still don't understand why you want to implement the same
> > > restriction
> > > twice, once for MOV and once for all other mixed-mode
> > > instructions.  How
> > > is that more convenient?
> > 
> > The restrictions based on operand types that are checked in the
> > validator are specific to Byte or cases where the execution type is
> > larger than the destination stride, for which mixed float has a
> > different set of restrictions.
> > 
> > For example, for mixed float we have this rule:
> > 
> > "In Align1, destination stride can be smaller than execution type"
> > 
> > Which overrides this other from "General restrictions based on
> > operand
> > types":
> > 
> > "Destination stride must be equal to the ratio of the sizes of the
> > execution data type to the destination type"
> > 
> > So I thought that it would make things easier to keep all
> > restrictions
> > for mixed float separate and make sure that we skipped mixed float
> > instructions in general_restrictions_based_on_operand_types() so we
> > avoid having to add code to skip individual general restrictions
> > that that are overriden for mixed float mode anyway.
> > 
> 
> I'm fine with that, but it doesn't seem like what this patch is
> doing...
> Isn't it adding code to validate mixed-mode float MOV instructions in
> general_restrictions_based_on_operand_types()?

I guess this could be arguable, but I was not considering conversion
MOVs to be mixed-float instructions. There are two reasons for this:

A conversion MOV involving F/HF doesn't seem to be fundamentally
different from any other conversion instruction involving other types,
other than the requirement of aligning the destination to a Dword,
which is not a resriction explictly made for mixed-float mode.

Then, for mixed-float mode, there is this other rule:

"In Align1, destination stride can be smaller than execution type. When
destination is stride of 1, 16 bit packed data
is updated on the destination. However, output packed f16 data must be
oword aligned, no oword crossing in
packed f16."

Which contradicts the other rule that conversions from F to HF need to
be DWord aligned on the destination.

So it seems to me that conversion MOVs are not following the same
principles of mixed-float instructions and we should skip validating
mixed-float restrictions for them. What do you think?

> > Anyway, I'll try to rework the patches to do more generic
> > validation of
> > HF conversions like you ask and see what kind of code we end up
> > with.
> > 
> 
> Thanks.
> 
> > [...]
Iago Toral <itoral@igalia.com> writes:

> On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> Iago Toral <itoral@igalia.com> writes:
>> 
>> > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral <itoral@igalia.com> writes:
>> > > 
>> > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral <itoral@igalia.com> writes:
>> > > > > 
>> > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
>> > > > > > > 
>> > > > > > > > ---
>> > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
>> > > > > > > > ++++++++++++-
>> > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
>> > > > > > > > ++++++++++++++++++++++++
>> > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
>> > > > > > > > 
>> > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > index 000a05cb6ac..203641fecb9 100644
>> > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > > > @@ -531,7 +531,69 @@
>> > > > > > > > general_restrictions_based_on_operand_types(const
>> > > > > > > > struct
>> > > > > > > > gen_device_info *devinf
>> > > > > > > >         exec_type_size == 8 && dst_type_size == 4)
>> > > > > > > >        dst_type_size = 8;
>> > > > > > > >  
>> > > > > > > > -   if (exec_type_size > dst_type_size) {
>> > > > > > > > +   /* From the BDW+ PRM:
>> > > > > > > > +    *
>> > > > > > > > +    *    "There is no direct conversion from HF to DF
>> > > > > > > > or
>> > > > > > > > DF to
>> > > > > > > > HF.
>> > > > > > > > +    *     There is no direct conversion from HF to
>> > > > > > > > Q/UQ or
>> > > > > > > > Q/UQ to
>> > > > > > > > HF."
>> > > > > > > > +    */
>> > > > > > > > +   enum brw_reg_type src0_type =
>> > > > > > > > brw_inst_src0_type(devinfo,
>> > > > > > > > inst);
>> > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
>> > > > > > > > BRW_OPCODE_MOV
>> > > > > > > > &&
>> > > > > > > 
>> > > > > > > Why is only the MOV instruction handled here and
>> > > > > > > below?  Aren't
>> > > > > > > other
>> > > > > > > instructions able to do implicit conversions?  Probably
>> > > > > > > means
>> > > > > > > you
>> > > > > > > need
>> > > > > > > to deal with two sources rather than one.
>> > > > > > 
>> > > > > > This comes from the programming notes of the MOV
>> > > > > > instruction
>> > > > > > (Volume
>> > > > > > 2a, Command Reference - Instructions - MOV), so it is
>> > > > > > described
>> > > > > > specifically for the MOV instruction. I should probably
>> > > > > > have
>> > > > > > made
>> > > > > > this
>> > > > > > clear in the comment.
>> > > > > > 
>> > > > > 
>> > > > > Maybe the one above is specified in the MOV page only,
>> > > > > probably
>> > > > > due
>> > > > > to
>> > > > > an oversight (If these restrictions were really specific to
>> > > > > the
>> > > > > MOV
>> > > > > instruction, what would prevent you from implementing such
>> > > > > conversions
>> > > > > through a different instruction?  E.g. "ADD dst:df, src:hf,
>> > > > > 0"
>> > > > > which
>> > > > > would be substantially more efficient than what you're doing
>> > > > > in
>> > > > > PATCH
>> > > > > 02)
>> > > > 
>> > > > Instructions that take HF can only be strictly HF or mix F and
>> > > > HF
>> > > > (mixed mode float), with MOV being the only exception. That
>> > > > means
>> > > > that
>> > > > any instruction like the one you use above are invalid. Maybe
>> > > > we
>> > > > should
>> > > > validate explicitly that instructions that use HF are strictly
>> > > > HF
>> > > > or
>> > > > mixed-float mode only?
>> > > > 
>> > > 
>> > > So you're acknowledging that the conversions checked above are
>> > > illegal
>> > > whether the instruction is a MOV or something else.  Where are we
>> > > preventing instructions other than MOV with such conversions from
>> > > being
>> > > accepted by the validator?
>> > 
>> > We aren't, because the validator is not checking, in general, for
>> > accepted type combinations for specific instructions anywhere as
>> > far as
>> > I can see.
>> 
>> Luckily these type conversion restrictions aren't really specific to
>> any
>> instruction AFAICT, even though they only seem to be mentioned
>> explicitly for the MOV instruction...
>> 
>> > If we want to start doing this with HF conversions specifically, I
>> > guess it is fine, but in case it is not clear, I think it won't
>> > bring
>> > any immediate benefits with the VK_KHR_shader_float16_int8
>> > implementation since this series only ever emits conversions
>> > involving
>> > HF operands via MOV instructions,
>> 
>> Yes, I can see that's the intention of this series, but how can you
>> make
>> sure that nothing in the optimizer is breaking your assumption if you
>> don't add some validator code to verify the claim of your last
>> paragraph?
>> 
>> > which is why I thought that validating that no direct MOV
>> > conversions
>> > from DF/Q types ever happen was useful, since we have code in the
>> > driver to handle this scenario.
>> > 
>> > [...]
>> > > 
>> > > I still don't understand why you want to implement the same
>> > > restriction
>> > > twice, once for MOV and once for all other mixed-mode
>> > > instructions.  How
>> > > is that more convenient?
>> > 
>> > The restrictions based on operand types that are checked in the
>> > validator are specific to Byte or cases where the execution type is
>> > larger than the destination stride, for which mixed float has a
>> > different set of restrictions.
>> > 
>> > For example, for mixed float we have this rule:
>> > 
>> > "In Align1, destination stride can be smaller than execution type"
>> > 
>> > Which overrides this other from "General restrictions based on
>> > operand
>> > types":
>> > 
>> > "Destination stride must be equal to the ratio of the sizes of the
>> > execution data type to the destination type"
>> > 
>> > So I thought that it would make things easier to keep all
>> > restrictions
>> > for mixed float separate and make sure that we skipped mixed float
>> > instructions in general_restrictions_based_on_operand_types() so we
>> > avoid having to add code to skip individual general restrictions
>> > that that are overriden for mixed float mode anyway.
>> > 
>> 
>> I'm fine with that, but it doesn't seem like what this patch is
>> doing...
>> Isn't it adding code to validate mixed-mode float MOV instructions in
>> general_restrictions_based_on_operand_types()?
>
> I guess this could be arguable, but I was not considering conversion
> MOVs to be mixed-float instructions. There are two reasons for this:
>
> A conversion MOV involving F/HF doesn't seem to be fundamentally
> different from any other conversion instruction involving other types,
> other than the requirement of aligning the destination to a Dword,
> which is not a resriction explictly made for mixed-float mode.
>
> Then, for mixed-float mode, there is this other rule:
>
> "In Align1, destination stride can be smaller than execution type. When
> destination is stride of 1, 16 bit packed data
> is updated on the destination. However, output packed f16 data must be
> oword aligned, no oword crossing in
> packed f16."
>
> Which contradicts the other rule that conversions from F to HF need to
> be DWord aligned on the destination.
>
> So it seems to me that conversion MOVs are not following the same
> principles of mixed-float instructions and we should skip validating
> mixed-float restrictions for them. What do you think?
>

That all seems fairly ambiguous...  And the restriction on DWORD
alignment for conversions includes a mixed-mode ADD instruction among
the examples, so I'm skeptical that MOV could be truly fundamentally
different.

>> > Anyway, I'll try to rework the patches to do more generic
>> > validation of
>> > HF conversions like you ask and see what kind of code we end up
>> > with.
>> > 
>> 
>> Thanks.
>> 
>> > [...]
On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > Iago Toral <itoral@igalia.com> writes:
> > > 
> > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > 
> > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > 
> > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
> > > > > > > > wrote:
> > > > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > > > > 
> > > > > > > > > > ---
> > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > > > > > ++++++++++++-
> > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > general_restrictions_based_on_operand_types(const
> > > > > > > > > > struct
> > > > > > > > > > gen_device_info *devinf
> > > > > > > > > >         exec_type_size == 8 && dst_type_size == 4)
> > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > >  
> > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > +    *
> > > > > > > > > > +    *    "There is no direct conversion from HF to
> > > > > > > > > > DF
> > > > > > > > > > or
> > > > > > > > > > DF to
> > > > > > > > > > HF.
> > > > > > > > > > +    *     There is no direct conversion from HF to
> > > > > > > > > > Q/UQ or
> > > > > > > > > > Q/UQ to
> > > > > > > > > > HF."
> > > > > > > > > > +    */
> > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > inst);
> > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > &&
> > > > > > > > > 
> > > > > > > > > Why is only the MOV instruction handled here and
> > > > > > > > > below?  Aren't
> > > > > > > > > other
> > > > > > > > > instructions able to do implicit
> > > > > > > > > conversions?  Probably
> > > > > > > > > means
> > > > > > > > > you
> > > > > > > > > need
> > > > > > > > > to deal with two sources rather than one.
> > > > > > > > 
> > > > > > > > This comes from the programming notes of the MOV
> > > > > > > > instruction
> > > > > > > > (Volume
> > > > > > > > 2a, Command Reference - Instructions - MOV), so it is
> > > > > > > > described
> > > > > > > > specifically for the MOV instruction. I should probably
> > > > > > > > have
> > > > > > > > made
> > > > > > > > this
> > > > > > > > clear in the comment.
> > > > > > > > 
> > > > > > > 
> > > > > > > Maybe the one above is specified in the MOV page only,
> > > > > > > probably
> > > > > > > due
> > > > > > > to
> > > > > > > an oversight (If these restrictions were really specific
> > > > > > > to
> > > > > > > the
> > > > > > > MOV
> > > > > > > instruction, what would prevent you from implementing
> > > > > > > such
> > > > > > > conversions
> > > > > > > through a different instruction?  E.g. "ADD dst:df,
> > > > > > > src:hf,
> > > > > > > 0"
> > > > > > > which
> > > > > > > would be substantially more efficient than what you're
> > > > > > > doing
> > > > > > > in
> > > > > > > PATCH
> > > > > > > 02)
> > > > > > 
> > > > > > Instructions that take HF can only be strictly HF or mix F
> > > > > > and
> > > > > > HF
> > > > > > (mixed mode float), with MOV being the only exception. That
> > > > > > means
> > > > > > that
> > > > > > any instruction like the one you use above are invalid.
> > > > > > Maybe
> > > > > > we
> > > > > > should
> > > > > > validate explicitly that instructions that use HF are
> > > > > > strictly
> > > > > > HF
> > > > > > or
> > > > > > mixed-float mode only?
> > > > > > 
> > > > > 
> > > > > So you're acknowledging that the conversions checked above
> > > > > are
> > > > > illegal
> > > > > whether the instruction is a MOV or something else.  Where
> > > > > are we
> > > > > preventing instructions other than MOV with such conversions
> > > > > from
> > > > > being
> > > > > accepted by the validator?
> > > > 
> > > > We aren't, because the validator is not checking, in general,
> > > > for
> > > > accepted type combinations for specific instructions anywhere
> > > > as
> > > > far as
> > > > I can see.
> > > 
> > > Luckily these type conversion restrictions aren't really specific
> > > to
> > > any
> > > instruction AFAICT, even though they only seem to be mentioned
> > > explicitly for the MOV instruction...
> > > 
> > > > If we want to start doing this with HF conversions
> > > > specifically, I
> > > > guess it is fine, but in case it is not clear, I think it won't
> > > > bring
> > > > any immediate benefits with the VK_KHR_shader_float16_int8
> > > > implementation since this series only ever emits conversions
> > > > involving
> > > > HF operands via MOV instructions,
> > > 
> > > Yes, I can see that's the intention of this series, but how can
> > > you
> > > make
> > > sure that nothing in the optimizer is breaking your assumption if
> > > you
> > > don't add some validator code to verify the claim of your last
> > > paragraph?
> > > 
> > > > which is why I thought that validating that no direct MOV
> > > > conversions
> > > > from DF/Q types ever happen was useful, since we have code in
> > > > the
> > > > driver to handle this scenario.
> > > > 
> > > > [...]
> > > > > 
> > > > > I still don't understand why you want to implement the same
> > > > > restriction
> > > > > twice, once for MOV and once for all other mixed-mode
> > > > > instructions.  How
> > > > > is that more convenient?
> > > > 
> > > > The restrictions based on operand types that are checked in the
> > > > validator are specific to Byte or cases where the execution
> > > > type is
> > > > larger than the destination stride, for which mixed float has a
> > > > different set of restrictions.
> > > > 
> > > > For example, for mixed float we have this rule:
> > > > 
> > > > "In Align1, destination stride can be smaller than execution
> > > > type"
> > > > 
> > > > Which overrides this other from "General restrictions based on
> > > > operand
> > > > types":
> > > > 
> > > > "Destination stride must be equal to the ratio of the sizes of
> > > > the
> > > > execution data type to the destination type"
> > > > 
> > > > So I thought that it would make things easier to keep all
> > > > restrictions
> > > > for mixed float separate and make sure that we skipped mixed
> > > > float
> > > > instructions in general_restrictions_based_on_operand_types()
> > > > so we
> > > > avoid having to add code to skip individual general
> > > > restrictions
> > > > that that are overriden for mixed float mode anyway.
> > > > 
> > > 
> > > I'm fine with that, but it doesn't seem like what this patch is
> > > doing...
> > > Isn't it adding code to validate mixed-mode float MOV
> > > instructions in
> > > general_restrictions_based_on_operand_types()?
> > 
> > I guess this could be arguable, but I was not considering
> > conversion
> > MOVs to be mixed-float instructions. There are two reasons for
> > this:
> > 
> > A conversion MOV involving F/HF doesn't seem to be fundamentally
> > different from any other conversion instruction involving other
> > types,
> > other than the requirement of aligning the destination to a Dword,
> > which is not a resriction explictly made for mixed-float mode.
> > 
> > Then, for mixed-float mode, there is this other rule:
> > 
> > "In Align1, destination stride can be smaller than execution type.
> > When
> > destination is stride of 1, 16 bit packed data
> > is updated on the destination. However, output packed f16 data must
> > be
> > oword aligned, no oword crossing in
> > packed f16."
> > 
> > Which contradicts the other rule that conversions from F to HF need
> > to
> > be DWord aligned on the destination.
> > 
> > So it seems to me that conversion MOVs are not following the same
> > principles of mixed-float instructions and we should skip
> > validating
> > mixed-float restrictions for them. What do you think?
> > 
> 
> That all seems fairly ambiguous...  And the restriction on DWORD
> alignment for conversions includes a mixed-mode ADD instruction among
> the examples, so I'm skeptical that MOV could be truly fundamentally
> different.

Ok... in that case what do we do about the mixed-float restriction I
quoted above? Since it is incompatible with the mixed-float MOV
conversion I guess we only have two options: ether we don't validate it
at all or we only validate it for mixed-float instructions that are not
MOV. I guess we can do the latter?

> > > > Anyway, I'll try to rework the patches to do more generic
> > > > validation of
> > > > HF conversions like you ask and see what kind of code we end up
> > > > with.
> > > > 
> > > 
> > > Thanks.
> > > 
> > > > [...]
On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> > Iago Toral <itoral@igalia.com> writes:
> > 
> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > > Iago Toral <itoral@igalia.com> writes:
> > > > 
> > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > 
> > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
> > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > 
> > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
> > > > > > > > > wrote:
> > > > > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > > > > > 
> > > > > > > > > > > ---
> > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > > > > > > ++++++++++++-
> > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > > >  2 files changed, 185 insertions(+), 1 deletion(-
> > > > > > > > > > > )
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > > general_restrictions_based_on_operand_types(const
> > > > > > > > > > > struct
> > > > > > > > > > > gen_device_info *devinf
> > > > > > > > > > >         exec_type_size == 8 && dst_type_size ==
> > > > > > > > > > > 4)
> > > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > > >  
> > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > > +    *
> > > > > > > > > > > +    *    "There is no direct conversion from HF
> > > > > > > > > > > to
> > > > > > > > > > > DF
> > > > > > > > > > > or
> > > > > > > > > > > DF to
> > > > > > > > > > > HF.
> > > > > > > > > > > +    *     There is no direct conversion from HF
> > > > > > > > > > > to
> > > > > > > > > > > Q/UQ or
> > > > > > > > > > > Q/UQ to
> > > > > > > > > > > HF."
> > > > > > > > > > > +    */
> > > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > > inst);
> > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > > &&
> > > > > > > > > > 
> > > > > > > > > > Why is only the MOV instruction handled here and
> > > > > > > > > > below?  Aren't
> > > > > > > > > > other
> > > > > > > > > > instructions able to do implicit
> > > > > > > > > > conversions?  Probably
> > > > > > > > > > means
> > > > > > > > > > you
> > > > > > > > > > need
> > > > > > > > > > to deal with two sources rather than one.
> > > > > > > > > 
> > > > > > > > > This comes from the programming notes of the MOV
> > > > > > > > > instruction
> > > > > > > > > (Volume
> > > > > > > > > 2a, Command Reference - Instructions - MOV), so it is
> > > > > > > > > described
> > > > > > > > > specifically for the MOV instruction. I should
> > > > > > > > > probably
> > > > > > > > > have
> > > > > > > > > made
> > > > > > > > > this
> > > > > > > > > clear in the comment.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Maybe the one above is specified in the MOV page only,
> > > > > > > > probably
> > > > > > > > due
> > > > > > > > to
> > > > > > > > an oversight (If these restrictions were really
> > > > > > > > specific
> > > > > > > > to
> > > > > > > > the
> > > > > > > > MOV
> > > > > > > > instruction, what would prevent you from implementing
> > > > > > > > such
> > > > > > > > conversions
> > > > > > > > through a different instruction?  E.g. "ADD dst:df,
> > > > > > > > src:hf,
> > > > > > > > 0"
> > > > > > > > which
> > > > > > > > would be substantially more efficient than what you're
> > > > > > > > doing
> > > > > > > > in
> > > > > > > > PATCH
> > > > > > > > 02)
> > > > > > > 
> > > > > > > Instructions that take HF can only be strictly HF or mix
> > > > > > > F
> > > > > > > and
> > > > > > > HF
> > > > > > > (mixed mode float), with MOV being the only exception.
> > > > > > > That
> > > > > > > means
> > > > > > > that
> > > > > > > any instruction like the one you use above are invalid.
> > > > > > > Maybe
> > > > > > > we
> > > > > > > should
> > > > > > > validate explicitly that instructions that use HF are
> > > > > > > strictly
> > > > > > > HF
> > > > > > > or
> > > > > > > mixed-float mode only?
> > > > > > > 
> > > > > > 
> > > > > > So you're acknowledging that the conversions checked above
> > > > > > are
> > > > > > illegal
> > > > > > whether the instruction is a MOV or something else.  Where
> > > > > > are we
> > > > > > preventing instructions other than MOV with such
> > > > > > conversions
> > > > > > from
> > > > > > being
> > > > > > accepted by the validator?
> > > > > 
> > > > > We aren't, because the validator is not checking, in general,
> > > > > for
> > > > > accepted type combinations for specific instructions anywhere
> > > > > as
> > > > > far as
> > > > > I can see.
> > > > 
> > > > Luckily these type conversion restrictions aren't really
> > > > specific
> > > > to
> > > > any
> > > > instruction AFAICT, even though they only seem to be mentioned
> > > > explicitly for the MOV instruction...
> > > > 
> > > > > If we want to start doing this with HF conversions
> > > > > specifically, I
> > > > > guess it is fine, but in case it is not clear, I think it
> > > > > won't
> > > > > bring
> > > > > any immediate benefits with the VK_KHR_shader_float16_int8
> > > > > implementation since this series only ever emits conversions
> > > > > involving
> > > > > HF operands via MOV instructions,
> > > > 
> > > > Yes, I can see that's the intention of this series, but how can
> > > > you
> > > > make
> > > > sure that nothing in the optimizer is breaking your assumption
> > > > if
> > > > you
> > > > don't add some validator code to verify the claim of your last
> > > > paragraph?
> > > > 
> > > > > which is why I thought that validating that no direct MOV
> > > > > conversions
> > > > > from DF/Q types ever happen was useful, since we have code in
> > > > > the
> > > > > driver to handle this scenario.
> > > > > 
> > > > > [...]
> > > > > > 
> > > > > > I still don't understand why you want to implement the same
> > > > > > restriction
> > > > > > twice, once for MOV and once for all other mixed-mode
> > > > > > instructions.  How
> > > > > > is that more convenient?
> > > > > 
> > > > > The restrictions based on operand types that are checked in
> > > > > the
> > > > > validator are specific to Byte or cases where the execution
> > > > > type is
> > > > > larger than the destination stride, for which mixed float has
> > > > > a
> > > > > different set of restrictions.
> > > > > 
> > > > > For example, for mixed float we have this rule:
> > > > > 
> > > > > "In Align1, destination stride can be smaller than execution
> > > > > type"
> > > > > 
> > > > > Which overrides this other from "General restrictions based
> > > > > on
> > > > > operand
> > > > > types":
> > > > > 
> > > > > "Destination stride must be equal to the ratio of the sizes
> > > > > of
> > > > > the
> > > > > execution data type to the destination type"
> > > > > 
> > > > > So I thought that it would make things easier to keep all
> > > > > restrictions
> > > > > for mixed float separate and make sure that we skipped mixed
> > > > > float
> > > > > instructions in general_restrictions_based_on_operand_types()
> > > > > so we
> > > > > avoid having to add code to skip individual general
> > > > > restrictions
> > > > > that that are overriden for mixed float mode anyway.
> > > > > 
> > > > 
> > > > I'm fine with that, but it doesn't seem like what this patch is
> > > > doing...
> > > > Isn't it adding code to validate mixed-mode float MOV
> > > > instructions in
> > > > general_restrictions_based_on_operand_types()?
> > > 
> > > I guess this could be arguable, but I was not considering
> > > conversion
> > > MOVs to be mixed-float instructions. There are two reasons for
> > > this:
> > > 
> > > A conversion MOV involving F/HF doesn't seem to be fundamentally
> > > different from any other conversion instruction involving other
> > > types,
> > > other than the requirement of aligning the destination to a
> > > Dword,
> > > which is not a resriction explictly made for mixed-float mode.
> > > 
> > > Then, for mixed-float mode, there is this other rule:
> > > 
> > > "In Align1, destination stride can be smaller than execution
> > > type.
> > > When
> > > destination is stride of 1, 16 bit packed data
> > > is updated on the destination. However, output packed f16 data
> > > must
> > > be
> > > oword aligned, no oword crossing in
> > > packed f16."
> > > 
> > > Which contradicts the other rule that conversions from F to HF
> > > need
> > > to
> > > be DWord aligned on the destination.
> > > 
> > > So it seems to me that conversion MOVs are not following the same
> > > principles of mixed-float instructions and we should skip
> > > validating
> > > mixed-float restrictions for them. What do you think?
> > > 
> > 
> > That all seems fairly ambiguous...  And the restriction on DWORD
> > alignment for conversions includes a mixed-mode ADD instruction
> > among
> > the examples, so I'm skeptical that MOV could be truly
> > fundamentally
> > different.
> 
> Ok... in that case what do we do about the mixed-float restriction I
> quoted above? Since it is incompatible with the mixed-float MOV
> conversion I guess we only have two options: ether we don't validate
> it
> at all or we only validate it for mixed-float instructions that are
> not
> MOV. I guess we can do the latter?

Also, related to this, if we consider implicit conversions in 2-src
instructions to also be a target of the generic rules for conversions
to HF described for CHV and SKL+, then doesn't that imply that all
mixed-float instructions are conversions and subject to those rules?
For example:

ADD dst:hf  src0:f  src1:f
ADD dst:hf  src0:hf src1:f
ADD dst:hf  src0:f  src1:hf

In all 3 instructions above the execution type is F. Should we consider
all of them implicit conversions? That would mean that any mixed-float
instruction with HF destination is an implicit conversion from F to HF
and therefore would be subject to the generic rules for those
conversions, which at least in the case of CHV and SKL+ involves DWord
alignment on the destination, or in other words: no mixed-float
instruction can have packed fp16 destinations. But that seems to
contradict what various mixed-float restrictions present in the docs
that suggest that packed fp16 destinations are possible with mixed-
float mode. Here are some examples:

"No SIMD16 in mixed mode when destination is packed f16 for both Align1
and Align16"

"In Align1, destination stride can be smaller than execution type. When
destination is stride of 1, 16 bit packed data is updated on the
destination. However, output packed f16 data must be oword aligned, no
oword crossing in packed f16."

"When source is float or half float from accumulator register and
destination is half float with a stride of 1, the source must register
aligned. i.e., source must have offset zero."

Since I was only considering that conversion rules only applied to MOV
instructions this was not an issue in my series, but if we consider
that those rules also apply to implicit conversions from other
instructions then it looks like all these restrictions refer to
impossible situations, so I am not sure what to do about them... should
we just ignore them in the validator?

> > > > > Anyway, I'll try to rework the patches to do more generic
> > > > > validation of
> > > > > HF conversions like you ask and see what kind of code we end
> > > > > up
> > > > > with.
> > > > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > > [...]
On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote:
> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
> > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> > > Iago Toral <itoral@igalia.com> writes:
> > > 
> > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > 
> > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > 
> > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
> > > > > > > > wrote:
> > > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > > 
> > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez
> > > > > > > > > > wrote:
> > > > > > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > > > > > > 
> > > > > > > > > > > > ---
> > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  64
> > > > > > > > > > > > ++++++++++++-
> > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp | 122
> > > > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
> > > > > > > > > > > > deletion(-
> > > > > > > > > > > > )
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git
> > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > > > general_restrictions_based_on_operand_types(con
> > > > > > > > > > > > st
> > > > > > > > > > > > struct
> > > > > > > > > > > > gen_device_info *devinf
> > > > > > > > > > > >         exec_type_size == 8 && dst_type_size ==
> > > > > > > > > > > > 4)
> > > > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > > > >  
> > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > > > +    *
> > > > > > > > > > > > +    *    "There is no direct conversion from
> > > > > > > > > > > > HF
> > > > > > > > > > > > to
> > > > > > > > > > > > DF
> > > > > > > > > > > > or
> > > > > > > > > > > > DF to
> > > > > > > > > > > > HF.
> > > > > > > > > > > > +    *     There is no direct conversion from
> > > > > > > > > > > > HF
> > > > > > > > > > > > to
> > > > > > > > > > > > Q/UQ or
> > > > > > > > > > > > Q/UQ to
> > > > > > > > > > > > HF."
> > > > > > > > > > > > +    */
> > > > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > > > inst);
> > > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst) ==
> > > > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > > > &&
> > > > > > > > > > > 
> > > > > > > > > > > Why is only the MOV instruction handled here and
> > > > > > > > > > > below?  Aren't
> > > > > > > > > > > other
> > > > > > > > > > > instructions able to do implicit
> > > > > > > > > > > conversions?  Probably
> > > > > > > > > > > means
> > > > > > > > > > > you
> > > > > > > > > > > need
> > > > > > > > > > > to deal with two sources rather than one.
> > > > > > > > > > 
> > > > > > > > > > This comes from the programming notes of the MOV
> > > > > > > > > > instruction
> > > > > > > > > > (Volume
> > > > > > > > > > 2a, Command Reference - Instructions - MOV), so it
> > > > > > > > > > is
> > > > > > > > > > described
> > > > > > > > > > specifically for the MOV instruction. I should
> > > > > > > > > > probably
> > > > > > > > > > have
> > > > > > > > > > made
> > > > > > > > > > this
> > > > > > > > > > clear in the comment.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Maybe the one above is specified in the MOV page
> > > > > > > > > only,
> > > > > > > > > probably
> > > > > > > > > due
> > > > > > > > > to
> > > > > > > > > an oversight (If these restrictions were really
> > > > > > > > > specific
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > MOV
> > > > > > > > > instruction, what would prevent you from implementing
> > > > > > > > > such
> > > > > > > > > conversions
> > > > > > > > > through a different instruction?  E.g. "ADD dst:df,
> > > > > > > > > src:hf,
> > > > > > > > > 0"
> > > > > > > > > which
> > > > > > > > > would be substantially more efficient than what
> > > > > > > > > you're
> > > > > > > > > doing
> > > > > > > > > in
> > > > > > > > > PATCH
> > > > > > > > > 02)
> > > > > > > > 
> > > > > > > > Instructions that take HF can only be strictly HF or
> > > > > > > > mix
> > > > > > > > F
> > > > > > > > and
> > > > > > > > HF
> > > > > > > > (mixed mode float), with MOV being the only exception.
> > > > > > > > That
> > > > > > > > means
> > > > > > > > that
> > > > > > > > any instruction like the one you use above are invalid.
> > > > > > > > Maybe
> > > > > > > > we
> > > > > > > > should
> > > > > > > > validate explicitly that instructions that use HF are
> > > > > > > > strictly
> > > > > > > > HF
> > > > > > > > or
> > > > > > > > mixed-float mode only?
> > > > > > > > 
> > > > > > > 
> > > > > > > So you're acknowledging that the conversions checked
> > > > > > > above
> > > > > > > are
> > > > > > > illegal
> > > > > > > whether the instruction is a MOV or something
> > > > > > > else.  Where
> > > > > > > are we
> > > > > > > preventing instructions other than MOV with such
> > > > > > > conversions
> > > > > > > from
> > > > > > > being
> > > > > > > accepted by the validator?
> > > > > > 
> > > > > > We aren't, because the validator is not checking, in
> > > > > > general,
> > > > > > for
> > > > > > accepted type combinations for specific instructions
> > > > > > anywhere
> > > > > > as
> > > > > > far as
> > > > > > I can see.
> > > > > 
> > > > > Luckily these type conversion restrictions aren't really
> > > > > specific
> > > > > to
> > > > > any
> > > > > instruction AFAICT, even though they only seem to be
> > > > > mentioned
> > > > > explicitly for the MOV instruction...
> > > > > 
> > > > > > If we want to start doing this with HF conversions
> > > > > > specifically, I
> > > > > > guess it is fine, but in case it is not clear, I think it
> > > > > > won't
> > > > > > bring
> > > > > > any immediate benefits with the VK_KHR_shader_float16_int8
> > > > > > implementation since this series only ever emits
> > > > > > conversions
> > > > > > involving
> > > > > > HF operands via MOV instructions,
> > > > > 
> > > > > Yes, I can see that's the intention of this series, but how
> > > > > can
> > > > > you
> > > > > make
> > > > > sure that nothing in the optimizer is breaking your
> > > > > assumption
> > > > > if
> > > > > you
> > > > > don't add some validator code to verify the claim of your
> > > > > last
> > > > > paragraph?
> > > > > 
> > > > > > which is why I thought that validating that no direct MOV
> > > > > > conversions
> > > > > > from DF/Q types ever happen was useful, since we have code
> > > > > > in
> > > > > > the
> > > > > > driver to handle this scenario.
> > > > > > 
> > > > > > [...]
> > > > > > > 
> > > > > > > I still don't understand why you want to implement the
> > > > > > > same
> > > > > > > restriction
> > > > > > > twice, once for MOV and once for all other mixed-mode
> > > > > > > instructions.  How
> > > > > > > is that more convenient?
> > > > > > 
> > > > > > The restrictions based on operand types that are checked in
> > > > > > the
> > > > > > validator are specific to Byte or cases where the execution
> > > > > > type is
> > > > > > larger than the destination stride, for which mixed float
> > > > > > has
> > > > > > a
> > > > > > different set of restrictions.
> > > > > > 
> > > > > > For example, for mixed float we have this rule:
> > > > > > 
> > > > > > "In Align1, destination stride can be smaller than
> > > > > > execution
> > > > > > type"
> > > > > > 
> > > > > > Which overrides this other from "General restrictions based
> > > > > > on
> > > > > > operand
> > > > > > types":
> > > > > > 
> > > > > > "Destination stride must be equal to the ratio of the sizes
> > > > > > of
> > > > > > the
> > > > > > execution data type to the destination type"
> > > > > > 
> > > > > > So I thought that it would make things easier to keep all
> > > > > > restrictions
> > > > > > for mixed float separate and make sure that we skipped
> > > > > > mixed
> > > > > > float
> > > > > > instructions in
> > > > > > general_restrictions_based_on_operand_types()
> > > > > > so we
> > > > > > avoid having to add code to skip individual general
> > > > > > restrictions
> > > > > > that that are overriden for mixed float mode anyway.
> > > > > > 
> > > > > 
> > > > > I'm fine with that, but it doesn't seem like what this patch
> > > > > is
> > > > > doing...
> > > > > Isn't it adding code to validate mixed-mode float MOV
> > > > > instructions in
> > > > > general_restrictions_based_on_operand_types()?
> > > > 
> > > > I guess this could be arguable, but I was not considering
> > > > conversion
> > > > MOVs to be mixed-float instructions. There are two reasons for
> > > > this:
> > > > 
> > > > A conversion MOV involving F/HF doesn't seem to be
> > > > fundamentally
> > > > different from any other conversion instruction involving other
> > > > types,
> > > > other than the requirement of aligning the destination to a
> > > > Dword,
> > > > which is not a resriction explictly made for mixed-float mode.
> > > > 
> > > > Then, for mixed-float mode, there is this other rule:
> > > > 
> > > > "In Align1, destination stride can be smaller than execution
> > > > type.
> > > > When
> > > > destination is stride of 1, 16 bit packed data
> > > > is updated on the destination. However, output packed f16 data
> > > > must
> > > > be
> > > > oword aligned, no oword crossing in
> > > > packed f16."
> > > > 
> > > > Which contradicts the other rule that conversions from F to HF
> > > > need
> > > > to
> > > > be DWord aligned on the destination.
> > > > 
> > > > So it seems to me that conversion MOVs are not following the
> > > > same
> > > > principles of mixed-float instructions and we should skip
> > > > validating
> > > > mixed-float restrictions for them. What do you think?
> > > > 
> > > 
> > > That all seems fairly ambiguous...  And the restriction on DWORD
> > > alignment for conversions includes a mixed-mode ADD instruction
> > > among
> > > the examples, so I'm skeptical that MOV could be truly
> > > fundamentally
> > > different.
> > 
> > Ok... in that case what do we do about the mixed-float restriction
> > I
> > quoted above? Since it is incompatible with the mixed-float MOV
> > conversion I guess we only have two options: ether we don't
> > validate
> > it
> > at all or we only validate it for mixed-float instructions that are
> > not
> > MOV. I guess we can do the latter?
> 
> Also, related to this, if we consider implicit conversions in 2-src
> instructions to also be a target of the generic rules for conversions
> to HF described for CHV and SKL+, then doesn't that imply that all
> mixed-float instructions are conversions and subject to those rules?
> For example:
> 
> ADD dst:hf  src0:f  src1:f
> ADD dst:hf  src0:hf src1:f
> ADD dst:hf  src0:f  src1:hf
> 
> In all 3 instructions above the execution type is F. Should we
> consider
> all of them implicit conversions? That would mean that any mixed-
> float
> instruction with HF destination is an implicit conversion from F to
> HF
> and therefore would be subject to the generic rules for those
> conversions, which at least in the case of CHV and SKL+ involves
> DWord
> alignment on the destination, or in other words: no mixed-float
> instruction can have packed fp16 destinations. But that seems to
> contradict what various mixed-float restrictions present in the docs
> that suggest that packed fp16 destinations are possible with mixed-
> float mode. Here are some examples:
> 
> "No SIMD16 in mixed mode when destination is packed f16 for both
> Align1
> and Align16"
> 
> "In Align1, destination stride can be smaller than execution type.
> When
> destination is stride of 1, 16 bit packed data is updated on the
> destination. However, output packed f16 data must be oword aligned,
> no
> oword crossing in packed f16."
> 
> "When source is float or half float from accumulator register and
> destination is half float with a stride of 1, the source must
> register
> aligned. i.e., source must have offset zero."
> 
> Since I was only considering that conversion rules only applied to
> MOV
> instructions this was not an issue in my series, but if we consider
> that those rules also apply to implicit conversions from other
> instructions then it looks like all these restrictions refer to
> impossible situations, so I am not sure what to do about them...
> should
> we just ignore them in the validator?

And I have just noticed another issue with the idea of considering
direct conversions involving F/HF mixed-float instructions. There is
this other restriction for mixed-float mode:

"No SIMD16 in mixed mode when destination is f32. Instruction Execution
size must be no more than 8."

So if we consider direct HF->F conversions mixed-float instructions
then we would habe to split all of them to be 8-wide. Obviously, I have
not been doing this and I have never seen any issues on any platform
because of this.

Iago



> 
> > > > > > Anyway, I'll try to rework the patches to do more generic
> > > > > > validation of
> > > > > > HF conversions like you ask and see what kind of code we
> > > > > > end
> > > > > > up
> > > > > > with.
> > > > > > 
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > > [...]




On Tue, 2019-03-12 at 15:44 -0700, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
> > > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> > > > Iago Toral <itoral@igalia.com> writes:
> > > > 
> > > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > 
> > > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
> > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > 
> > > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
> > > > > > > > > wrote:
> > > > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > > > 
> > > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco
> > > > > > > > > > > Jerez
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Iago Toral Quiroga <itoral@igalia.com> writes:
> > > > > > > > > > > > 
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    |  6
> > > > > > > > > > > > > 4
> > > > > > > > > > > > > ++++++++++++-
> > > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp |
> > > > > > > > > > > > > 122
> > > > > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
> > > > > > > > > > > > > deletion(-
> > > > > > > > > > > > > )
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > > > > general_restrictions_based_on_operand_types(c
> > > > > > > > > > > > > onst
> > > > > > > > > > > > > struct
> > > > > > > > > > > > > gen_device_info *devinf
> > > > > > > > > > > > >         exec_type_size == 8 && dst_type_size
> > > > > > > > > > > > > ==
> > > > > > > > > > > > > 4)
> > > > > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > > > > >  
> > > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > > > > +    *
> > > > > > > > > > > > > +    *    "There is no direct conversion from
> > > > > > > > > > > > > HF
> > > > > > > > > > > > > to
> > > > > > > > > > > > > DF
> > > > > > > > > > > > > or
> > > > > > > > > > > > > DF to
> > > > > > > > > > > > > HF.
> > > > > > > > > > > > > +    *     There is no direct conversion from
> > > > > > > > > > > > > HF
> > > > > > > > > > > > > to
> > > > > > > > > > > > > Q/UQ or
> > > > > > > > > > > > > Q/UQ to
> > > > > > > > > > > > > HF."
> > > > > > > > > > > > > +    */
> > > > > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > > > > inst);
> > > > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst)
> > > > > > > > > > > > > ==
> > > > > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > > > > &&
> > > > > > > > > > > > 
> > > > > > > > > > > > Why is only the MOV instruction handled here
> > > > > > > > > > > > and
> > > > > > > > > > > > below?  Aren't
> > > > > > > > > > > > other
> > > > > > > > > > > > instructions able to do implicit
> > > > > > > > > > > > conversions?  Probably
> > > > > > > > > > > > means
> > > > > > > > > > > > you
> > > > > > > > > > > > need
> > > > > > > > > > > > to deal with two sources rather than one.
> > > > > > > > > > > 
> > > > > > > > > > > This comes from the programming notes of the MOV
> > > > > > > > > > > instruction
> > > > > > > > > > > (Volume
> > > > > > > > > > > 2a, Command Reference - Instructions - MOV), so
> > > > > > > > > > > it is
> > > > > > > > > > > described
> > > > > > > > > > > specifically for the MOV instruction. I should
> > > > > > > > > > > probably
> > > > > > > > > > > have
> > > > > > > > > > > made
> > > > > > > > > > > this
> > > > > > > > > > > clear in the comment.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Maybe the one above is specified in the MOV page
> > > > > > > > > > only,
> > > > > > > > > > probably
> > > > > > > > > > due
> > > > > > > > > > to
> > > > > > > > > > an oversight (If these restrictions were really
> > > > > > > > > > specific
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > MOV
> > > > > > > > > > instruction, what would prevent you from
> > > > > > > > > > implementing
> > > > > > > > > > such
> > > > > > > > > > conversions
> > > > > > > > > > through a different instruction?  E.g. "ADD dst:df,
> > > > > > > > > > src:hf,
> > > > > > > > > > 0"
> > > > > > > > > > which
> > > > > > > > > > would be substantially more efficient than what
> > > > > > > > > > you're
> > > > > > > > > > doing
> > > > > > > > > > in
> > > > > > > > > > PATCH
> > > > > > > > > > 02)
> > > > > > > > > 
> > > > > > > > > Instructions that take HF can only be strictly HF or
> > > > > > > > > mix
> > > > > > > > > F
> > > > > > > > > and
> > > > > > > > > HF
> > > > > > > > > (mixed mode float), with MOV being the only
> > > > > > > > > exception.
> > > > > > > > > That
> > > > > > > > > means
> > > > > > > > > that
> > > > > > > > > any instruction like the one you use above are
> > > > > > > > > invalid.
> > > > > > > > > Maybe
> > > > > > > > > we
> > > > > > > > > should
> > > > > > > > > validate explicitly that instructions that use HF are
> > > > > > > > > strictly
> > > > > > > > > HF
> > > > > > > > > or
> > > > > > > > > mixed-float mode only?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > So you're acknowledging that the conversions checked
> > > > > > > > above
> > > > > > > > are
> > > > > > > > illegal
> > > > > > > > whether the instruction is a MOV or something
> > > > > > > > else.  Where
> > > > > > > > are we
> > > > > > > > preventing instructions other than MOV with such
> > > > > > > > conversions
> > > > > > > > from
> > > > > > > > being
> > > > > > > > accepted by the validator?
> > > > > > > 
> > > > > > > We aren't, because the validator is not checking, in
> > > > > > > general,
> > > > > > > for
> > > > > > > accepted type combinations for specific instructions
> > > > > > > anywhere
> > > > > > > as
> > > > > > > far as
> > > > > > > I can see.
> > > > > > 
> > > > > > Luckily these type conversion restrictions aren't really
> > > > > > specific
> > > > > > to
> > > > > > any
> > > > > > instruction AFAICT, even though they only seem to be
> > > > > > mentioned
> > > > > > explicitly for the MOV instruction...
> > > > > > 
> > > > > > > If we want to start doing this with HF conversions
> > > > > > > specifically, I
> > > > > > > guess it is fine, but in case it is not clear, I think it
> > > > > > > won't
> > > > > > > bring
> > > > > > > any immediate benefits with the
> > > > > > > VK_KHR_shader_float16_int8
> > > > > > > implementation since this series only ever emits
> > > > > > > conversions
> > > > > > > involving
> > > > > > > HF operands via MOV instructions,
> > > > > > 
> > > > > > Yes, I can see that's the intention of this series, but how
> > > > > > can
> > > > > > you
> > > > > > make
> > > > > > sure that nothing in the optimizer is breaking your
> > > > > > assumption
> > > > > > if
> > > > > > you
> > > > > > don't add some validator code to verify the claim of your
> > > > > > last
> > > > > > paragraph?
> > > > > > 
> > > > > > > which is why I thought that validating that no direct MOV
> > > > > > > conversions
> > > > > > > from DF/Q types ever happen was useful, since we have
> > > > > > > code in
> > > > > > > the
> > > > > > > driver to handle this scenario.
> > > > > > > 
> > > > > > > [...]
> > > > > > > > 
> > > > > > > > I still don't understand why you want to implement the
> > > > > > > > same
> > > > > > > > restriction
> > > > > > > > twice, once for MOV and once for all other mixed-mode
> > > > > > > > instructions.  How
> > > > > > > > is that more convenient?
> > > > > > > 
> > > > > > > The restrictions based on operand types that are checked
> > > > > > > in
> > > > > > > the
> > > > > > > validator are specific to Byte or cases where the
> > > > > > > execution
> > > > > > > type is
> > > > > > > larger than the destination stride, for which mixed float
> > > > > > > has
> > > > > > > a
> > > > > > > different set of restrictions.
> > > > > > > 
> > > > > > > For example, for mixed float we have this rule:
> > > > > > > 
> > > > > > > "In Align1, destination stride can be smaller than
> > > > > > > execution
> > > > > > > type"
> > > > > > > 
> > > > > > > Which overrides this other from "General restrictions
> > > > > > > based
> > > > > > > on
> > > > > > > operand
> > > > > > > types":
> > > > > > > 
> > > > > > > "Destination stride must be equal to the ratio of the
> > > > > > > sizes
> > > > > > > of
> > > > > > > the
> > > > > > > execution data type to the destination type"
> > > > > > > 
> > > > > > > So I thought that it would make things easier to keep all
> > > > > > > restrictions
> > > > > > > for mixed float separate and make sure that we skipped
> > > > > > > mixed
> > > > > > > float
> > > > > > > instructions in
> > > > > > > general_restrictions_based_on_operand_types()
> > > > > > > so we
> > > > > > > avoid having to add code to skip individual general
> > > > > > > restrictions
> > > > > > > that that are overriden for mixed float mode anyway.
> > > > > > > 
> > > > > > 
> > > > > > I'm fine with that, but it doesn't seem like what this
> > > > > > patch is
> > > > > > doing...
> > > > > > Isn't it adding code to validate mixed-mode float MOV
> > > > > > instructions in
> > > > > > general_restrictions_based_on_operand_types()?
> > > > > 
> > > > > I guess this could be arguable, but I was not considering
> > > > > conversion
> > > > > MOVs to be mixed-float instructions. There are two reasons
> > > > > for
> > > > > this:
> > > > > 
> > > > > A conversion MOV involving F/HF doesn't seem to be
> > > > > fundamentally
> > > > > different from any other conversion instruction involving
> > > > > other
> > > > > types,
> > > > > other than the requirement of aligning the destination to a
> > > > > Dword,
> > > > > which is not a resriction explictly made for mixed-float
> > > > > mode.
> > > > > 
> > > > > Then, for mixed-float mode, there is this other rule:
> > > > > 
> > > > > "In Align1, destination stride can be smaller than execution
> > > > > type.
> > > > > When
> > > > > destination is stride of 1, 16 bit packed data
> > > > > is updated on the destination. However, output packed f16
> > > > > data
> > > > > must
> > > > > be
> > > > > oword aligned, no oword crossing in
> > > > > packed f16."
> > > > > 
> > > > > Which contradicts the other rule that conversions from F to
> > > > > HF
> > > > > need
> > > > > to
> > > > > be DWord aligned on the destination.
> > > > > 
> > > > > So it seems to me that conversion MOVs are not following the
> > > > > same
> > > > > principles of mixed-float instructions and we should skip
> > > > > validating
> > > > > mixed-float restrictions for them. What do you think?
> > > > > 
> > > > 
> > > > That all seems fairly ambiguous...  And the restriction on
> > > > DWORD
> > > > alignment for conversions includes a mixed-mode ADD instruction
> > > > among
> > > > the examples, so I'm skeptical that MOV could be truly
> > > > fundamentally
> > > > different.
> > > 
> > > Ok... in that case what do we do about the mixed-float
> > > restriction I
> > > quoted above? Since it is incompatible with the mixed-float MOV
> > > conversion I guess we only have two options: ether we don't
> > > validate
> > > it
> > > at all or we only validate it for mixed-float instructions that
> > > are
> > > not
> > > MOV. I guess we can do the latter?
> > 
> > Also, related to this, if we consider implicit conversions in 2-src
> > instructions to also be a target of the generic rules for
> > conversions
> > to HF described for CHV and SKL+, then doesn't that imply that all
> > mixed-float instructions are conversions and subject to those
> > rules?
> > For example:
> > 
> > ADD dst:hf  src0:f  src1:f
> > ADD dst:hf  src0:hf src1:f
> > ADD dst:hf  src0:f  src1:hf
> > 
> > In all 3 instructions above the execution type is F. Should we
> > consider
> > all of them implicit conversions?
> 
> Yes, I just verified that the simulator considers all of the above to
> fall under the category "Format conversion to or from HF", and
> requires
> their destination components to be DWORD-aligned, but *only* on BDW.

Thanks for testing checking this in the validator! I didn't find
anything in the BDW docs that documents this behavior explicitly, at
least in my copy of the BDW PRM, but it sounds like this behavior would
be in line with the general rule that conversions to smaller types need
to be aligned to the size of the largr type so that's proably why they
didn't include anything specific for these conversions...

> > That would mean that any mixed-float instruction with HF
> > destination
> > is an implicit conversion from F to HF and therefore would be
> > subject
> > to the generic rules for those conversions, which at least in the
> > case
> > of CHV and SKL+ involves DWord alignment on the destination, or in
> > other words: no mixed-float instruction can have packed fp16
> > destinations.
> 
> AFAIUI, your reasoning in the paragraph above is correct for BDW
> *only*,
> not for CHV and SKL+ which have an exception to the DWORD channel
> alignment rule (whether the instruction is doing conversions or not)
> in
> the case where the destination is HF, has stride equal to 1, and has
> sub-register offset aligned to 16B.

You mean this rule listed for mixed float mode, right?:

"In Align1, destination stride can be smaller than execution type. When
destination is stride of 1, 16 bit packed data is updated on the
destination. However, output packed f16 data must be oword aligned, no
oword crossing in packed f16."

That brings up another question: my copy of the BDW PRM doesn't list
any restrictions for mixed float mode at all. It only has this:

"Special Requirements for Handling Mixed Mode Float Operations

There are some special requirements for handling mixed mode float
operations for specific projects, described in the following table. If
you are viewing a version of the BSpec limited to other particular
projects, the table may appear with no data rows."

But then there is no table and no restrictions listed... I have been
assuming that this was just bogus since there are a lot of restrictions
listed in the CHV and SKL+ documentation, but from what you say above,
it looks like my guess was incorrect, since you say that the exception
to the Dword channel alignment rule that I quote above doesn't apply to
BDW, so I guess my question now is:

Is my copy of the BDW PRM correct in that there are no restrictions for
mixed-float mode in that platform?

> > But that seems to contradict what various mixed-float restrictions
> > present in the docs that suggest that packed fp16 destinations are
> > possible with mixed- float mode. Here are some examples:
> > 
> 
> AFAICT the reason for the apparent contradiction is that packed fp16
> destinations are allowed for such instruction on CHV/SKL+ as an
> exception to the more strict BDW rule.
> 
> > "No SIMD16 in mixed mode when destination is packed f16 for both
> > Align1
> > and Align16"
> > 
> > "In Align1, destination stride can be smaller than execution type.
> > When
> > destination is stride of 1, 16 bit packed data is updated on the
> > destination. However, output packed f16 data must be oword aligned,
> > no
> > oword crossing in packed f16."
> > 
> > "When source is float or half float from accumulator register and
> > destination is half float with a stride of 1, the source must
> > register
> > aligned. i.e., source must have offset zero."
> > 
> > Since I was only considering that conversion rules only applied to
> > MOV
> > instructions this was not an issue in my series, but if we consider
> > that those rules also apply to implicit conversions from other
> > instructions then it looks like all these restrictions refer to
> > impossible situations, so I am not sure what to do about them...
> > should
> > we just ignore them in the validator?
> > 
> 
> I don't see any reason to ignore them.  The last three restrictions
> you've quoted above probably apply to MOV conversions in the same way
> that they apply to other instructions.  Unfortunately only the last
> one
> you quoted seems to be enforced by the simulator, and it certainly
> kicks
> in for any instructions with an accumulator source and a packed
> half-float destination, including MOV.

Yes, I agree, all these would be compatible with the exception to the
DWord alignment rule you mentioned for SKL+ and CHV, so I think this is
clear now. Thanks!

> > > > > > > Anyway, I'll try to rework the patches to do more generic
> > > > > > > validation of
> > > > > > > HF conversions like you ask and see what kind of code we
> > > > > > > end
> > > > > > > up
> > > > > > > with.
> > > > > > > 
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > > [...]
On Tue, 2019-03-12 at 15:46 -0700, Francisco Jerez wrote:
> Iago Toral <itoral@igalia.com> writes:
> 
> > On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote:
> > > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
> > > > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
> > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > 
> > > > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
> > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > 
> > > > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez
> > > > > > > > wrote:
> > > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > > 
> > > > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez
> > > > > > > > > > wrote:
> > > > > > > > > > > Iago Toral <itoral@igalia.com> writes:
> > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco
> > > > > > > > > > > > Jerez
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Iago Toral Quiroga <itoral@igalia.com>
> > > > > > > > > > > > > writes:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  src/intel/compiler/brw_eu_validate.c    | 
> > > > > > > > > > > > > >  64
> > > > > > > > > > > > > > ++++++++++++-
> > > > > > > > > > > > > >  src/intel/compiler/test_eu_validate.cpp |
> > > > > > > > > > > > > > 122
> > > > > > > > > > > > > > ++++++++++++++++++++++++
> > > > > > > > > > > > > >  2 files changed, 185 insertions(+), 1
> > > > > > > > > > > > > > deletion(-
> > > > > > > > > > > > > > )
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644
> > > > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
> > > > > > > > > > > > > > @@ -531,7 +531,69 @@
> > > > > > > > > > > > > > general_restrictions_based_on_operand_types
> > > > > > > > > > > > > > (con
> > > > > > > > > > > > > > st
> > > > > > > > > > > > > > struct
> > > > > > > > > > > > > > gen_device_info *devinf
> > > > > > > > > > > > > >         exec_type_size == 8 &&
> > > > > > > > > > > > > > dst_type_size ==
> > > > > > > > > > > > > > 4)
> > > > > > > > > > > > > >        dst_type_size = 8;
> > > > > > > > > > > > > >  
> > > > > > > > > > > > > > -   if (exec_type_size > dst_type_size) {
> > > > > > > > > > > > > > +   /* From the BDW+ PRM:
> > > > > > > > > > > > > > +    *
> > > > > > > > > > > > > > +    *    "There is no direct conversion
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > HF
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > DF
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > DF to
> > > > > > > > > > > > > > HF.
> > > > > > > > > > > > > > +    *     There is no direct conversion
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > HF
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > Q/UQ or
> > > > > > > > > > > > > > Q/UQ to
> > > > > > > > > > > > > > HF."
> > > > > > > > > > > > > > +    */
> > > > > > > > > > > > > > +   enum brw_reg_type src0_type =
> > > > > > > > > > > > > > brw_inst_src0_type(devinfo,
> > > > > > > > > > > > > > inst);
> > > > > > > > > > > > > > +   ERROR_IF(brw_inst_opcode(devinfo, inst)
> > > > > > > > > > > > > > ==
> > > > > > > > > > > > > > BRW_OPCODE_MOV
> > > > > > > > > > > > > > &&
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Why is only the MOV instruction handled here
> > > > > > > > > > > > > and
> > > > > > > > > > > > > below?  Aren't
> > > > > > > > > > > > > other
> > > > > > > > > > > > > instructions able to do implicit
> > > > > > > > > > > > > conversions?  Probably
> > > > > > > > > > > > > means
> > > > > > > > > > > > > you
> > > > > > > > > > > > > need
> > > > > > > > > > > > > to deal with two sources rather than one.
> > > > > > > > > > > > 
> > > > > > > > > > > > This comes from the programming notes of the
> > > > > > > > > > > > MOV
> > > > > > > > > > > > instruction
> > > > > > > > > > > > (Volume
> > > > > > > > > > > > 2a, Command Reference - Instructions - MOV), so
> > > > > > > > > > > > it
> > > > > > > > > > > > is
> > > > > > > > > > > > described
> > > > > > > > > > > > specifically for the MOV instruction. I should
> > > > > > > > > > > > probably
> > > > > > > > > > > > have
> > > > > > > > > > > > made
> > > > > > > > > > > > this
> > > > > > > > > > > > clear in the comment.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Maybe the one above is specified in the MOV page
> > > > > > > > > > > only,
> > > > > > > > > > > probably
> > > > > > > > > > > due
> > > > > > > > > > > to
> > > > > > > > > > > an oversight (If these restrictions were really
> > > > > > > > > > > specific
> > > > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > MOV
> > > > > > > > > > > instruction, what would prevent you from
> > > > > > > > > > > implementing
> > > > > > > > > > > such
> > > > > > > > > > > conversions
> > > > > > > > > > > through a different instruction?  E.g. "ADD
> > > > > > > > > > > dst:df,
> > > > > > > > > > > src:hf,
> > > > > > > > > > > 0"
> > > > > > > > > > > which
> > > > > > > > > > > would be substantially more efficient than what
> > > > > > > > > > > you're
> > > > > > > > > > > doing
> > > > > > > > > > > in
> > > > > > > > > > > PATCH
> > > > > > > > > > > 02)
> > > > > > > > > > 
> > > > > > > > > > Instructions that take HF can only be strictly HF
> > > > > > > > > > or
> > > > > > > > > > mix
> > > > > > > > > > F
> > > > > > > > > > and
> > > > > > > > > > HF
> > > > > > > > > > (mixed mode float), with MOV being the only
> > > > > > > > > > exception.
> > > > > > > > > > That
> > > > > > > > > > means
> > > > > > > > > > that
> > > > > > > > > > any instruction like the one you use above are
> > > > > > > > > > invalid.
> > > > > > > > > > Maybe
> > > > > > > > > > we
> > > > > > > > > > should
> > > > > > > > > > validate explicitly that instructions that use HF
> > > > > > > > > > are
> > > > > > > > > > strictly
> > > > > > > > > > HF
> > > > > > > > > > or
> > > > > > > > > > mixed-float mode only?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > So you're acknowledging that the conversions checked
> > > > > > > > > above
> > > > > > > > > are
> > > > > > > > > illegal
> > > > > > > > > whether the instruction is a MOV or something
> > > > > > > > > else.  Where
> > > > > > > > > are we
> > > > > > > > > preventing instructions other than MOV with such
> > > > > > > > > conversions
> > > > > > > > > from
> > > > > > > > > being
> > > > > > > > > accepted by the validator?
> > > > > > > > 
> > > > > > > > We aren't, because the validator is not checking, in
> > > > > > > > general,
> > > > > > > > for
> > > > > > > > accepted type combinations for specific instructions
> > > > > > > > anywhere
> > > > > > > > as
> > > > > > > > far as
> > > > > > > > I can see.
> > > > > > > 
> > > > > > > Luckily these type conversion restrictions aren't really
> > > > > > > specific
> > > > > > > to
> > > > > > > any
> > > > > > > instruction AFAICT, even though they only seem to be
> > > > > > > mentioned
> > > > > > > explicitly for the MOV instruction...
> > > > > > > 
> > > > > > > > If we want to start doing this with HF conversions
> > > > > > > > specifically, I
> > > > > > > > guess it is fine, but in case it is not clear, I think
> > > > > > > > it
> > > > > > > > won't
> > > > > > > > bring
> > > > > > > > any immediate benefits with the
> > > > > > > > VK_KHR_shader_float16_int8
> > > > > > > > implementation since this series only ever emits
> > > > > > > > conversions
> > > > > > > > involving
> > > > > > > > HF operands via MOV instructions,
> > > > > > > 
> > > > > > > Yes, I can see that's the intention of this series, but
> > > > > > > how
> > > > > > > can
> > > > > > > you
> > > > > > > make
> > > > > > > sure that nothing in the optimizer is breaking your
> > > > > > > assumption
> > > > > > > if
> > > > > > > you
> > > > > > > don't add some validator code to verify the claim of your
> > > > > > > last
> > > > > > > paragraph?
> > > > > > > 
> > > > > > > > which is why I thought that validating that no direct
> > > > > > > > MOV
> > > > > > > > conversions
> > > > > > > > from DF/Q types ever happen was useful, since we have
> > > > > > > > code
> > > > > > > > in
> > > > > > > > the
> > > > > > > > driver to handle this scenario.
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > > 
> > > > > > > > > I still don't understand why you want to implement
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > > restriction
> > > > > > > > > twice, once for MOV and once for all other mixed-mode
> > > > > > > > > instructions.  How
> > > > > > > > > is that more convenient?
> > > > > > > > 
> > > > > > > > The restrictions based on operand types that are
> > > > > > > > checked in
> > > > > > > > the
> > > > > > > > validator are specific to Byte or cases where the
> > > > > > > > execution
> > > > > > > > type is
> > > > > > > > larger than the destination stride, for which mixed
> > > > > > > > float
> > > > > > > > has
> > > > > > > > a
> > > > > > > > different set of restrictions.
> > > > > > > > 
> > > > > > > > For example, for mixed float we have this rule:
> > > > > > > > 
> > > > > > > > "In Align1, destination stride can be smaller than
> > > > > > > > execution
> > > > > > > > type"
> > > > > > > > 
> > > > > > > > Which overrides this other from "General restrictions
> > > > > > > > based
> > > > > > > > on
> > > > > > > > operand
> > > > > > > > types":
> > > > > > > > 
> > > > > > > > "Destination stride must be equal to the ratio of the
> > > > > > > > sizes
> > > > > > > > of
> > > > > > > > the
> > > > > > > > execution data type to the destination type"
> > > > > > > > 
> > > > > > > > So I thought that it would make things easier to keep
> > > > > > > > all
> > > > > > > > restrictions
> > > > > > > > for mixed float separate and make sure that we skipped
> > > > > > > > mixed
> > > > > > > > float
> > > > > > > > instructions in
> > > > > > > > general_restrictions_based_on_operand_types()
> > > > > > > > so we
> > > > > > > > avoid having to add code to skip individual general
> > > > > > > > restrictions
> > > > > > > > that that are overriden for mixed float mode anyway.
> > > > > > > > 
> > > > > > > 
> > > > > > > I'm fine with that, but it doesn't seem like what this
> > > > > > > patch
> > > > > > > is
> > > > > > > doing...
> > > > > > > Isn't it adding code to validate mixed-mode float MOV
> > > > > > > instructions in
> > > > > > > general_restrictions_based_on_operand_types()?
> > > > > > 
> > > > > > I guess this could be arguable, but I was not considering
> > > > > > conversion
> > > > > > MOVs to be mixed-float instructions. There are two reasons
> > > > > > for
> > > > > > this:
> > > > > > 
> > > > > > A conversion MOV involving F/HF doesn't seem to be
> > > > > > fundamentally
> > > > > > different from any other conversion instruction involving
> > > > > > other
> > > > > > types,
> > > > > > other than the requirement of aligning the destination to a
> > > > > > Dword,
> > > > > > which is not a resriction explictly made for mixed-float
> > > > > > mode.
> > > > > > 
> > > > > > Then, for mixed-float mode, there is this other rule:
> > > > > > 
> > > > > > "In Align1, destination stride can be smaller than
> > > > > > execution
> > > > > > type.
> > > > > > When
> > > > > > destination is stride of 1, 16 bit packed data
> > > > > > is updated on the destination. However, output packed f16
> > > > > > data
> > > > > > must
> > > > > > be
> > > > > > oword aligned, no oword crossing in
> > > > > > packed f16."
> > > > > > 
> > > > > > Which contradicts the other rule that conversions from F to
> > > > > > HF
> > > > > > need
> > > > > > to
> > > > > > be DWord aligned on the destination.
> > > > > > 
> > > > > > So it seems to me that conversion MOVs are not following
> > > > > > the
> > > > > > same
> > > > > > principles of mixed-float instructions and we should skip
> > > > > > validating
> > > > > > mixed-float restrictions for them. What do you think?
> > > > > > 
> > > > > 
> > > > > That all seems fairly ambiguous...  And the restriction on
> > > > > DWORD
> > > > > alignment for conversions includes a mixed-mode ADD
> > > > > instruction
> > > > > among
> > > > > the examples, so I'm skeptical that MOV could be truly
> > > > > fundamentally
> > > > > different.
> > > > 
> > > > Ok... in that case what do we do about the mixed-float
> > > > restriction
> > > > I
> > > > quoted above? Since it is incompatible with the mixed-float MOV
> > > > conversion I guess we only have two options: ether we don't
> > > > validate
> > > > it
> > > > at all or we only validate it for mixed-float instructions that
> > > > are
> > > > not
> > > > MOV. I guess we can do the latter?
> > > 
> > > Also, related to this, if we consider implicit conversions in 2-
> > > src
> > > instructions to also be a target of the generic rules for
> > > conversions
> > > to HF described for CHV and SKL+, then doesn't that imply that
> > > all
> > > mixed-float instructions are conversions and subject to those
> > > rules?
> > > For example:
> > > 
> > > ADD dst:hf  src0:f  src1:f
> > > ADD dst:hf  src0:hf src1:f
> > > ADD dst:hf  src0:f  src1:hf
> > > 
> > > In all 3 instructions above the execution type is F. Should we
> > > consider
> > > all of them implicit conversions? That would mean that any mixed-
> > > float
> > > instruction with HF destination is an implicit conversion from F
> > > to
> > > HF
> > > and therefore would be subject to the generic rules for those
> > > conversions, which at least in the case of CHV and SKL+ involves
> > > DWord
> > > alignment on the destination, or in other words: no mixed-float
> > > instruction can have packed fp16 destinations. But that seems to
> > > contradict what various mixed-float restrictions present in the
> > > docs
> > > that suggest that packed fp16 destinations are possible with
> > > mixed-
> > > float mode. Here are some examples:
> > > 
> > > "No SIMD16 in mixed mode when destination is packed f16 for both
> > > Align1
> > > and Align16"
> > > 
> > > "In Align1, destination stride can be smaller than execution
> > > type.
> > > When
> > > destination is stride of 1, 16 bit packed data is updated on the
> > > destination. However, output packed f16 data must be oword
> > > aligned,
> > > no
> > > oword crossing in packed f16."
> > > 
> > > "When source is float or half float from accumulator register and
> > > destination is half float with a stride of 1, the source must
> > > register
> > > aligned. i.e., source must have offset zero."
> > > 
> > > Since I was only considering that conversion rules only applied
> > > to
> > > MOV
> > > instructions this was not an issue in my series, but if we
> > > consider
> > > that those rules also apply to implicit conversions from other
> > > instructions then it looks like all these restrictions refer to
> > > impossible situations, so I am not sure what to do about them...
> > > should
> > > we just ignore them in the validator?
> > 
> > And I have just noticed another issue with the idea of considering
> > direct conversions involving F/HF mixed-float instructions. There
> > is
> > this other restriction for mixed-float mode:
> > 
> > "No SIMD16 in mixed mode when destination is f32. Instruction
> > Execution
> > size must be no more than 8."
> > 
> > So if we consider direct HF->F conversions mixed-float instructions
> > then we would habe to split all of them to be 8-wide. Obviously, I
> > have
> > not been doing this and I have never seen any issues on any
> > platform
> > because of this.
> > 
> 
> Hm...  That's kind of scary...  Your code seems to be violating this
> and
> the other no-SIMD16 restriction of mixed-mode instructions to my best
> interpretation of the B-Spec.  Unfortunately the simulator doesn't
> seem
> to enforce either of them so I'm not certain whether it's going to
> blow
> up in practice or not.  Maybe implement the restriction in
> get_fpu_lowered_simd_width() to be on the safe side?

Yeah... this is quite unfortunate. I have not seen any problem with
this on any platform I have tested, and I guess that the fact that the
simulator doesn't complain about them might be another hint that this
is actually not a problem but it is annoying to have the docs and the
testing give us different info :-(.

It is particularly bad in this case, because when you start mixing
half-float and float you inevitably get a lot of conversions between
the two types along the way (for example, there are a bunch of
instructions that do not support half-float so we need to convert to
F), so if we decide to split conversions with F destination I imagine
it is going to have a visible  negative effect on performance, which is
particularly bad if it ends up being that there is not such restriction
for MOV in practice after all. Anyway, I suppose taking the safe
approach could make sense...

> > Iago
> > 
> > 
> > 
> > > 
> > > > > > > > Anyway, I'll try to rework the patches to do more
> > > > > > > > generic
> > > > > > > > validation of
> > > > > > > > HF conversions like you ask and see what kind of code
> > > > > > > > we
> > > > > > > > end
> > > > > > > > up
> > > > > > > > with.
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > > [...]