[v3,24/42] intel/compiler: fix ddy for half-float in gen8

Submitted by Iago Toral Quiroga on Jan. 15, 2019, 1:53 p.m.

Details

Message ID 20190115135414.2313-25-itoral@igalia.com
State New
Headers show
Series "intel: VK_KHR_shader_float16_int8 implementation" ( rev: 5 4 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga Jan. 15, 2019, 1:53 p.m.
We use ALign16 mode for this, since it is more convenient, but the PRM
for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
restrictions', Section '1. Special Restrictions':

   "In Align16 mode, the channel selects and channel enables apply to a
    pair of half-floats, because these parameters are defined for DWord
    elements ONLY. This is applicable when both source and destination
    are half-floats."

This means that we cannot select individual HF elements using swizzles
like we do with 32-bit floats so we can't implement the required
regioning for this.

Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in gen9 or gen10, where the Align16
implementation seems to work just fine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
---
 src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
index d0cc4a6d231..4310f0b7fdc 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1339,8 +1339,14 @@  fs_generator::generate_ddy(const fs_inst *inst,
    const uint32_t type_size = type_sz(src.type);
 
    if (inst->opcode == FS_OPCODE_DDY_FINE) {
-      /* produce accurate derivatives */
-      if (devinfo->gen >= 11) {
+      /* produce accurate derivatives. We can do this easily in Align16
+       * but this is not supported in gen11+ and gen8 Align16 swizzles
+       * for Half-Float operands work in units of 32-bit and always
+       * select pairs of consecutive half-float elements, so we can't use
+       * use it for this.
+       */
+      if (devinfo->gen >= 11 ||
+          (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {
          src = stride(src, 0, 2, 1);
          struct brw_reg src_0  = byte_offset(src,  0 * type_size);
          struct brw_reg src_2  = byte_offset(src,  2 * type_size);

Comments

On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral@igalia.com> wrote:
>
> We use ALign16 mode for this, since it is more convenient, but the PRM
> for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region
> restrictions', Section '1. Special Restrictions':
>
>    "In Align16 mode, the channel selects and channel enables apply to a
>     pair of half-floats, because these parameters are defined for DWord
>     elements ONLY. This is applicable when both source and destination
>     are half-floats."
>
> This means that we cannot select individual HF elements using swizzles
> like we do with 32-bit floats so we can't implement the required
> regioning for this.
>
> Use the gen11 path for this instead, which uses Align1 mode.
>
> The restriction is not present in gen9 or gen10, where the Align16
> implementation seems to work just fine.
>
> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
> ---
>  src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp
> index d0cc4a6d231..4310f0b7fdc 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst,
>     const uint32_t type_size = type_sz(src.type);
>
>     if (inst->opcode == FS_OPCODE_DDY_FINE) {
> -      /* produce accurate derivatives */
> -      if (devinfo->gen >= 11) {
> +      /* produce accurate derivatives. We can do this easily in Align16
> +       * but this is not supported in gen11+ and gen8 Align16 swizzles
> +       * for Half-Float operands work in units of 32-bit and always
> +       * select pairs of consecutive half-float elements, so we can't use
> +       * use it for this.
> +       */

Let's break this comment up and include (or move) the BSpec text from
the commit message here. I wouldn't mention the "this is not supported
in gen11+" because it's slightly unclear whether you're talking about
"accurate derivatives" or "Align16". How about:


      /* produce accurate derivatives.
       *
       * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
       * "Register Region Restrictions", Section "1. Special Restrictions":
       *
       *    "In Align16 mode, the channel selects and channel enables apply to
       *     a pair of half-floats, because these parameters are defined for
       *     DWord elements ONLY. This is applicable when both source and
       *     destination are half-floats."
       *
       * So for half-float operations we use the Gen11+ Align1 path. CHV
       * inherits its FP16 hardware from SKL, so it is not affected.
       */


> +      if (devinfo->gen >= 11 ||
> +          (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) {


The docs are bad about telling us whether a BDW restriction applies to
CHV as well, but in this case I suspect the answer is no. CHV seems to
inherit its FP16 hw from SKL, which doesn't have the restriction as
you say.

So I suspect you want devinfo->is_broadwell instead of devinfo->gen == 8.

With that (and confirmation that CHV isn't affected), this patch is

Reviewed-by: Matt Turner <mattst88@gmail.com>
On Tue, 2019-01-22 at 16:36 -0800, Matt Turner wrote:
> On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <itoral@igalia.com
> > wrote:
> > 
> > We use ALign16 mode for this, since it is more convenient, but the
> > PRM
> > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register
> > region
> > restrictions', Section '1. Special Restrictions':
> > 
> >    "In Align16 mode, the channel selects and channel enables apply
> > to a
> >     pair of half-floats, because these parameters are defined for
> > DWord
> >     elements ONLY. This is applicable when both source and
> > destination
> >     are half-floats."
> > 
> > This means that we cannot select individual HF elements using
> > swizzles
> > like we do with 32-bit floats so we can't implement the required
> > regioning for this.
> > 
> > Use the gen11 path for this instead, which uses Align1 mode.
> > 
> > The restriction is not present in gen9 or gen10, where the Align16
> > implementation seems to work just fine.
> > 
> > Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
> > ---
> >  src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_generator.cpp
> > b/src/intel/compiler/brw_fs_generator.cpp
> > index d0cc4a6d231..4310f0b7fdc 100644
> > --- a/src/intel/compiler/brw_fs_generator.cpp
> > +++ b/src/intel/compiler/brw_fs_generator.cpp
> > @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst
> > *inst,
> >     const uint32_t type_size = type_sz(src.type);
> > 
> >     if (inst->opcode == FS_OPCODE_DDY_FINE) {
> > -      /* produce accurate derivatives */
> > -      if (devinfo->gen >= 11) {
> > +      /* produce accurate derivatives. We can do this easily in
> > Align16
> > +       * but this is not supported in gen11+ and gen8 Align16
> > swizzles
> > +       * for Half-Float operands work in units of 32-bit and
> > always
> > +       * select pairs of consecutive half-float elements, so we
> > can't use
> > +       * use it for this.
> > +       */
> 
> Let's break this comment up and include (or move) the BSpec text from
> the commit message here. I wouldn't mention the "this is not
> supported
> in gen11+" because it's slightly unclear whether you're talking about
> "accurate derivatives" or "Align16". How about:
> 
> 
>       /* produce accurate derivatives.
>        *
>        * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU)
>        * "Register Region Restrictions", Section "1. Special
> Restrictions":
>        *
>        *    "In Align16 mode, the channel selects and channel enables
> apply to
>        *     a pair of half-floats, because these parameters are
> defined for
>        *     DWord elements ONLY. This is applicable when both source
> and
>        *     destination are half-floats."
>        *
>        * So for half-float operations we use the Gen11+ Align1 path.
> CHV
>        * inherits its FP16 hardware from SKL, so it is not affected.
>        */
> 
> 
> > +      if (devinfo->gen >= 11 ||
> > +          (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF))
> > {
> 
> 
> The docs are bad about telling us whether a BDW restriction applies
> to
> CHV as well, but in this case I suspect the answer is no. CHV seems
> to
> inherit its FP16 hw from SKL, which doesn't have the restriction as
> you say.
> 
> So I suspect you want devinfo->is_broadwell instead of devinfo->gen
> == 8.

I have just confirmed that your suspicion is correct, thanks for
bringing this up!

> With that (and confirmation that CHV isn't affected), this patch is
> 
> Reviewed-by: Matt Turner <mattst88@gmail.com>
>