[Mesa-dev,36/95] i965/vec4: add a helper function to create double immediates

Submitted by Iago Toral Quiroga on July 19, 2016, 10:40 a.m.

Details

Message ID 1468924892-6910-37-git-send-email-itoral@igalia.com
State New
Headers show
Series "i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0" ( rev: 2 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga July 19, 2016, 10:40 a.m.
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
---
 src/mesa/drivers/dri/i965/brw_vec4.h       |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 ++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
index 4650ae0..cf7cdab 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -320,6 +320,8 @@  public:
    void emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
                               brw_reg_type single_type);
 
+   src_reg setup_imm_df(double v);
+
    virtual void emit_nir_code();
    virtual void nir_setup_uniforms();
    virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index d7c6bf4..8a45fde 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1063,6 +1063,46 @@  vec4_visitor::emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
    inst->saturate = saturate;
 }
 
+src_reg
+vec4_visitor::setup_imm_df(double v)
+{
+   assert(devinfo->gen >= 7);
+
+   if (devinfo->gen >= 8)
+      return brw_imm_df(v);
+
+   /* gen7 does not support DF immediates */
+   union {
+      double d;
+      struct {
+         uint32_t i1;
+         uint32_t i2;
+      };
+   } di;
+
+   di.d = v;
+
+   /* Write the low 32-bit of the constant to the X:UD channel and the
+    * high 32-bit to the Y:UD channel to build the constant in a VGRF.
+    * We have to do this twice (offset 0 and offset 1), since a DF VGRF takes
+    * two SIMD8 registers in SIMD4x2 execution. Finally, return a swizzle
+    * XXXX so any access to the VGRF only reads the constant data in these
+    * channels.
+    */
+   dst_reg tmp = dst_reg(VGRF, alloc.allocate(2));
+   tmp.type = BRW_REGISTER_TYPE_UD;
+   for (int n = 0; n < 2; n++) {
+      tmp.writemask = WRITEMASK_X;
+      emit(MOV(offset(tmp, n), brw_imm_ud(di.i1)))->force_writemask_all = true;
+      tmp.writemask = WRITEMASK_Y;
+      emit(MOV(offset(tmp, n), brw_imm_ud(di.i2)))->force_writemask_all = true;
+   }
+
+   src_reg tmp_as_src = src_reg(retype(tmp, BRW_REGISTER_TYPE_DF));
+   tmp_as_src.swizzle = BRW_SWIZZLE_XXXX;
+   return tmp_as_src;
+}
+
 void
 vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 {

Comments

Iago Toral Quiroga <itoral@igalia.com> writes:

> Gen7 hardware does not support double immediates so these need
> to be moved in 32-bit chunks to a regular vgrf instead. Instead
> of doing this every time we need to create a DF immediate,
> create a helper function that does the right thing depending
> on the hardware generation.
>
> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.h       |  2 ++
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 ++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 4650ae0..cf7cdab 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -320,6 +320,8 @@ public:
>     void emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
>                                brw_reg_type single_type);
>  
> +   src_reg setup_imm_df(double v);
> +
>     virtual void emit_nir_code();
>     virtual void nir_setup_uniforms();
>     virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index d7c6bf4..8a45fde 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -1063,6 +1063,46 @@ vec4_visitor::emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
>     inst->saturate = saturate;
>  }
>  
> +src_reg
> +vec4_visitor::setup_imm_df(double v)
> +{
> +   assert(devinfo->gen >= 7);
> +
> +   if (devinfo->gen >= 8)
> +      return brw_imm_df(v);
> +
> +   /* gen7 does not support DF immediates */
> +   union {
> +      double d;
> +      struct {
> +         uint32_t i1;
> +         uint32_t i2;
> +      };
> +   } di;
> +
> +   di.d = v;
> +
> +   /* Write the low 32-bit of the constant to the X:UD channel and the
> +    * high 32-bit to the Y:UD channel to build the constant in a VGRF.
> +    * We have to do this twice (offset 0 and offset 1), since a DF VGRF takes
> +    * two SIMD8 registers in SIMD4x2 execution. Finally, return a swizzle
> +    * XXXX so any access to the VGRF only reads the constant data in these
> +    * channels.
> +    */
> +   dst_reg tmp = dst_reg(VGRF, alloc.allocate(2));
> +   tmp.type = BRW_REGISTER_TYPE_UD;
> +   for (int n = 0; n < 2; n++) {
> +      tmp.writemask = WRITEMASK_X;

If you used the writemask helper instead the tmp register could be
declared const.

> +      emit(MOV(offset(tmp, n), brw_imm_ud(di.i1)))->force_writemask_all = true;
> +      tmp.writemask = WRITEMASK_Y;
> +      emit(MOV(offset(tmp, n), brw_imm_ud(di.i2)))->force_writemask_all = true;

Because you're adding support for arbitrary exec sizes to the VEC4
back-end, wouldn't it make sense to use SIMD16 instructions here instead
of the loop?  (At least on HSW, on IVB it would end up being lowered
into four uncompressed instructions anyway)

> +   }
> +
> +   src_reg tmp_as_src = src_reg(retype(tmp, BRW_REGISTER_TYPE_DF));
> +   tmp_as_src.swizzle = BRW_SWIZZLE_XXXX;
> +   return tmp_as_src;

You could use the swizzle() helper instead of the last three lines:

| return swizzle(retype(tmp, BRW_REGISTER_TYPE_DF), BRW_SWIZZLE_XXX));

> +}
> +
>  void
>  vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>  {
> -- 
> 2.7.4
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Wed, 2016-08-03 at 13:28 -0700, Francisco Jerez wrote:
> Iago Toral Quiroga <itoral@igalia.com> writes:
> 
> > 
> > Gen7 hardware does not support double immediates so these need
> > to be moved in 32-bit chunks to a regular vgrf instead. Instead
> > of doing this every time we need to create a DF immediate,
> > create a helper function that does the right thing depending
> > on the hardware generation.
> > 
> > Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
> > ---
> >  src/mesa/drivers/dri/i965/brw_vec4.h       |  2 ++
> >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40
> > ++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h
> > b/src/mesa/drivers/dri/i965/brw_vec4.h
> > index 4650ae0..cf7cdab 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> > @@ -320,6 +320,8 @@ public:
> >     void emit_single_to_double(dst_reg dst, src_reg src, bool
> > saturate,
> >                                brw_reg_type single_type);
> >  
> > +   src_reg setup_imm_df(double v);
> > +
> >     virtual void emit_nir_code();
> >     virtual void nir_setup_uniforms();
> >     virtual void
> > nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > index d7c6bf4..8a45fde 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > @@ -1063,6 +1063,46 @@ vec4_visitor::emit_single_to_double(dst_reg
> > dst, src_reg src, bool saturate,
> >     inst->saturate = saturate;
> >  }
> >  
> > +src_reg
> > +vec4_visitor::setup_imm_df(double v)
> > +{
> > +   assert(devinfo->gen >= 7);
> > +
> > +   if (devinfo->gen >= 8)
> > +      return brw_imm_df(v);
> > +
> > +   /* gen7 does not support DF immediates */
> > +   union {
> > +      double d;
> > +      struct {
> > +         uint32_t i1;
> > +         uint32_t i2;
> > +      };
> > +   } di;
> > +
> > +   di.d = v;
> > +
> > +   /* Write the low 32-bit of the constant to the X:UD channel and
> > the
> > +    * high 32-bit to the Y:UD channel to build the constant in a
> > VGRF.
> > +    * We have to do this twice (offset 0 and offset 1), since a DF
> > VGRF takes
> > +    * two SIMD8 registers in SIMD4x2 execution. Finally, return a
> > swizzle
> > +    * XXXX so any access to the VGRF only reads the constant data
> > in these
> > +    * channels.
> > +    */
> > +   dst_reg tmp = dst_reg(VGRF, alloc.allocate(2));
> > +   tmp.type = BRW_REGISTER_TYPE_UD;
> > +   for (int n = 0; n < 2; n++) {
> > +      tmp.writemask = WRITEMASK_X;
> If you used the writemask helper instead the tmp register could be
> declared const.

Sure.

> > 
> > +      emit(MOV(offset(tmp, n), brw_imm_ud(di.i1)))-
> > >force_writemask_all = true;
> > +      tmp.writemask = WRITEMASK_Y;
> > +      emit(MOV(offset(tmp, n), brw_imm_ud(di.i2)))-
> > >force_writemask_all = true;
> Because you're adding support for arbitrary exec sizes to the VEC4
> back-end, wouldn't it make sense to use SIMD16 instructions here
> instead
> of the loop?  (At least on HSW, on IVB it would end up being lowered
> into four uncompressed instructions anyway)

Actually, the current implementation in this series was only using 8-
wide and 4-wide instructions and there are probably a few places where
I built on that assumption. I did not think that 16-wide was going to
be a common thing in SIMD4x2 operation so I decided to keep it simple.

That said, and seeing how you also commented on making the simd
lowering pass more flexible to support 16-wide, I'll give it a go.

> > 
> > +   }
> > +
> > +   src_reg tmp_as_src = src_reg(retype(tmp,
> > BRW_REGISTER_TYPE_DF));
> > +   tmp_as_src.swizzle = BRW_SWIZZLE_XXXX;
> > +   return tmp_as_src;
> You could use the swizzle() helper instead of the last three lines:

Yes, I could not do it before because of the writemask assignments to
tmp but with the changes you suggested above it should be fine. Thanks!

> > 
> > return swizzle(retype(tmp, BRW_REGISTER_TYPE_DF),
> > BRW_SWIZZLE_XXX));
> > 
> > +}
> > +
> >  void
> >  vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
> >  {