[Mesa-dev,v2] i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data

Submitted by Jordan Justen on Dec. 16, 2015, 10:48 p.m.

Details

Message ID 145030611094.25955.1216487711015721183@jljusten-ivb
State Accepted
Commit 8326eb13f2b13390a75bc0af2635d7d0385001e8
Headers show
Series "i965/gen8/cs: Gen8 requires 64 byte alignment for push constant data" ( rev: 2 ) in Mesa

Not browsing as part of any series.

Commit Message

Jordan Justen Dec. 16, 2015, 10:48 p.m.
On 2015-12-16 11:39:00, Kenneth Graunke wrote:
> On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote:
> > The BDW PRM Vol2a: Command Reference: Instructions, section MEDIA_CURBE_LOAD,
> > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
> > 64-byte aligned. This is different from previous gens, that were 32-byte
> > aligned.
> > 
> > v2 (Jordan):
> >   - CURBE Data Start Address is also 64-byte aligned.
> >   - The call to brw_state_batch should also use 64-byte alignment.
> >   - Improve PRM reference.
> > 
> > Fixes the following SSBO CTS tests on BDW:
> > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
> > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
> > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
> > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
> > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs
> > 
> > And many other CS CTS tests as reported by Marta Lofstedt.
> > ---
> >  src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > index 1fde69c..df0f301 100644
> > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw)
> >  
> >     uint32_t *bind = (uint32_t*) brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> >                                              prog_data->binding_table.size_bytes,
> > -                                            32, &stage_state->bind_bo_offset);
> > +                                            64, &stage_state->bind_bo_offset);
> 
> I don't understand this hunk - binding tables don't have anything to do
> with push constants.  These are for pull constants and UBOs.  At least
> in the 3D pipeline, we only align these to 32B, not 64.

Yeah. I think he wants to update the call you pointed out below in
brw_upload_cs_push_constants.

Also, how about consistently applying the alignment change? Either,
just bump the base and size alignment to 64, or also check the gen to
align the base to 32 on gen7.

How about the attached patch?

-Jordan

> >     unsigned local_id_dwords = 0;
> >  
> > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw)
> >  
> >     unsigned push_constant_data_size =
> >        (prog_data->nr_params + local_id_dwords) * sizeof(gl_constant_value);
> > -   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
> > +   unsigned reg_aligned_constant_size =
> > +      ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> >     unsigned push_constant_regs = reg_aligned_constant_size / 32;
> >     unsigned threads = get_cs_thread_count(cs_prog_data);
> >  
> > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw)
> >     ADVANCE_BATCH();
> >  
> >     if (reg_aligned_constant_size > 0) {
> > +      const unsigned aligned_push_const_offset =
> > +         ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64);
> 
> This is wrong.  What you want is to change:
> 
>       param = (gl_constant_value*)
>          brw_state_batch(brw, type,
>                          reg_aligned_constant_size * threads,
>                          32, &stage_state->push_const_offset);
> 
> to use an alignment of 64 instead of 32 on Gen8+.  That way, it'll
> actually upload the data to a portion of the buffer that starts on
> a 64B aligned boundary.
> 
> As is, you're uploading the data to a 32B aligned section and then
> just fudging the pointer to be 64B aligned, possibly skipping over
> the first 32B.  Probably not what you wanted :)
> 
> Maybe you accidentally changed the wrong brw_state_batch call?
> 
> >        BEGIN_BATCH(4);
> >        OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
> >        OUT_BATCH(0);
> >        OUT_BATCH(reg_aligned_constant_size * threads);
> > -      OUT_BATCH(stage_state->push_const_offset);
> > +      OUT_BATCH(aligned_push_const_offset);
> >        ADVANCE_BATCH();
> >     }
> >  
> > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw,
> >  
> >        const unsigned push_constant_data_size =
> >           (local_id_dwords + prog_data->nr_params) * sizeof(gl_constant_value);
> > -      const unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
> > +      const unsigned reg_aligned_constant_size =
> > +         ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> >        const unsigned param_aligned_count =
> >           reg_aligned_constant_size / sizeof(*param);
> >  
> > 
> 
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c
index 1fde69c..a025bb9 100644
--- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
@@ -141,7 +141,7 @@  brw_upload_cs_state(struct brw_context *brw)
       BEGIN_BATCH(4);
       OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
       OUT_BATCH(0);
-      OUT_BATCH(reg_aligned_constant_size * threads);
+      OUT_BATCH(ALIGN(reg_aligned_constant_size * threads, 64));
       OUT_BATCH(stage_state->push_const_offset);
       ADVANCE_BATCH();
    }
@@ -249,8 +249,8 @@  brw_upload_cs_push_constants(struct brw_context *brw,
 
       param = (gl_constant_value*)
          brw_state_batch(brw, type,
-                         reg_aligned_constant_size * threads,
-                         32, &stage_state->push_const_offset);
+                         ALIGN(reg_aligned_constant_size * threads, 64),
+                         64, &stage_state->push_const_offset);
       assert(param);
 
       STATIC_ASSERT(sizeof(gl_constant_value) == sizeof(float));

Comments

On Wed, 2015-12-16 at 14:48 -0800, Jordan Justen wrote:
> On 2015-12-16 11:39:00, Kenneth Graunke wrote:
> > On Wednesday, December 16, 2015 10:02:16 AM Iago Toral Quiroga wrote:
> > > The BDW PRM Vol2a: Command Reference: Instructions, section MEDIA_CURBE_LOAD,
> > > says that 'CURBE Total Data Length' and 'CURBE Data Start Address' are
> > > 64-byte aligned. This is different from previous gens, that were 32-byte
> > > aligned.
> > > 
> > > v2 (Jordan):
> > >   - CURBE Data Start Address is also 64-byte aligned.
> > >   - The call to brw_state_batch should also use 64-byte alignment.
> > >   - Improve PRM reference.
> > > 
> > > Fixes the following SSBO CTS tests on BDW:
> > > ES31-CTS.shader_storage_buffer_object.basic-atomic-case1-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-operations-case1-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-operations-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.basic-stdLayout_UBO_SSBO-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-write-fragment-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-indirectAddressing-case2-cs
> > > ES31-CTS.shader_storage_buffer_object.advanced-matrix-cs
> > > 
> > > And many other CS CTS tests as reported by Marta Lofstedt.
> > > ---
> > >  src/mesa/drivers/dri/i965/gen7_cs_state.c | 12 ++++++++----
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > index 1fde69c..df0f301 100644
> > > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > @@ -68,7 +68,7 @@ brw_upload_cs_state(struct brw_context *brw)
> > >  
> > >     uint32_t *bind = (uint32_t*) brw_state_batch(brw, AUB_TRACE_BINDING_TABLE,
> > >                                              prog_data->binding_table.size_bytes,
> > > -                                            32, &stage_state->bind_bo_offset);
> > > +                                            64, &stage_state->bind_bo_offset);
> > 
> > I don't understand this hunk - binding tables don't have anything to do
> > with push constants.  These are for pull constants and UBOs.  At least
> > in the 3D pipeline, we only align these to 32B, not 64.
> 
> Yeah. I think he wants to update the call you pointed out below in
> brw_upload_cs_push_constants.
> 
> Also, how about consistently applying the alignment change? Either,
> just bump the base and size alignment to 64, or also check the gen to
> align the base to 32 on gen7.
> 
> How about the attached patch?

Yeah, that looks simpler. The patch is:

Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Thanks Jordan!

> -Jordan
> 
> > >     unsigned local_id_dwords = 0;
> > >  
> > > @@ -77,7 +77,8 @@ brw_upload_cs_state(struct brw_context *brw)
> > >  
> > >     unsigned push_constant_data_size =
> > >        (prog_data->nr_params + local_id_dwords) * sizeof(gl_constant_value);
> > > -   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
> > > +   unsigned reg_aligned_constant_size =
> > > +      ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> > >     unsigned push_constant_regs = reg_aligned_constant_size / 32;
> > >     unsigned threads = get_cs_thread_count(cs_prog_data);
> > >  
> > > @@ -138,11 +139,13 @@ brw_upload_cs_state(struct brw_context *brw)
> > >     ADVANCE_BATCH();
> > >  
> > >     if (reg_aligned_constant_size > 0) {
> > > +      const unsigned aligned_push_const_offset =
> > > +         ALIGN(stage_state->push_const_offset, brw->gen < 8 ? 32 : 64);
> > 
> > This is wrong.  What you want is to change:
> > 
> >       param = (gl_constant_value*)
> >          brw_state_batch(brw, type,
> >                          reg_aligned_constant_size * threads,
> >                          32, &stage_state->push_const_offset);
> > 
> > to use an alignment of 64 instead of 32 on Gen8+.  That way, it'll
> > actually upload the data to a portion of the buffer that starts on
> > a 64B aligned boundary.
> > 
> > As is, you're uploading the data to a 32B aligned section and then
> > just fudging the pointer to be 64B aligned, possibly skipping over
> > the first 32B.  Probably not what you wanted :)
> > 
> > Maybe you accidentally changed the wrong brw_state_batch call?
> > 
> > >        BEGIN_BATCH(4);
> > >        OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
> > >        OUT_BATCH(0);
> > >        OUT_BATCH(reg_aligned_constant_size * threads);
> > > -      OUT_BATCH(stage_state->push_const_offset);
> > > +      OUT_BATCH(aligned_push_const_offset);
> > >        ADVANCE_BATCH();
> > >     }
> > >  
> > > @@ -241,7 +244,8 @@ brw_upload_cs_push_constants(struct brw_context *brw,
> > >  
> > >        const unsigned push_constant_data_size =
> > >           (local_id_dwords + prog_data->nr_params) * sizeof(gl_constant_value);
> > > -      const unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
> > > +      const unsigned reg_aligned_constant_size =
> > > +         ALIGN(push_constant_data_size, brw->gen < 8 ? 32 : 64);
> > >        const unsigned param_aligned_count =
> > >           reg_aligned_constant_size / sizeof(*param);
> > >  
> > > 
> > 
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev