drm/i915/guc: Don't make assumptions while getting the lrca offset

Submitted by Michel Thierry on July 11, 2017, 9:29 p.m.

Details

Message ID 20170711212939.12185-1-michel.thierry@intel.com
State New
Headers show
Series "drm/i915/guc: Don't make assumptions while getting the lrca offset" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Michel Thierry July 11, 2017, 9:29 p.m.
Using the HWSP ggtt_offset to get the lrca offset is only correct if the
HWSP happens to be before it (when we reuse the PPHWSP of the kernel
context as the engine HWSP). Instead of making this assumption, get the
lrca offset from the kernel_context engine state.

And while looking at this part of the GuC interaction, it was also
noticed that the firmware expects the size of only the engine context
(context minus the execlist part, i.e. don't include the first 80
dwords), so pass the right size.

Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 48a1e9349a2c..97ec11a7cc9a 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1018,6 +1018,12 @@  static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * The first 80 dwords of the register state context, containing the
+ * execlists and ppgtt registers.
+ */
+#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1033,6 +1039,7 @@  static int guc_ads_create(struct intel_guc *guc)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	u32 base;
+	u32 lrca_offset = LRC_PPHWSP_PN * PAGE_SIZE;
 
 	GEM_BUG_ON(guc->ads_vma);
 
@@ -1062,13 +1069,15 @@  static int guc_ads_create(struct intel_guc *guc)
 	 * engines after a reset. Here we use the Render ring default
 	 * context, which must already exist and be pinned in the GGTT,
 	 * so its address won't change after we've told the GuC where
-	 * to find it.
+	 * to find it. GuC will skip the PPHWSP and 'Execlist Context',
+	 * thus reduce the size by 1 page (PPHWSP) and 80 dwords.
 	 */
 	blob->ads.golden_context_lrca =
-		dev_priv->engine[RCS]->status_page.ggtt_offset;
+		guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + lrca_offset;
 
+	/* context_size - PPHWSP - first 80 dwords */
 	for_each_engine(engine, dev_priv, id)
-		blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
+		blob->ads.eng_state_size[engine->guc_id] = engine->context_size - lrca_offset - LR_HW_CONTEXT_SIZE;
 
 	base = guc_ggtt_offset(vma);
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);

Comments

Quoting Michel Thierry (2017-07-11 22:29:39)
> Using the HWSP ggtt_offset to get the lrca offset is only correct if the
> HWSP happens to be before it (when we reuse the PPHWSP of the kernel
> context as the engine HWSP). Instead of making this assumption, get the
> lrca offset from the kernel_context engine state.
> 
> And while looking at this part of the GuC interaction, it was also
> noticed that the firmware expects the size of only the engine context
> (context minus the execlist part, i.e. don't include the first 80
> dwords), so pass the right size.
> 
> Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Oscar Mateo <oscar.mateo@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 48a1e9349a2c..97ec11a7cc9a 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1018,6 +1018,12 @@ static void guc_policies_init(struct guc_policies *policies)
>         policies->is_valid = 1;
>  }
>  
> +/*
> + * The first 80 dwords of the register state context, containing the
> + * execlists and ppgtt registers.
> + */
> +#define LR_HW_CONTEXT_SIZE     (80 * sizeof(u32))
> +
>  static int guc_ads_create(struct intel_guc *guc)
>  {
>         struct drm_i915_private *dev_priv = guc_to_i915(guc);
> @@ -1033,6 +1039,7 @@ static int guc_ads_create(struct intel_guc *guc)
>         struct intel_engine_cs *engine;
>         enum intel_engine_id id;
>         u32 base;
> +       u32 lrca_offset = LRC_PPHWSP_PN * PAGE_SIZE;

Make it const and try to avoid the visual breaks in line length? (Whilst
also trying not to gratuitously mix types.)
>  
>         GEM_BUG_ON(guc->ads_vma);
>  
> @@ -1062,13 +1069,15 @@ static int guc_ads_create(struct intel_guc *guc)
>          * engines after a reset. Here we use the Render ring default
>          * context, which must already exist and be pinned in the GGTT,
>          * so its address won't change after we've told the GuC where
> -        * to find it.
> +        * to find it. GuC will skip the PPHWSP and 'Execlist Context',
> +        * thus reduce the size by 1 page (PPHWSP) and 80 dwords.
>          */
>         blob->ads.golden_context_lrca =
> -               dev_priv->engine[RCS]->status_page.ggtt_offset;
> +               guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + lrca_offset;
>  
> +       /* context_size - PPHWSP - first 80 dwords */
>         for_each_engine(engine, dev_priv, id)
> -               blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
> +               blob->ads.eng_state_size[engine->guc_id] = engine->context_size - lrca_offset - LR_HW_CONTEXT_SIZE;

The first LRC_PPHWSP_PN pages are not included in engine->context_size,
they are added on by execlist_context_deferred_alloc().

I'm finding it odd that we pass in a pointer to the start of the context
image, but that the guc wants to be told the size of the context minus
the bytes that *it* is choosing to skip at the start. (May be correct,
but looks weird and merits a warning.) The alternative explanation is
that it is not skipping the first 80 dwords, but the last.
-Chris
On 7/11/2017 3:37 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2017-07-11 22:29:39)
>> Using the HWSP ggtt_offset to get the lrca offset is only correct if the
>> HWSP happens to be before it (when we reuse the PPHWSP of the kernel
>> context as the engine HWSP). Instead of making this assumption, get the
>> lrca offset from the kernel_context engine state.
>>
>> And while looking at this part of the GuC interaction, it was also
>> noticed that the firmware expects the size of only the engine context
>> (context minus the execlist part, i.e. don't include the first 80
>> dwords), so pass the right size.
>>
>> Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Oscar Mateo <oscar.mateo@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_guc_submission.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 48a1e9349a2c..97ec11a7cc9a 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -1018,6 +1018,12 @@ static void guc_policies_init(struct guc_policies *policies)
>>          policies->is_valid = 1;
>>   }
>>   
>> +/*
>> + * The first 80 dwords of the register state context, containing the
>> + * execlists and ppgtt registers.
>> + */
>> +#define LR_HW_CONTEXT_SIZE     (80 * sizeof(u32))
>> +
>>   static int guc_ads_create(struct intel_guc *guc)
>>   {
>>          struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> @@ -1033,6 +1039,7 @@ static int guc_ads_create(struct intel_guc *guc)
>>          struct intel_engine_cs *engine;
>>          enum intel_engine_id id;
>>          u32 base;
>> +       u32 lrca_offset = LRC_PPHWSP_PN * PAGE_SIZE;
> 
> Make it const and try to avoid the visual breaks in line length? (Whilst
> also trying not to gratuitously mix types.)

ok

>>   
>>          GEM_BUG_ON(guc->ads_vma);
>>   
>> @@ -1062,13 +1069,15 @@ static int guc_ads_create(struct intel_guc *guc)
>>           * engines after a reset. Here we use the Render ring default
>>           * context, which must already exist and be pinned in the GGTT,
>>           * so its address won't change after we've told the GuC where
>> -        * to find it.
>> +        * to find it. GuC will skip the PPHWSP and 'Execlist Context',
>> +        * thus reduce the size by 1 page (PPHWSP) and 80 dwords.
>>           */
>>          blob->ads.golden_context_lrca =
>> -               dev_priv->engine[RCS]->status_page.ggtt_offset;
>> +               guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + lrca_offset;
>>   
>> +       /* context_size - PPHWSP - first 80 dwords */
>>          for_each_engine(engine, dev_priv, id)
>> -               blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
>> +               blob->ads.eng_state_size[engine->guc_id] = engine->context_size - lrca_offset - LR_HW_CONTEXT_SIZE;
> 
> The first LRC_PPHWSP_PN pages are not included in engine->context_size,
> they are added on by execlist_context_deferred_alloc().
> 

I think there is only one extra page added in 
execlists_context_deferred_alloc, and it is the "GuC shared data". And 
yes, we don't include that one in in the context_size.
(also __intel_engine_context_size has a comment saying the size includes 
the HWSP).

> I'm finding it odd that we pass in a pointer to the start of the context
> image, but that the guc wants to be told the size of the context minus
> the bytes that *it* is choosing to skip at the start. (May be correct,
> but looks weird and merits a warning.) The alternative explanation is
> that it is not skipping the first 80 dwords, but the last.

Yes, it's really odd style. Daniele & I looked at it, and guc takes the 
pointer and uses "size of the PPHWSP + the size of the execlist_context 
(those 80 dwords)"  as the starting point of the engine-state... that's 
why ads.eng_state_size needs to be decreased.

Anyway, since we always round up the ctx size to the next page, I
imagine we've been lucky and there are enough 0's at the end of the context.

> -Chris
> 

Thanks!
Quoting Michel Thierry (2017-07-12 00:01:04)
> On 7/11/2017 3:37 PM, Chris Wilson wrote:
> > Quoting Michel Thierry (2017-07-11 22:29:39)
> >> Using the HWSP ggtt_offset to get the lrca offset is only correct if the
> >> HWSP happens to be before it (when we reuse the PPHWSP of the kernel
> >> context as the engine HWSP). Instead of making this assumption, get the
> >> lrca offset from the kernel_context engine state.
> >>
> >> And while looking at this part of the GuC interaction, it was also
> >> noticed that the firmware expects the size of only the engine context
> >> (context minus the execlist part, i.e. don't include the first 80
> >> dwords), so pass the right size.
> >>
> >> Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Oscar Mateo <oscar.mateo@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_guc_submission.c | 15 ++++++++++++---
> >>   1 file changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> >> index 48a1e9349a2c..97ec11a7cc9a 100644
> >> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >> @@ -1018,6 +1018,12 @@ static void guc_policies_init(struct guc_policies *policies)
> >>          policies->is_valid = 1;
> >>   }
> >>   
> >> +/*
> >> + * The first 80 dwords of the register state context, containing the
> >> + * execlists and ppgtt registers.
> >> + */
> >> +#define LR_HW_CONTEXT_SIZE     (80 * sizeof(u32))
> >> +
> >>   static int guc_ads_create(struct intel_guc *guc)
> >>   {
> >>          struct drm_i915_private *dev_priv = guc_to_i915(guc);
> >> @@ -1033,6 +1039,7 @@ static int guc_ads_create(struct intel_guc *guc)
> >>          struct intel_engine_cs *engine;
> >>          enum intel_engine_id id;
> >>          u32 base;
> >> +       u32 lrca_offset = LRC_PPHWSP_PN * PAGE_SIZE;
> > 
> > Make it const and try to avoid the visual breaks in line length? (Whilst
> > also trying not to gratuitously mix types.)
> 
> ok
> 
> >>   
> >>          GEM_BUG_ON(guc->ads_vma);
> >>   
> >> @@ -1062,13 +1069,15 @@ static int guc_ads_create(struct intel_guc *guc)
> >>           * engines after a reset. Here we use the Render ring default
> >>           * context, which must already exist and be pinned in the GGTT,
> >>           * so its address won't change after we've told the GuC where
> >> -        * to find it.
> >> +        * to find it. GuC will skip the PPHWSP and 'Execlist Context',
> >> +        * thus reduce the size by 1 page (PPHWSP) and 80 dwords.
> >>           */
> >>          blob->ads.golden_context_lrca =
> >> -               dev_priv->engine[RCS]->status_page.ggtt_offset;
> >> +               guc_ggtt_offset(dev_priv->kernel_context->engine[RCS].state) + lrca_offset;
> >>   
> >> +       /* context_size - PPHWSP - first 80 dwords */
> >>          for_each_engine(engine, dev_priv, id)
> >> -               blob->ads.eng_state_size[engine->guc_id] = engine->context_size;
> >> +               blob->ads.eng_state_size[engine->guc_id] = engine->context_size - lrca_offset - LR_HW_CONTEXT_SIZE;
> > 
> > The first LRC_PPHWSP_PN pages are not included in engine->context_size,
> > they are added on by execlist_context_deferred_alloc().
> > 
> 
> I think there is only one extra page added in 
> execlists_context_deferred_alloc, and it is the "GuC shared data". And 
> yes, we don't include that one in in the context_size.
> (also __intel_engine_context_size has a comment saying the size includes 
> the HWSP).

Oh, I see what my confusion is. You are using lrca_offset for two very
different meanings just because they are the same value. (And also the
use of LRC_PPHWSP_PN for execlists_context_deferred_alloc() has the same
dubious semantics, which lead me to think that PPHWSP couldn't possibly
mean per-process hwsp anymore!) The first is an offset from the start of
the allocation to the HWSP, which is the correct meaning of LRC_PPHWSP_PN
(that is, its page index). In second case, you just mean PAGE_SIZE to
remove the single page being skipped. Please, I am easily confused!
That does strongly suggest the comment needs an actual sentence and not
just rehashing of the code:

/*
 * The GuC expects us to exclude the portion of the context image that
 * it skips from the size it is to read. It starts reading from after
 * the execlist context (so skipping the first page [PPHWSP] and 80
 * dwords). Weird guc is weird.
 */
for_each_engine(engine) {
	const u32 skipped = PAGE_SIZE + LR_HW_CONTEXT_SIZE;
	blob->ads.eng_state_size[engine->guc_id] = engine->context_size - skipped;
}

Also feel free to try and fixup the other confusion in intel_lrc.c.
Perhaps something like:
Quoting Chris Wilson (2017-07-12 01:00:02)
> Also feel free to try and fixup the other confusion in intel_lrc.c.
> Perhaps something like:
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b0738d2..f498aa6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2031,8 +2031,9 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>  
>         context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
>  
> -       /* One extra page as the sharing data between driver and GuC */
> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
> +       /* Add an extra page at the start for sharing data with the GuC */
> +       BUILD_BUG_ON(LRC_PPHWSP_PN != 1);
> +       context_size += PAGE_SIZE;
>  
>         ctx_obj = i915_gem_object_create(ctx->i915, context_size);
>         if (IS_ERR(ctx_obj)) {

Or better


Nowhere do we use LRC_GUCSHR_PN. Having it just gives us the mirage of
flexibility...
-Chris
On 7/11/2017 5:07 PM, Chris Wilson wrote:
> Quoting Chris Wilson (2017-07-12 01:00:02)
>> Also feel free to try and fixup the other confusion in intel_lrc.c.
>> Perhaps something like:
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index b0738d2..f498aa6 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2031,8 +2031,9 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>   
>>          context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
>>   
>> -       /* One extra page as the sharing data between driver and GuC */
>> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
>> +       /* Add an extra page at the start for sharing data with the GuC */
>> +       BUILD_BUG_ON(LRC_PPHWSP_PN != 1);
>> +       context_size += PAGE_SIZE;
>>   
>>          ctx_obj = i915_gem_object_create(ctx->i915, context_size);
>>          if (IS_ERR(ctx_obj)) {
> 
> Or better
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b0738d2..0b7fb38 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2031,8 +2031,8 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   
>          context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
>   
> -       /* One extra page as the sharing data between driver and GuC */
> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
> +       /* Add an extra page at the start for sharing data with the GuC */
> +       context_size += LRC_GUCSHR_PAGES * PAGE_SIZE;
>   
>          ctx_obj = i915_gem_object_create(ctx->i915, context_size);
>          if (IS_ERR(ctx_obj)) {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 52b3a1f..848ca21 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -72,7 +72,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine);
>   
>   /* One extra page is added before LRC for GuC as shared data */
>   #define LRC_GUCSHR_PN  (0)
> -#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + 1)
> +#define LRC_GUCSHR_PAGES 1
> +#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)
>   #define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
>   
>   struct drm_i915_private;
> 
> Nowhere do we use LRC_GUCSHR_PN. Having it just gives us the mirage of
> flexibility...

LRC_GUCSHR_PN is not used because people assumed it is always page 0 of 
the kernel ctx :S

One more thing to fix, use LRC_GUCSHR_PN in all the places where this 
_shared_ GuC page is used (guc_ggtt_offset(ctx->engine[RCS].state))

> -Chris
>
Quoting Michel Thierry (2017-07-12 01:14:46)
> On 7/11/2017 5:07 PM, Chris Wilson wrote:
> > Quoting Chris Wilson (2017-07-12 01:00:02)
> >> Also feel free to try and fixup the other confusion in intel_lrc.c.
> >> Perhaps something like:
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >> index b0738d2..f498aa6 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -2031,8 +2031,9 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> >>   
> >>          context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
> >>   
> >> -       /* One extra page as the sharing data between driver and GuC */
> >> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
> >> +       /* Add an extra page at the start for sharing data with the GuC */
> >> +       BUILD_BUG_ON(LRC_PPHWSP_PN != 1);
> >> +       context_size += PAGE_SIZE;
> >>   
> >>          ctx_obj = i915_gem_object_create(ctx->i915, context_size);
> >>          if (IS_ERR(ctx_obj)) {
> > 
> > Or better
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index b0738d2..0b7fb38 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -2031,8 +2031,8 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> >   
> >          context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
> >   
> > -       /* One extra page as the sharing data between driver and GuC */
> > -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
> > +       /* Add an extra page at the start for sharing data with the GuC */
> > +       context_size += LRC_GUCSHR_PAGES * PAGE_SIZE;

One more thought about this, and with a little coaxing LRC_PPHWSP_PN may
be the most appropriate constant to use here.

/* Before the actual start of the context image, we insert a few pages
 * for our own use and for sharing with the GuC.
 */
context_size += LRC_HEADER_PAGES * PAGE_SIZE;

> >          ctx_obj = i915_gem_object_create(ctx->i915, context_size);
> >          if (IS_ERR(ctx_obj)) {
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> > index 52b3a1f..848ca21 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.h
> > +++ b/drivers/gpu/drm/i915/intel_lrc.h
> > @@ -72,7 +72,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine);
> >   
> >   /* One extra page is added before LRC for GuC as shared data */
> >   #define LRC_GUCSHR_PN  (0)
> > -#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + 1)
> > +#define LRC_GUCSHR_PAGES 1
> > +#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)
> >   #define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)

but

/* We allocate a header at the start of the context image for our own
 * use, therefore the actual location of the logical state is offset
 * from the start of the VMA. The layout is
 *
 * | [guc]          | [hwsp] [logical state] |
 * |<- our header ->|<- context image      ->|
 *               
 */

/* One extra page is used for sharing with the GuC */
#define LRC_GUCSHR_PN  (0)
#define LRC_GUCSHR_PAGES 1

/* At the start of the context image is its per-process HWS page */
#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)

/* Finally we have the logical state for the context */
#define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)

/* Currently we include the PPHWSP in __intel_engine_context_size() so
 * the size of the header is synonymous with the start of the PPHWSP.
 */
#define LRC_HEADER_PAGES LRC_PPHWSP_PN

> >   
> >   struct drm_i915_private;
> > 
> > Nowhere do we use LRC_GUCSHR_PN. Having it just gives us the mirage of
> > flexibility...
> 
> LRC_GUCSHR_PN is not used because people assumed it is always page 0 of 
> the kernel ctx :S
> 
> One more thing to fix, use LRC_GUCSHR_PN in all the places where this 
> _shared_ GuC page is used (guc_ggtt_offset(ctx->engine[RCS].state))

That may prove very handy.
-Chris
On 7/11/2017 5:54 PM, Chris Wilson wrote:
> Quoting Michel Thierry (2017-07-12 01:14:46)
>> On 7/11/2017 5:07 PM, Chris Wilson wrote:
>>> Quoting Chris Wilson (2017-07-12 01:00:02)
>>>> Also feel free to try and fixup the other confusion in intel_lrc.c.
>>>> Perhaps something like:
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index b0738d2..f498aa6 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -2031,8 +2031,9 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>>>    
>>>>           context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
>>>>    
>>>> -       /* One extra page as the sharing data between driver and GuC */
>>>> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
>>>> +       /* Add an extra page at the start for sharing data with the GuC */
>>>> +       BUILD_BUG_ON(LRC_PPHWSP_PN != 1);
>>>> +       context_size += PAGE_SIZE;
>>>>    
>>>>           ctx_obj = i915_gem_object_create(ctx->i915, context_size);
>>>>           if (IS_ERR(ctx_obj)) {
>>>
>>> Or better
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index b0738d2..0b7fb38 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2031,8 +2031,8 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>>    
>>>           context_size = round_up(engine->context_size, I915_GTT_PAGE_SIZE);
>>>    
>>> -       /* One extra page as the sharing data between driver and GuC */
>>> -       context_size += PAGE_SIZE * LRC_PPHWSP_PN;
>>> +       /* Add an extra page at the start for sharing data with the GuC */
>>> +       context_size += LRC_GUCSHR_PAGES * PAGE_SIZE;
> 
> One more thought about this, and with a little coaxing LRC_PPHWSP_PN may
> be the most appropriate constant to use here.
> 
> /* Before the actual start of the context image, we insert a few pages
>   * for our own use and for sharing with the GuC.
>   */
> context_size += LRC_HEADER_PAGES * PAGE_SIZE;
> 
>>>           ctx_obj = i915_gem_object_create(ctx->i915, context_size);
>>>           if (IS_ERR(ctx_obj)) {
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>>> index 52b3a1f..848ca21 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>>> @@ -72,7 +72,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine);
>>>    
>>>    /* One extra page is added before LRC for GuC as shared data */
>>>    #define LRC_GUCSHR_PN  (0)
>>> -#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + 1)
>>> +#define LRC_GUCSHR_PAGES 1
>>> +#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)
>>>    #define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
> 
> but
> 
> /* We allocate a header at the start of the context image for our own
>   * use, therefore the actual location of the logical state is offset
>   * from the start of the VMA. The layout is
>   *
>   * | [guc]          | [hwsp] [logical state] |
>   * |<- our header ->|<- context image      ->|
>   *
>   */
> 
> /* One extra page is used for sharing with the GuC */
> #define LRC_GUCSHR_PN  (0)
> #define LRC_GUCSHR_PAGES 1
> 
> /* At the start of the context image is its per-process HWS page */
> #define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)
> 
> /* Finally we have the logical state for the context */
> #define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
> 
> /* Currently we include the PPHWSP in __intel_engine_context_size() so
>   * the size of the header is synonymous with the start of the PPHWSP.
>   */
> #define LRC_HEADER_PAGES LRC_PPHWSP_PN
> 

It cannot be clearer now.

The only extra thing I was thinking about was also defining the 'size' 
(length?) of the guc and pphwsp sections, in case those "+ 1" were not 
clear:


  #define LRC_GUCSHR_PN  (0)
-#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + 1)
-#define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
+#define LRC_GUCSHR_SZ  (1)
+#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_SZ)
+#define LRC_PPHWSP_SZ  (1)
+#define LRC_STATE_PN   (LRC_PPHWSP_PN + LRC_PPHWSP_SZ)

>>>    
>>>    struct drm_i915_private;
>>>
>>> Nowhere do we use LRC_GUCSHR_PN. Having it just gives us the mirage of
>>> flexibility...
>>
>> LRC_GUCSHR_PN is not used because people assumed it is always page 0 of
>> the kernel ctx :S
>>
>> One more thing to fix, use LRC_GUCSHR_PN in all the places where this
>> _shared_ GuC page is used (guc_ggtt_offset(ctx->engine[RCS].state))
> 
> That may prove very handy.
> -Chris
>
Quoting Michel Thierry (2017-07-12 02:01:29)
> On 7/11/2017 5:54 PM, Chris Wilson wrote:
> > /* We allocate a header at the start of the context image for our own
> >   * use, therefore the actual location of the logical state is offset
> >   * from the start of the VMA. The layout is
> >   *
> >   * | [guc]          | [hwsp] [logical state] |
> >   * |<- our header ->|<- context image      ->|
> >   *
> >   */
> > 
> > /* One extra page is used for sharing with the GuC */
> > #define LRC_GUCSHR_PN  (0)
> > #define LRC_GUCSHR_PAGES 1
> > 
> > /* At the start of the context image is its per-process HWS page */
> > #define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_PAGES)
> > 
> > /* Finally we have the logical state for the context */
> > #define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
> > 
> > /* Currently we include the PPHWSP in __intel_engine_context_size() so
> >   * the size of the header is synonymous with the start of the PPHWSP.
> >   */
> > #define LRC_HEADER_PAGES LRC_PPHWSP_PN
> > 
> 
> It cannot be clearer now.
> 
> The only extra thing I was thinking about was also defining the 'size' 
> (length?) of the guc and pphwsp sections, in case those "+ 1" were not 
> clear:
> 
> 
>   #define LRC_GUCSHR_PN  (0)
> -#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + 1)
> -#define LRC_STATE_PN   (LRC_PPHWSP_PN + 1)
> +#define LRC_GUCSHR_SZ  (1)
> +#define LRC_PPHWSP_PN  (LRC_GUCSHR_PN + LRC_GUCSHR_SZ)
> +#define LRC_PPHWSP_SZ  (1)
> +#define LRC_STATE_PN   (LRC_PPHWSP_PN + LRC_PPHWSP_SZ)

Yes, I erred on the side of laziness thinking that this is common enough
that there should be a generator somewhere, but was to lazy to
find/write one.
-Chris