[v5,12/18] drm/i915/guc: Provide register list to be saved/restored during engine reset

Submitted by Michel Thierry on March 25, 2017, 1:30 a.m.

Details

Message ID 20170325013010.36244-13-michel.thierry@intel.com
State New
Headers show
Series "Gen8+ engine-reset" ( rev: 1 ) in Intel GFX

Browsing this patch as part of:
"Gen8+ engine-reset" rev 1 in Intel GFX
<< prev patch [12/18] next patch >>

Commit Message

Michel Thierry March 25, 2017, 1:30 a.m.
From: Arun Siluvery <arun.siluvery@linux.intel.com>

GuC expects a list of registers from the driver which are saved/restored
during engine reset. The type of value to be saved is controlled by
flags. We provide a minimal set of registers that we want GuC to save and
restore. This is not an issue in case of engine reset as driver initializes
most of them following an engine reset, but in case of media reset (aka
watchdog reset) which is completely internal to GuC (including resubmission
of hung workload), it is necessary to provide this list, otherwise GuC won't
be able to schedule further workloads after a reset. This is the minimal
set of registers identified for things to work as expected but if we see
any new issues, this register list can be expanded.

Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_guc_submission.c | 50 +++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 991e76e10f82..2445af96aa71 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -1015,6 +1015,22 @@  static void guc_policies_init(struct guc_policies *policies)
 	policies->is_valid = 1;
 }
 
+/*
+ * In this macro it is highly unlikely to exceed max value but even if we did
+ * it is not an error so just throw a warning and continue. Only side effect
+ * in continuing further means some registers won't be added to save/restore
+ * list.
+ */
+#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags)			\
+	do {								\
+		u32 __count = node->number_of_registers;		\
+		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
+			continue;					\
+		node->registers[__count].offset = reg_addr.reg;		\
+		node->registers[__count].flags = (_flags);		\
+		node->number_of_registers++;				\
+	} while (0)
+
 static int guc_ads_create(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
@@ -1047,10 +1063,42 @@  static int guc_ads_create(struct intel_guc *guc)
 
 	/* MMIO reg state */
 	for_each_engine(engine, dev_priv, id) {
+		u32 flags;
+		struct guc_mmio_regset *eng_reg =
+			&blob->reg_state.engine_reg[engine->guc_id];
+
+		/*
+		 * Provide a list of registers to be saved/restored during gpu
+		 * reset. This is mainly required for Media reset (aka watchdog
+		 * timeout) which is completely under the control of GuC
+		 * (resubmission of hung workload is handled inside GuC).
+		 */
+		flags = GUC_REGSET_POWERCYCLE |
+			GUC_REGSET_ENGINERESET |
+			GUC_REGSET_SAVE_CURRENT_VALUE;
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HEAD(engine->mmio_base),
+				     flags);
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_TAIL(engine->mmio_base),
+				     flags);
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
+				     flags);
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
+				     (flags | GUC_REGSET_MASKED));
+
+		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
+				     flags);
+
+		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
+				 engine->name, eng_reg->number_of_registers);
+
+		 /* Nothing to be white listed for now. */
 		blob->reg_state.white_list[engine->guc_id].mmio_start =
 			engine->mmio_base + GUC_MMIO_WHITE_LIST_START;
 
-		/* Nothing to be saved or restored for now. */
 		blob->reg_state.white_list[engine->guc_id].count = 0;
 	}
 

Comments

On 24/03/17 18:30, Michel Thierry wrote:
> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>
> GuC expects a list of registers from the driver which are saved/restored
> during engine reset. The type of value to be saved is controlled by
> flags. We provide a minimal set of registers that we want GuC to save and
> restore. This is not an issue in case of engine reset as driver initializes
> most of them following an engine reset, but in case of media reset (aka
> watchdog reset) which is completely internal to GuC (including resubmission
> of hung workload), it is necessary to provide this list, otherwise GuC won't
> be able to schedule further workloads after a reset. This is the minimal
> set of registers identified for things to work as expected but if we see
> any new issues, this register list can be expanded.
>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_guc_submission.c | 50 +++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 991e76e10f82..2445af96aa71 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1015,6 +1015,22 @@ static void guc_policies_init(struct guc_policies *policies)
>  	policies->is_valid = 1;
>  }
>
> +/*
> + * In this macro it is highly unlikely to exceed max value but even if we did
> + * it is not an error so just throw a warning and continue. Only side effect
> + * in continuing further means some registers won't be added to save/restore
> + * list.
> + */
> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags)			\
> +	do {								\
> +		u32 __count = node->number_of_registers;		\
> +		if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))	\
> +			continue;					\
> +		node->registers[__count].offset = reg_addr.reg;		\
> +		node->registers[__count].flags = (_flags);		\
> +		node->number_of_registers++;				\
> +	} while (0)
> +
>  static int guc_ads_create(struct intel_guc *guc)
>  {
>  	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> @@ -1047,10 +1063,42 @@ static int guc_ads_create(struct intel_guc *guc)
>
>  	/* MMIO reg state */
>  	for_each_engine(engine, dev_priv, id) {
> +		u32 flags;
> +		struct guc_mmio_regset *eng_reg =
> +			&blob->reg_state.engine_reg[engine->guc_id];
> +
> +		/*
> +		 * Provide a list of registers to be saved/restored during gpu
> +		 * reset. This is mainly required for Media reset (aka watchdog
> +		 * timeout) which is completely under the control of GuC
> +		 * (resubmission of hung workload is handled inside GuC).
> +		 */
> +		flags = GUC_REGSET_POWERCYCLE |

This flag doesn't really do anything inside GuC, I guess it is probably 
a remnant of some functionality that has been removed.

> +			GUC_REGSET_ENGINERESET |
> +			GUC_REGSET_SAVE_CURRENT_VALUE;
> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HEAD(engine->mmio_base),
> +				     flags);
> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_TAIL(engine->mmio_base),
> +				     flags);

Aren't head & tail context save/restored? there should be no need to 
have GuC restore them. Also, won't restoring them manually generate 
activity on the engine?

> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
> +				     flags);
> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
> +				     (flags | GUC_REGSET_MASKED));

GUC_REGSET_MASKED doesn't do anything either (surprise!). Since this is 
actually something that is required, I've chatted with the GuC guys and 
they said they're going to look into re-adding proper functionality. In 
the meantime the suggestion is to use GUC_REGSET_SAVE_DEFAULT_VALUE, in 
which case the guc after the reset will write whatever we set in 
node->registers[__count].value and we can put an already masked value in 
there.

Thanks,
Daniele

> +
> +		GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
> +				     flags);
> +
> +		DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
> +				 engine->name, eng_reg->number_of_registers);
> +
> +		 /* Nothing to be white listed for now. */
>  		blob->reg_state.white_list[engine->guc_id].mmio_start =
>  			engine->mmio_base + GUC_MMIO_WHITE_LIST_START;
>
> -		/* Nothing to be saved or restored for now. */
>  		blob->reg_state.white_list[engine->guc_id].count = 0;
>  	}
>
>
On 13/04/17 17:16, Daniele Ceraolo Spurio wrote:
>
>
> On 24/03/17 18:30, Michel Thierry wrote:
>> From: Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> GuC expects a list of registers from the driver which are saved/restored
>> during engine reset. The type of value to be saved is controlled by
>> flags. We provide a minimal set of registers that we want GuC to save and
>> restore. This is not an issue in case of engine reset as driver
>> initializes
>> most of them following an engine reset, but in case of media reset (aka
>> watchdog reset) which is completely internal to GuC (including
>> resubmission
>> of hung workload), it is necessary to provide this list, otherwise GuC
>> won't
>> be able to schedule further workloads after a reset. This is the minimal
>> set of registers identified for things to work as expected but if we see
>> any new issues, this register list can be expanded.
>>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> ---
>>  drivers/gpu/drm/i915/i915_guc_submission.c | 50
>> +++++++++++++++++++++++++++++-
>>  1 file changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>> index 991e76e10f82..2445af96aa71 100644
>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>> @@ -1015,6 +1015,22 @@ static void guc_policies_init(struct
>> guc_policies *policies)
>>      policies->is_valid = 1;
>>  }
>>
>> +/*
>> + * In this macro it is highly unlikely to exceed max value but even
>> if we did
>> + * it is not an error so just throw a warning and continue. Only side
>> effect
>> + * in continuing further means some registers won't be added to
>> save/restore
>> + * list.
>> + */
>> +#define GUC_ADD_MMIO_REG_ADS(node, reg_addr, _flags)            \
>> +    do {                                \
>> +        u32 __count = node->number_of_registers;        \
>> +        if (WARN_ON(__count >= GUC_REGSET_MAX_REGISTERS))    \
>> +            continue;                    \
>> +        node->registers[__count].offset = reg_addr.reg;        \
>> +        node->registers[__count].flags = (_flags);        \
>> +        node->number_of_registers++;                \
>> +    } while (0)
>> +
>>  static int guc_ads_create(struct intel_guc *guc)
>>  {
>>      struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> @@ -1047,10 +1063,42 @@ static int guc_ads_create(struct intel_guc *guc)
>>
>>      /* MMIO reg state */
>>      for_each_engine(engine, dev_priv, id) {
>> +        u32 flags;
>> +        struct guc_mmio_regset *eng_reg =
>> +            &blob->reg_state.engine_reg[engine->guc_id];
>> +
>> +        /*
>> +         * Provide a list of registers to be saved/restored during gpu
>> +         * reset. This is mainly required for Media reset (aka watchdog
>> +         * timeout) which is completely under the control of GuC
>> +         * (resubmission of hung workload is handled inside GuC).
>> +         */
>> +        flags = GUC_REGSET_POWERCYCLE |
>
> This flag doesn't really do anything inside GuC, I guess it is probably
> a remnant of some functionality that has been removed.
>
>> +            GUC_REGSET_ENGINERESET |
>> +            GUC_REGSET_SAVE_CURRENT_VALUE;
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HEAD(engine->mmio_base),
>> +                     flags);
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_TAIL(engine->mmio_base),
>> +                     flags);
>
> Aren't head & tail context save/restored? there should be no need to
> have GuC restore them. Also, won't restoring them manually generate
> activity on the engine?
>
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_HWS_PGA(engine->mmio_base),
>> +                     flags);
>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_MODE_GEN7(engine),
>> +                     (flags | GUC_REGSET_MASKED));
>
> GUC_REGSET_MASKED doesn't do anything either (surprise!). Since this is

s/doesn't do anything/guc is full of bugs/  :smirk:

> actually something that is required, I've chatted with the GuC guys and
> they said they're going to look into re-adding proper functionality. In
> the meantime the suggestion is to use GUC_REGSET_SAVE_DEFAULT_VALUE, in
> which case the guc after the reset will write whatever we set in
> node->registers[__count].value and we can put an already masked value in
> there.
>
> Thanks,
> Daniele
>

Thanks, I'll add it to the changes in the next version.

>> +
>> +        GUC_ADD_MMIO_REG_ADS(eng_reg, RING_IMR(engine->mmio_base),
>> +                     flags);
>> +
>> +        DRM_DEBUG_DRIVER("%s register save/restore count: %u\n",
>> +                 engine->name, eng_reg->number_of_registers);
>> +
>> +         /* Nothing to be white listed for now. */
>>          blob->reg_state.white_list[engine->guc_id].mmio_start =
>>              engine->mmio_base + GUC_MMIO_WHITE_LIST_START;
>>
>> -        /* Nothing to be saved or restored for now. */
>>          blob->reg_state.white_list[engine->guc_id].count = 0;
>>      }
>>
>>