drm/i915: Restore inhibiting the load of the default context

Submitted by Chris Wilson on Nov. 27, 2015, 10:07 a.m.

Details

Message ID 1448618850-23514-1-git-send-email-chris@chris-wilson.co.uk
State New
Headers show
Series "drm/i915: Restore inhibiting the load of the default context" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Chris Wilson Nov. 27, 2015, 10:07 a.m.
Following a GPU reset, we may leave the context in a poorly defined
state, and reloading from that context will leave the GPU flummoxed. For
secondary contexts, this will lead to that context being banned - but
currently it is also causing the default context to become banned,
leading to turmoil in the shared state.

This is a regression from

commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Mon Mar 16 16:00:58 2015 +0000

    drm/i915: Initialize all contexts

which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
default context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 43761c5bcaca..1041099d285a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -708,7 +708,7 @@  static int do_switch(struct drm_i915_gem_request *req)
 	if (ret)
 		goto unpin_out;
 
-	if (!to->legacy_hw_ctx.initialized) {
+	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
 		hw_flags |= MI_RESTORE_INHIBIT;
 		/* NB: If we inhibit the restore, the context is not allowed to
 		 * die because future work may end up depending on valid address

Comments

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Following a GPU reset, we may leave the context in a poorly defined
> state, and reloading from that context will leave the GPU flummoxed. For
> secondary contexts, this will lead to that context being banned - but
> currently it is also causing the default context to become banned,
> leading to turmoil in the shared state.
>
> This is a regression from
>
> commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Mon Mar 16 16:00:58 2015 +0000
>
>     drm/i915: Initialize all contexts
>
> which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> default context.
>

As we never submit anything except driver initialization commands
for that context, what would cause this context to become corrupted?

Please consider:

To achieve the same effect and as a bonus, get the
same default context (with workarounds) as we
did in driver init.

I also think that we should zero the global
default context in here to gain similarity wrt
module init.

-Mika

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43761c5bcaca..1041099d285a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -708,7 +708,7 @@ static int do_switch(struct drm_i915_gem_request *req)
>  	if (ret)
>  		goto unpin_out;
>  
> -	if (!to->legacy_hw_ctx.initialized) {
> +	if (!to->legacy_hw_ctx.initialized || i915_gem_context_is_default(to)) {
>  		hw_flags |= MI_RESTORE_INHIBIT;
>  		/* NB: If we inhibit the restore, the context is not allowed to
>  		 * die because future work may end up depending on valid address
> -- 
> 2.6.2
On Fri, Nov 27, 2015 at 01:32:11PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Following a GPU reset, we may leave the context in a poorly defined
> > state, and reloading from that context will leave the GPU flummoxed. For
> > secondary contexts, this will lead to that context being banned - but
> > currently it is also causing the default context to become banned,
> > leading to turmoil in the shared state.
> >
> > This is a regression from
> >
> > commit 6702cf16e0ba8b0129f5aa1b6609d4e9c70bc13b [v4.1]
> > Author: Ben Widawsky <benjamin.widawsky@intel.com>
> > Date:   Mon Mar 16 16:00:58 2015 +0000
> >
> >     drm/i915: Initialize all contexts
> >
> > which quietly introduced the removal of the MI_RESTORE_INHIBIT on the
> > default context.
> >
> 
> As we never submit anything except driver initialization commands
> for that context, what would cause this context to become corrupted?

I can only hazard that the act of reseting the GPU left it invalid. A
bisect pointed to that commit, and partially reverting each chunk left
me with the conclusion that the hang was a direct result of reloading
the context. Closer inspection may reveal someelse suspect about the
context, but I object to this sneaky change.

> Please consider:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 43761c5..45b9a39 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -332,6 +332,7 @@ void i915_gem_context_reset(struct drm_device *dev)
>         for (i = 0; i < I915_NUM_RINGS; i++) {
>                 struct intel_engine_cs *ring = &dev_priv->ring[i];
>                 struct intel_context *lctx = ring->last_context;
> +               struct intel_context *dctx = ring->default_context;
>  
>                 if (lctx) {
>                         if (lctx->legacy_hw_ctx.rcs_state && i == RCS)
> @@ -340,6 +341,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>                         i915_gem_context_unreference(lctx);
>                         ring->last_context = NULL;
>                 }
> +
> +               if (dctx)
> +                       dctx->legacy_hw_ctx.initialized = false;
>         }
>  }
> 
> To achieve the same effect and as a bonus, get the
> same default context (with workarounds) as we
> did in driver init.

I considered it, and wondered why it wasn't already there. It is a
separate issue imo.
 
> I also think that we should zero the global
> default context in here to gain similarity wrt
> module init.

You mean reallocate it from scratch? We have avoided doing the
reallocations in the past, as they can fail at inopportune times
-Chris