| Message ID | 20170621124804.4529-1-chris@chris-wilson.co.uk |
|---|---|
| State | Accepted |
| Commit | 4ee056f41807858b2eae263e74ae8b81800c0337 |
| Headers | show |
| Series |
"drm/i915: Cancel pending execlist tasklet upon wedging"
( rev:
1
)
in
Intel GFX |
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1353491c1010..ae3ce1314bd1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) engine->execlist_first = NULL; spin_unlock_irqrestore(&engine->timeline->lock, flags); + + /* The port is checked prior to scheduling a tasklet, but + * just in case we have suspended the tasklet to do the + * wedging make sure that when it wakes, it decides there + * is no work to do by clearing the irq_posted bit. + */ + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); } }
On 21/06/2017 13:48, Chris Wilson wrote: > Highly unlikely, but if the stop_machine() did suspend the tasklet, we > want to make sure that when it wakes it finds there is nothing to do. > Otherwise, it will loudly complain that the ELSP port tracking no longer > matches the hardware, and we will be mightly confused. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 1353491c1010..ae3ce1314bd1 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > engine->execlist_first = NULL; > > spin_unlock_irqrestore(&engine->timeline->lock, flags); > + > + /* The port is checked prior to scheduling a tasklet, but > + * just in case we have suspended the tasklet to do the > + * wedging make sure that when it wakes, it decides there > + * is no work to do by clearing the irq_posted bit. > + */ > + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > } > } > > Or tasklet_kill? Actually I am not sure now how the wedging transition works for request which have pasted the submit callback, but are not in the hardware yet (waiting in the priotree). Can't find at the moment that we have anything dealing with those. Regards, Tvrtko
Quoting Tvrtko Ursulin (2017-06-21 14:33:59) > > On 21/06/2017 13:48, Chris Wilson wrote: > > Highly unlikely, but if the stop_machine() did suspend the tasklet, we > > want to make sure that when it wakes it finds there is nothing to do. > > Otherwise, it will loudly complain that the ELSP port tracking no longer > > matches the hardware, and we will be mightly confused. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 1353491c1010..ae3ce1314bd1 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > > engine->execlist_first = NULL; > > > > spin_unlock_irqrestore(&engine->timeline->lock, flags); > > + > > + /* The port is checked prior to scheduling a tasklet, but > > + * just in case we have suspended the tasklet to do the > > + * wedging make sure that when it wakes, it decides there > > + * is no work to do by clearing the irq_posted bit. > > + */ > > + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > > } > > } > > > > > > Or tasklet_kill? Actually I am not sure now how the wedging transition > works for request which have pasted the submit callback, but are not in > the hardware yet (waiting in the priotree). Can't find at the moment > that we have anything dealing with those. We kill the rbtree and the tree itself doesn't hold a reference to the request (just a reference is held for the ports, as prior to submission there is no way we complete the request and so cancel the in-flight reference count). That's the engine->execlist_queue = RB_ROOT; engine->execlist_first = NULL; above. -Chris
Quoting Tvrtko Ursulin (2017-06-21 14:33:59) > > On 21/06/2017 13:48, Chris Wilson wrote: > > Highly unlikely, but if the stop_machine() did suspend the tasklet, we > > want to make sure that when it wakes it finds there is nothing to do. > > Otherwise, it will loudly complain that the ELSP port tracking no longer > > matches the hardware, and we will be mightly confused. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 1353491c1010..ae3ce1314bd1 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > > engine->execlist_first = NULL; > > > > spin_unlock_irqrestore(&engine->timeline->lock, flags); > > + > > + /* The port is checked prior to scheduling a tasklet, but > > + * just in case we have suspended the tasklet to do the > > + * wedging make sure that when it wakes, it decides there > > + * is no work to do by clearing the irq_posted bit. > > + */ > > + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > > } > > } > > > > > > Or tasklet_kill? Hmm, technically not permitted: void tasklet_kill(struct tasklet_struct *t) { if (in_interrupt()) pr_notice("Attempt to kill tasklet from interrupt\n"); while (test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) { do { yield(); } while (test_bit(TASKLET_STATE_SCHED, &t->state)); } tasklet_unlock_wait(t); clear_bit(TASKLET_STATE_SCHED, &t->state); } Even if we don't trigger the in_interrupt() simply due to having interrupts disabled, then the potential yield() is illegal from inside stop_machine(). -Chris
On 21/06/2017 14:53, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2017-06-21 14:33:59) >> >> On 21/06/2017 13:48, Chris Wilson wrote: >>> Highly unlikely, but if the stop_machine() did suspend the tasklet, we >>> want to make sure that when it wakes it finds there is nothing to do. >>> Otherwise, it will loudly complain that the ELSP port tracking no longer >>> matches the hardware, and we will be mightly confused. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >>> index 1353491c1010..ae3ce1314bd1 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem.c >>> +++ b/drivers/gpu/drm/i915/i915_gem.c >>> @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) >>> engine->execlist_first = NULL; >>> >>> spin_unlock_irqrestore(&engine->timeline->lock, flags); >>> + >>> + /* The port is checked prior to scheduling a tasklet, but >>> + * just in case we have suspended the tasklet to do the >>> + * wedging make sure that when it wakes, it decides there >>> + * is no work to do by clearing the irq_posted bit. >>> + */ >>> + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); >>> } >>> } >>> >>> >> >> Or tasklet_kill? Actually I am not sure now how the wedging transition >> works for request which have pasted the submit callback, but are not in >> the hardware yet (waiting in the priotree). Can't find at the moment >> that we have anything dealing with those. > > We kill the rbtree and the tree itself doesn't hold a reference to the > request (just a reference is held for the ports, as prior to submission > there is no way we complete the request and so cancel the in-flight > reference count). That's the > > engine->execlist_queue = RB_ROOT; > engine->execlist_first = NULL; > > above. Yes of course, don't know where I was looking. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2017-06-21 15:06:34) > > On 21/06/2017 14:53, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2017-06-21 14:33:59) > >> > >> On 21/06/2017 13:48, Chris Wilson wrote: > >>> Highly unlikely, but if the stop_machine() did suspend the tasklet, we > >>> want to make sure that when it wakes it finds there is nothing to do. > >>> Otherwise, it will loudly complain that the ELSP port tracking no longer > >>> matches the hardware, and we will be mightly confused. > >>> > >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>> --- > >>> drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ > >>> 1 file changed, 7 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>> index 1353491c1010..ae3ce1314bd1 100644 > >>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>> @@ -3092,6 +3092,13 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > >>> engine->execlist_first = NULL; > >>> > >>> spin_unlock_irqrestore(&engine->timeline->lock, flags); > >>> + > >>> + /* The port is checked prior to scheduling a tasklet, but > >>> + * just in case we have suspended the tasklet to do the > >>> + * wedging make sure that when it wakes, it decides there > >>> + * is no work to do by clearing the irq_posted bit. > >>> + */ > >>> + clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted); > >>> } > >>> } > >>> > >>> > >> > >> Or tasklet_kill? Actually I am not sure now how the wedging transition > >> works for request which have pasted the submit callback, but are not in > >> the hardware yet (waiting in the priotree). Can't find at the moment > >> that we have anything dealing with those. > > > > We kill the rbtree and the tree itself doesn't hold a reference to the > > request (just a reference is held for the ports, as prior to submission > > there is no way we complete the request and so cancel the in-flight > > reference count). That's the > > > > engine->execlist_queue = RB_ROOT; > > engine->execlist_first = NULL; > > > > above. > > Yes of course, don't know where I was looking. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Discussion is always worthwhile, we've caught many little bugs today alone just by asking questions. :) Pushed this little patch as it makes reasoning about i915_gem_set_wedged() a smidgen easier. -Chris
Highly unlikely, but if the stop_machine() did suspend the tasklet, we want to make sure that when it wakes it finds there is nothing to do. Otherwise, it will loudly complain that the ELSP port tracking no longer matches the hardware, and we will be mightly confused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 7 +++++++ 1 file changed, 7 insertions(+)