[1/3] drm/i915/gt: Flush retire.work timer object on unload

Submitted by Chris Wilson on Nov. 15, 2019, 3:08 p.m.

Details

Message ID 20191115150841.880349-1-chris@chris-wilson.co.uk
State Accepted
Commit dea397e818b18f688734802465e45a35a976d4df
Headers show
Series "Series without cover letter" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Chris Wilson Nov. 15, 2019, 3:08 p.m.
We need to wait until the timer object is marked as deactivated before
unloading, so follow up our gentle cancel_delayed_work() with the
synchronous variant to ensure it is flushed off a remote cpu before we
mark the memory as freed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111994
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c          | 1 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 6 ++++++
 drivers/gpu/drm/i915/gt/intel_gt_requests.h | 1 +
 3 files changed, 8 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index c39b21c8d328..b5a9b87e4ec9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -397,6 +397,7 @@  void intel_gt_driver_release(struct intel_gt *gt)
 void intel_gt_driver_late_release(struct intel_gt *gt)
 {
 	intel_uc_driver_late_release(&gt->uc);
+	intel_gt_fini_requests(gt);
 	intel_gt_fini_reset(gt);
 	intel_gt_fini_timelines(gt);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index ccbddddbbd52..a79e6efb31a2 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,3 +130,9 @@  void intel_gt_unpark_requests(struct intel_gt *gt)
 	schedule_delayed_work(&gt->requests.retire_work,
 			      round_jiffies_up_relative(HZ));
 }
+
+void intel_gt_fini_requests(struct intel_gt *gt)
+{
+	/* Wait until the work is marked as finished before unloading! */
+	cancel_delayed_work_sync(&gt->requests.retire_work);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
index bd31cbce47e0..fde546424c63 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
@@ -20,5 +20,6 @@  int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
 void intel_gt_init_requests(struct intel_gt *gt);
 void intel_gt_park_requests(struct intel_gt *gt);
 void intel_gt_unpark_requests(struct intel_gt *gt);
+void intel_gt_fini_requests(struct intel_gt *gt);
 
 #endif /* INTEL_GT_REQUESTS_H */

Comments

On 15/11/2019 15:08, Chris Wilson wrote:
> We need to wait until the timer object is marked as deactivated before
> unloading, so follow up our gentle cancel_delayed_work() with the
> synchronous variant to ensure it is flushed off a remote cpu before we
> mark the memory as freed.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111994
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt.c          | 1 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 6 ++++++
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h | 1 +
>   3 files changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index c39b21c8d328..b5a9b87e4ec9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -397,6 +397,7 @@ void intel_gt_driver_release(struct intel_gt *gt)
>   void intel_gt_driver_late_release(struct intel_gt *gt)
>   {
>   	intel_uc_driver_late_release(&gt->uc);
> +	intel_gt_fini_requests(gt);
>   	intel_gt_fini_reset(gt);
>   	intel_gt_fini_timelines(gt);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index ccbddddbbd52..a79e6efb31a2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -130,3 +130,9 @@ void intel_gt_unpark_requests(struct intel_gt *gt)
>   	schedule_delayed_work(&gt->requests.retire_work,
>   			      round_jiffies_up_relative(HZ));
>   }
> +
> +void intel_gt_fini_requests(struct intel_gt *gt)
> +{
> +	/* Wait until the work is marked as finished before unloading! */
> +	cancel_delayed_work_sync(&gt->requests.retire_work);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> index bd31cbce47e0..fde546424c63 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> @@ -20,5 +20,6 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>   void intel_gt_init_requests(struct intel_gt *gt);
>   void intel_gt_park_requests(struct intel_gt *gt);
>   void intel_gt_unpark_requests(struct intel_gt *gt);
> +void intel_gt_fini_requests(struct intel_gt *gt);
>   
>   #endif /* INTEL_GT_REQUESTS_H */
> 

Sounds plausible. Verified fix or speculative?

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Quoting Tvrtko Ursulin (2019-11-15 16:09:00)
> 
> On 15/11/2019 15:08, Chris Wilson wrote:
> > We need to wait until the timer object is marked as deactivated before
> > unloading, so follow up our gentle cancel_delayed_work() with the
> > synchronous variant to ensure it is flushed off a remote cpu before we
> > mark the memory as freed.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111994
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt.c          | 1 +
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c | 6 ++++++
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.h | 1 +
> >   3 files changed, 8 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index c39b21c8d328..b5a9b87e4ec9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -397,6 +397,7 @@ void intel_gt_driver_release(struct intel_gt *gt)
> >   void intel_gt_driver_late_release(struct intel_gt *gt)
> >   {
> >       intel_uc_driver_late_release(&gt->uc);
> > +     intel_gt_fini_requests(gt);
> >       intel_gt_fini_reset(gt);
> >       intel_gt_fini_timelines(gt);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index ccbddddbbd52..a79e6efb31a2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -130,3 +130,9 @@ void intel_gt_unpark_requests(struct intel_gt *gt)
> >       schedule_delayed_work(&gt->requests.retire_work,
> >                             round_jiffies_up_relative(HZ));
> >   }
> > +
> > +void intel_gt_fini_requests(struct intel_gt *gt)
> > +{
> > +     /* Wait until the work is marked as finished before unloading! */
> > +     cancel_delayed_work_sync(&gt->requests.retire_work);
> > +}
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > index bd31cbce47e0..fde546424c63 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > @@ -20,5 +20,6 @@ int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> >   void intel_gt_init_requests(struct intel_gt *gt);
> >   void intel_gt_park_requests(struct intel_gt *gt);
> >   void intel_gt_unpark_requests(struct intel_gt *gt);
> > +void intel_gt_fini_requests(struct intel_gt *gt);
> >   
> >   #endif /* INTEL_GT_REQUESTS_H */
> > 
> 
> Sounds plausible. Verified fix or speculative?

Only verified in the sense that it came and went away again.
It's timing dependent and all the debugobject says is delayed_work of
which we free a few hundred at module unload. But I suspect this is the
only delayed work that doesn't have a sync on shutdown at present.

It sounded plausible, albeit unlikely, to me that we could just about do
the kfree(i915) while the worker was asleep on another cpu before its
debugobject could be marked as deactivated.
-Chris