drm/i915: Defer modeset cleanup to a secondary task

Submitted by Chris Wilson on June 23, 2018, 10:39 a.m.

Details

Message ID 20180623103951.23889-1-chris@chris-wilson.co.uk
State Accepted
Commit 8d52e447807b350b98ffb4e64bc2fcc1f181c5be
Headers show
Series "drm/i915: Defer modeset cleanup to a secondary task" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Chris Wilson June 23, 2018, 10:39 a.m.
If we avoid cleaning up the old state immediately in
intel_atomic_commit_tail() and defer it to a second task, we can avoid
taking heavily contended locks when the caller is ready to procede.
Subsequent modesets will wait for the cleanup operation (either directly
via the ordered modeset wq or indirectly through the atomic helperr)
which keeps the number of inflight cleanup tasks in check.

As an example, during reset an immediate modeset is performed to disable
the displays before the HW is reset, which must avoid struct_mutex to
avoid recursion. Moving the cleanup to a separate task, defers acquiring
the struct_mutex to after the GPU is running again, allowing it to
complete. Even in a few patches time (optimist!) when we no longer
require struct_mutex to unpin the framebuffers, it will still be good
practice to minimise the number of contention points along reset. The
mutex dependency still exists (as one modeset flushes the other), but in
the short term it resolves the deadlock for simple reset cases.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 30 +++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3d849ec17f5c..3709fa1b6318 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12554,6 +12554,19 @@  static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_stat
 	finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset);
 }
 
+static void intel_atomic_cleanup_work(struct work_struct *work)
+{
+	struct drm_atomic_state *state =
+		container_of(work, struct drm_atomic_state, commit_work);
+	struct drm_i915_private *i915 = to_i915(state->dev);
+
+	drm_atomic_helper_cleanup_planes(&i915->drm, state);
+	drm_atomic_helper_commit_cleanup_done(state);
+	drm_atomic_state_put(state);
+
+	intel_atomic_helper_free_state(i915);
+}
+
 static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 {
 	struct drm_device *dev = state->dev;
@@ -12714,13 +12727,16 @@  static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 		intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET);
 	}
 
-	drm_atomic_helper_cleanup_planes(dev, state);
-
-	drm_atomic_helper_commit_cleanup_done(state);
-
-	drm_atomic_state_put(state);
-
-	intel_atomic_helper_free_state(dev_priv);
+	/*
+	 * Defer the cleanup of the old state to a separate worker to not
+	 * impede the current task (userspace for blocking modesets) that
+	 * are executed inline. For out-of-line asynchronous modesets/flips,
+	 * deferring to a new worker seems overkill, but we would place a
+	 * schedule point (cond_resched()) here anyway to keep latencies
+	 * down.
+	 */
+	INIT_WORK(&state->commit_work, intel_atomic_cleanup_work);
+	schedule_work(&state->commit_work);
 }
 
 static void intel_atomic_commit_work(struct work_struct *work)

Comments

Quoting Chris Wilson (2018-06-23 11:39:51)
> If we avoid cleaning up the old state immediately in
> intel_atomic_commit_tail() and defer it to a second task, we can avoid
> taking heavily contended locks when the caller is ready to procede.
> Subsequent modesets will wait for the cleanup operation (either directly
> via the ordered modeset wq or indirectly through the atomic helperr)
> which keeps the number of inflight cleanup tasks in check.
> 
> As an example, during reset an immediate modeset is performed to disable
> the displays before the HW is reset, which must avoid struct_mutex to
> avoid recursion. Moving the cleanup to a separate task, defers acquiring
> the struct_mutex to after the GPU is running again, allowing it to
> complete. Even in a few patches time (optimist!) when we no longer
> require struct_mutex to unpin the framebuffers, it will still be good
> practice to minimise the number of contention points along reset. The
> mutex dependency still exists (as one modeset flushes the other), but in
> the short term it resolves the deadlock for simple reset cases.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

This silences
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101600
which is one of the *four* remaining failures in BAT.

We're almost green, so let's board up this window as well.
-Chris
Quoting Chris Wilson (2018-06-25 11:16:58)
> Quoting Chris Wilson (2018-06-23 11:39:51)
> > If we avoid cleaning up the old state immediately in
> > intel_atomic_commit_tail() and defer it to a second task, we can avoid
> > taking heavily contended locks when the caller is ready to procede.
> > Subsequent modesets will wait for the cleanup operation (either directly
> > via the ordered modeset wq or indirectly through the atomic helperr)
> > which keeps the number of inflight cleanup tasks in check.
> > 
> > As an example, during reset an immediate modeset is performed to disable
> > the displays before the HW is reset, which must avoid struct_mutex to
> > avoid recursion. Moving the cleanup to a separate task, defers acquiring
> > the struct_mutex to after the GPU is running again, allowing it to
> > complete. Even in a few patches time (optimist!) when we no longer
> > require struct_mutex to unpin the framebuffers, it will still be good
> > practice to minimise the number of contention points along reset. The
> > mutex dependency still exists (as one modeset flushes the other), but in
> > the short term it resolves the deadlock for simple reset cases.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> This silences
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101600
> which is one of the *four* remaining failures in BAT.

Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Maarten said he would ack it if I could dig up the bugzilla link for the
CI fail. This is not the long term solution, just a convenient hack that
adds a schedule point after the modeset (which in imo is beneficial).
-Chris
Quoting Chris Wilson (2018-06-25 12:00:16)
> Quoting Chris Wilson (2018-06-25 11:16:58)
> > Quoting Chris Wilson (2018-06-23 11:39:51)
> > > If we avoid cleaning up the old state immediately in
> > > intel_atomic_commit_tail() and defer it to a second task, we can avoid
> > > taking heavily contended locks when the caller is ready to procede.
> > > Subsequent modesets will wait for the cleanup operation (either directly
> > > via the ordered modeset wq or indirectly through the atomic helperr)
> > > which keeps the number of inflight cleanup tasks in check.
> > > 
> > > As an example, during reset an immediate modeset is performed to disable
> > > the displays before the HW is reset, which must avoid struct_mutex to
> > > avoid recursion. Moving the cleanup to a separate task, defers acquiring
> > > the struct_mutex to after the GPU is running again, allowing it to
> > > complete. Even in a few patches time (optimist!) when we no longer
> > > require struct_mutex to unpin the framebuffers, it will still be good
> > > practice to minimise the number of contention points along reset. The
> > > mutex dependency still exists (as one modeset flushes the other), but in
> > > the short term it resolves the deadlock for simple reset cases.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > This silences
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101600
> > which is one of the *four* remaining failures in BAT.
> 
> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> Maarten said he would ack it if I could dig up the bugzilla link for the
> CI fail. This is not the long term solution, just a convenient hack that
> adds a schedule point after the modeset (which in imo is beneficial).

Added a further ack from Daniel and pushed. Thanks for checking over the
patch,
-Chris