drm/i915: Recursive i915_reset_trylock() verboten

Submitted by Chris Wilson on Feb. 12, 2019, 10:23 a.m.

Details

Message ID 20190212102354.22284-1-chris@chris-wilson.co.uk
State Accepted
Series "drm/i915: Recursive i915_reset_trylock() verboten"
Commit aeaaa55c7368ea0e7c195baa35dea37b806efb11
Headers show

Commit Message

Chris Wilson Feb. 12, 2019, 10:23 a.m.
We cannot nest i915_reset_trylock() as the inner may wait for the
I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
waiting for our outermost lock. As we take the reset srcu around the
fence update, we have to defer taking it in i915_gem_fault() until after
we acquire the pin on the fence to avoid nesting. This is a little ugly,
but still works. If a reset occurs between i915_vma_pin_fence() and the
second reset lock, the reset will restore the fence register back to the
pinned value before the reset lock allows us to proceed (our mmap won't
be revoked as we haven't yet marked it as being a userfault as that
requires us to hold the reset lock), so the pagefault is still
serialised with the revocation in reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c8c355bec091..ae1467a74a08 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1923,16 +1923,16 @@  vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	if (ret)
 		goto err_unpin;
 
+	ret = i915_vma_pin_fence(vma);
+	if (ret)
+		goto err_unpin;
+
 	srcu = i915_reset_trylock(dev_priv);
 	if (srcu < 0) {
 		ret = srcu;
-		goto err_unpin;
+		goto err_fence;
 	}
 
-	ret = i915_vma_pin_fence(vma);
-	if (ret)
-		goto err_reset;
-
 	/* Finally, remap it using the new GTT offset */
 	ret = remap_io_mapping(area,
 			       area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT),
@@ -1940,7 +1940,7 @@  vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 			       min_t(u64, vma->size, area->vm_end - area->vm_start),
 			       &ggtt->iomap);
 	if (ret)
-		goto err_fence;
+		goto err_reset;
 
 	/* Mark as being mmapped into userspace for later revocation */
 	assert_rpm_wakelock_held(dev_priv);
@@ -1950,10 +1950,10 @@  vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 	i915_vma_set_ggtt_write(vma);
 
-err_fence:
-	i915_vma_unpin_fence(vma);
 err_reset:
 	i915_reset_unlock(dev_priv, srcu);
+err_fence:
+	i915_vma_unpin_fence(vma);
 err_unpin:
 	__i915_vma_unpin(vma);
 err_unlock:

Comments

Mika Kuoppala Feb. 12, 2019, 11:12 a.m.
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We cannot nest i915_reset_trylock() as the inner may wait for the
> I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
> waiting for our outermost lock. As we take the reset srcu around the
> fence update, we have to defer taking it in i915_gem_fault() until after
> we acquire the pin on the fence to avoid nesting. This is a little ugly,
> but still works. If a reset occurs between i915_vma_pin_fence() and the
> second reset lock, the reset will restore the fence register back to the
> pinned value before the reset lock allows us to proceed (our mmap won't
> be revoked as we haven't yet marked it as being a userfault as that
> requires us to hold the reset lock), so the pagefault is still
> serialised with the revocation in reset.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
> Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c8c355bec091..ae1467a74a08 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  	if (ret)
>  		goto err_unpin;
>  
> +	ret = i915_vma_pin_fence(vma);
> +	if (ret)
> +		goto err_unpin;
> +

As this is obviusness slipped past us, would it
be worthwhile, in retrospect, to build a debug in
i915_reset_trylock to be vocal about it failing
to make progress?

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  	srcu = i915_reset_trylock(dev_priv);
>  	if (srcu < 0) {
>  		ret = srcu;
> -		goto err_unpin;
> +		goto err_fence;
>  	}
>  
> -	ret = i915_vma_pin_fence(vma);
> -	if (ret)
> -		goto err_reset;
> -
>  	/* Finally, remap it using the new GTT offset */
>  	ret = remap_io_mapping(area,
>  			       area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT),
> @@ -1940,7 +1940,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  			       min_t(u64, vma->size, area->vm_end - area->vm_start),
>  			       &ggtt->iomap);
>  	if (ret)
> -		goto err_fence;
> +		goto err_reset;
>  
>  	/* Mark as being mmapped into userspace for later revocation */
>  	assert_rpm_wakelock_held(dev_priv);
> @@ -1950,10 +1950,10 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  
>  	i915_vma_set_ggtt_write(vma);
>  
> -err_fence:
> -	i915_vma_unpin_fence(vma);
>  err_reset:
>  	i915_reset_unlock(dev_priv, srcu);
> +err_fence:
> +	i915_vma_unpin_fence(vma);
>  err_unpin:
>  	__i915_vma_unpin(vma);
>  err_unlock:
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Feb. 12, 2019, 11:18 a.m.
Quoting Mika Kuoppala (2019-02-12 11:12:05)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We cannot nest i915_reset_trylock() as the inner may wait for the
> > I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
> > waiting for our outermost lock. As we take the reset srcu around the
> > fence update, we have to defer taking it in i915_gem_fault() until after
> > we acquire the pin on the fence to avoid nesting. This is a little ugly,
> > but still works. If a reset occurs between i915_vma_pin_fence() and the
> > second reset lock, the reset will restore the fence register back to the
> > pinned value before the reset lock allows us to proceed (our mmap won't
> > be revoked as we haven't yet marked it as being a userfault as that
> > requires us to hold the reset lock), so the pagefault is still
> > serialised with the revocation in reset.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
> > Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index c8c355bec091..ae1467a74a08 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1923,16 +1923,16 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
> >       if (ret)
> >               goto err_unpin;
> >  
> > +     ret = i915_vma_pin_fence(vma);
> > +     if (ret)
> > +             goto err_unpin;
> > +
> 
> As this is obviusness slipped past us, would it
> be worthwhile, in retrospect, to build a debug in
> i915_reset_trylock to be vocal about it failing
> to make progress?

If we stick a timeout in there, we just send that back to
userspace. Deadlock resolved just with a sporadic delay.
It is interruptible so it's not a complete loss, and more obvious if it
stalls? That's my thinking for not sending along the quick conversion to
wait_event_interruptible_timeout().

What I think we can do is stick a might_lock() so we get the lockdep
splat before the wait?
-Chris