[v7,4/8] drm/i915: Fix clean up of file client list on execbuff failure

Submitted by John Harrison on April 20, 2016, 5:09 p.m.

Details

Message ID 1461172195-3959-5-git-send-email-John.C.Harrison@Intel.com
State New
Headers show
Series "Convert requests to use struct fence" ( rev: 4 ) in Intel GFX

Not browsing as part of any series.

Commit Message

John Harrison April 20, 2016, 5:09 p.m.
From: John Harrison <John.C.Harrison@Intel.com>

If an execbuff IOCTL call fails for some reason, it would leave the
request in the client list. The request clean up code would remove
this but only later on and only after the reference count has dropped
to zero. The entire sequence is contained within the driver mutex
lock. However, there is still a hole such that any code which does not
require the mutex lock could still find the request on the client list
and start using it. That would lead to broken reference counts, use of
dangling pointers and all sorts of other nastiness.

The throttle IOCTL in particular does not acquire the mutex and does
process the client list. And the likely situation of the execbuff
IOCTL failing is when the system is busy with lots of work
outstanding. That is exactly the situation where the throttle IOCTL
would try to wait on a request.

Currently, this hole is tiny - the gap between the reference count
dropping to zero and the free function being called in response.
However the next patch in this series enlarges that gap considerably
by deferring the free function (to remove the need for the mutex lock
when unreferencing requests).

v7: New patch in series.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            | 1 +
 drivers/gpu/drm/i915/i915_gem.c            | 5 ++---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 6 +++++-
 3 files changed, 8 insertions(+), 4 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1c3a1ca..707bf0f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2358,6 +2358,7 @@  static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
+void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request);
 
 static inline uint32_t
 i915_gem_request_get_seqno(struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 99e9ddb..74db6a3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1399,8 +1399,7 @@  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static inline void
-i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
+void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_file_private *file_priv = request->file_priv;
 
@@ -2735,7 +2734,7 @@  static void i915_gem_request_free(struct fence *req_fence)
 
 	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
 
-	if (req->file_priv)
+	if (WARN_ON(req->file_priv))
 		i915_gem_request_remove_from_client(req);
 
 	if (ctx) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index bb95d27..40d333c 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1659,8 +1659,12 @@  err:
 	 * must be freed again. If it was submitted then it is being tracked
 	 * on the active request list and no clean up is required here.
 	 */
-	if (ret && !IS_ERR_OR_NULL(req))
+	if (ret && !IS_ERR_OR_NULL(req)) {
+		if (req->file_priv)
+			i915_gem_request_remove_from_client(req);
+
 		i915_gem_request_cancel(req);
+	}
 
 	mutex_unlock(&dev->struct_mutex);
 

Comments

Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> If an execbuff IOCTL call fails for some reason, it would leave the
> request in the client list. The request clean up code would remove
> this but only later on and only after the reference count has dropped
> to zero. The entire sequence is contained within the driver mutex
> lock. However, there is still a hole such that any code which does not
> require the mutex lock could still find the request on the client list
> and start using it. That would lead to broken reference counts, use of
> dangling pointers and all sorts of other nastiness.
>
> The throttle IOCTL in particular does not acquire the mutex and does
> process the client list. And the likely situation of the execbuff
> IOCTL failing is when the system is busy with lots of work
> outstanding. That is exactly the situation where the throttle IOCTL
> would try to wait on a request.
>
> Currently, this hole is tiny - the gap between the reference count
> dropping to zero and the free function being called in response.
> However the next patch in this series enlarges that gap considerably
> by deferring the free function (to remove the need for the mutex lock
> when unreferencing requests).
>
Seems to already have been fixed by

commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 17:35:15 2016 +0100

    drm/i915: Late request cancellations are harmful
On 21/04/2016 08:20, Maarten Lankhorst wrote:
> Op 20-04-16 om 19:09 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> If an execbuff IOCTL call fails for some reason, it would leave the
>> request in the client list. The request clean up code would remove
>> this but only later on and only after the reference count has dropped
>> to zero. The entire sequence is contained within the driver mutex
>> lock. However, there is still a hole such that any code which does not
>> require the mutex lock could still find the request on the client list
>> and start using it. That would lead to broken reference counts, use of
>> dangling pointers and all sorts of other nastiness.
>>
>> The throttle IOCTL in particular does not acquire the mutex and does
>> process the client list. And the likely situation of the execbuff
>> IOCTL failing is when the system is busy with lots of work
>> outstanding. That is exactly the situation where the throttle IOCTL
>> would try to wait on a request.
>>
>> Currently, this hole is tiny - the gap between the reference count
>> dropping to zero and the free function being called in response.
>> However the next patch in this series enlarges that gap considerably
>> by deferring the free function (to remove the need for the mutex lock
>> when unreferencing requests).
>>
> Seems to already have been fixed by
>
> commit aa9b78104fe3210758fa9e6c644e9a108d371e8b
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Wed Apr 13 17:35:15 2016 +0100
>
>      drm/i915: Late request cancellations are harmful
>

I don't believe that patch is the best way to solve the problem.