drm/amd/display: Don't cleanup new cursor fb in fast cursor update

Submitted by Kazlauskas, Nicholas on Dec. 14, 2018, 2:48 p.m.

Details

Message ID 20181214144813.7403-1-nicholas.kazlauskas@amd.com
State New
Headers show
Series "drm/amd/display: Don't cleanup new cursor fb in fast cursor update" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Kazlauskas, Nicholas Dec. 14, 2018, 2:48 p.m.
[Why]
The behavior of dm_plane_helper_cleanup_fb differs depending on
whether the commit was asynchronous or not. When it's called from
amdgpu_dm_atomic_commit_tail during a typical atomic commit the
plane state has been swapped so it calls cleanup_fb on the old plane
state.

However, in the asynchronous commit codepath the call to
drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
atomic_async_update has been called. Since the plane state has not
been swapped, the cleanup_fb call affects the new plane state with the
active fb.

The slow, full path is only ever used for the cursor update when the
cursor CRTC changes. If there was previously a fb in the state before
that had gone through the fast asynchronous path then cleanup_fb will
be called a second time on an fb that was previously unpinned/unref'd.

This results in a use after free on the next cursor update.

[How]
Regardless of whether this was intentional behavior in DRM or not,
the fix is to manually call cleanup_fb on the old plane state's fb.

Since the pointer to the old_plane_state is actually the current
plane->state in this function, a backup is needed. This has a minor
change to behavior for handle_cursor_update since it now correctly uses
the old state.

The next step is to prevent the cleanup_fb call from DRM from unpinning
the active new_state->fb. Setting new_state->fb to NULL would be
enough to handle this, but DRM has a warning in the
drm_atomic_helper_async_commit helper if
plane->state->fb != new_state->fb.

A new field was added (cursor_update) to dm_plane_state that's only
ever set to true during the fast path. If it's true, then cleanup_fb
doesn't unpin and unref the fb.

Cc: Leo Li <sunpeng.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 17 +++++++++++++----
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h   |  2 ++
 2 files changed, 15 insertions(+), 4 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d01315965af0..b71c834a959d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3589,9 +3589,10 @@  static void dm_plane_helper_cleanup_fb(struct drm_plane *plane,
 				       struct drm_plane_state *old_state)
 {
 	struct amdgpu_bo *rbo;
+	struct dm_plane_state *dm_plane_state = to_dm_plane_state(old_state);
 	int r;
 
-	if (!old_state->fb)
+	if (!old_state->fb || dm_plane_state->cursor_update)
 		return;
 
 	rbo = gem_to_amdgpu_bo(old_state->fb->obj[0]);
@@ -3638,8 +3639,8 @@  static int dm_plane_atomic_async_check(struct drm_plane *plane,
 static void dm_plane_atomic_async_update(struct drm_plane *plane,
 					 struct drm_plane_state *new_state)
 {
-	struct drm_plane_state *old_state =
-		drm_atomic_get_old_plane_state(new_state->state, plane);
+	struct drm_plane_state old_state = *plane->state;
+	struct dm_plane_state *dm_plane_state = to_dm_plane_state(new_state);
 
 	if (plane->state->fb != new_state->fb)
 		drm_atomic_set_fb_for_plane(plane->state, new_state->fb);
@@ -3653,7 +3654,15 @@  static void dm_plane_atomic_async_update(struct drm_plane *plane,
 	plane->state->crtc_w = new_state->crtc_w;
 	plane->state->crtc_h = new_state->crtc_h;
 
-	handle_cursor_update(plane, old_state);
+	handle_cursor_update(plane, &old_state);
+
+	/*
+	 * While DRM already takes care of drm_atomic_helper_cleanup_planes,
+	 * it does it on the wrong state (new_state instead of old_state).
+	 * This is a workaround for that behavior.
+	 */
+	dm_plane_helper_cleanup_fb(plane, &old_state);
+	dm_plane_state->cursor_update = true;
 }
 
 static const struct drm_plane_helper_funcs dm_plane_helper_funcs = {
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 25bb91ee80ba..ecac91036864 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -254,6 +254,8 @@  struct dc_plane_state;
 struct dm_plane_state {
 	struct drm_plane_state base;
 	struct dc_plane_state *dc_state;
+
+	bool cursor_update;
 };
 
 struct dm_crtc_state {

Comments

On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:
> [Why]
> The behavior of dm_plane_helper_cleanup_fb differs depending on
> whether the commit was asynchronous or not. When it's called from
> amdgpu_dm_atomic_commit_tail during a typical atomic commit the
> plane state has been swapped so it calls cleanup_fb on the old plane
> state.
> 
> However, in the asynchronous commit codepath the call to
> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after
> atomic_async_update has been called. Since the plane state has not
> been swapped, the cleanup_fb call affects the new plane state with the
> active fb.
> 
> The slow, full path is only ever used for the cursor update when the
> cursor CRTC changes. If there was previously a fb in the state before
> that had gone through the fast asynchronous path then cleanup_fb will
> be called a second time on an fb that was previously unpinned/unref'd.
> 
> This results in a use after free on the next cursor update.
> 
> [How]
> Regardless of whether this was intentional behavior in DRM or not,
> the fix is to manually call cleanup_fb on the old plane state's fb.
> 
> Since the pointer to the old_plane_state is actually the current
> plane->state in this function, a backup is needed. This has a minor
> change to behavior for handle_cursor_update since it now correctly uses
> the old state.
> 
> The next step is to prevent the cleanup_fb call from DRM from unpinning
> the active new_state->fb. Setting new_state->fb to NULL would be
> enough to handle this, but DRM has a warning in the
> drm_atomic_helper_async_commit helper if
> plane->state->fb != new_state->fb.
> 
> A new field was added (cursor_update) to dm_plane_state that's only
> ever set to true during the fast path. If it's true, then cleanup_fb
> doesn't unpin and unref the fb.
> 
> Cc: Leo Li <sunpeng.li@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Michel Dänzer <michel.daenzer@amd.com>
> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> 
> [...]
>  
> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct drm_plane *plane,
>  	plane->state->crtc_w = new_state->crtc_w;
>  	plane->state->crtc_h = new_state->crtc_h;
>  
> -	handle_cursor_update(plane, old_state);
> +	handle_cursor_update(plane, &old_state);
> +
> +	/*
> +	 * While DRM already takes care of drm_atomic_helper_cleanup_planes,
> +	 * it does it on the wrong state (new_state instead of old_state).
> +	 * This is a workaround for that behavior.
> +	 */
> +	dm_plane_helper_cleanup_fb(plane, &old_state);
> +	dm_plane_state->cursor_update = true;

We shouldn't work around DRM core code in driver code. Please work on a
solution with the core code developers on the dri-devel list, and revert
the cursor fast path for now.
On 12/14/18 10:01 AM, Michel Dänzer wrote:
> On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:

>> [Why]

>> The behavior of dm_plane_helper_cleanup_fb differs depending on

>> whether the commit was asynchronous or not. When it's called from

>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the

>> plane state has been swapped so it calls cleanup_fb on the old plane

>> state.

>>

>> However, in the asynchronous commit codepath the call to

>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after

>> atomic_async_update has been called. Since the plane state has not

>> been swapped, the cleanup_fb call affects the new plane state with the

>> active fb.

>>

>> The slow, full path is only ever used for the cursor update when the

>> cursor CRTC changes. If there was previously a fb in the state before

>> that had gone through the fast asynchronous path then cleanup_fb will

>> be called a second time on an fb that was previously unpinned/unref'd.

>>

>> This results in a use after free on the next cursor update.

>>

>> [How]

>> Regardless of whether this was intentional behavior in DRM or not,

>> the fix is to manually call cleanup_fb on the old plane state's fb.

>>

>> Since the pointer to the old_plane_state is actually the current

>> plane->state in this function, a backup is needed. This has a minor

>> change to behavior for handle_cursor_update since it now correctly uses

>> the old state.

>>

>> The next step is to prevent the cleanup_fb call from DRM from unpinning

>> the active new_state->fb. Setting new_state->fb to NULL would be

>> enough to handle this, but DRM has a warning in the

>> drm_atomic_helper_async_commit helper if

>> plane->state->fb != new_state->fb.

>>

>> A new field was added (cursor_update) to dm_plane_state that's only

>> ever set to true during the fast path. If it's true, then cleanup_fb

>> doesn't unpin and unref the fb.

>>

>> Cc: Leo Li <sunpeng.li@amd.com>

>> Cc: Harry Wentland <harry.wentland@amd.com>

>> Cc: Michel Dänzer <michel.daenzer@amd.com>

>> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

>>

>> [...]

>>   

>> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct drm_plane *plane,

>>   	plane->state->crtc_w = new_state->crtc_w;

>>   	plane->state->crtc_h = new_state->crtc_h;

>>   

>> -	handle_cursor_update(plane, old_state);

>> +	handle_cursor_update(plane, &old_state);

>> +

>> +	/*

>> +	 * While DRM already takes care of drm_atomic_helper_cleanup_planes,

>> +	 * it does it on the wrong state (new_state instead of old_state).

>> +	 * This is a workaround for that behavior.

>> +	 */

>> +	dm_plane_helper_cleanup_fb(plane, &old_state);

>> +	dm_plane_state->cursor_update = true;

> 

> We shouldn't work around DRM core code in driver code. Please work on a

> solution with the core code developers on the dri-devel list, and revert

> the cursor fast path for now.

> 

> 


I'd rather not revert the patch given how much it improves the user 
experience. Some compositors (and wine games) are unusable or unplayable 
without it.

In general I agree with not doing core workarounds, but I'm not even 
sure if this is a bug in core DRM to begin with. Since the plane state 
modification is in place if you free the old_state then you're freeing 
what's currently active on plane->state. This works for us since we only 
work on the ->fb during prepare/cleanup, but this isn't true in general.

The only other way to do this is not using 
async_commit_check/async_commit_update and reimplementing atomic commit 
behavior for the fast path. Intel does this with their legacy cursor 
update fast path - it's a lot more code and it involves using 
"deprecated" elements of the atomic API.

Nicholas Kazlauskas
On 12/14/18 10:19 AM, Kazlauskas, Nicholas wrote:
> On 12/14/18 10:01 AM, Michel Dänzer wrote:

>> On 2018-12-14 3:48 p.m., Nicholas Kazlauskas wrote:

>>> [Why]

>>> The behavior of dm_plane_helper_cleanup_fb differs depending on

>>> whether the commit was asynchronous or not. When it's called from

>>> amdgpu_dm_atomic_commit_tail during a typical atomic commit the

>>> plane state has been swapped so it calls cleanup_fb on the old plane

>>> state.

>>>

>>> However, in the asynchronous commit codepath the call to

>>> drm_atomic_helper_commit also calls dm_plane_helper_cleanup_fb after

>>> atomic_async_update has been called. Since the plane state has not

>>> been swapped, the cleanup_fb call affects the new plane state with the

>>> active fb.

>>>

>>> The slow, full path is only ever used for the cursor update when the

>>> cursor CRTC changes. If there was previously a fb in the state before

>>> that had gone through the fast asynchronous path then cleanup_fb will

>>> be called a second time on an fb that was previously unpinned/unref'd.

>>>

>>> This results in a use after free on the next cursor update.

>>>

>>> [How]

>>> Regardless of whether this was intentional behavior in DRM or not,

>>> the fix is to manually call cleanup_fb on the old plane state's fb.

>>>

>>> Since the pointer to the old_plane_state is actually the current

>>> plane->state in this function, a backup is needed. This has a minor

>>> change to behavior for handle_cursor_update since it now correctly uses

>>> the old state.

>>>

>>> The next step is to prevent the cleanup_fb call from DRM from unpinning

>>> the active new_state->fb. Setting new_state->fb to NULL would be

>>> enough to handle this, but DRM has a warning in the

>>> drm_atomic_helper_async_commit helper if

>>> plane->state->fb != new_state->fb.

>>>

>>> A new field was added (cursor_update) to dm_plane_state that's only

>>> ever set to true during the fast path. If it's true, then cleanup_fb

>>> doesn't unpin and unref the fb.

>>>

>>> Cc: Leo Li <sunpeng.li@amd.com>

>>> Cc: Harry Wentland <harry.wentland@amd.com>

>>> Cc: Michel Dänzer <michel.daenzer@amd.com>

>>> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

>>>

>>> [...]

>>>    

>>> @@ -3653,7 +3654,15 @@ static void dm_plane_atomic_async_update(struct drm_plane *plane,

>>>    	plane->state->crtc_w = new_state->crtc_w;

>>>    	plane->state->crtc_h = new_state->crtc_h;

>>>    

>>> -	handle_cursor_update(plane, old_state);

>>> +	handle_cursor_update(plane, &old_state);

>>> +

>>> +	/*

>>> +	 * While DRM already takes care of drm_atomic_helper_cleanup_planes,

>>> +	 * it does it on the wrong state (new_state instead of old_state).

>>> +	 * This is a workaround for that behavior.

>>> +	 */

>>> +	dm_plane_helper_cleanup_fb(plane, &old_state);

>>> +	dm_plane_state->cursor_update = true;

>>

>> We shouldn't work around DRM core code in driver code. Please work on a

>> solution with the core code developers on the dri-devel list, and revert

>> the cursor fast path for now.

>>

>>

> 

> I'd rather not revert the patch given how much it improves the user

> experience. Some compositors (and wine games) are unusable or unplayable

> without it.

> 

> In general I agree with not doing core workarounds, but I'm not even

> sure if this is a bug in core DRM to begin with. Since the plane state

> modification is in place if you free the old_state then you're freeing

> what's currently active on plane->state. This works for us since we only

> work on the ->fb during prepare/cleanup, but this isn't true in general.

> 

> The only other way to do this is not using

> async_commit_check/async_commit_update and reimplementing atomic commit

> behavior for the fast path. Intel does this with their legacy cursor

> update fast path - it's a lot more code and it involves using

> "deprecated" elements of the atomic API.

> 

> Nicholas Kazlauskas

> _______________________________________________

> amd-gfx mailing list

> amd-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

> 


A simpler solution that I would still be fine with is to just disallow 
framebuffer changes in the fast path until DRM behavior is clarified or 
fixed. This would work with DRM freeing either the new_plane_state or 
the old_plane_state - it's the same FB in either case.

This will still resolve most cursor stuttering problems while being 
significantly less hacky and easier to revert.

It unfortunately does affect performance quite a bit for cases where the 
the cursor is changing the framebuffer, but you don't see this in most 
desktop usage. Just in the cursor-vs-flip-toggle and 
cursor-vs-flip-varying-size igt tests.

Nicholas Kazlauskas