[2/2] drm/amdgpu: Fix use of interruptible waiting

Submitted by Xie, AlexBin on April 25, 2017, 9:25 p.m.

Details

Message ID 1493155526-28910-2-git-send-email-AlexBin.Xie@amd.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Xie, AlexBin April 25, 2017, 9:25 p.m.
1. The wait is short. There is not much benefit by
interruptible waiting.
2. In this function and caller functions, the error
handling for such interrupt is complicated and risky.

Change-Id: I289674ecd3f5ef20c93fe63e33df6d668b3c2edc
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 43082bf..c006cc4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -227,7 +227,7 @@  int amdgpu_crtc_prepare_flip(
 	new_abo = gem_to_amdgpu_bo(obj);
 
 	/* pin the new buffer */
-	r = amdgpu_bo_reserve(new_abo, false);
+	r = amdgpu_bo_reserve(new_abo, true);
 	if (unlikely(r != 0)) {
 		DRM_ERROR("failed to reserve new abo buffer before flip\n");
 		goto cleanup;

Comments

On 26/04/17 06:25 AM, Alex Xie wrote:
> 1. The wait is short. There is not much benefit by
> interruptible waiting.
> 2. In this function and caller functions, the error
> handling for such interrupt is complicated and risky.
> 
> Change-Id: I289674ecd3f5ef20c93fe63e33df6d668b3c2edc
> Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 43082bf..c006cc4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -227,7 +227,7 @@ int amdgpu_crtc_prepare_flip(
>  	new_abo = gem_to_amdgpu_bo(obj);
>  
>  	/* pin the new buffer */
> -	r = amdgpu_bo_reserve(new_abo, false);
> +	r = amdgpu_bo_reserve(new_abo, true);
>  	if (unlikely(r != 0)) {
>  		DRM_ERROR("failed to reserve new abo buffer before flip\n");
>  		goto cleanup;
> 

I'm afraid we have no choice but to use interruptible here, because this
code runs as part of an ioctl (DRM_IOCTL_MODE_PAGE_FLIP).
Am 26.04.2017 um 03:17 schrieb Michel Dänzer:
> On 26/04/17 06:25 AM, Alex Xie wrote:
>> 1. The wait is short. There is not much benefit by
>> interruptible waiting.
>> 2. In this function and caller functions, the error
>> handling for such interrupt is complicated and risky.
>>
>> Change-Id: I289674ecd3f5ef20c93fe63e33df6d668b3c2edc
>> Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> index 43082bf..c006cc4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>> @@ -227,7 +227,7 @@ int amdgpu_crtc_prepare_flip(
>>   	new_abo = gem_to_amdgpu_bo(obj);
>>   
>>   	/* pin the new buffer */
>> -	r = amdgpu_bo_reserve(new_abo, false);
>> +	r = amdgpu_bo_reserve(new_abo, true);
>>   	if (unlikely(r != 0)) {
>>   		DRM_ERROR("failed to reserve new abo buffer before flip\n");
>>   		goto cleanup;
>>
> I'm afraid we have no choice but to use interruptible here, because this
> code runs as part of an ioctl (DRM_IOCTL_MODE_PAGE_FLIP).

Yes, agree. But the error message is incorrect here and should be removed.

Christian.
Hi,

I knew this is part of ioctl. And I intended to fix this interruptible 
waiting in an ioctl.

ioctl is one of driver interfaces, where interruptible waiting is good 
in some scenarios. For example, if ioctl performs IO operations in 
atomic way, we don't want ioctl to be interrupted by a signal.

Interruptible waiting is good when we need to wait for longer time, for 
example, waiting for an input data for indefinite time. We don't want 
the process not responding to signals. Interruptible waitings can be 
good in read/write/ioctl interfaces.

For this patch,

1. The wait is short. There is not much benefit by interruptible 
waiting.  The BO is a frame buffer BO. Are there many competitors to 
lock this BO? If not, the wait is very short. This is my main reason to 
change.
2. In this function and caller functions, the error handling for such 
interrupt is complicated and risky. When the waiting is interrupted by a 
signal, the callers of this function need to handle this interrupt 
error. I traced the calling stack all the way back to function 
drm_mode_page_flip_ioctl. I could not find an issue. But I am not sure 
about even upper layer. Is this IOCTL restartable? How about further 
higher layer? Since I didn't see benefit in point 1. So I thought it was 
a good idea to change.

Anyway, it is up to you. A change might bring other unseen risk. If you 
still have concern, I will drop this patch. This IOCTL is used by 
graphic operation. There may not be a lot of signals to a GUI process 
which uses this IOCTL.

Thanks,
Alex Bin


On 17-04-26 04:34 AM, Christian König wrote:
> Am 26.04.2017 um 03:17 schrieb Michel Dänzer:
>> On 26/04/17 06:25 AM, Alex Xie wrote:
>>> 1. The wait is short. There is not much benefit by
>>> interruptible waiting.
>>> 2. In this function and caller functions, the error
>>> handling for such interrupt is complicated and risky.
>>>
>>> Change-Id: I289674ecd3f5ef20c93fe63e33df6d668b3c2edc
>>> Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>> index 43082bf..c006cc4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>> @@ -227,7 +227,7 @@ int amdgpu_crtc_prepare_flip(
>>>       new_abo = gem_to_amdgpu_bo(obj);
>>>         /* pin the new buffer */
>>> -    r = amdgpu_bo_reserve(new_abo, false);
>>> +    r = amdgpu_bo_reserve(new_abo, true);
>>>       if (unlikely(r != 0)) {
>>>           DRM_ERROR("failed to reserve new abo buffer before flip\n");
>>>           goto cleanup;
>>>
>> I'm afraid we have no choice but to use interruptible here, because this
>> code runs as part of an ioctl (DRM_IOCTL_MODE_PAGE_FLIP).
>
> Yes, agree. But the error message is incorrect here and should be 
> removed.
>
> Christian.
On 27/04/17 04:19 AM, Alex Xie wrote:
> Hi,
> 
> I knew this is part of ioctl. And I intended to fix this interruptible
> waiting in an ioctl.

It's by design, nothing that needs fixing in the case of an ioctl.


> 1. The wait is short. There is not much benefit by interruptible
> waiting.  The BO is a frame buffer BO. Are there many competitors to
> lock this BO? If not, the wait is very short. This is my main reason to
> change.

Irrelevant. If amdgpu_bo_reserve returns -EINTR, so will other functions
down the line, and amdgpu_crtc_page_flip_target will have to propagate
that anyway. So doing an uninterruptible wait will just delay the
inevitable and delivery of the signal to userspace.


> 2. In this function and caller functions, the error handling for such
> interrupt is complicated and risky.

If you're concerned about the cascade of cleanup functions, let's just
revert commit 0169b52ce575 ("drm/amdgpu: Refactor flip into prepare
submit and submit. (v2)"). amdgpu_crtc_prepare_flip was split out to be
used by DC, but DC is now using a DRM core atomic helper for flips and
no longer using amdgpu_crtc_prepare_flip.
> 1. The wait is short. There is not much benefit by interruptible 
> waiting.  The BO is a frame buffer BO. Are there many competitors to 
> lock this BO? If not, the wait is very short. This is my main reason 
> to change.
The problem is that all those waits can block the MM subsystem from 
reclaiming memory in an out of memory situation, e.g. when the process 
is terminated with a SIGKILL.

That's why the usual policy used in the drivers is to wait interruptible 
unless you absolutely need the wait uninterrupted (e.g. in a cleanup 
path for example).

> 2. In this function and caller functions, the error handling for such 
> interrupt is complicated and risky. When the waiting is interrupted by 
> a signal, the callers of this function need to handle this interrupt 
> error. I traced the calling stack all the way back to function 
> drm_mode_page_flip_ioctl. I could not find an issue. But I am not sure 
> about even upper layer. Is this IOCTL restartable? How about further 
> higher layer? Since I didn't see benefit in point 1. So I thought it 
> was a good idea to change.

The page flip IOCTL should be restartable. I think all drm IOCTLs with 
driver callbacks are restartable, but I'm not 100% sure, Michel probably 
knows that better.

There is even the CONFIG_DEBUG_WW_MUTEX_SLOWPATH option to throw random 
backoff cases into reserving the BO for testing. But I think that only 
applies to when multiple BOs are reserved at the same time.

Might be a good idea to create something similar for all interruptible 
lock acquires to test if they are really restartable. That will most 
likely yield quite a bunch of cases where the handling isn't correct.

Regards,
Christian.

Am 26.04.2017 um 21:19 schrieb Alex Xie:
> Hi,
>
> I knew this is part of ioctl. And I intended to fix this interruptible 
> waiting in an ioctl.
>
> ioctl is one of driver interfaces, where interruptible waiting is good 
> in some scenarios. For example, if ioctl performs IO operations in 
> atomic way, we don't want ioctl to be interrupted by a signal.
>
> Interruptible waiting is good when we need to wait for longer time, 
> for example, waiting for an input data for indefinite time. We don't 
> want the process not responding to signals. Interruptible waitings can 
> be good in read/write/ioctl interfaces.
>
> For this patch,
>
> 1. The wait is short. There is not much benefit by interruptible 
> waiting.  The BO is a frame buffer BO. Are there many competitors to 
> lock this BO? If not, the wait is very short. This is my main reason 
> to change.
> 2. In this function and caller functions, the error handling for such 
> interrupt is complicated and risky. When the waiting is interrupted by 
> a signal, the callers of this function need to handle this interrupt 
> error. I traced the calling stack all the way back to function 
> drm_mode_page_flip_ioctl. I could not find an issue. But I am not sure 
> about even upper layer. Is this IOCTL restartable? How about further 
> higher layer? Since I didn't see benefit in point 1. So I thought it 
> was a good idea to change.
>
> Anyway, it is up to you. A change might bring other unseen risk. If 
> you still have concern, I will drop this patch. This IOCTL is used by 
> graphic operation. There may not be a lot of signals to a GUI process 
> which uses this IOCTL.
>
> Thanks,
> Alex Bin
>
>
> On 17-04-26 04:34 AM, Christian König wrote:
>> Am 26.04.2017 um 03:17 schrieb Michel Dänzer:
>>> On 26/04/17 06:25 AM, Alex Xie wrote:
>>>> 1. The wait is short. There is not much benefit by
>>>> interruptible waiting.
>>>> 2. In this function and caller functions, the error
>>>> handling for such interrupt is complicated and risky.
>>>>
>>>> Change-Id: I289674ecd3f5ef20c93fe63e33df6d668b3c2edc
>>>> Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>>> index 43082bf..c006cc4 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
>>>> @@ -227,7 +227,7 @@ int amdgpu_crtc_prepare_flip(
>>>>       new_abo = gem_to_amdgpu_bo(obj);
>>>>         /* pin the new buffer */
>>>> -    r = amdgpu_bo_reserve(new_abo, false);
>>>> +    r = amdgpu_bo_reserve(new_abo, true);
>>>>       if (unlikely(r != 0)) {
>>>>           DRM_ERROR("failed to reserve new abo buffer before flip\n");
>>>>           goto cleanup;
>>>>
>>> I'm afraid we have no choice but to use interruptible here, because 
>>> this
>>> code runs as part of an ioctl (DRM_IOCTL_MODE_PAGE_FLIP).
>>
>> Yes, agree. But the error message is incorrect here and should be 
>> removed.
>>
>> Christian.
>
On 27/04/17 06:05 PM, Christian König wrote:
> 
>> 2. In this function and caller functions, the error handling for such
>> interrupt is complicated and risky. When the waiting is interrupted by
>> a signal, the callers of this function need to handle this interrupt
>> error. I traced the calling stack all the way back to function
>> drm_mode_page_flip_ioctl. I could not find an issue. But I am not sure
>> about even upper layer. Is this IOCTL restartable? How about further
>> higher layer? Since I didn't see benefit in point 1. So I thought it
>> was a good idea to change.
> 
> The page flip IOCTL should be restartable. I think all drm IOCTLs with
> driver callbacks are restartable, but I'm not 100% sure, Michel probably
> knows that better.

Yes, I'm pretty sure it is.
On 27/04/17 04:39 PM, Michel Dänzer wrote:
> On 27/04/17 04:19 AM, Alex Xie wrote:
>> Hi,
>>
>> I knew this is part of ioctl. And I intended to fix this interruptible
>> waiting in an ioctl.
> 
> It's by design, nothing that needs fixing in the case of an ioctl.
> 
> 
>> 1. The wait is short. There is not much benefit by interruptible
>> waiting.  The BO is a frame buffer BO. Are there many competitors to
>> lock this BO? If not, the wait is very short. This is my main reason to
>> change.
> 
> Irrelevant. If amdgpu_bo_reserve returns -EINTR, so will other functions
> down the line, and amdgpu_crtc_page_flip_target will have to propagate
> that anyway. So doing an uninterruptible wait will just delay the
> inevitable and delivery of the signal to userspace.

To be clear, I meant -ERESTARTSYS instead of -EINTR.


>> 2. In this function and caller functions, the error handling for such
>> interrupt is complicated and risky.
> 
> If you're concerned about the cascade of cleanup functions, let's just
> revert commit 0169b52ce575 ("drm/amdgpu: Refactor flip into prepare
> submit and submit. (v2)"). amdgpu_crtc_prepare_flip was split out to be
> used by DC, but DC is now using a DRM core atomic helper for flips and
> no longer using amdgpu_crtc_prepare_flip.
> 
>