drm/amdgpu: fix the memory corruption on S3

Submitted by Huang, Ray on June 29, 2017, 7:03 a.m.

Details

Message ID 1498719798-8435-1-git-send-email-ray.huang@amd.com
State New
Headers show
Series "drm/amdgpu: fix the memory corruption on S3" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Huang, Ray June 29, 2017, 7:03 a.m.
psp->cmd will be used on resume phase, so we can not free it on hw_init.
Otherwise, a memory corruption will be triggered.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---

Alex, Christian,

This is the final fix for vega10 S3. The random memory corruption issue is root
caused.

Thanks,
Ray

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 5041073..fcdd542 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -372,8 +372,6 @@  static int psp_load_fw(struct amdgpu_device *adev)
 	if (ret)
 		goto failed_mem;
 
-	kfree(cmd);
-
 	return 0;
 
 failed_mem:
@@ -384,6 +382,7 @@  static int psp_load_fw(struct amdgpu_device *adev)
 			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
 failed:
 	kfree(cmd);
+	cmd = NULL;
 	return ret;
 }
 
@@ -443,6 +442,11 @@  static int psp_hw_fini(void *handle)
 		amdgpu_bo_free_kernel(&psp->fence_buf_bo,
 				      &psp->fence_buf_mc_addr, &psp->fence_buf);
 
+	if (!psp->cmd) {
+		kfree(psp->cmd);
+		psp->cmd = NULL;
+	}
+
 	return 0;
 }
 

Comments

On 29/06/17 04:03 PM, Huang Rui wrote:
> psp->cmd will be used on resume phase, so we can not free it on hw_init.
> Otherwise, a memory corruption will be triggered.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
> 
> Alex, Christian,
> 
> This is the final fix for vega10 S3. The random memory corruption issue is root
> caused.
> 
> Thanks,
> Ray
> 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 5041073..fcdd542 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -372,8 +372,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
>  	if (ret)
>  		goto failed_mem;
>  
> -	kfree(cmd);
> -
>  	return 0;

This looks like a good catch.


> @@ -384,6 +382,7 @@ static int psp_load_fw(struct amdgpu_device *adev)
>  			      &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
>  failed:
>  	kfree(cmd);
> +	cmd = NULL;

This should probably be

	psp->cmd = NULL;

instead?


> @@ -443,6 +442,11 @@ static int psp_hw_fini(void *handle)
>  		amdgpu_bo_free_kernel(&psp->fence_buf_bo,
>  				      &psp->fence_buf_mc_addr, &psp->fence_buf);
>  
> +	if (!psp->cmd) {
> +		kfree(psp->cmd);
> +		psp->cmd = NULL;
> +	}

This should probably be

	if (psp->cmd) {

? If so, you could skip the if altogether, since kfree(NULL) is safe.
On Thu, Jun 29, 2017 at 03:34:57PM +0800, Michel Dänzer wrote:
> On 29/06/17 04:03 PM, Huang Rui wrote:
> > psp->cmd will be used on resume phase, so we can not free it on hw_init.
> > Otherwise, a memory corruption will be triggered.
> >
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >
> > Alex, Christian,
> >
> > This is the final fix for vega10 S3. The random memory corruption issue is
> root
> > caused.
> >
> > Thanks,
> > Ray
> >
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/
> amdgpu/amdgpu_psp.c
> > index 5041073..fcdd542 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -372,8 +372,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >        if (ret)
> >                goto failed_mem;
> > 
> > -     kfree(cmd);
> > -
> >        return 0;
> 
> This looks like a good catch.
> 
> 
> > @@ -384,6 +382,7 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >                              &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> >  failed:
> >        kfree(cmd);
> > +     cmd = NULL;
> 
> This should probably be
> 
>         psp->cmd = NULL;
> 
> instead?
> 

Actually, we set psp->cmd = cmd before.

But anyway, we needn't "cmd" member any more.

> 
> > @@ -443,6 +442,11 @@ static int psp_hw_fini(void *handle)
> >                amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> >                                      &psp->fence_buf_mc_addr, &psp->
> fence_buf);
> > 
> > +     if (!psp->cmd) {
> > +             kfree(psp->cmd);
> > +             psp->cmd = NULL;
> > +     }
> 
> This should probably be
> 
>         if (psp->cmd) {
> 
> ? If so, you could skip the if altogether, since kfree(NULL) is safe.

Nice catch. My typo. ;-)

You're right. Will update it in V2.

Thanks,
Rui
Indeed a nice catch.

Just skimming over the PSP code, wouldn't it be simpler to temporary 
allocate the cmd buffer in psp_np_fw_load()?

Doesn't looks like that is used outside that function. Might even be 
possible to just allocate the buffer on the stack.

Regards,
Christian.

Am 29.06.2017 um 09:59 schrieb Huang Rui:
> On Thu, Jun 29, 2017 at 03:34:57PM +0800, Michel Dänzer wrote:
>> On 29/06/17 04:03 PM, Huang Rui wrote:
>>> psp->cmd will be used on resume phase, so we can not free it on hw_init.
>>> Otherwise, a memory corruption will be triggered.
>>>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> ---
>>>
>>> Alex, Christian,
>>>
>>> This is the final fix for vega10 S3. The random memory corruption issue is
>> root
>>> caused.
>>>
>>> Thanks,
>>> Ray
>>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/
>> amdgpu/amdgpu_psp.c
>>> index 5041073..fcdd542 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> @@ -372,8 +372,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
>>>         if (ret)
>>>                 goto failed_mem;
>>>
>>> -     kfree(cmd);
>>> -
>>>         return 0;
>> This looks like a good catch.
>>
>>
>>> @@ -384,6 +382,7 @@ static int psp_load_fw(struct amdgpu_device *adev)
>>>                               &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
>>>   failed:
>>>         kfree(cmd);
>>> +     cmd = NULL;
>> This should probably be
>>
>>          psp->cmd = NULL;
>>
>> instead?
>>
> Actually, we set psp->cmd = cmd before.
>
> But anyway, we needn't "cmd" member any more.
>
>>> @@ -443,6 +442,11 @@ static int psp_hw_fini(void *handle)
>>>                 amdgpu_bo_free_kernel(&psp->fence_buf_bo,
>>>                                       &psp->fence_buf_mc_addr, &psp->
>> fence_buf);
>>> +     if (!psp->cmd) {
>>> +             kfree(psp->cmd);
>>> +             psp->cmd = NULL;
>>> +     }
>> This should probably be
>>
>>          if (psp->cmd) {
>>
>> ? If so, you could skip the if altogether, since kfree(NULL) is safe.
> Nice catch. My typo. ;-)
>
> You're right. Will update it in V2.
>
> Thanks,
> Rui
On 29/06/17 04:59 PM, Huang Rui wrote:
> On Thu, Jun 29, 2017 at 03:34:57PM +0800, Michel Dänzer wrote:
>> On 29/06/17 04:03 PM, Huang Rui wrote:
>>> psp->cmd will be used on resume phase, so we can not free it on hw_init.
>>> Otherwise, a memory corruption will be triggered.
>>>
>>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>>> ---
>>>
>>> Alex, Christian,
>>>
>>> This is the final fix for vega10 S3. The random memory corruption issue is
>> root
>>> caused.
>>>
>>> Thanks,
>>> Ray
>>>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/
>> amdgpu/amdgpu_psp.c
>>> index 5041073..fcdd542 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> @@ -372,8 +372,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
>>>        if (ret)
>>>                goto failed_mem;
>>>
>>> -     kfree(cmd);
>>> -
>>>        return 0;
>>
>> This looks like a good catch.
>>
>>
>>> @@ -384,6 +382,7 @@ static int psp_load_fw(struct amdgpu_device *adev)
>>>                              &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
>>>  failed:
>>>        kfree(cmd);
>>> +     cmd = NULL;
>>
>> This should probably be
>>
>>         psp->cmd = NULL;
>>
>> instead?
>>
> 
> Actually, we set psp->cmd = cmd before.
> 
> But anyway, we needn't "cmd" member any more.

You should probably still set psp->cmd = NULL here, otherwise psp->cmd
still contains the pointer to the memory that is freed here, which could
result in use-after-free somewhere else.
On Thu, Jun 29, 2017 at 04:07:33PM +0800, Michel Dänzer wrote:
> On 29/06/17 04:59 PM, Huang Rui wrote:
> > On Thu, Jun 29, 2017 at 03:34:57PM +0800, Michel Dänzer wrote:
> >> On 29/06/17 04:03 PM, Huang Rui wrote:
> >>> psp->cmd will be used on resume phase, so we can not free it on hw_init.
> >>> Otherwise, a memory corruption will be triggered.
> >>>
> >>> Signed-off-by: Huang Rui <ray.huang@amd.com>
> >>> ---
> >>>
> >>> Alex, Christian,
> >>>
> >>> This is the final fix for vega10 S3. The random memory corruption issue is
> >> root
> >>> caused.
> >>>
> >>> Thanks,
> >>> Ray
> >>>
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++++++--
> >>>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/
> >> amdgpu/amdgpu_psp.c
> >>> index 5041073..fcdd542 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> >>> @@ -372,8 +372,6 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >>>        if (ret)
> >>>                goto failed_mem;
> >>>
> >>> -     kfree(cmd);
> >>> -
> >>>        return 0;
> >>
> >> This looks like a good catch.
> >>
> >>
> >>> @@ -384,6 +382,7 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >>>                              &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> >>>  failed:
> >>>        kfree(cmd);
> >>> +     cmd = NULL;
> >>
> >> This should probably be
> >>
> >>         psp->cmd = NULL;
> >>
> >> instead?
> >>
> >
> > Actually, we set psp->cmd = cmd before.
> >
> > But anyway, we needn't "cmd" member any more.
> 
> You should probably still set psp->cmd = NULL here, otherwise psp->cmd
> still contains the pointer to the memory that is freed here, which could
> result in use-after-free somewhere else.
> 

Right, I already found it and update it in V2, please take a look.

Thanks,
Ray