[2/2] drm/amdgpu: prevent command submission failures under memory pressure

Submitted by Christian König on Sept. 1, 2016, 2:16 p.m.

Details

Message ID 1472739370-6579-2-git-send-email-deathsimple@vodafone.de
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Christian König Sept. 1, 2016, 2:16 p.m.
From: Christian König <christian.koenig@amd.com>

As last resort try to evict BOs from the current working set into other
memory domains. This effectively prevents command submission failures when
VM page tables have been swapped out.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 65 +++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 03ba035..ee2eeaa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1225,6 +1225,7 @@  struct amdgpu_cs_parser {
 	struct fence			*fence;
 	uint64_t			bytes_moved_threshold;
 	uint64_t			bytes_moved;
+	struct amdgpu_bo_list_entry	*evictable;
 
 	/* user fence */
 	struct amdgpu_bo_list_entry	uf_entry;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 23964b8..09adc75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -378,6 +378,60 @@  retry:
 	}
 
 	return r;
+
+	return -ENOMEM;
+}
+
+/* Last resort try to evict something from the current working set */
+static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser *p,
+				struct amdgpu_bo_list_entry *lobj)
+{
+	uint32_t domain = lobj->robj->allowed_domains;
+	int r;
+
+	if (!p->evictable)
+		return false;
+
+	for (;&p->evictable->tv.head != &p->validated;
+	     p->evictable = list_prev_entry(p->evictable, tv.head)) {
+
+		struct amdgpu_bo_list_entry *candidate = p->evictable;
+		struct amdgpu_bo *bo = candidate->robj;
+		u64 initial_bytes_moved;
+		uint32_t other;
+
+		/* If we reached our current BO we can forget it */
+		if (candidate == lobj)
+			break;
+
+		other = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
+
+		/* Check if this BO is in one of the domains we need space for */
+		if (!(other & domain))
+			continue;
+
+		/* Check if we can move this BO somewhere else */
+		other = bo->allowed_domains & ~domain;
+		if (!other)
+			continue;
+
+		/* Good we can try to move this BO somewhere else */
+		amdgpu_ttm_placement_from_domain(bo, other);
+		initial_bytes_moved = atomic64_read(&bo->adev->num_bytes_moved);
+		r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false);
+		p->bytes_moved += atomic64_read(&bo->adev->num_bytes_moved) -
+			initial_bytes_moved;
+
+		if (unlikely(r))
+			break;
+
+		p->evictable = list_prev_entry(p->evictable, tv.head);
+		list_move(&candidate->tv.head, &p->validated);
+
+		return true;
+	}
+
+	return false;
 }
 
 int amdgpu_cs_list_validate(struct amdgpu_cs_parser *p,
@@ -404,9 +458,15 @@  int amdgpu_cs_list_validate(struct amdgpu_cs_parser *p,
 			binding_userptr = true;
 		}
 
-		r = amdgpu_cs_bo_validate(p, bo);
+		if (p->evictable == lobj)
+			p->evictable = NULL;
+
+		do {
+			r = amdgpu_cs_bo_validate(p, bo);
+		} while (r == -ENOMEM && amdgpu_cs_try_evict(p, lobj));
 		if (r)
 			return r;
+
 		if (bo->shadow) {
 			r = amdgpu_cs_bo_validate(p, bo);
 			if (r)
@@ -534,6 +594,9 @@  static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 
 	p->bytes_moved_threshold = amdgpu_cs_get_threshold_for_moves(p->adev);
 	p->bytes_moved = 0;
+	p->evictable = list_last_entry(&p->validated,
+				       struct amdgpu_bo_list_entry,
+				       tv.head);
 
 	r = amdgpu_cs_list_validate(p, &duplicates);
 	if (r) {

Comments

> -----Original Message-----

> From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf

> Of Christian König

> Sent: Thursday, September 01, 2016 10:16 AM

> To: amd-gfx@lists.freedesktop.org

> Subject: [PATCH 2/2] drm/amdgpu: prevent command submission failures

> under memory pressure

> 

> From: Christian König <christian.koenig@amd.com>

> 

> As last resort try to evict BOs from the current working set into other

> memory domains. This effectively prevents command submission failures

> when

> VM page tables have been swapped out.

> 

> Signed-off-by: Christian König <christian.koenig@amd.com>

> ---

>  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  1 +

>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 65

> +++++++++++++++++++++++++++++++++-

>  2 files changed, 65 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> index 03ba035..ee2eeaa 100644

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

> @@ -1225,6 +1225,7 @@ struct amdgpu_cs_parser {

>  	struct fence			*fence;

>  	uint64_t			bytes_moved_threshold;

>  	uint64_t			bytes_moved;

> +	struct amdgpu_bo_list_entry	*evictable;

> 

>  	/* user fence */

>  	struct amdgpu_bo_list_entry	uf_entry;

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

> index 23964b8..09adc75 100644

> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

> @@ -378,6 +378,60 @@ retry:

>  	}

> 

>  	return r;

> +

> +	return -ENOMEM;

> +}


Unreachable code.  Other than that looks good.  With that fixed:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>


Alex

> +

> +/* Last resort try to evict something from the current working set */

> +static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser *p,

> +				struct amdgpu_bo_list_entry *lobj)

> +{

> +	uint32_t domain = lobj->robj->allowed_domains;

> +	int r;

> +

> +	if (!p->evictable)

> +		return false;

> +

> +	for (;&p->evictable->tv.head != &p->validated;

> +	     p->evictable = list_prev_entry(p->evictable, tv.head)) {

> +

> +		struct amdgpu_bo_list_entry *candidate = p->evictable;

> +		struct amdgpu_bo *bo = candidate->robj;

> +		u64 initial_bytes_moved;

> +		uint32_t other;

> +

> +		/* If we reached our current BO we can forget it */

> +		if (candidate == lobj)

> +			break;

> +

> +		other = amdgpu_mem_type_to_domain(bo-

> >tbo.mem.mem_type);

> +

> +		/* Check if this BO is in one of the domains we need space

> for */

> +		if (!(other & domain))

> +			continue;

> +

> +		/* Check if we can move this BO somewhere else */

> +		other = bo->allowed_domains & ~domain;

> +		if (!other)

> +			continue;

> +

> +		/* Good we can try to move this BO somewhere else */

> +		amdgpu_ttm_placement_from_domain(bo, other);

> +		initial_bytes_moved = atomic64_read(&bo->adev-

> >num_bytes_moved);

> +		r = ttm_bo_validate(&bo->tbo, &bo->placement, true,

> false);

> +		p->bytes_moved += atomic64_read(&bo->adev-

> >num_bytes_moved) -

> +			initial_bytes_moved;

> +

> +		if (unlikely(r))

> +			break;

> +

> +		p->evictable = list_prev_entry(p->evictable, tv.head);

> +		list_move(&candidate->tv.head, &p->validated);

> +

> +		return true;

> +	}

> +

> +	return false;

>  }

> 

>  int amdgpu_cs_list_validate(struct amdgpu_cs_parser *p,

> @@ -404,9 +458,15 @@ int amdgpu_cs_list_validate(struct

> amdgpu_cs_parser *p,

>  			binding_userptr = true;

>  		}

> 

> -		r = amdgpu_cs_bo_validate(p, bo);

> +		if (p->evictable == lobj)

> +			p->evictable = NULL;

> +

> +		do {

> +			r = amdgpu_cs_bo_validate(p, bo);

> +		} while (r == -ENOMEM && amdgpu_cs_try_evict(p, lobj));

>  		if (r)

>  			return r;

> +

>  		if (bo->shadow) {

>  			r = amdgpu_cs_bo_validate(p, bo);

>  			if (r)

> @@ -534,6 +594,9 @@ static int amdgpu_cs_parser_bos(struct

> amdgpu_cs_parser *p,

> 

>  	p->bytes_moved_threshold =

> amdgpu_cs_get_threshold_for_moves(p->adev);

>  	p->bytes_moved = 0;

> +	p->evictable = list_last_entry(&p->validated,

> +				       struct amdgpu_bo_list_entry,

> +				       tv.head);

> 

>  	r = amdgpu_cs_list_validate(p, &duplicates);

>  	if (r) {

> --

> 2.5.0

> 

> _______________________________________________

> amd-gfx mailing list

> amd-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Am 01.09.2016 um 16:33 schrieb Deucher, Alexander:
>> >return r; > + > + return -ENOMEM; > +} 
> Unreachable code. Other than that looks good.

Ups, indeed just a merge error.

> With that fixed: Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Thanks for the review, I've just pushed the resulting patches to 
amd-staging-4.6.

Marek you might want to try and raise the limits now how much VRAM can 
be used by a single command submission. If I remember correctly that was 
rather conservatively.

Regards,
Christian.
On Fri, Sep 2, 2016 at 9:38 AM, Christian König <deathsimple@vodafone.de> wrote:
> Am 01.09.2016 um 16:33 schrieb Deucher, Alexander:
>
>> return r; > + > + return -ENOMEM; > +}
>
> Unreachable code. Other than that looks good.
>
>
> Ups, indeed just a merge error.
>
> With that fixed: Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>
> Thanks for the review, I've just pushed the resulting patches to
> amd-staging-4.6.
>
> Marek you might want to try and raise the limits now how much VRAM can be
> used by a single command submission. If I remember correctly that was rather
> conservatively.

I would beg to differ. Just to simplify the calculation, let's assume
there are not allocations in GTT (in practice there are very few, so
it's very close to the reality). If you take the memory used by all
VRAM buffers in CS, the limit for that is "VRAM size + (GTT size *
0.7)".

If you have 4GB VRAM and 4GB GTT, Mesa allows per-CS VRAM usage to be 6.8 GB.

If you have any GTT buffers there, you just subtract from that. For
example, if you have 100MB GTT usage, the VRAM limit is 6.7 GB.

Does that sound conservative to you?

Code:
https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/radeon/r600_cs.h#n36

Marek
Am 02.09.2016 um 12:33 schrieb Marek Olšák:
> On Fri, Sep 2, 2016 at 9:38 AM, Christian König <deathsimple@vodafone.de> wrote:
>> Am 01.09.2016 um 16:33 schrieb Deucher, Alexander:
>>
>>> return r; > + > + return -ENOMEM; > +}
>> Unreachable code. Other than that looks good.
>>
>>
>> Ups, indeed just a merge error.
>>
>> With that fixed: Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>
>> Thanks for the review, I've just pushed the resulting patches to
>> amd-staging-4.6.
>>
>> Marek you might want to try and raise the limits now how much VRAM can be
>> used by a single command submission. If I remember correctly that was rather
>> conservatively.
> I would beg to differ. Just to simplify the calculation, let's assume
> there are not allocations in GTT (in practice there are very few, so
> it's very close to the reality). If you take the memory used by all
> VRAM buffers in CS, the limit for that is "VRAM size + (GTT size *
> 0.7)".
>
> If you have 4GB VRAM and 4GB GTT, Mesa allows per-CS VRAM usage to be 6.8 GB.
>
> If you have any GTT buffers there, you just subtract from that. For
> example, if you have 100MB GTT usage, the VRAM limit is 6.7 GB.
>
> Does that sound conservative to you?

Not at all :)

But I clearly remember that you noted that you can't allocate more than 
75% of VRAM in one CS without running into problems sometimes. Is that 
already fixed?

When I limit VRAM to 256MB on my Tonga Valley ran into a whole bunch of 
CS failures because the VM page tables couldn't be swapped in again. 
With that patch nearly all of them are fixed now.

Christian.

>
> Code:
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/radeon/r600_cs.h#n36
>
> Marek
On Fri, Sep 2, 2016 at 12:56 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 02.09.2016 um 12:33 schrieb Marek Olšák:
>>
>> On Fri, Sep 2, 2016 at 9:38 AM, Christian König <deathsimple@vodafone.de>
>> wrote:
>>>
>>> Am 01.09.2016 um 16:33 schrieb Deucher, Alexander:
>>>
>>>> return r; > + > + return -ENOMEM; > +}
>>>
>>> Unreachable code. Other than that looks good.
>>>
>>>
>>> Ups, indeed just a merge error.
>>>
>>> With that fixed: Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
>>>
>>> Thanks for the review, I've just pushed the resulting patches to
>>> amd-staging-4.6.
>>>
>>> Marek you might want to try and raise the limits now how much VRAM can be
>>> used by a single command submission. If I remember correctly that was
>>> rather
>>> conservatively.
>>
>> I would beg to differ. Just to simplify the calculation, let's assume
>> there are not allocations in GTT (in practice there are very few, so
>> it's very close to the reality). If you take the memory used by all
>> VRAM buffers in CS, the limit for that is "VRAM size + (GTT size *
>> 0.7)".
>>
>> If you have 4GB VRAM and 4GB GTT, Mesa allows per-CS VRAM usage to be 6.8
>> GB.
>>
>> If you have any GTT buffers there, you just subtract from that. For
>> example, if you have 100MB GTT usage, the VRAM limit is 6.7 GB.
>>
>> Does that sound conservative to you?
>
>
> Not at all :)
>
> But I clearly remember that you noted that you can't allocate more than 75%
> of VRAM in one CS without running into problems sometimes. Is that already
> fixed?

No, I can allocate all of VRAM (actually Mesa can't allocate VRAM
"physically", only the CS ioctl can reserve it for its time window).

The problem I saw was the following. GEM_CREATE or the CS ioctl (I
don't remember which one) had a great chance of failing if 1 buffer
has the size of 50% of VRAM. You can have thousands of tiny buffers
being able to use close to 100% of VRAM, but you can't have 1 buffer
whose size is only 50%. After that discovery, we speculated that the
number of pinned buffers (=CRTCs) may have had something to do with
that.

Marek