drm/amdgpu: fix amdgpu_vm_handle_moved as well

Submitted by Christian König on Sept. 11, 2017, 10:58 a.m.

Details

Message ID 1505127498-21621-1-git-send-email-deathsimple@vodafone.de
State New
Headers show
Series "drm/amdgpu: fix amdgpu_vm_handle_moved as well" ( rev: 1 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Christian König Sept. 11, 2017, 10:58 a.m.
From: Christian König <christian.koenig@amd.com>

There is no guarantee that the last BO_VA actually needed an update.

Additional to that all command submissions must wait for moved BOs to
be cleared, not just the first one.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +--
 3 files changed, 8 insertions(+), 9 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 4681dcc..b59749d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -805,7 +805,7 @@  static int amdgpu_bo_vm_update_pte(struct amdgpu_cs_parser *p)
 
 	}
 
-	r = amdgpu_vm_handle_moved(adev, vm, &p->job->sync);
+	r = amdgpu_vm_handle_moved(adev, vm);
 	if (r)
 		return r;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5042f09..ae2a163 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1997,7 +1997,6 @@  int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
  *
  * @adev: amdgpu_device pointer
  * @vm: requested vm
- * @sync: sync object to add fences to
  *
  * Make sure all BOs which are moved are updated in the PTs.
  * Returns 0 for success.
@@ -2005,8 +2004,7 @@  int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
  * PTs have to be reserved!
  */
 int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
-			   struct amdgpu_vm *vm,
-			   struct amdgpu_sync *sync)
+			   struct amdgpu_vm *vm)
 {
 	struct amdgpu_bo_va *bo_va = NULL;
 	bool clear;
@@ -2025,13 +2023,15 @@  int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
 		if (r)
 			return r;
 
+		if (bo_va->base.bo->tbo.resv != vm->root.base.bo->tbo.resv) {
+			dma_fence_put(vm->last_update);
+			vm->last_update = dma_fence_get(bo_va->last_pt_update);
+		}
+
 		spin_lock(&vm->status_lock);
 	}
 	spin_unlock(&vm->status_lock);
 
-	if (bo_va)
-		r = amdgpu_sync_fence(adev, sync, bo_va->last_pt_update);
-
 	return r;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index cb6a622..48c58ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -250,8 +250,7 @@  int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
 			  struct amdgpu_vm *vm,
 			  struct dma_fence **fence);
 int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
-			   struct amdgpu_vm *vm,
-			   struct amdgpu_sync *sync);
+			   struct amdgpu_vm *vm);
 int amdgpu_vm_bo_update(struct amdgpu_device *adev,
 			struct amdgpu_bo_va *bo_va,
 			bool clear);

Comments

On 2017年09月11日 18:58, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
>
> There is no guarantee that the last BO_VA actually needed an update.
Good catch. One comment inline
>
> Additional to that all command submissions must wait for moved BOs to
> be cleared, not just the first one.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  3 +--
>   3 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 4681dcc..b59749d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -805,7 +805,7 @@ static int amdgpu_bo_vm_update_pte(struct amdgpu_cs_parser *p)
>   
>   	}
>   
> -	r = amdgpu_vm_handle_moved(adev, vm, &p->job->sync);
> +	r = amdgpu_vm_handle_moved(adev, vm);
>   	if (r)
>   		return r;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5042f09..ae2a163 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -1997,7 +1997,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>    *
>    * @adev: amdgpu_device pointer
>    * @vm: requested vm
> - * @sync: sync object to add fences to
>    *
>    * Make sure all BOs which are moved are updated in the PTs.
>    * Returns 0 for success.
> @@ -2005,8 +2004,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>    * PTs have to be reserved!
>    */
>   int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
> -			   struct amdgpu_vm *vm,
> -			   struct amdgpu_sync *sync)
> +			   struct amdgpu_vm *vm)
>   {
>   	struct amdgpu_bo_va *bo_va = NULL;
>   	bool clear;
> @@ -2025,13 +2023,15 @@ int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
>   		if (r)
>   			return r;
>   
> +		if (bo_va->base.bo->tbo.resv != vm->root.base.bo->tbo.resv) {
When we expand mapping fence, we will sync all moved update and clear 
here, instead of moved update in amdgpu_vm_bo_update in previous patch.
Anyway, this patch is Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
> +			dma_fence_put(vm->last_update);
> +			vm->last_update = dma_fence_get(bo_va->last_pt_update);
> +		}
> +
>   		spin_lock(&vm->status_lock);
>   	}
>   	spin_unlock(&vm->status_lock);
>   
> -	if (bo_va)
> -		r = amdgpu_sync_fence(adev, sync, bo_va->last_pt_update);
> -
>   	return r;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index cb6a622..48c58ae 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -250,8 +250,7 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
>   			  struct amdgpu_vm *vm,
>   			  struct dma_fence **fence);
>   int amdgpu_vm_handle_moved(struct amdgpu_device *adev,
> -			   struct amdgpu_vm *vm,
> -			   struct amdgpu_sync *sync);
> +			   struct amdgpu_vm *vm);
>   int amdgpu_vm_bo_update(struct amdgpu_device *adev,
>   			struct amdgpu_bo_va *bo_va,
>   			bool clear);
>>   +        if (bo_va->base.bo->tbo.resv != vm->root.base.bo->tbo.resv) {
> When we expand mapping fence, we will sync all moved update and clear 
> here, instead of moved update in amdgpu_vm_bo_update in previous patch.

Yeah, turned out this patch actually didn't worked as expected because 
the bo_va->last_pt_update fence could be stale and old and replaces the 
new and fresh fence in vm->last_update resulting in VM faults.

I've send a V2 of that patch yesterday which fixes this (and is quite a 
bit cleaner in general).

Please take a look at that one instead.

Thanks,
Christian.