amdgpu: fix multi-process hang issue

Submitted by Deng, Emily on Aug. 22, 2018, 12:07 p.m.

Details

Message ID 1534939678-11489-1-git-send-email-Emily.Deng@amd.com
State New
Headers show
Series "amdgpu: fix multi-process hang issue" ( rev: 3 ) in AMD X.Org drivers

Not browsing as part of any series.

Commit Message

Deng, Emily Aug. 22, 2018, 12:07 p.m.
SWDEV-146499: hang during multi vulkan process testing

cause:
the second frame's PREAMBLE_IB have clear-state
and LOAD actions, those actions ruin the pipeline
that is still doing process in the previous frame's
work-load IB.

fix:
need insert pipeline sync if have context switch for
SRIOV (because only SRIOV will report PREEMPTION flag
to UMD)

Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 5c22cfd..47817e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -165,8 +165,10 @@  int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 		return r;
 	}
 
+	need_ctx_switch = ring->current_ctx != fence_ctx;
 	if (ring->funcs->emit_pipeline_sync && job &&
 	    ((tmp = amdgpu_sync_get_fence(&job->sched_sync, NULL)) ||
+	     (amdgpu_sriov_vf(adev) && need_ctx_switch) ||
 	     amdgpu_vm_need_pipeline_sync(ring, job))) {
 		need_pipe_sync = true;
 
@@ -201,7 +203,6 @@  int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 	}
 
 	skip_preamble = ring->current_ctx == fence_ctx;
-	need_ctx_switch = ring->current_ctx != fence_ctx;
 	if (job && ring->funcs->emit_cntxcntl) {
 		if (need_ctx_switch)
 			status |= AMDGPU_HAVE_CTX_SWITCH;

Comments

Am 22.08.2018 um 14:07 schrieb Emily Deng:
> SWDEV-146499: hang during multi vulkan process testing
>
> cause:
> the second frame's PREAMBLE_IB have clear-state
> and LOAD actions, those actions ruin the pipeline
> that is still doing process in the previous frame's
> work-load IB.
>
> fix:
> need insert pipeline sync if have context switch for
> SRIOV (because only SRIOV will report PREEMPTION flag
> to UMD)
>
> Signed-off-by: Monk Liu <Monk.Liu@amd.com>
> Signed-off-by: Emily Deng <Emily.Deng@amd.com>

Much better, patch is Reviewed-by: Christian König 
<christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 5c22cfd..47817e0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -165,8 +165,10 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>   		return r;
>   	}
>   
> +	need_ctx_switch = ring->current_ctx != fence_ctx;
>   	if (ring->funcs->emit_pipeline_sync && job &&
>   	    ((tmp = amdgpu_sync_get_fence(&job->sched_sync, NULL)) ||
> +	     (amdgpu_sriov_vf(adev) && need_ctx_switch) ||
>   	     amdgpu_vm_need_pipeline_sync(ring, job))) {
>   		need_pipe_sync = true;
>   
> @@ -201,7 +203,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>   	}
>   
>   	skip_preamble = ring->current_ctx == fence_ctx;
> -	need_ctx_switch = ring->current_ctx != fence_ctx;
>   	if (job && ring->funcs->emit_cntxcntl) {
>   		if (need_ctx_switch)
>   			status |= AMDGPU_HAVE_CTX_SWITCH;