[v2] drm/i915/gvt: Clear d3_entered on elsp cmd submission.

Submitted by Colin Xu on July 7, 2021, 12:45 a.m.

Details

Message ID 20210707004531.4873-1-colin.xu@intel.com
State Accepted
Commit c90b4503ccf42d9d367e843c223df44aa550e82a
Headers show
Series "drm/i915/gvt: Clear d3_entered on elsp cmd submission." ( rev: 2 ) in Intel GVT devel

Not browsing as part of any series.

Commit Message

Colin Xu July 7, 2021, 12:45 a.m.
d3_entered flag is used to mark for vgpu_reset a previous power
transition from D3->D0, typically for VM resume from S3, so that gvt
could skip PPGTT invalidation in current vgpu_reset during resuming.

In case S0ix exit, although there is D3->D0, guest driver continue to
use vgpu as normal, with d3_entered set, until next shutdown/reboot or
power transition.

If a reboot follows a S0ix exit, device power state transite as:
D0->D3->D0->D0(reboot), while system power state transites as:
S0->S0 (reboot). There is no vgpu_reset until D0(reboot), thus
d3_entered won't be cleared, the vgpu_reset will skip PPGTT invalidation
however those PPGTT entries are no longer valid. Err appears like:

gvt: vgpu 2: vfio_pin_pages failed for gfn 0xxxxx, ret -22
gvt: vgpu 2: fail: spt xxxx guest entry 0xxxxx type 2
gvt: vgpu 2: fail: shadow page xxxx guest entry 0xxxxx type 2.

Give gvt a chance to clear d3_entered on elsp cmd submission so that the
states before & after S0ix enter/exit are consistent.

Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")

v2: Add inline comment.

Signed-off-by: Colin Xu <colin.xu@intel.com>
---
 drivers/gpu/drm/i915/gvt/handlers.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index 98eb48c24c46..06024d321a1a 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -1977,6 +1977,21 @@  static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
 	if (drm_WARN_ON(&i915->drm, !engine))
 		return -EINVAL;
 
+	/*
+	 * Due to d3_entered is used to indicate skipping PPGTT invalidation on
+	 * vGPU reset, it's set on D0->D3 on PCI config write, and cleared after
+	 * vGPU reset if in resuming.
+	 * In S0ix exit, the device power state also transite from D3 to D0 as
+	 * S3 resume, but no vGPU reset (triggered by QEMU devic model). After
+	 * S0ix exit, all engines continue to work. However the d3_entered
+	 * remains set which will break next vGPU reset logic (miss the expected
+	 * PPGTT invalidation).
+	 * Engines can only work in D0. Thus the 1st elsp write gives GVT a
+	 * chance to clear d3_entered.
+	 */
+	if (vgpu->d3_entered)
+		vgpu->d3_entered = false;
+
 	execlist = &vgpu->submission.execlist[engine->id];
 
 	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;

Comments

On 2021.07.07 08:45:31 +0800, Colin Xu wrote:
> d3_entered flag is used to mark for vgpu_reset a previous power
> transition from D3->D0, typically for VM resume from S3, so that gvt
> could skip PPGTT invalidation in current vgpu_reset during resuming.
> 
> In case S0ix exit, although there is D3->D0, guest driver continue to
> use vgpu as normal, with d3_entered set, until next shutdown/reboot or
> power transition.
> 
> If a reboot follows a S0ix exit, device power state transite as:
> D0->D3->D0->D0(reboot), while system power state transites as:
> S0->S0 (reboot). There is no vgpu_reset until D0(reboot), thus
> d3_entered won't be cleared, the vgpu_reset will skip PPGTT invalidation
> however those PPGTT entries are no longer valid. Err appears like:
> 
> gvt: vgpu 2: vfio_pin_pages failed for gfn 0xxxxx, ret -22
> gvt: vgpu 2: fail: spt xxxx guest entry 0xxxxx type 2
> gvt: vgpu 2: fail: shadow page xxxx guest entry 0xxxxx type 2.
> 
> Give gvt a chance to clear d3_entered on elsp cmd submission so that the
> states before & after S0ix enter/exit are consistent.
> 
> Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
> 
> v2: Add inline comment.
> 
> Signed-off-by: Colin Xu <colin.xu@intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/handlers.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
> index 98eb48c24c46..06024d321a1a 100644
> --- a/drivers/gpu/drm/i915/gvt/handlers.c
> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> @@ -1977,6 +1977,21 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
>  	if (drm_WARN_ON(&i915->drm, !engine))
>  		return -EINVAL;
>  
> +	/*
> +	 * Due to d3_entered is used to indicate skipping PPGTT invalidation on
> +	 * vGPU reset, it's set on D0->D3 on PCI config write, and cleared after
> +	 * vGPU reset if in resuming.
> +	 * In S0ix exit, the device power state also transite from D3 to D0 as
> +	 * S3 resume, but no vGPU reset (triggered by QEMU devic model). After
> +	 * S0ix exit, all engines continue to work. However the d3_entered
> +	 * remains set which will break next vGPU reset logic (miss the expected
> +	 * PPGTT invalidation).
> +	 * Engines can only work in D0. Thus the 1st elsp write gives GVT a
> +	 * chance to clear d3_entered.
> +	 */
> +	if (vgpu->d3_entered)
> +		vgpu->d3_entered = false;
> +

Thanks!

Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>

p.s, for such fixes better cc stable as well in future, but you may submit
it anyway after merge.

>  	execlist = &vgpu->submission.execlist[engine->id];
>  
>  	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
> -- 
> 2.32.0
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev