[v1,3/5] drm/i915/gvt: GVTg support context submission pvmmio optimization

Submitted by Zhang, Xiaolin on Nov. 5, 2018, 9:20 a.m.

Details

Message ID 1541409649-21171-3-git-send-email-xiaolin.zhang@intel.com
State New
Series "Series without cover letter"
Headers show

Commit Message

Zhang, Xiaolin Nov. 5, 2018, 9:20 a.m.
implemented context submission pvmmio optimizaiton with GVTg.

GVTg to read context submission data (elsp_data) from the shared_page
directly without trap cost to improve guest GPU peformrnace.

v1: rebase
v0: RFC

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Min He<min.he@intel.com>
Cc: Fei Jiang <fei.jiang@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Hang Yuan <hang.yuan@intel.com>
Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
---
 drivers/gpu/drm/i915/gvt/handlers.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index bf14c66..cd3b602 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -1667,6 +1667,8 @@  static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
 	int ring_id = intel_gvt_render_mmio_to_ring_id(vgpu->gvt, offset);
 	struct intel_vgpu_execlist *execlist;
 	u32 data = *(u32 *)p_data;
+	u32 elsp_data[4];
+	u32 elsp_data_off;
 	int ret = 0;
 
 	if (WARN_ON(ring_id < 0 || ring_id >= I915_NUM_ENGINES))
@@ -1674,6 +1676,16 @@  static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
 
 	execlist = &vgpu->submission.execlist[ring_id];
 
+	if (VGPU_PVMMIO(vgpu) & PVMMIO_ELSP_SUBMIT) {
+		elsp_data_off = offsetof(struct gvt_shared_page, elsp_data);
+		intel_gvt_read_shared_page(vgpu, elsp_data_off, &elsp_data, 16);
+		execlist->elsp_dwords.data[3] = elsp_data[0];
+		execlist->elsp_dwords.data[2] = elsp_data[1];
+		execlist->elsp_dwords.data[1] = elsp_data[2];
+		execlist->elsp_dwords.data[0] = data;
+		return intel_vgpu_submit_execlist(vgpu, ring_id);
+	}
+
 	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
 	if (execlist->elsp_dwords.index == 3) {
 		ret = intel_vgpu_submit_execlist(vgpu, ring_id);

Comments

Zhenyu Wang Nov. 5, 2018, 9:54 a.m.
On 2018.11.05 17:20:47 +0800, Xiaolin Zhang wrote:
> implemented context submission pvmmio optimizaiton with GVTg.
> 
> GVTg to read context submission data (elsp_data) from the shared_page
> directly without trap cost to improve guest GPU peformrnace.
> 
> v1: rebase
> v0: RFC
> 
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: Zhi Wang <zhi.a.wang@intel.com>
> Cc: Min He<min.he@intel.com>
> Cc: Fei Jiang <fei.jiang@intel.com>
> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
> Cc: Hang Yuan <hang.yuan@intel.com>
> Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
> Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/handlers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
> index bf14c66..cd3b602 100644
> --- a/drivers/gpu/drm/i915/gvt/handlers.c
> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
> @@ -1667,6 +1667,8 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
>  	int ring_id = intel_gvt_render_mmio_to_ring_id(vgpu->gvt, offset);
>  	struct intel_vgpu_execlist *execlist;
>  	u32 data = *(u32 *)p_data;
> +	u32 elsp_data[4];
> +	u32 elsp_data_off;
>  	int ret = 0;
>  
>  	if (WARN_ON(ring_id < 0 || ring_id >= I915_NUM_ENGINES))
> @@ -1674,6 +1676,16 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
>  
>  	execlist = &vgpu->submission.execlist[ring_id];
>  
> +	if (VGPU_PVMMIO(vgpu) & PVMMIO_ELSP_SUBMIT) {
> +		elsp_data_off = offsetof(struct gvt_shared_page, elsp_data);
> +		intel_gvt_read_shared_page(vgpu, elsp_data_off, &elsp_data, 16);
> +		execlist->elsp_dwords.data[3] = elsp_data[0];
> +		execlist->elsp_dwords.data[2] = elsp_data[1];
> +		execlist->elsp_dwords.data[1] = elsp_data[2];
> +		execlist->elsp_dwords.data[0] = data;
> +		return intel_vgpu_submit_execlist(vgpu, ring_id);
> +	}

I think we still need to do more check, e.g if ctx address is in valid vgpu range, etc?

> +
>  	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
>  	if (execlist->elsp_dwords.index == 3) {
>  		ret = intel_vgpu_submit_execlist(vgpu, ring_id);
> -- 
> 2.7.4
>
Zhang, Xiaolin Nov. 6, 2018, 5:42 a.m.
On 11/05/2018 06:03 PM, Zhenyu Wang wrote:
> On 2018.11.05 17:20:47 +0800, Xiaolin Zhang wrote:
>> implemented context submission pvmmio optimizaiton with GVTg.
>>
>> GVTg to read context submission data (elsp_data) from the shared_page
>> directly without trap cost to improve guest GPU peformrnace.
>>
>> v1: rebase
>> v0: RFC
>>
>> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
>> Cc: Zhi Wang <zhi.a.wang@intel.com>
>> Cc: Min He<min.he@intel.com>
>> Cc: Fei Jiang <fei.jiang@intel.com>
>> Cc: Zhipeng Gong <zhipeng.gong@intel.com>
>> Cc: Hang Yuan <hang.yuan@intel.com>
>> Cc: Zhiyuan Lv <zhiyuan.lv@intel.com>
>> Signed-off-by: Xiaolin Zhang <xiaolin.zhang@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gvt/handlers.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
>> index bf14c66..cd3b602 100644
>> --- a/drivers/gpu/drm/i915/gvt/handlers.c
>> +++ b/drivers/gpu/drm/i915/gvt/handlers.c
>> @@ -1667,6 +1667,8 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
>>  	int ring_id = intel_gvt_render_mmio_to_ring_id(vgpu->gvt, offset);
>>  	struct intel_vgpu_execlist *execlist;
>>  	u32 data = *(u32 *)p_data;
>> +	u32 elsp_data[4];
>> +	u32 elsp_data_off;
>>  	int ret = 0;
>>  
>>  	if (WARN_ON(ring_id < 0 || ring_id >= I915_NUM_ENGINES))
>> @@ -1674,6 +1676,16 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
>>  
>>  	execlist = &vgpu->submission.execlist[ring_id];
>>  
>> +	if (VGPU_PVMMIO(vgpu) & PVMMIO_ELSP_SUBMIT) {
>> +		elsp_data_off = offsetof(struct gvt_shared_page, elsp_data);
>> +		intel_gvt_read_shared_page(vgpu, elsp_data_off, &elsp_data, 16);
>> +		execlist->elsp_dwords.data[3] = elsp_data[0];
>> +		execlist->elsp_dwords.data[2] = elsp_data[1];
>> +		execlist->elsp_dwords.data[1] = elsp_data[2];
>> +		execlist->elsp_dwords.data[0] = data;
>> +		return intel_vgpu_submit_execlist(vgpu, ring_id);
>> +	}
> I think we still need to do more check, e.g if ctx address is in valid vgpu range, etc?
I think your concern to check input data eslp_datt[4], but elsp_data[4]
store the context descriptor, not ctx address.  so I am not sure is
there any mechanism to validate context descriptor.
>
>> +
>>  	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
>>  	if (execlist->elsp_dwords.index == 3) {
>>  		ret = intel_vgpu_submit_execlist(vgpu, ring_id);
>> -- 
>> 2.7.4
>>
Zhenyu Wang Nov. 6, 2018, 6:14 a.m.
On 2018.11.06 05:42:02 +0000, Zhang, Xiaolin wrote:
> >> @@ -1674,6 +1676,16 @@ static int elsp_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
> >>  
> >>  	execlist = &vgpu->submission.execlist[ring_id];
> >>  
> >> +	if (VGPU_PVMMIO(vgpu) & PVMMIO_ELSP_SUBMIT) {
> >> +		elsp_data_off = offsetof(struct gvt_shared_page, elsp_data);
> >> +		intel_gvt_read_shared_page(vgpu, elsp_data_off, &elsp_data, 16);
> >> +		execlist->elsp_dwords.data[3] = elsp_data[0];
> >> +		execlist->elsp_dwords.data[2] = elsp_data[1];
> >> +		execlist->elsp_dwords.data[1] = elsp_data[2];
> >> +		execlist->elsp_dwords.data[0] = data;
> >> +		return intel_vgpu_submit_execlist(vgpu, ring_id);
> >> +	}
> > I think we still need to do more check, e.g if ctx address is in valid vgpu range, etc?
> I think your concern to check input data eslp_datt[4], but elsp_data[4]
> store the context descriptor, not ctx address.  so I am not sure is
> there any mechanism to validate context descriptor.

Looks intel_vgpu_submit_execlist() does have checks on elsp state and
validate guest ctx address in final submit_context(), was considering
to validate that earlier.

> >> +
> >>  	execlist->elsp_dwords.data[3 - execlist->elsp_dwords.index] = data;
> >>  	if (execlist->elsp_dwords.index == 3) {
> >>  		ret = intel_vgpu_submit_execlist(vgpu, ring_id);
> >> -- 
> >> 2.7.4
> >>
>