[RFC,3/4] drm/i915: Use new i915_gem_object_pin_map for LRC

Submitted by Tvrtko Ursulin on April 8, 2016, 1:54 p.m.

Details

Message ID 1460123698-16832-4-git-send-email-tvrtko.ursulin@linux.intel.com
State New
Headers show
Series "Eliminating the execlist retired request queue" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Tvrtko Ursulin April 8, 2016, 1:54 p.m.
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can use the new pin/lazy unpin API for more performance
in the execlist submission paths.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3d08dac0a9b0..5e967943ce49 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1111,8 +1111,8 @@  static int intel_lr_context_do_pin(struct intel_context *ctx,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[engine->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[engine->id].ringbuf;
-	struct page *lrc_state_page;
-	uint32_t *lrc_reg_state;
+	void *obj_addr;
+	u32 *lrc_reg_state;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&engine->dev->struct_mutex));
@@ -1122,11 +1122,11 @@  static int intel_lr_context_do_pin(struct intel_context *ctx,
 	if (ret)
 		return ret;
 
-	lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	if (WARN_ON(!lrc_state_page)) {
-		ret = -ENODEV;
+	obj_addr = i915_gem_object_pin_map(ctx_obj);
+	if (!obj_addr)
 		goto unpin_ctx_obj;
-	}
+
+	lrc_reg_state = obj_addr + LRC_STATE_PN * PAGE_SIZE;
 
 	ret = intel_pin_and_map_ringbuffer_obj(engine->dev, ringbuf);
 	if (ret)
@@ -1134,7 +1134,6 @@  static int intel_lr_context_do_pin(struct intel_context *ctx,
 
 	ctx->engine[engine->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
 	intel_lr_context_descriptor_update(ctx, engine);
-	lrc_reg_state = kmap(lrc_state_page);
 	lrc_reg_state[CTX_RING_BUFFER_START+1] = ringbuf->vma->node.start;
 	ctx->engine[engine->id].lrc_reg_state = lrc_reg_state;
 	ctx_obj->dirty = true;
@@ -1177,7 +1176,7 @@  void intel_lr_context_unpin(struct intel_context *ctx,
 
 	WARN_ON(!mutex_is_locked(&ctx->i915->dev->struct_mutex));
 	if (--ctx->engine[engine->id].pin_count == 0) {
-		kunmap(kmap_to_page(ctx->engine[engine->id].lrc_reg_state));
+		i915_gem_object_unpin_map(ctx_obj);
 		intel_unpin_ringbuffer_obj(ctx->engine[engine->id].ringbuf);
 		i915_gem_object_ggtt_unpin(ctx_obj);
 		ctx->engine[engine->id].lrc_vma = NULL;

Comments

On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We can use the new pin/lazy unpin API for more performance
> in the execlist submission paths.

The sticking point for me was the ringbuffer and Braswell having to
re-ioremap it every context pin. I hadn't noticed the kmap of the
context, but this does seem like a valid use for pin_map so I'm not
complaining... (Just saying this half the solution ;)

Oh, and you don't have pin_map for your ringbuffer either yet.
-Chris
On 08/04/16 15:40, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We can use the new pin/lazy unpin API for more performance
>> in the execlist submission paths.
>
> The sticking point for me was the ringbuffer and Braswell having to
> re-ioremap it every context pin. I hadn't noticed the kmap of the
> context, but this does seem like a valid use for pin_map so I'm not
> complaining... (Just saying this half the solution ;)

Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA" as 
well then?

> Oh, and you don't have pin_map for your ringbuffer either yet.

Confused, I did base this on top of "drm/i915: Refactor duplicate object 
vmap functions" and the half missing is the ioremap paths. Which you 
commented on above. What other ringbuffer is missing?

Regards,

Tvrtko
On Mon, Apr 11, 2016 at 10:05:34AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/04/16 15:40, Chris Wilson wrote:
> >On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>We can use the new pin/lazy unpin API for more performance
> >>in the execlist submission paths.
> >
> >The sticking point for me was the ringbuffer and Braswell having to
> >re-ioremap it every context pin. I hadn't noticed the kmap of the
> >context, but this does seem like a valid use for pin_map so I'm not
> >complaining... (Just saying this half the solution ;)
> 
> Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA"
> as well then?

That depends on "io-mapping: Specify mapping size for io_mapping_map_wc"
and the vma->map_and_fenceable (though you can workaround that) iirc .
 
> >Oh, and you don't have pin_map for your ringbuffer either yet.
> 
> Confused, I did base this on top of "drm/i915: Refactor duplicate
> object vmap functions" and the half missing is the ioremap paths.
> Which you commented on above. What other ringbuffer is missing?

No, I was just thinking it was still pending because I had a fixup in my
tree: drm/i915: Treat ringbuffer writes as write to normal memory

That and a patch to use WC vmaps for Braswell as well (that's still in
preliminary form, but if you are interested
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=3fa97e044f3b5a2c23e44ec659a8199f134d8fb5
it really helps with context creation stress tests - and not much else!
;)
-Chris