[1/2] drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset

Submitted by Chris Wilson on Sept. 12, 2019, 7:09 a.m.

Details

Message ID 20190912070925.11526-1-chris@chris-wilson.co.uk
State Accepted
Commit 582a6f90aa0d9103ab834b3a48a5bb7e4d07cac6
Headers show
Series "Series without cover letter" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Chris Wilson Sept. 12, 2019, 7:09 a.m.
After a GPU reset, we need to drain all the CS events so that we have an
accurate picture of the execlists state at the time of the reset. Be
paranoid and force a read of the CSB write pointer from memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++++
 1 file changed, 4 insertions(+)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3d83c7e0d9de..61a38a4ccbca 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2836,6 +2836,10 @@  static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	struct i915_request *rq;
 	u32 *regs;
 
+	mb(); /* paranoia: read the CSB pointers from after the reset */
+	clflush(execlists->csb_write);
+	mb();
+
 	process_csb(engine); /* drain preemption events */
 
 	/* Following the reset, we need to reload the CSB read/write pointers */

Comments

Chris Wilson <chris@chris-wilson.co.uk> writes:

> After a GPU reset, we need to drain all the CS events so that we have an
> accurate picture of the execlists state at the time of the reset. Be
> paranoid and force a read of the CSB write pointer from memory.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 3d83c7e0d9de..61a38a4ccbca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2836,6 +2836,10 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
>  	struct i915_request *rq;
>  	u32 *regs;
>  
> +	mb(); /* paranoia: read the CSB pointers from after the reset */
> +	clflush(execlists->csb_write);
> +	mb();
> +

We know there is always a cost. We do invalidate the csb
on each pass on process_csb.

Add csb_write in to invalidate_csb entries along
with mbs. Rename it to invalidate_csb and use it
always?

By doing so, we could prolly throw out the rmb() at
the start of the process_csb as we would have invalidated
the write pointer along with the entries we read,
on previous pass.

-Mika


>  	process_csb(engine); /* drain preemption events */
>  
>  	/* Following the reset, we need to reload the CSB read/write pointers */
> -- 
> 2.23.0
Quoting Mika Kuoppala (2019-09-12 08:51:38)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > After a GPU reset, we need to drain all the CS events so that we have an
> > accurate picture of the execlists state at the time of the reset. Be
> > paranoid and force a read of the CSB write pointer from memory.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 3d83c7e0d9de..61a38a4ccbca 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2836,6 +2836,10 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> >       struct i915_request *rq;
> >       u32 *regs;
> >  
> > +     mb(); /* paranoia: read the CSB pointers from after the reset */
> > +     clflush(execlists->csb_write);
> > +     mb();
> > +
> 
> We know there is always a cost. We do invalidate the csb
> on each pass on process_csb.
> 
> Add csb_write in to invalidate_csb entries along
> with mbs. Rename it to invalidate_csb and use it
> always?
> 
> By doing so, we could prolly throw out the rmb() at
> the start of the process_csb as we would have invalidated
> the write pointer along with the entries we read,
> on previous pass.

No. That rmb is essential for the read ordering at that moment in time.

All I have in mind here is a delay, not really a barrier per se, just
this is a nice way of saying no speculation either.
-Chris
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2019-09-12 08:51:38)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > After a GPU reset, we need to drain all the CS events so that we have an
>> > accurate picture of the execlists state at the time of the reset. Be
>> > paranoid and force a read of the CSB write pointer from memory.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> > index 3d83c7e0d9de..61a38a4ccbca 100644
>> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> > @@ -2836,6 +2836,10 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
>> >       struct i915_request *rq;
>> >       u32 *regs;
>> >  
>> > +     mb(); /* paranoia: read the CSB pointers from after the reset */
>> > +     clflush(execlists->csb_write);
>> > +     mb();
>> > +
>> 
>> We know there is always a cost. We do invalidate the csb
>> on each pass on process_csb.
>> 
>> Add csb_write in to invalidate_csb entries along
>> with mbs. Rename it to invalidate_csb and use it
>> always?
>> 
>> By doing so, we could prolly throw out the rmb() at
>> the start of the process_csb as we would have invalidated
>> the write pointer along with the entries we read,
>> on previous pass.
>
> No. That rmb is essential for the read ordering at that moment in time.

Ah yes indeed it is. head vs entries coherency.

>
> All I have in mind here is a delay, not really a barrier per se, just
> this is a nice way of saying no speculation either.

Forgetting the rmb(), there is similar pattern of mb()+flush
elsewhere. Just saw the profiliferation and opportunity to converge.

But syncing with the hardware on moment of reset, this should
do.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Quoting Mika Kuoppala (2019-09-12 09:27:56)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2019-09-12 08:51:38)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> 
> >> > After a GPU reset, we need to drain all the CS events so that we have an
> >> > accurate picture of the execlists state at the time of the reset. Be
> >> > paranoid and force a read of the CSB write pointer from memory.
> >> >
> >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > index 3d83c7e0d9de..61a38a4ccbca 100644
> >> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> > @@ -2836,6 +2836,10 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> >> >       struct i915_request *rq;
> >> >       u32 *regs;
> >> >  
> >> > +     mb(); /* paranoia: read the CSB pointers from after the reset */
> >> > +     clflush(execlists->csb_write);
> >> > +     mb();
> >> > +
> >> 
> >> We know there is always a cost. We do invalidate the csb
> >> on each pass on process_csb.
> >> 
> >> Add csb_write in to invalidate_csb entries along
> >> with mbs. Rename it to invalidate_csb and use it
> >> always?
> >> 
> >> By doing so, we could prolly throw out the rmb() at
> >> the start of the process_csb as we would have invalidated
> >> the write pointer along with the entries we read,
> >> on previous pass.
> >
> > No. That rmb is essential for the read ordering at that moment in time.
> 
> Ah yes indeed it is. head vs entries coherency.
> 
> >
> > All I have in mind here is a delay, not really a barrier per se, just
> > this is a nice way of saying no speculation either.
> 
> Forgetting the rmb(), there is similar pattern of mb()+flush
> elsewhere. Just saw the profiliferation and opportunity to converge.

I understood. I think your barrier-less w/a works pretty well and I
haven't yet poked a hole in how I think it works ;)

> But syncing with the hardware on moment of reset, this should
> do.

I looked at reusing invalidate_csb_entries() and I think the key part
here is that we do want to invalidate the execlists->csb_write itself,
so a subtly different location/reason (not sure if it's the same
cacheline or the neighbouring one).
-Chris