[1/3] drm/i915/execlists: Force write serialisation into context image vs execution

Submitted by Chris Wilson on Nov. 8, 2018, 8:17 a.m.

Details

Message ID 20181108081740.25615-1-chris@chris-wilson.co.uk
State New
Series "Series without cover letter"
Headers show

Commit Message

Chris Wilson Nov. 8, 2018, 8:17 a.m.
Ensure that the writes into the context image are completed prior to the
register mmio to trigger execution. Although previously we were assured
by the SDM that all writes are flushed before an uncached memory
transaction (our mmio write to submit the context to HW for execution),
we have empirical evidence to believe that this is not actually the
case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656
References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
References: https://bugs.freedesktop.org/show_bug.cgi?id=106887
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 22b57b8926fc..f7892ddb3f13 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -380,7 +380,8 @@  static u64 execlists_update_context(struct i915_request *rq)
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
 
-	/* True 32b PPGTT with dynamic page allocation: update PDP
+	/*
+	 * True 32b PPGTT with dynamic page allocation: update PDP
 	 * registers and point the unallocated PDPs to scratch page.
 	 * PML4 is allocated during ppgtt init, so this is not needed
 	 * in 48-bit mode.
@@ -388,6 +389,17 @@  static u64 execlists_update_context(struct i915_request *rq)
 	if (!i915_vm_is_48bit(&ppgtt->vm))
 		execlists_update_context_pdps(ppgtt, reg_state);
 
+	/*
+	 * Make sure the context image is complete before we submit it to HW.
+	 *
+	 * Ostensibly, writes (including the WCB) should be flushed prior to
+	 * an uncached write such as our mmio register access, the empirical
+	 * evidence (esp. on Braswell) suggests that the WC write into memory
+	 * may not be visible to the HW prior to the completion of the UC
+	 * register write and that we may begin execution from the context
+	 * before its image is complete leading to invalid PD chasing.
+	 */
+	wmb();
 	return ce->lrc_desc;
 }
 

Comments

Mika Kuoppala Nov. 8, 2018, noon
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Ensure that the writes into the context image are completed prior to the
> register mmio to trigger execution. Although previously we were assured
> by the SDM that all writes are flushed before an uncached memory
> transaction (our mmio write to submit the context to HW for execution),
> we have empirical evidence to believe that this is not actually the
> case.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656

I would have marked this also as References.

> References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106887
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 22b57b8926fc..f7892ddb3f13 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -380,7 +380,8 @@ static u64 execlists_update_context(struct i915_request *rq)
>  
>  	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
>  
> -	/* True 32b PPGTT with dynamic page allocation: update PDP
> +	/*
> +	 * True 32b PPGTT with dynamic page allocation: update PDP
>  	 * registers and point the unallocated PDPs to scratch page.
>  	 * PML4 is allocated during ppgtt init, so this is not needed
>  	 * in 48-bit mode.
> @@ -388,6 +389,17 @@ static u64 execlists_update_context(struct i915_request *rq)
>  	if (!i915_vm_is_48bit(&ppgtt->vm))
>  		execlists_update_context_pdps(ppgtt, reg_state);
>  
> +	/*
> +	 * Make sure the context image is complete before we submit it to HW.
> +	 *
> +	 * Ostensibly, writes (including the WCB) should be flushed prior to
> +	 * an uncached write such as our mmio register access, the empirical
> +	 * evidence (esp. on Braswell) suggests that the WC write into memory
> +	 * may not be visible to the HW prior to the completion of the UC
> +	 * register write and that we may begin execution from the context
> +	 * before its image is complete leading to invalid PD chasing.
> +	 */
> +	wmb();

Let's put it into use and gather more evidence.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  	return ce->lrc_desc;
>  }
>  
> -- 
> 2.19.1
Chris Wilson Nov. 8, 2018, 12:11 p.m.
Quoting Mika Kuoppala (2018-11-08 12:00:39)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Ensure that the writes into the context image are completed prior to the
> > register mmio to trigger execution. Although previously we were assured
> > by the SDM that all writes are flushed before an uncached memory
> > transaction (our mmio write to submit the context to HW for execution),
> > we have empirical evidence to believe that this is not actually the
> > case.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656
> 
> I would have marked this also as References.

I'm confident in my local results indicating some success here, albeit
in not exactly the same quick death, but still out-of-bounds execution.
 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106887
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 22b57b8926fc..f7892ddb3f13 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -380,7 +380,8 @@ static u64 execlists_update_context(struct i915_request *rq)
> >  
> >       reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
> >  
> > -     /* True 32b PPGTT with dynamic page allocation: update PDP
> > +     /*
> > +      * True 32b PPGTT with dynamic page allocation: update PDP
> >        * registers and point the unallocated PDPs to scratch page.
> >        * PML4 is allocated during ppgtt init, so this is not needed
> >        * in 48-bit mode.
> > @@ -388,6 +389,17 @@ static u64 execlists_update_context(struct i915_request *rq)
> >       if (!i915_vm_is_48bit(&ppgtt->vm))
> >               execlists_update_context_pdps(ppgtt, reg_state);
> >  
> > +     /*
> > +      * Make sure the context image is complete before we submit it to HW.
> > +      *
> > +      * Ostensibly, writes (including the WCB) should be flushed prior to
> > +      * an uncached write such as our mmio register access, the empirical
> > +      * evidence (esp. on Braswell) suggests that the WC write into memory
> > +      * may not be visible to the HW prior to the completion of the UC
> > +      * register write and that we may begin execution from the context
> > +      * before its image is complete leading to invalid PD chasing.
> > +      */
> > +     wmb();
> 
> Let's put it into use and gather more evidence.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Aye. Sure about r-b? I'm quite happy to take an a-b since we're just
postulating to gather evidence.
-Chris
Mika Kuoppala Nov. 8, 2018, 12:13 p.m.
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-11-08 12:00:39)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Ensure that the writes into the context image are completed prior to the
>> > register mmio to trigger execution. Although previously we were assured
>> > by the SDM that all writes are flushed before an uncached memory
>> > transaction (our mmio write to submit the context to HW for execution),
>> > we have empirical evidence to believe that this is not actually the
>> > case.
>> >
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656
>> 
>> I would have marked this also as References.
>
> I'm confident in my local results indicating some success here, albeit
> in not exactly the same quick death, but still out-of-bounds execution.
>  
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106887
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_lrc.c | 14 +++++++++++++-
>> >  1 file changed, 13 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index 22b57b8926fc..f7892ddb3f13 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -380,7 +380,8 @@ static u64 execlists_update_context(struct i915_request *rq)
>> >  
>> >       reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
>> >  
>> > -     /* True 32b PPGTT with dynamic page allocation: update PDP
>> > +     /*
>> > +      * True 32b PPGTT with dynamic page allocation: update PDP
>> >        * registers and point the unallocated PDPs to scratch page.
>> >        * PML4 is allocated during ppgtt init, so this is not needed
>> >        * in 48-bit mode.
>> > @@ -388,6 +389,17 @@ static u64 execlists_update_context(struct i915_request *rq)
>> >       if (!i915_vm_is_48bit(&ppgtt->vm))
>> >               execlists_update_context_pdps(ppgtt, reg_state);
>> >  
>> > +     /*
>> > +      * Make sure the context image is complete before we submit it to HW.
>> > +      *
>> > +      * Ostensibly, writes (including the WCB) should be flushed prior to
>> > +      * an uncached write such as our mmio register access, the empirical
>> > +      * evidence (esp. on Braswell) suggests that the WC write into memory
>> > +      * may not be visible to the HW prior to the completion of the UC
>> > +      * register write and that we may begin execution from the context
>> > +      * before its image is complete leading to invalid PD chasing.
>> > +      */
>> > +     wmb();
>> 
>> Let's put it into use and gather more evidence.
>> 
>> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> Aye. Sure about r-b? I'm quite happy to take an a-b since we're just
> postulating to gather evidence.

Agreed that a-b is more accurate. r-b would indicate I know what the
heck is going on there under the hood.
-Mika
Chris Wilson Nov. 8, 2018, 12:26 p.m.
Quoting Mika Kuoppala (2018-11-08 12:13:42)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Mika Kuoppala (2018-11-08 12:00:39)
> >> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> > +     /*
> >> > +      * Make sure the context image is complete before we submit it to HW.
> >> > +      *
> >> > +      * Ostensibly, writes (including the WCB) should be flushed prior to
> >> > +      * an uncached write such as our mmio register access, the empirical
> >> > +      * evidence (esp. on Braswell) suggests that the WC write into memory
> >> > +      * may not be visible to the HW prior to the completion of the UC
> >> > +      * register write and that we may begin execution from the context
> >> > +      * before its image is complete leading to invalid PD chasing.
> >> > +      */
> >> > +     wmb();
> >> 
> >> Let's put it into use and gather more evidence.
> >> 
> >> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >
> > Aye. Sure about r-b? I'm quite happy to take an a-b since we're just
> > postulating to gather evidence.
> 
> Agreed that a-b is more accurate. r-b would indicate I know what the
> heck is going on there under the hood.

And pushed (this patch) to see if the next few months do quieten down.
-Chris
Chris Wilson Nov. 8, 2018, 1:38 p.m.
Quoting Chris Wilson (2018-11-08 12:11:05)
> Quoting Mika Kuoppala (2018-11-08 12:00:39)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > Ensure that the writes into the context image are completed prior to the
> > > register mmio to trigger execution. Although previously we were assured
> > > by the SDM that all writes are flushed before an uncached memory
> > > transaction (our mmio write to submit the context to HW for execution),
> > > we have empirical evidence to believe that this is not actually the
> > > case.
> > >
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656
> > 
> > I would have marked this also as References.
> 
> I'm confident in my local results indicating some success here, albeit
> in not exactly the same quick death, but still out-of-bounds execution.

For the record, bsw just hung after close to a day, with the same out of
bounds execution (BBADDR after the batch).
/o\
-Chris