[1/3] drm/i915: Prevent concurrent GGTT update and use on Braswell (again)

Submitted by Chris Wilson on Jan. 14, 2019, 9:17 p.m.

Details

Message ID 20190114211729.30352-1-chris@chris-wilson.co.uk
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Chris Wilson Jan. 14, 2019, 9:17 p.m.
On Braswell, under heavy stress, if we update the GGTT while
simultaneously accessing another region inside the GTT, we are returned
the wrong values. To prevent this we stop the machine to update the GGTT
entries so that no memory traffic can occur at the same time.

This was first spotted in

commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 23 18:43:32 2015 +0100

    drm/i915: Serialise updates to GGTT with access through GGTT on Braswell

but removed again in forlorn hope with

commit 4509276ee824bb967885c095c610767e42345c36
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Feb 20 12:47:18 2017 +0000

    drm/i915: Remove Braswell GGTT update w/a

However, gem_concurrent_blit is once again only stable with the patch
applied and CI is detecting the odd failure in forked gem_mmap_gtt tests
(which smell like the same issue). Fwiw, a wide variety of CPU memory
barriers (around GGTT flushing, fence updates, PTE updates) and GPU
flushes/invalidates (between requests, after PTE updates) were tried as
part of the investigation to find an alternate cause, nothing comes
close to serialised GGTT updates.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105591
Testcase: igt/gem_concurrent_blit
Testcase: igt/gem_mmap_gtt/*forked*
References: 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell")
References: 4509276ee824 ("drm/i915: Remove Braswell GGTT update w/a")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index dbea14bf67cc..f0d46366fb0b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3232,7 +3232,8 @@  static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 	ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
 
 	/* Serialize GTT updates with aperture access on BXT if VT-d is on. */
-	if (intel_ggtt_update_needs_vtd_wa(dev_priv)) {
+	if (intel_ggtt_update_needs_vtd_wa(dev_priv) ||
+	    IS_CHERRYVIEW(dev_priv) /* fails with concurrent use/update */) {
 		ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
 		ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
 		if (ggtt->vm.clear_range != nop_clear_range)

Comments

On 14/01/2019 21:17, Chris Wilson wrote:
> On Braswell, under heavy stress, if we update the GGTT while
> simultaneously accessing another region inside the GTT, we are returned
> the wrong values. To prevent this we stop the machine to update the GGTT
> entries so that no memory traffic can occur at the same time.
> 
> This was first spotted in
> 
> commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Oct 23 18:43:32 2015 +0100
> 
>      drm/i915: Serialise updates to GGTT with access through GGTT on Braswell
> 
> but removed again in forlorn hope with
> 
> commit 4509276ee824bb967885c095c610767e42345c36
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Feb 20 12:47:18 2017 +0000
> 
>      drm/i915: Remove Braswell GGTT update w/a
> 
> However, gem_concurrent_blit is once again only stable with the patch
> applied and CI is detecting the odd failure in forked gem_mmap_gtt tests
> (which smell like the same issue). Fwiw, a wide variety of CPU memory
> barriers (around GGTT flushing, fence updates, PTE updates) and GPU
> flushes/invalidates (between requests, after PTE updates) were tried as
> part of the investigation to find an alternate cause, nothing comes
> close to serialised GGTT updates.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105591
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/gem_mmap_gtt/*forked*
> References: 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell")
> References: 4509276ee824 ("drm/i915: Remove Braswell GGTT update w/a")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index dbea14bf67cc..f0d46366fb0b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3232,7 +3232,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
>   	ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
>   
>   	/* Serialize GTT updates with aperture access on BXT if VT-d is on. */
> -	if (intel_ggtt_update_needs_vtd_wa(dev_priv)) {
> +	if (intel_ggtt_update_needs_vtd_wa(dev_priv) ||
> +	    IS_CHERRYVIEW(dev_priv) /* fails with concurrent use/update */) {
>   		ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
>   		ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
>   		if (ggtt->vm.clear_range != nop_clear_range)
> 
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Quoting Tvrtko Ursulin (2019-01-15 09:04:47)
> 
> On 14/01/2019 21:17, Chris Wilson wrote:
> > On Braswell, under heavy stress, if we update the GGTT while
> > simultaneously accessing another region inside the GTT, we are returned
> > the wrong values. To prevent this we stop the machine to update the GGTT
> > entries so that no memory traffic can occur at the same time.
> > 
> > This was first spotted in
> > 
> > commit 5bab6f60cb4d1417ad7c599166bcfec87529c1a2
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Fri Oct 23 18:43:32 2015 +0100
> > 
> >      drm/i915: Serialise updates to GGTT with access through GGTT on Braswell
> > 
> > but removed again in forlorn hope with
> > 
> > commit 4509276ee824bb967885c095c610767e42345c36
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Mon Feb 20 12:47:18 2017 +0000
> > 
> >      drm/i915: Remove Braswell GGTT update w/a
> > 
> > However, gem_concurrent_blit is once again only stable with the patch
> > applied and CI is detecting the odd failure in forked gem_mmap_gtt tests
> > (which smell like the same issue). Fwiw, a wide variety of CPU memory
> > barriers (around GGTT flushing, fence updates, PTE updates) and GPU
> > flushes/invalidates (between requests, after PTE updates) were tried as
> > part of the investigation to find an alternate cause, nothing comes
> > close to serialised GGTT updates.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105591
> > Testcase: igt/gem_concurrent_blit
> > Testcase: igt/gem_mmap_gtt/*forked*
> > References: 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell")
> > References: 4509276ee824 ("drm/i915: Remove Braswell GGTT update w/a")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index dbea14bf67cc..f0d46366fb0b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -3232,7 +3232,8 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt)
> >       ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
> >   
> >       /* Serialize GTT updates with aperture access on BXT if VT-d is on. */
> > -     if (intel_ggtt_update_needs_vtd_wa(dev_priv)) {
> > +     if (intel_ggtt_update_needs_vtd_wa(dev_priv) ||
> > +         IS_CHERRYVIEW(dev_priv) /* fails with concurrent use/update */) {
> >               ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
> >               ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
> >               if (ggtt->vm.clear_range != nop_clear_range)
> > 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Thanks, pushed and closed the bugzilla. If I am allowed to be
optimistic, maybe this will be the cause of the fail-on-boot as well?
-Chris