[v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines

Submitted by Akash Goel on Nov. 30, 2015, 7:11 a.m.

Details

Message ID 1448867465-5520-1-git-send-email-akash.goel@intel.com
State New
Headers show
Series "drm/i915 : Avoid superfluous invalidation of CPU cache lines" ( rev: 3 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Akash Goel Nov. 30, 2015, 7:11 a.m.
From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
    speculative-prefetching behavior of HW (Ville/Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..f97795e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@  struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely clflushed.
+	 * !cache_clean does not imply cache_dirty (there is some data in the
+	 * CPU cachelines, but has not been dirtied), but cache_clean
+	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
+	 * Actually cache_dirty tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_clean:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..7376be8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@  i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_clean = true;
 
 	return true;
 }
@@ -3982,7 +3983,21 @@  i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* If an object is moved out of the CPU domain following a
+		 * CPU write and before a GPU or GTT write, we will clflush
+		 * it out of the CPU cache, and mark the cache as clean.
+		 * After clflushing we know that this object cannot be in the
+		 * CPU cache, nor can it be speculatively loaded into the CPU
+		 * cache as our objects are page-aligned (& speculation cannot
+		 * cross page boundaries). Whilst this flag is set, we know
+		 * that any future access to the object's pages will miss the
+		 * stale cache and have to be serviced from main memory, i.e.
+		 * we do not need another clflush to invalidate the CPU cache
+		 * in preparing to read from the object.
+		 */
+		if (!obj->cache_clean)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_clean = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}

Comments

On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
> 
> v2: Made the comment more verbose (Ville/Chris)
>     Added doc for 'cache_clean' field (Daniel)
> 
> v3: Updated the comment to assuage an apprehension regarding the
>     speculative-prefetching behavior of HW (Ville/Chris)
> 
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/benchmarks/gem_set_domain
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
>  drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 11ae5a5..f97795e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
>  	unsigned int cache_level:3;
>  	unsigned int cache_dirty:1;
>  
> +	/*
> +	 * Tracks if the CPU cache has been completely clflushed.
> +	 * !cache_clean does not imply cache_dirty (there is some data in the
> +	 * CPU cachelines, but has not been dirtied), but cache_clean
> +	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
> +	 * Actually cache_dirty tracks whether we have been omitting clflushes.
> +	 */
> +	unsigned int cache_clean:1;

Maybe it should be cache_flushed or something? clean really makes me
think !dirty.

> +
>  	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
>  
>  	unsigned int pin_display;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 33adc8f..7376be8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  	trace_i915_gem_object_clflush(obj);
>  	drm_clflush_sg(obj->pages);
>  	obj->cache_dirty = false;
> +	obj->cache_clean = true;
>  
>  	return true;
>  }
> @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> -		i915_gem_clflush_object(obj, false);
> +		/* If an object is moved out of the CPU domain following a
> +		 * CPU write and before a GPU or GTT write, we will clflush
> +		 * it out of the CPU cache, and mark the cache as clean.
> +		 * After clflushing we know that this object cannot be in the
> +		 * CPU cache, nor can it be speculatively loaded into the CPU
> +		 * cache as our objects are page-aligned (& speculation cannot
> +		 * cross page boundaries). Whilst this flag is set, we know
> +		 * that any future access to the object's pages will miss the
> +		 * stale cache and have to be serviced from main memory, i.e.
> +		 * we do not need another clflush to invalidate the CPU cache
> +		 * in preparing to read from the object.
> +		 */
> +		if (!obj->cache_clean)
> +			i915_gem_clflush_object(obj, false);
> +		obj->cache_clean = false;

Having the comment here talk about moving stuff out of the cpu domain
made me think there's a bug here (false vs. true). But actually this
code moves it into the cpu domain so it's actually fine, I wonder if
there's a better place for the comment (eg. where we do set
cache_clean=true)?

>  
>  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>  	}
> -- 
> 1.9.2
On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> >  
> >  	/* Flush the CPU cache if it's still invalid. */
> >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > -		i915_gem_clflush_object(obj, false);
> > +		/* If an object is moved out of the CPU domain following a
> > +		 * CPU write and before a GPU or GTT write, we will clflush
> > +		 * it out of the CPU cache, and mark the cache as clean.
> > +		 * After clflushing we know that this object cannot be in the
> > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > +		 * cache as our objects are page-aligned (& speculation cannot
> > +		 * cross page boundaries). Whilst this flag is set, we know
> > +		 * that any future access to the object's pages will miss the
> > +		 * stale cache and have to be serviced from main memory, i.e.
> > +		 * we do not need another clflush to invalidate the CPU cache
> > +		 * in preparing to read from the object.
> > +		 */
> > +		if (!obj->cache_clean)
> > +			i915_gem_clflush_object(obj, false);
> > +		obj->cache_clean = false;
> 
> Having the comment here talk about moving stuff out of the cpu domain
> made me think there's a bug here (false vs. true). But actually this
> code moves it into the cpu domain so it's actually fine, I wonder if
> there's a better place for the comment (eg. where we do set
> cache_clean=true)?

I thought it made more sense here because this is where we playing the
trick to avoid the clflush.

Hmm, would s/If an object/When the object/ and
s/cache_clean/cache_flushed/ suffice?
-Chris
On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > >  
> > >  	/* Flush the CPU cache if it's still invalid. */
> > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > -		i915_gem_clflush_object(obj, false);
> > > +		/* If an object is moved out of the CPU domain following a
> > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > +		 * After clflushing we know that this object cannot be in the
> > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > +		 * that any future access to the object's pages will miss the
> > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > +		 * we do not need another clflush to invalidate the CPU cache
> > > +		 * in preparing to read from the object.
> > > +		 */
> > > +		if (!obj->cache_clean)
> > > +			i915_gem_clflush_object(obj, false);
> > > +		obj->cache_clean = false;
> > 
> > Having the comment here talk about moving stuff out of the cpu domain
> > made me think there's a bug here (false vs. true). But actually this
> > code moves it into the cpu domain so it's actually fine, I wonder if
> > there's a better place for the comment (eg. where we do set
> > cache_clean=true)?
> 
> I thought it made more sense here because this is where we playing the
> trick to avoid the clflush.
> 
> Hmm, would s/If an object/When the object/ and
> s/cache_clean/cache_flushed/ suffice?

Maybe 'When the object is eventually moved out...' ?

That extra word might convey more clearly that's it's not talking
about moving it out right now.
On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
> On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> > On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > > >  
> > > >  	/* Flush the CPU cache if it's still invalid. */
> > > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > > -		i915_gem_clflush_object(obj, false);
> > > > +		/* If an object is moved out of the CPU domain following a
> > > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > > +		 * After clflushing we know that this object cannot be in the
> > > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > > +		 * that any future access to the object's pages will miss the
> > > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > > +		 * we do not need another clflush to invalidate the CPU cache
> > > > +		 * in preparing to read from the object.
> > > > +		 */
> > > > +		if (!obj->cache_clean)
> > > > +			i915_gem_clflush_object(obj, false);
> > > > +		obj->cache_clean = false;
> > > 
> > > Having the comment here talk about moving stuff out of the cpu domain
> > > made me think there's a bug here (false vs. true). But actually this
> > > code moves it into the cpu domain so it's actually fine, I wonder if
> > > there's a better place for the comment (eg. where we do set
> > > cache_clean=true)?
> > 
> > I thought it made more sense here because this is where we playing the
> > trick to avoid the clflush.
> > 
> > Hmm, would s/If an object/When the object/ and
> > s/cache_clean/cache_flushed/ suffice?
> 
> Maybe 'When the object is eventually moved out...' ?
> 
> That extra word might convey more clearly that's it's not talking
> about moving it out right now.

Hmm, the change of tense is good. Let's try that again:

When the object was moved out the CPU domain following a CPU write, we
will have flushed it out of the CPU cache (and marked the object as
cache_flushed). After the clflush, we know that this object cannot be in
the CPU cache, nor can it be speculatively loaded into the CPU cache as
our objects are page-aligned and speculation cannot cross page
boundaries. So whilst the cache_flushed flag is set, we know that any
future access to the object's pages will miss the GPU cache and have to
be serviced from main memory (where they will pick up any writes
through the GTT or by the GPU) i.e. we do not need another clflush here
and now to invalidate the CPU cache as we prepare to read from the object.
-Chris
On Tue, Dec 01, 2015 at 01:49:10PM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
> > On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> > > On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > > > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > > > >  
> > > > >  	/* Flush the CPU cache if it's still invalid. */
> > > > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > > > -		i915_gem_clflush_object(obj, false);
> > > > > +		/* If an object is moved out of the CPU domain following a
> > > > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > > > +		 * After clflushing we know that this object cannot be in the
> > > > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > > > +		 * that any future access to the object's pages will miss the
> > > > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > > > +		 * we do not need another clflush to invalidate the CPU cache
> > > > > +		 * in preparing to read from the object.
> > > > > +		 */
> > > > > +		if (!obj->cache_clean)
> > > > > +			i915_gem_clflush_object(obj, false);
> > > > > +		obj->cache_clean = false;
> > > > 
> > > > Having the comment here talk about moving stuff out of the cpu domain
> > > > made me think there's a bug here (false vs. true). But actually this
> > > > code moves it into the cpu domain so it's actually fine, I wonder if
> > > > there's a better place for the comment (eg. where we do set
> > > > cache_clean=true)?
> > > 
> > > I thought it made more sense here because this is where we playing the
> > > trick to avoid the clflush.
> > > 
> > > Hmm, would s/If an object/When the object/ and
> > > s/cache_clean/cache_flushed/ suffice?
> > 
> > Maybe 'When the object is eventually moved out...' ?
> > 
> > That extra word might convey more clearly that's it's not talking
> > about moving it out right now.
> 
> Hmm, the change of tense is good. Let's try that again:
> 
> When the object was moved out the CPU domain following a CPU write, we
> will have flushed it out of the CPU cache (and marked the object as
> cache_flushed). After the clflush, we know that this object cannot be in
> the CPU cache, nor can it be speculatively loaded into the CPU cache as
> our objects are page-aligned and speculation cannot cross page
> boundaries. So whilst the cache_flushed flag is set, we know that any
> future access to the object's pages will miss the GPU cache and have to
> be serviced from main memory (where they will pick up any writes
> through the GTT or by the GPU) i.e. we do not need another clflush here
> and now to invalidate the CPU cache as we prepare to read from the object.

Hmm, yeah referring to past events clearly now. That does make more
sense that referring to future events. lgtm
On 12/1/2015 7:30 PM, Ville Syrjälä wrote:
> On Tue, Dec 01, 2015 at 01:49:10PM +0000, Chris Wilson wrote:
>> On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
>>> On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
>>>> On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
>>>>> On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
>>>>>> @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>>>>>>
>>>>>>   	/* Flush the CPU cache if it's still invalid. */
>>>>>>   	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
>>>>>> -		i915_gem_clflush_object(obj, false);
>>>>>> +		/* If an object is moved out of the CPU domain following a
>>>>>> +		 * CPU write and before a GPU or GTT write, we will clflush
>>>>>> +		 * it out of the CPU cache, and mark the cache as clean.
>>>>>> +		 * After clflushing we know that this object cannot be in the
>>>>>> +		 * CPU cache, nor can it be speculatively loaded into the CPU
>>>>>> +		 * cache as our objects are page-aligned (& speculation cannot
>>>>>> +		 * cross page boundaries). Whilst this flag is set, we know
>>>>>> +		 * that any future access to the object's pages will miss the
>>>>>> +		 * stale cache and have to be serviced from main memory, i.e.
>>>>>> +		 * we do not need another clflush to invalidate the CPU cache
>>>>>> +		 * in preparing to read from the object.
>>>>>> +		 */
>>>>>> +		if (!obj->cache_clean)
>>>>>> +			i915_gem_clflush_object(obj, false);
>>>>>> +		obj->cache_clean = false;
>>>>>
>>>>> Having the comment here talk about moving stuff out of the cpu domain
>>>>> made me think there's a bug here (false vs. true). But actually this
>>>>> code moves it into the cpu domain so it's actually fine, I wonder if
>>>>> there's a better place for the comment (eg. where we do set
>>>>> cache_clean=true)?
>>>>
>>>> I thought it made more sense here because this is where we playing the
>>>> trick to avoid the clflush.
>>>>
>>>> Hmm, would s/If an object/When the object/ and
>>>> s/cache_clean/cache_flushed/ suffice?
>>>
>>> Maybe 'When the object is eventually moved out...' ?
>>>
>>> That extra word might convey more clearly that's it's not talking
>>> about moving it out right now.
>>
>> Hmm, the change of tense is good. Let's try that again:
>>
>> When the object was moved out the CPU domain following a CPU write, we
>> will have flushed it out of the CPU cache (and marked the object as
>> cache_flushed). After the clflush, we know that this object cannot be in
>> the CPU cache, nor can it be speculatively loaded into the CPU cache as
>> our objects are page-aligned and speculation cannot cross page
>> boundaries. So whilst the cache_flushed flag is set, we know that any
>> future access to the object's pages will miss the GPU cache and have to
>> be serviced from main memory (where they will pick up any writes
>> through the GTT or by the GPU) i.e. we do not need another clflush here
>> and now to invalidate the CPU cache as we prepare to read from the object.
>
> Hmm, yeah referring to past events clearly now. That does make more
> sense that referring to future events. lgtm

Thanks, will update the 'comments' in the next version.

And will also rename 'cache_clean' flag to 'cache_flushed'.

Best Regards
Akash


>