[v2] drm/i915 : Avoid superfluous invalidation of CPU cache lines

Submitted by Akash Goel on Nov. 25, 2015, 5:29 a.m.

Details

Message ID 1448429366-15316-1-git-send-email-akash.goel@intel.com
State New
Headers show
Series "drm/i915 : Avoid superfluous invalidation of CPU cache lines" ( rev: 2 ) in Intel GFX

Not browsing as part of any series.

Commit Message

Akash Goel Nov. 25, 2015, 5:29 a.m.
From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index df9316f..b63d3ab 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2099,6 +2099,15 @@  struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely clflushed.
+	 * !cache_clean does not imply cache_dirty (there is some data in the
+	 * CPU cachelines, but has not been dirtied), but cache_clean
+	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
+	 * Actually cache_dirty tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_clean:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 19c282b..2b6a38d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@  i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_clean = true;
 
 	return true;
 }
@@ -3982,7 +3983,18 @@  i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* If an object is moved out of the CPU domain following a
+		 * CPU write and before a GPU or GTT write, we will clflush
+		 * it out of the CPU cache, and mark the cache as clean.
+		 * As the object has not been accessed on the CPU since
+		 * (i.e. the CPU cache is still clean and it is out of the CPU
+		 * domain), we know that no CPU cache line contains stale data
+		 * and so we can skip invalidating the CPU cache in preparing
+		 * to read from the object.
+		 */
+		if (!obj->cache_clean)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_clean = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}