[Mesa-dev,4/4] i965: Fix remaining flush vs invalidate race conditions in brw_emit_pipe_control_flush.

Submitted by Francisco Jerez on July 1, 2016, 5:07 a.m.

Details

Message ID 20160701050744.26228-4-currojerez@riseup.net
State New
Headers show
Series "i965: adds gen7_emit_cs_stall_flush on intel_texture_barrier" ( rev: 7 ) in Mesa

Not browsing as part of any series.

Commit Message

Francisco Jerez July 1, 2016, 5:07 a.m.
This hardware race condition has caused problems several times already
(see "i965: Fix cache pollution race during L3 partitioning set-up.",
"i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
"i965: intel_texture_barrier reimplemented").  The problem is that
whenever we attempt to both flush and invalidate multiple caches with
a single pipe control command the flush and invalidation happen in
reverse order, so the contents flushed from the R/W caches aren't
guaranteed to become visible from the invalidated caches after the
PIPE_CONTROL command completes execution if some concurrent rendering
workload happened to pollute any of the invalidated R/O caches in the
short window of time between the invalidation and flush.

This makes sure that brw_emit_pipe_control_flush() has the effect
expected by most callers of making the contents flushed from any R/W
caches visible from the invalidated R/O caches.
---
 src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++++++++++++++++++
 src/mesa/drivers/dri/i965/intel_reg.h        |  9 +++++++++
 2 files changed, 27 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c
index 14a8f7c..05e8c05 100644
--- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
+++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
@@ -96,6 +96,24 @@  gen7_cs_stall_every_four_pipe_controls(struct brw_context *brw, uint32_t flags)
 void
 brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
 {
+   if (brw->gen >= 6 &&
+       (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) &&
+       (flags & PIPE_CONTROL_CACHE_INVALIDATE_BITS)) {
+      /* A pipe control command with flush and invalidate bits set
+       * simultaneously is an inherently racy operation on Gen6+ if the
+       * contents of the flushed caches were intended to become visible from
+       * any of the invalidated caches.  Split it in two PIPE_CONTROLs, the
+       * first one should stall the pipeline to make sure that the flushed R/W
+       * caches are coherent with memory once the specified R/O caches are
+       * invalidated.  On pre-Gen6 hardware the (implicit) R/O cache
+       * invalidation seems to happen at the bottom of the pipeline together
+       * with any write cache flush, so this shouldn't be a concern.
+       */
+      brw_emit_pipe_control_flush(brw, (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) |
+                                       PIPE_CONTROL_CS_STALL);
+      flags &= ~(PIPE_CONTROL_CACHE_FLUSH_BITS | PIPE_CONTROL_CS_STALL);
+   }
+
    if (brw->gen >= 8) {
       if (brw->gen == 8)
          gen8_add_cs_stall_workaround_bits(&flags);
diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h
index 95365fe..7a82be4 100644
--- a/src/mesa/drivers/dri/i965/intel_reg.h
+++ b/src/mesa/drivers/dri/i965/intel_reg.h
@@ -134,6 +134,15 @@ 
 #define PIPE_CONTROL_PPGTT_WRITE	(0 << 2)
 #define PIPE_CONTROL_GLOBAL_GTT_WRITE	(1 << 2)
 
+#define PIPE_CONTROL_CACHE_FLUSH_BITS \
+   (PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DATA_CACHE_FLUSH | \
+    PIPE_CONTROL_RENDER_TARGET_FLUSH)
+
+#define PIPE_CONTROL_CACHE_INVALIDATE_BITS \
+   (PIPE_CONTROL_STATE_CACHE_INVALIDATE | PIPE_CONTROL_CONST_CACHE_INVALIDATE | \
+    PIPE_CONTROL_VF_CACHE_INVALIDATE | PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
+    PIPE_CONTROL_INSTRUCTION_INVALIDATE)
+
 /** @} */
 
 #define XY_SETUP_BLT_CMD		(CMD_2D | (0x01 << 22))

Comments

3 and 4 are

Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

I did look over 3 fairly carefully.

It's worth noting that I think we have some double-pipe-controls that could
probably be put together now.  Not sure that we actually want to do that
though.
--Jason

On Thu, Jun 30, 2016 at 10:07 PM, Francisco Jerez <currojerez@riseup.net>
wrote:

> This hardware race condition has caused problems several times already
> (see "i965: Fix cache pollution race during L3 partitioning set-up.",
> "i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
> "i965: intel_texture_barrier reimplemented").  The problem is that
> whenever we attempt to both flush and invalidate multiple caches with
> a single pipe control command the flush and invalidation happen in
> reverse order, so the contents flushed from the R/W caches aren't
> guaranteed to become visible from the invalidated caches after the
> PIPE_CONTROL command completes execution if some concurrent rendering
> workload happened to pollute any of the invalidated R/O caches in the
> short window of time between the invalidation and flush.
>
> This makes sure that brw_emit_pipe_control_flush() has the effect
> expected by most callers of making the contents flushed from any R/W
> caches visible from the invalidated R/O caches.
> ---
>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++++++++++++++++++
>  src/mesa/drivers/dri/i965/intel_reg.h        |  9 +++++++++
>  2 files changed, 27 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> index 14a8f7c..05e8c05 100644
> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> @@ -96,6 +96,24 @@ gen7_cs_stall_every_four_pipe_controls(struct
> brw_context *brw, uint32_t flags)
>  void
>  brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
>  {
> +   if (brw->gen >= 6 &&
> +       (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) &&
> +       (flags & PIPE_CONTROL_CACHE_INVALIDATE_BITS)) {
> +      /* A pipe control command with flush and invalidate bits set
> +       * simultaneously is an inherently racy operation on Gen6+ if the
> +       * contents of the flushed caches were intended to become visible
> from
> +       * any of the invalidated caches.  Split it in two PIPE_CONTROLs,
> the
> +       * first one should stall the pipeline to make sure that the
> flushed R/W
> +       * caches are coherent with memory once the specified R/O caches are
> +       * invalidated.  On pre-Gen6 hardware the (implicit) R/O cache
> +       * invalidation seems to happen at the bottom of the pipeline
> together
> +       * with any write cache flush, so this shouldn't be a concern.
> +       */
> +      brw_emit_pipe_control_flush(brw, (flags &
> PIPE_CONTROL_CACHE_FLUSH_BITS) |
> +                                       PIPE_CONTROL_CS_STALL);
> +      flags &= ~(PIPE_CONTROL_CACHE_FLUSH_BITS | PIPE_CONTROL_CS_STALL);
> +   }
> +
>     if (brw->gen >= 8) {
>        if (brw->gen == 8)
>           gen8_add_cs_stall_workaround_bits(&flags);
> diff --git a/src/mesa/drivers/dri/i965/intel_reg.h
> b/src/mesa/drivers/dri/i965/intel_reg.h
> index 95365fe..7a82be4 100644
> --- a/src/mesa/drivers/dri/i965/intel_reg.h
> +++ b/src/mesa/drivers/dri/i965/intel_reg.h
> @@ -134,6 +134,15 @@
>  #define PIPE_CONTROL_PPGTT_WRITE       (0 << 2)
>  #define PIPE_CONTROL_GLOBAL_GTT_WRITE  (1 << 2)
>
> +#define PIPE_CONTROL_CACHE_FLUSH_BITS \
> +   (PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DATA_CACHE_FLUSH | \
> +    PIPE_CONTROL_RENDER_TARGET_FLUSH)
> +
> +#define PIPE_CONTROL_CACHE_INVALIDATE_BITS \
> +   (PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> PIPE_CONTROL_CONST_CACHE_INVALIDATE | \
> +    PIPE_CONTROL_VF_CACHE_INVALIDATE |
> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
> +    PIPE_CONTROL_INSTRUCTION_INVALIDATE)
> +
>  /** @} */
>
>  #define XY_SETUP_BLT_CMD               (CMD_2D | (0x01 << 22))
> --
> 2.9.0
>
>
Jason Ekstrand <jason@jlekstrand.net> writes:

> 3 and 4 are
>
> Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>

Hmm, I'll cc PATCH 2 to mesa-stable too since technically it also fixes
a bug.  PATCH 1 shouldn't make much of a difference though.

> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
>
> I did look over 3 fairly carefully.
>
Thanks.

> It's worth noting that I think we have some double-pipe-controls that could
> probably be put together now.  Not sure that we actually want to do that
> though.

I don't think it matters much, usually it's nearly the same amount of
lines of code to do it as one call to brw_emit_pipe_control_flush() or
as two, both approaches are functionally equivalent, and doing it
explicitly as two brw_emit_pipe_control_flush() calls makes it clear to
the reader that the read cache invalidations are supposed to happen
after the write caches are flushed.  In some places though it's really
convenient to be able to put all cache bits for which coherency is
required into a single bitfield and have brw_emit_pipe_control_flush()
figure out whether the command needs to be split (e.g.
brw_memory_barrier()).

> --Jason
>
> On Thu, Jun 30, 2016 at 10:07 PM, Francisco Jerez <currojerez@riseup.net>
> wrote:
>
>> This hardware race condition has caused problems several times already
>> (see "i965: Fix cache pollution race during L3 partitioning set-up.",
>> "i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
>> "i965: intel_texture_barrier reimplemented").  The problem is that
>> whenever we attempt to both flush and invalidate multiple caches with
>> a single pipe control command the flush and invalidation happen in
>> reverse order, so the contents flushed from the R/W caches aren't
>> guaranteed to become visible from the invalidated caches after the
>> PIPE_CONTROL command completes execution if some concurrent rendering
>> workload happened to pollute any of the invalidated R/O caches in the
>> short window of time between the invalidation and flush.
>>
>> This makes sure that brw_emit_pipe_control_flush() has the effect
>> expected by most callers of making the contents flushed from any R/W
>> caches visible from the invalidated R/O caches.
>> ---
>>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++++++++++++++++++
>>  src/mesa/drivers/dri/i965/intel_reg.h        |  9 +++++++++
>>  2 files changed, 27 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c
>> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
>> index 14a8f7c..05e8c05 100644
>> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
>> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
>> @@ -96,6 +96,24 @@ gen7_cs_stall_every_four_pipe_controls(struct
>> brw_context *brw, uint32_t flags)
>>  void
>>  brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
>>  {
>> +   if (brw->gen >= 6 &&
>> +       (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) &&
>> +       (flags & PIPE_CONTROL_CACHE_INVALIDATE_BITS)) {
>> +      /* A pipe control command with flush and invalidate bits set
>> +       * simultaneously is an inherently racy operation on Gen6+ if the
>> +       * contents of the flushed caches were intended to become visible
>> from
>> +       * any of the invalidated caches.  Split it in two PIPE_CONTROLs,
>> the
>> +       * first one should stall the pipeline to make sure that the
>> flushed R/W
>> +       * caches are coherent with memory once the specified R/O caches are
>> +       * invalidated.  On pre-Gen6 hardware the (implicit) R/O cache
>> +       * invalidation seems to happen at the bottom of the pipeline
>> together
>> +       * with any write cache flush, so this shouldn't be a concern.
>> +       */
>> +      brw_emit_pipe_control_flush(brw, (flags &
>> PIPE_CONTROL_CACHE_FLUSH_BITS) |
>> +                                       PIPE_CONTROL_CS_STALL);
>> +      flags &= ~(PIPE_CONTROL_CACHE_FLUSH_BITS | PIPE_CONTROL_CS_STALL);
>> +   }
>> +
>>     if (brw->gen >= 8) {
>>        if (brw->gen == 8)
>>           gen8_add_cs_stall_workaround_bits(&flags);
>> diff --git a/src/mesa/drivers/dri/i965/intel_reg.h
>> b/src/mesa/drivers/dri/i965/intel_reg.h
>> index 95365fe..7a82be4 100644
>> --- a/src/mesa/drivers/dri/i965/intel_reg.h
>> +++ b/src/mesa/drivers/dri/i965/intel_reg.h
>> @@ -134,6 +134,15 @@
>>  #define PIPE_CONTROL_PPGTT_WRITE       (0 << 2)
>>  #define PIPE_CONTROL_GLOBAL_GTT_WRITE  (1 << 2)
>>
>> +#define PIPE_CONTROL_CACHE_FLUSH_BITS \
>> +   (PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DATA_CACHE_FLUSH | \
>> +    PIPE_CONTROL_RENDER_TARGET_FLUSH)
>> +
>> +#define PIPE_CONTROL_CACHE_INVALIDATE_BITS \
>> +   (PIPE_CONTROL_STATE_CACHE_INVALIDATE |
>> PIPE_CONTROL_CONST_CACHE_INVALIDATE | \
>> +    PIPE_CONTROL_VF_CACHE_INVALIDATE |
>> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
>> +    PIPE_CONTROL_INSTRUCTION_INVALIDATE)
>> +
>>  /** @} */
>>
>>  #define XY_SETUP_BLT_CMD               (CMD_2D | (0x01 << 22))
>> --
>> 2.9.0
>>
>>
On Fri, Jul 1, 2016 at 5:43 PM, Francisco Jerez <currojerez@riseup.net>
wrote:

> Jason Ekstrand <jason@jlekstrand.net> writes:
>
> > 3 and 4 are
> >
> > Cc: "12.0 11.1 11.2" <mesa-stable@lists.freedesktop.org>
>
> Hmm, I'll cc PATCH 2 to mesa-stable too since technically it also fixes
> a bug.  PATCH 1 shouldn't make much of a difference though.
>
> > Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
> >
> > I did look over 3 fairly carefully.
> >
> Thanks.
>
> > It's worth noting that I think we have some double-pipe-controls that
> could
> > probably be put together now.  Not sure that we actually want to do that
> > though.
>
> I don't think it matters much, usually it's nearly the same amount of
> lines of code to do it as one call to brw_emit_pipe_control_flush() or
> as two, both approaches are functionally equivalent, and doing it
> explicitly as two brw_emit_pipe_control_flush() calls makes it clear to
> the reader that the read cache invalidations are supposed to happen
> after the write caches are flushed.  In some places though it's really
> convenient to be able to put all cache bits for which coherency is
> required into a single bitfield and have brw_emit_pipe_control_flush()
> figure out whether the command needs to be split (e.g.
> brw_memory_barrier()).
>

That sounds reasonable.  I wasn't even really sure about it myself but
thought it would be worth a few words of discussion.  Sounds like we're on
the same page.


> > --Jason
> >
> > On Thu, Jun 30, 2016 at 10:07 PM, Francisco Jerez <currojerez@riseup.net
> >
> > wrote:
> >
> >> This hardware race condition has caused problems several times already
> >> (see "i965: Fix cache pollution race during L3 partitioning set-up.",
> >> "i965: Fix brw_render_cache_set_check_flush's PIPE_CONTROLs." and
> >> "i965: intel_texture_barrier reimplemented").  The problem is that
> >> whenever we attempt to both flush and invalidate multiple caches with
> >> a single pipe control command the flush and invalidation happen in
> >> reverse order, so the contents flushed from the R/W caches aren't
> >> guaranteed to become visible from the invalidated caches after the
> >> PIPE_CONTROL command completes execution if some concurrent rendering
> >> workload happened to pollute any of the invalidated R/O caches in the
> >> short window of time between the invalidation and flush.
> >>
> >> This makes sure that brw_emit_pipe_control_flush() has the effect
> >> expected by most callers of making the contents flushed from any R/W
> >> caches visible from the invalidated R/O caches.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_pipe_control.c | 18 ++++++++++++++++++
> >>  src/mesa/drivers/dri/i965/intel_reg.h        |  9 +++++++++
> >>  2 files changed, 27 insertions(+)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> >> b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> >> index 14a8f7c..05e8c05 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c
> >> +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c
> >> @@ -96,6 +96,24 @@ gen7_cs_stall_every_four_pipe_controls(struct
> >> brw_context *brw, uint32_t flags)
> >>  void
> >>  brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags)
> >>  {
> >> +   if (brw->gen >= 6 &&
> >> +       (flags & PIPE_CONTROL_CACHE_FLUSH_BITS) &&
> >> +       (flags & PIPE_CONTROL_CACHE_INVALIDATE_BITS)) {
> >> +      /* A pipe control command with flush and invalidate bits set
> >> +       * simultaneously is an inherently racy operation on Gen6+ if the
> >> +       * contents of the flushed caches were intended to become visible
> >> from
> >> +       * any of the invalidated caches.  Split it in two PIPE_CONTROLs,
> >> the
> >> +       * first one should stall the pipeline to make sure that the
> >> flushed R/W
> >> +       * caches are coherent with memory once the specified R/O caches
> are
> >> +       * invalidated.  On pre-Gen6 hardware the (implicit) R/O cache
> >> +       * invalidation seems to happen at the bottom of the pipeline
> >> together
> >> +       * with any write cache flush, so this shouldn't be a concern.
> >> +       */
> >> +      brw_emit_pipe_control_flush(brw, (flags &
> >> PIPE_CONTROL_CACHE_FLUSH_BITS) |
> >> +                                       PIPE_CONTROL_CS_STALL);
> >> +      flags &= ~(PIPE_CONTROL_CACHE_FLUSH_BITS |
> PIPE_CONTROL_CS_STALL);
> >> +   }
> >> +
> >>     if (brw->gen >= 8) {
> >>        if (brw->gen == 8)
> >>           gen8_add_cs_stall_workaround_bits(&flags);
> >> diff --git a/src/mesa/drivers/dri/i965/intel_reg.h
> >> b/src/mesa/drivers/dri/i965/intel_reg.h
> >> index 95365fe..7a82be4 100644
> >> --- a/src/mesa/drivers/dri/i965/intel_reg.h
> >> +++ b/src/mesa/drivers/dri/i965/intel_reg.h
> >> @@ -134,6 +134,15 @@
> >>  #define PIPE_CONTROL_PPGTT_WRITE       (0 << 2)
> >>  #define PIPE_CONTROL_GLOBAL_GTT_WRITE  (1 << 2)
> >>
> >> +#define PIPE_CONTROL_CACHE_FLUSH_BITS \
> >> +   (PIPE_CONTROL_DEPTH_CACHE_FLUSH | PIPE_CONTROL_DATA_CACHE_FLUSH | \
> >> +    PIPE_CONTROL_RENDER_TARGET_FLUSH)
> >> +
> >> +#define PIPE_CONTROL_CACHE_INVALIDATE_BITS \
> >> +   (PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> >> PIPE_CONTROL_CONST_CACHE_INVALIDATE | \
> >> +    PIPE_CONTROL_VF_CACHE_INVALIDATE |
> >> PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
> >> +    PIPE_CONTROL_INSTRUCTION_INVALIDATE)
> >> +
> >>  /** @} */
> >>
> >>  #define XY_SETUP_BLT_CMD               (CMD_2D | (0x01 << 22))
> >> --
> >> 2.9.0
> >>
> >>
>