[1/2] i965: add missing rollback of URB requirements

Submitted by andrey simiklit on Nov. 19, 2018, 9:19 p.m.

Details

Message ID 20181119211938.31394-1-asimiklit.work@gmail.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

andrey simiklit Nov. 19, 2018, 9:19 p.m.
From: Andrii Simiklit <andrii.simiklit@globallogic.com>

This patch is needed to avoid missing 3DSTATE_URB_* commands in a batch after
rollback operation.

To be able to test easily this issue
the following workaround is necessary:
-if (!brw_batch_has_aperture_space(brw, 0)) {
by:
+if (true) {
in brw_draw_single_prim function.

This workaround forces a rollback for each batch.
The "gl-3.2-adj-prims pv-first -auto -fbo" piglit test with this workaround
led to a hung. The hung investigation helped to find that some batches
didn't have the 3DSTATE_URB_* commands in compare to batches which were
generated without this workaround.

Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
---
 src/mesa/drivers/dri/i965/brw_context.h       |  8 ++++++++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 ++++++++++++
 2 files changed, 20 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
index 7fd15669eb..4e1682ba5c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -521,6 +521,14 @@  struct intel_batchbuffer {
       int batch_reloc_count;
       int state_reloc_count;
       int exec_count;
+      struct {
+         GLuint vsize;
+         GLuint gsize;
+         GLuint hsize;
+         GLuint dsize;
+         bool gs_present;
+         bool tess_present;
+      } urb;
    } saved;
 
    /** Map from batch offset to brw_state_batch data (with DEBUG_BATCH) */
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 353fcba18f..bae9f2ffed 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -299,6 +299,12 @@  intel_batchbuffer_save_state(struct brw_context *brw)
    brw->batch.saved.batch_reloc_count = brw->batch.batch_relocs.reloc_count;
    brw->batch.saved.state_reloc_count = brw->batch.state_relocs.reloc_count;
    brw->batch.saved.exec_count = brw->batch.exec_count;
+   brw->batch.saved.urb.vsize = brw->urb.vsize;
+   brw->batch.saved.urb.gs_present = brw->urb.gs_present;
+   brw->batch.saved.urb.gsize = brw->urb.gsize;
+   brw->batch.saved.urb.tess_present = brw->urb.tess_present;
+   brw->batch.saved.urb.hsize = brw->urb.hsize;
+   brw->batch.saved.urb.dsize = brw->urb.dsize;
 }
 
 void
@@ -313,6 +319,12 @@  intel_batchbuffer_reset_to_saved(struct brw_context *brw)
    brw->batch.exec_count = brw->batch.saved.exec_count;
 
    brw->batch.map_next = brw->batch.saved.map_next;
+   brw->urb.vsize = brw->batch.saved.urb.vsize;
+   brw->urb.gs_present = brw->batch.saved.urb.gs_present;
+   brw->urb.gsize = brw->batch.saved.urb.gsize;
+   brw->urb.tess_present = brw->batch.saved.urb.tess_present;
+   brw->urb.hsize = brw->batch.saved.urb.hsize;
+   brw->urb.dsize = brw->batch.saved.urb.dsize;
    if (USED_BATCH(brw->batch) == 0)
       brw_new_batch(brw);
 }

Comments

Hi all,

Could somebody take a look at this patch?

Regards,
Andrii.

On Mon, Nov 19, 2018 at 11:19 PM <asimiklit.work@gmail.com> wrote:

> From: Andrii Simiklit <andrii.simiklit@globallogic.com>
>
> This patch is needed to avoid missing 3DSTATE_URB_* commands in a batch
> after
> rollback operation.
>
> To be able to test easily this issue
> the following workaround is necessary:
> -if (!brw_batch_has_aperture_space(brw, 0)) {
> by:
> +if (true) {
> in brw_draw_single_prim function.
>
> This workaround forces a rollback for each batch.
> The "gl-3.2-adj-prims pv-first -auto -fbo" piglit test with this workaround
> led to a hung. The hung investigation helped to find that some batches
> didn't have the 3DSTATE_URB_* commands in compare to batches which were
> generated without this workaround.
>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> ---
>  src/mesa/drivers/dri/i965/brw_context.h       |  8 ++++++++
>  src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 ++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 7fd15669eb..4e1682ba5c 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -521,6 +521,14 @@ struct intel_batchbuffer {
>        int batch_reloc_count;
>        int state_reloc_count;
>        int exec_count;
> +      struct {
> +         GLuint vsize;
> +         GLuint gsize;
> +         GLuint hsize;
> +         GLuint dsize;
> +         bool gs_present;
> +         bool tess_present;
> +      } urb;
>     } saved;
>
>     /** Map from batch offset to brw_state_batch data (with DEBUG_BATCH) */
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 353fcba18f..bae9f2ffed 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -299,6 +299,12 @@ intel_batchbuffer_save_state(struct brw_context *brw)
>     brw->batch.saved.batch_reloc_count =
> brw->batch.batch_relocs.reloc_count;
>     brw->batch.saved.state_reloc_count =
> brw->batch.state_relocs.reloc_count;
>     brw->batch.saved.exec_count = brw->batch.exec_count;
> +   brw->batch.saved.urb.vsize = brw->urb.vsize;
> +   brw->batch.saved.urb.gs_present = brw->urb.gs_present;
> +   brw->batch.saved.urb.gsize = brw->urb.gsize;
> +   brw->batch.saved.urb.tess_present = brw->urb.tess_present;
> +   brw->batch.saved.urb.hsize = brw->urb.hsize;
> +   brw->batch.saved.urb.dsize = brw->urb.dsize;
>  }
>
>  void
> @@ -313,6 +319,12 @@ intel_batchbuffer_reset_to_saved(struct brw_context
> *brw)
>     brw->batch.exec_count = brw->batch.saved.exec_count;
>
>     brw->batch.map_next = brw->batch.saved.map_next;
> +   brw->urb.vsize = brw->batch.saved.urb.vsize;
> +   brw->urb.gs_present = brw->batch.saved.urb.gs_present;
> +   brw->urb.gsize = brw->batch.saved.urb.gsize;
> +   brw->urb.tess_present = brw->batch.saved.urb.tess_present;
> +   brw->urb.hsize = brw->batch.saved.urb.hsize;
> +   brw->urb.dsize = brw->batch.saved.urb.dsize;
>     if (USED_BATCH(brw->batch) == 0)
>        brw_new_batch(brw);
>  }
> --
> 2.17.1
>
>
Hi Andrii,

Although I think what these patches do makes sense, I think it's missing 
the bigger picture.
There is a lot more state that gets lost if we have to revert all of the 
emitted commands.
A quick look at brw_upload_pipeline_state() shows all of the programs 
could be invalid as well, or the number of samples, etc...

To me it seems like we need a much larger state save/restore.

Ken: what do you think?

Cheers,

-
Lionel

On 19/11/2018 21:19, asimiklit.work@gmail.com wrote:
> From: Andrii Simiklit <andrii.simiklit@globallogic.com>
>
> This patch is needed to avoid missing 3DSTATE_URB_* commands in a batch after
> rollback operation.
>
> To be able to test easily this issue
> the following workaround is necessary:
> -if (!brw_batch_has_aperture_space(brw, 0)) {
> by:
> +if (true) {
> in brw_draw_single_prim function.
>
> This workaround forces a rollback for each batch.
> The "gl-3.2-adj-prims pv-first -auto -fbo" piglit test with this workaround
> led to a hung. The hung investigation helped to find that some batches
> didn't have the 3DSTATE_URB_* commands in compare to batches which were
> generated without this workaround.
>
> Reported-by: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> ---
>   src/mesa/drivers/dri/i965/brw_context.h       |  8 ++++++++
>   src/mesa/drivers/dri/i965/intel_batchbuffer.c | 12 ++++++++++++
>   2 files changed, 20 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index 7fd15669eb..4e1682ba5c 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -521,6 +521,14 @@ struct intel_batchbuffer {
>         int batch_reloc_count;
>         int state_reloc_count;
>         int exec_count;
> +      struct {
> +         GLuint vsize;
> +         GLuint gsize;
> +         GLuint hsize;
> +         GLuint dsize;
> +         bool gs_present;
> +         bool tess_present;
> +      } urb;
>      } saved;
>   
>      /** Map from batch offset to brw_state_batch data (with DEBUG_BATCH) */
> diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> index 353fcba18f..bae9f2ffed 100644
> --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
> @@ -299,6 +299,12 @@ intel_batchbuffer_save_state(struct brw_context *brw)
>      brw->batch.saved.batch_reloc_count = brw->batch.batch_relocs.reloc_count;
>      brw->batch.saved.state_reloc_count = brw->batch.state_relocs.reloc_count;
>      brw->batch.saved.exec_count = brw->batch.exec_count;
> +   brw->batch.saved.urb.vsize = brw->urb.vsize;
> +   brw->batch.saved.urb.gs_present = brw->urb.gs_present;
> +   brw->batch.saved.urb.gsize = brw->urb.gsize;
> +   brw->batch.saved.urb.tess_present = brw->urb.tess_present;
> +   brw->batch.saved.urb.hsize = brw->urb.hsize;
> +   brw->batch.saved.urb.dsize = brw->urb.dsize;
>   }
>   
>   void
> @@ -313,6 +319,12 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw)
>      brw->batch.exec_count = brw->batch.saved.exec_count;
>   
>      brw->batch.map_next = brw->batch.saved.map_next;
> +   brw->urb.vsize = brw->batch.saved.urb.vsize;
> +   brw->urb.gs_present = brw->batch.saved.urb.gs_present;
> +   brw->urb.gsize = brw->batch.saved.urb.gsize;
> +   brw->urb.tess_present = brw->batch.saved.urb.tess_present;
> +   brw->urb.hsize = brw->batch.saved.urb.hsize;
> +   brw->urb.dsize = brw->batch.saved.urb.dsize;
>      if (USED_BATCH(brw->batch) == 0)
>         brw_new_batch(brw);
>   }
Quoting Lionel Landwerlin (2019-01-08 11:03:26)
> Hi Andrii,
> 
> Although I think what these patches do makes sense, I think it's missing 
> the bigger picture.
> There is a lot more state that gets lost if we have to revert all of the 
> emitted commands.
> A quick look at brw_upload_pipeline_state() shows all of the programs 
> could be invalid as well, or the number of samples, etc...
> 
> To me it seems like we need a much larger state save/restore.
> 
> Ken: what do you think?

How about not rolling back? If we accept it as currently broken, and just
demand the kernel provide logical contexts for all i965 devices (just ack
some patches!), and then just flush the batch (possibly with a dummy 3D
prim if you want to be sure the 3D state is loaded) and rely on the
context preserving state across batches.
-Chris
On 08/01/2019 11:11, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-01-08 11:03:26)
>> Hi Andrii,
>>
>> Although I think what these patches do makes sense, I think it's missing
>> the bigger picture.
>> There is a lot more state that gets lost if we have to revert all of the
>> emitted commands.
>> A quick look at brw_upload_pipeline_state() shows all of the programs
>> could be invalid as well, or the number of samples, etc...
>>
>> To me it seems like we need a much larger state save/restore.
>>
>> Ken: what do you think?
> How about not rolling back? If we accept it as currently broken, and just
> demand the kernel provide logical contexts for all i965 devices (just ack
> some patches!), and then just flush the batch (possibly with a dummy 3D
> prim if you want to be sure the 3D state is loaded) and rely on the
> context preserving state across batches.
> -Chris

Actually I'm not sure why this isn't happening already :|
Unless you're talking about kernel patches? ;)

-
Lionel
On Tue, Jan 8, 2019 at 1:11 PM Chris Wilson <chris@chris-wilson.co.uk>
wrote:

> Quoting Lionel Landwerlin (2019-01-08 11:03:26)
> > Hi Andrii,
> >
> > Although I think what these patches do makes sense, I think it's missing
> > the bigger picture.
> > There is a lot more state that gets lost if we have to revert all of the
> > emitted commands.
> > A quick look at brw_upload_pipeline_state() shows all of the programs
> > could be invalid as well, or the number of samples, etc...
> >
> > To me it seems like we need a much larger state save/restore.
> >
> > Ken: what do you think?
>
> How about not rolling back? If we accept it as currently broken, and just
> demand the kernel provide logical contexts for all i965 devices (just ack
> some patches!), and then just flush the batch (possibly with a dummy 3D
> prim if you want to be sure the 3D state is loaded) and rely on the
> context preserving state across batches.
> -Chris
>

Could you please provide a link to the patches you talking about?

At the moment as far as I understood the rollback system
helps to reduce quantity of the flush operations in some lucky cases.
When there isn't enough space for a batch aperture,
we will rollback the batch to the saved state,
when limit (means aperture_threshold) wasn't reached by it.

pseudo code (simple variant, without error handling):
   save batch state here;
   do
   {
       add primitives to the batch;
       if the batch size limit is exceeded
       {
           rollback to the saved batch state;
           flush the batch;
       }
       else
       {
           break;
       }
   } while(true);

Are you suggesting to flush the batch every time without waiting for a
nearly full filling of it?
Like this:
   add primitives to batch;
   flush batch;

Please correct me if I am wrong.

-Andrii


> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
Quoting andrey simiklit (2019-01-08 16:00:45)
> On Tue, Jan 8, 2019 at 1:11 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
>     Quoting Lionel Landwerlin (2019-01-08 11:03:26)
>     > Hi Andrii,
>     >
>     > Although I think what these patches do makes sense, I think it's missing
>     > the bigger picture.
>     > There is a lot more state that gets lost if we have to revert all of the
>     > emitted commands.
>     > A quick look at brw_upload_pipeline_state() shows all of the programs
>     > could be invalid as well, or the number of samples, etc...
>     >
>     > To me it seems like we need a much larger state save/restore.
>     >
>     > Ken: what do you think?
> 
>     How about not rolling back? If we accept it as currently broken, and just
>     demand the kernel provide logical contexts for all i965 devices (just ack
>     some patches!), and then just flush the batch (possibly with a dummy 3D
>     prim if you want to be sure the 3D state is loaded) and rely on the
>     context preserving state across batches.
>     -Chris
> 
> 
> Could you please provide a link to the patches you talking about?

I just need an ack for the kernel patches to enable context support.
 
> At the moment as far as I understood the rollback system
> helps to reduce quantity of the flush operations in some lucky cases.
> When there isn't enough space for a batch aperture,
> we will rollback the batch to the saved state,
> when limit (means aperture_threshold) wasn't reached by it.
> 
> pseudo code (simple variant, without error handling):
>    save batch state here;
>    do
>    {
>        add primitives to the batch;
>        if the batch size limit is exceeded  
>        {
>            rollback to the saved batch state;
>            flush the batch;
>        }
>        else
>        {
>            break;
>        }
>    } while(true);
> 
> Are you suggesting to flush the batch every time without waiting for a nearly
> full filling of it?
> Like this:
>    add primitives to batch;
>    flush batch;

The suggestion was just thinking about if we detect that this primitive
would exceed the aperture, instead of rolling back the batch to before
the primitive, unroll the objects/relocs (so that we don't trigger
ENOSPC from the kernel) but keep the 3DSTATE without the final PRIM and
submit that.

Basically it's all about removing no_batch_wrap.

Quite icky. All it saves is having to track the incidental details like
URB, but you still have to re-emit all the surface state (but that's
already a given as it is stored alongside the batch). However, that's
all you have to remember.
-Chris
On Tue, Jan 8, 2019 at 6:16 PM Chris Wilson <chris@chris-wilson.co.uk>
wrote:

> Quoting andrey simiklit (2019-01-08 16:00:45)
> > On Tue, Jan 8, 2019 at 1:11 PM Chris Wilson <chris@chris-wilson.co.uk>
> wrote:
> >
> >     Quoting Lionel Landwerlin (2019-01-08 11:03:26)
> >     > Hi Andrii,
> >     >
> >     > Although I think what these patches do makes sense, I think it's
> missing
> >     > the bigger picture.
> >     > There is a lot more state that gets lost if we have to revert all
> of the
> >     > emitted commands.
> >     > A quick look at brw_upload_pipeline_state() shows all of the
> programs
> >     > could be invalid as well, or the number of samples, etc...
> >     >
> >     > To me it seems like we need a much larger state save/restore.
> >     >
> >     > Ken: what do you think?
> >
> >     How about not rolling back? If we accept it as currently broken, and
> just
> >     demand the kernel provide logical contexts for all i965 devices
> (just ack
> >     some patches!), and then just flush the batch (possibly with a dummy
> 3D
> >     prim if you want to be sure the 3D state is loaded) and rely on the
> >     context preserving state across batches.
> >     -Chris
> >
> >
> > Could you please provide a link to the patches you talking about?
>
> I just need an ack for the kernel patches to enable context support.
>
> > At the moment as far as I understood the rollback system
> > helps to reduce quantity of the flush operations in some lucky cases.
> > When there isn't enough space for a batch aperture,
> > we will rollback the batch to the saved state,
> > when limit (means aperture_threshold) wasn't reached by it.
> >
> > pseudo code (simple variant, without error handling):
> >    save batch state here;
> >    do
> >    {
> >        add primitives to the batch;
> >        if the batch size limit is exceeded
> >        {
> >            rollback to the saved batch state;
> >            flush the batch;
> >        }
> >        else
> >        {
> >            break;
> >        }
> >    } while(true);
> >
> > Are you suggesting to flush the batch every time without waiting for a
> nearly
> > full filling of it?
> > Like this:
> >    add primitives to batch;
> >    flush batch;
>
> The suggestion was just thinking about if we detect that this primitive
> would exceed the aperture, instead of rolling back the batch to before
> the primitive, unroll the objects/relocs (so that we don't trigger
> ENOSPC from the kernel) but keep the 3DSTATE without the final PRIM and
> submit that.
>
> Basically it's all about removing no_batch_wrap.
>
> Quite icky. All it saves is having to track the incidental details like
> URB, but you still have to re-emit all the surface state (but that's
> already a given as it is stored alongside the batch). However, that's
> all you have to remember.
> -Chris
>

Thanks a lot for clarifications.
-Andrii


> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
On Tuesday, January 8, 2019 3:11:37 AM PST Chris Wilson wrote:
> Quoting Lionel Landwerlin (2019-01-08 11:03:26)
> > Hi Andrii,
> > 
> > Although I think what these patches do makes sense, I think it's missing 
> > the bigger picture.
> > There is a lot more state that gets lost if we have to revert all of the 
> > emitted commands.
> > A quick look at brw_upload_pipeline_state() shows all of the programs 
> > could be invalid as well, or the number of samples, etc...
> > 
> > To me it seems like we need a much larger state save/restore.
> > 
> > Ken: what do you think?
> 
> How about not rolling back? If we accept it as currently broken, and just
> demand the kernel provide logical contexts for all i965 devices (just ack
> some patches!), and then just flush the batch (possibly with a dummy 3D
> prim if you want to be sure the 3D state is loaded) and rely on the
> context preserving state across batches.
> -Chris

I'm not sure precisely what you're proposing, but every scenario I can
think of is breaking in some subtle way.

Option 1: Submit batch at last 3DPRIMITIVE, re-call emit in a new batch.

This is what we do today; it doesn't work since brw_upload_render_state
implicitly sets CPU-side fields for "current state of the GPU", which
become inaccurate at rollback, so the next state upload would do the
wrong thing.  We'd need to roll all those back as well, or only commit
them when we finish uploading state.  Atoms also flag other new atoms,
and that's not getting rolled back, but extra flagging is harmless.

Fixing this means disentangling some of our code, saving and restoring
more values, and so on.  Maybe dropping a few CPU micro-optimizations.

Option 2: Submit batch at last 3DPRIMITIVE (A), memcpy commands for
          for failing primitive (B) into a new batch.

This doesn't work either.  Let's say (A) issued a 3DSTATE_INDEX_BUFFER
command with a pointer to the index buffer.  (B) didn't change index
buffer state, so it doesn't emit one, intending to reuse the old value.

The kernel will process the batch containing (A), and relocate the index
buffer somewhere to some address.  Context save will store that address.
Context load will happen before executing (B)'s batch, and restore the
old pointer.  But batch (B) won't have any relocations for the index
buffer pointer (because there's no command to patch up).  So, we'd
either need relocations to patch the GEM context image...or we'd have
to softpin all inherited buffers.

Maybe softpinning inherited buffers in an otherwise relocation-centric
world is viable?  Does softpin even exist on Gen4?  This doesn't seem
easy.

Option 3: Submit batch as soon as we overflow aperture checks, with
          state for a primitive ending up in two batches.

This is not viable either - has all the problems of option 2, but also
means checking aperture everywhere...and our binding table upload code
means essentially pulling in all surfaces as one big group, which is
dodgy...
Quoting Kenneth Graunke (2019-01-08 20:17:01)
> On Tuesday, January 8, 2019 3:11:37 AM PST Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-01-08 11:03:26)
> > > Hi Andrii,
> > > 
> > > Although I think what these patches do makes sense, I think it's missing 
> > > the bigger picture.
> > > There is a lot more state that gets lost if we have to revert all of the 
> > > emitted commands.
> > > A quick look at brw_upload_pipeline_state() shows all of the programs 
> > > could be invalid as well, or the number of samples, etc...
> > > 
> > > To me it seems like we need a much larger state save/restore.
> > > 
> > > Ken: what do you think?
> > 
> > How about not rolling back? If we accept it as currently broken, and just
> > demand the kernel provide logical contexts for all i965 devices (just ack
> > some patches!), and then just flush the batch (possibly with a dummy 3D
> > prim if you want to be sure the 3D state is loaded) and rely on the
> > context preserving state across batches.
> > -Chris
> 
> I'm not sure precisely what you're proposing, but every scenario I can
> think of is breaking in some subtle way.
> 
> Option 1: Submit batch at last 3DPRIMITIVE, re-call emit in a new batch.
> 
> This is what we do today; it doesn't work since brw_upload_render_state
> implicitly sets CPU-side fields for "current state of the GPU", which
> become inaccurate at rollback, so the next state upload would do the
> wrong thing.  We'd need to roll all those back as well, or only commit
> them when we finish uploading state.  Atoms also flag other new atoms,
> and that's not getting rolled back, but extra flagging is harmless.
> 
> Fixing this means disentangling some of our code, saving and restoring
> more values, and so on.  Maybe dropping a few CPU micro-optimizations.
> 
> Option 2: Submit batch at last 3DPRIMITIVE (A), memcpy commands for
>           for failing primitive (B) into a new batch.
> 
> This doesn't work either.  Let's say (A) issued a 3DSTATE_INDEX_BUFFER
> command with a pointer to the index buffer.  (B) didn't change index
> buffer state, so it doesn't emit one, intending to reuse the old value.

This with all relocs with BRW_NEW_BATCH, and iirc the rule is that they
are already tied into that BRW_NEW_BATCH flag. Except you don't need to
memcpy, since you already have the 3DSTATE equivalent to the cpu state
tracker in the batch, emit that, and it will be recorded in the context
for the next batch. (You just don't tell the kernel about the
buffers/relocs that cause the aperture exhaustion and it won't do
anything about them. The GPU may load stale addresses, but never uses
them, exactly like the context state on the next batch anyway.)

It's not a great argument, but the argument is that there's less you
need to fixup, than having to remember to every single comparison in
order to rollback the cpu state tracker. 
-Chris