[v2] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

Submitted by andrey simiklit on June 21, 2018, 1:40 p.m.

Details

Message ID 1529588435-29287-1-git-send-email-andrii.simiklit@globallogic.com
State New
Headers show
Series "i965/gen6/gs: Handle case where a GS doesn't allocate VUE" ( rev: 2 ) in Mesa

Not browsing as part of any series.

Commit Message

andrey simiklit June 21, 2018, 1:40 p.m.
We can not use the VUE Dereference flags combination for EOT
message under ILK and SNB because the threads are not initialized
there with initial VUE handle unlike Pre-IL.
So to avoid GPU hangs on SNB and ILK we need
to avoid usage of the VUE Dereference flags combination.
(Was tested only on SNB but according to the specification
SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
the ILK must behave itself in the similar way)

v2: Approach to fix this issue was changed.
Instead of different EOT flags in the program end
we will create VUE every time even if GS produces no output.

Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
---
 src/intel/compiler/gen6_gs_visitor.cpp | 88 +++++++++-------------------------
 1 file changed, 23 insertions(+), 65 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/gen6_gs_visitor.cpp b/src/intel/compiler/gen6_gs_visitor.cpp
index ac3ba55..b831d33 100644
--- a/src/intel/compiler/gen6_gs_visitor.cpp
+++ b/src/intel/compiler/gen6_gs_visitor.cpp
@@ -300,11 +300,10 @@  gen6_gs_visitor::emit_urb_write_opcode(bool complete, int base_mrf,
       /* Otherwise we always request to allocate a new VUE handle. If this is
        * the last write before the EOT message and the new handle never gets
        * used it will be dereferenced when we send the EOT message. This is
-       * necessary to avoid different setups (under Pre-IL only) for the EOT message (one for the
+       * necessary to avoid different setups for the EOT message (one for the
        * case when there is no output and another for the case when there is)
        * which would require to end the program with an IF/ELSE/ENDIF block,
-       * something we do not want. 
-       * But for ILK and SNB we can not avoid the end the program with an IF/ELSE/ENDIF block.
+       * something we do not want.
        */
       inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
       inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
@@ -351,27 +350,27 @@  gen6_gs_visitor::emit_thread_end()
    int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
 
    /* Issue the FF_SYNC message and obtain the initial VUE handle. */
-   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G));
-   emit(IF(BRW_PREDICATE_NORMAL));
-   {
-      this->current_annotation = "gen6 thread end: ff_sync";
+   this->current_annotation = "gen6 thread end: ff_sync";
 
-      vec4_instruction *inst;
-      if (prog->info.has_transform_feedback_varyings) {
+   vec4_instruction *inst = NULL;
+   if (prog->info.has_transform_feedback_varyings) {
          src_reg sol_temp(this, glsl_type::uvec4_type);
          emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
-              dst_reg(this->svbi),
-              this->vertex_count,
-              this->prim_count,
-              sol_temp);
+               dst_reg(this->svbi),
+               this->vertex_count,
+               this->prim_count,
+               sol_temp);
          inst = emit(GS_OPCODE_FF_SYNC,
                      dst_reg(this->temp), this->prim_count, this->svbi);
-      } else {
+   } else {
          inst = emit(GS_OPCODE_FF_SYNC,
                      dst_reg(this->temp), this->prim_count, brw_imm_ud(0u));
-      }
-      inst->base_mrf = base_mrf;
+   }
+   inst->base_mrf = base_mrf;
 
+   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G));
+   emit(IF(BRW_PREDICATE_NORMAL));
+   {
       /* Loop over all buffered vertices and emit URB write messages */
       this->current_annotation = "gen6 thread end: urb writes init";
       src_reg vertex(this, glsl_type::uint_type);
@@ -415,7 +414,7 @@  gen6_gs_visitor::emit_thread_end()
                dst_reg reg = dst_reg(MRF, mrf);
                reg.type = output_reg[varying][0].type;
                data.type = reg.type;
-               vec4_instruction *inst = emit(MOV(reg, data));
+               inst = emit(MOV(reg, data));
                inst->force_writemask_all = true;
 
                mrf++;
@@ -450,11 +449,8 @@  gen6_gs_visitor::emit_thread_end()
       if (prog->info.has_transform_feedback_varyings)
          xfb_write();
    }
-   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);
-   if(common_eot_approach_can_be_used)
-   {
-      emit(BRW_OPCODE_ENDIF);  
-   }
+   emit(BRW_OPCODE_ENDIF);
+
    /* Finally, emit EOT message.
     *
     * In gen6 we need to end the thread differently depending on whether we have
@@ -464,35 +460,11 @@  gen6_gs_visitor::emit_thread_end()
     *
     * However, this would lead us to end the program with an ENDIF opcode,
     * which we want to avoid, so what we do is that we always request a new
-    * VUE handle every time we do a URB WRITE, even for the last vertex we emit.
+    * VUE handle every time, even if GS produces no output.
     * With this we make sure that whether we have emitted at least one vertex
     * or none at all, we have to finish the thread without writing to the URB,
-    * which works for both cases (but only under Pre-IL) by setting 
-    * the COMPLETE and UNUSED flags in the EOT message.
-    * 
-    * But under ILK or SNB we must not use combination COMPLETE and UNUSED 
-    * because this combination could be used only for already allocated VUE. 
-    * But unlike Pre-IL in the ILK and SNB 
-    * the initial VUE is not passed to threads. 
-    * This behaver mentioned in specification: 
-    * SNB Volume 2 Part 1:
-    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
-    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
-    *     "The threads are not passed an initial handle.  
-    *     Instead, they request a first handle (if any) 
-    *     via the URB shared function’s FF_SYNC message (see Shared Functions). 
-    *     If additional handles are required, 
-    *     the URB_WRITE allocate mechanism (mentioned above) is used."
-    * 
-    * So for ILK and for SNB we must use only UNUSED flag.
-    * This is accepteble combination according to:
-    *    SNB Volume 4 Part 2:
-    *       "2.4.2 Message Descriptor"
-    *          "Table lists the valid and invalid combinations of 
-    *           the Complete, Used, Allocate and EOT bits"
-    *          "Thread terminate non-write of URB"
-    *    SNB Volume 2 Part 1:
-    *       "1.6.5.6 Thread Termination"
+    * which works for both cases by setting the COMPLETE and UNUSED flags in
+    * the EOT message.
     */
    this->current_annotation = "gen6 thread end: EOT";
 
@@ -504,26 +476,12 @@  gen6_gs_visitor::emit_thread_end()
       emit(GS_OPCODE_SET_DWORD_2, dst_reg(MRF, base_mrf), data);
    }
 
-   vec4_instruction *inst = emit(GS_OPCODE_THREAD_END);
+   inst = emit(GS_OPCODE_THREAD_END);
    inst->urb_write_flags = BRW_URB_WRITE_COMPLETE | BRW_URB_WRITE_UNUSED;
    inst->base_mrf = base_mrf;
    inst->mlen = 1;
-   
-   if(!common_eot_approach_can_be_used)
-   {
-      emit(BRW_OPCODE_ELSE);
-      
-      this->current_annotation = "gen6 thread end: EOT";
-
-      vec4_instruction *unused_urb_inst = emit(GS_OPCODE_THREAD_END);
-      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
-      unused_urb_inst->base_mrf = base_mrf;
-      unused_urb_inst->mlen = 1;
-
-      emit(BRW_OPCODE_ENDIF);  
-   }
 }
-   
+
 void
 gen6_gs_visitor::setup_payload()
 {

Comments

Hello,

Thanks for your feedback.
I made changes which you requested.
I hope that I understood you right.
The solution which you suggested "VUE allocation unconditionally" fixes this bug
and we have avoided the endif in the program end.

> Ok, then we are in agreement.
>>  Section 1.6.5.4 VUE Allocation:
>>   " The following description is applicable only to the GS stage.
>>     The threads are not passed an initial handle.
>>     In stead, they request a first handle (if any) via the URB
>>     shared function’s FF_SYNC message (see Shared Functions).
>>     If additional handles are required,
>>     the URB_WRITE allocate mechanism (mentioned above) is used."If GS doesn't allocate/request VUEs then GS shouldn't use the 
>> Dereference (COMPLETE + UNUSED) message. So when GS produces no 
>> output GS doesn't allocate VUEs at all and GS shouldn't use 
>> Dereference message. 
> Agreed as well. But do notice that none of this is pre-ILK as far as 
> the documentation goes, it is the same across all supported platforms 
> up to SNB.

But do notice that according to

    Section "1.6.5.2 VUE Allocation (GS, CLIP) [Pre-DevIL] " (vol2, part1)
     " The following description is applicable only to the GS, CLIP stages.
       The GS and CLIP threads are passed a single, initial destination VUE handle.
       These threads may be required to output more than one destination VUE, and
       therefore they are provided with a mechanism to
       request additional handles.
       ......"

So if we want to support Pre-ILK we have to implement the different solutions for ILK, SNB (1) and for Pre-ILK (2):
    1. The "make the FF_SYNC happen unconditionally" solution will work correctly
         for ILK, SNB according to "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
         and "1.6.5.4 VUE Allocation (GS) [DevSNB+]" (vol2, part1).
    2. The current GS implementation will work correctly
         for Pre-ILK according to "1.6.5.2 VUE Allocation (GS, CLIP) [Pre-DevIL]"

Must I implement something additional for Pre-ILK?

Regards,
Andrii.

On 21.06.18 16:40, Andrii Simiklit wrote:

> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
>
> v2: Approach to fix this issue was changed.
> Instead of different EOT flags in the program end
> we will create VUE every time even if GS produces no output.
>
> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> ---
>   src/intel/compiler/gen6_gs_visitor.cpp | 88 +++++++++-------------------------
>   1 file changed, 23 insertions(+), 65 deletions(-)
>
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp b/src/intel/compiler/gen6_gs_visitor.cpp
> index ac3ba55..b831d33 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,11 +300,10 @@ gen6_gs_visitor::emit_urb_write_opcode(bool complete, int base_mrf,
>         /* Otherwise we always request to allocate a new VUE handle. If this is
>          * the last write before the EOT message and the new handle never gets
>          * used it will be dereferenced when we send the EOT message. This is
> -       * necessary to avoid different setups (under Pre-IL only) for the EOT message (one for the
> +       * necessary to avoid different setups for the EOT message (one for the
>          * case when there is no output and another for the case when there is)
>          * which would require to end the program with an IF/ELSE/ENDIF block,
> -       * something we do not want.
> -       * But for ILK and SNB we can not avoid the end the program with an IF/ELSE/ENDIF block.
> +       * something we do not want.
>          */
>         inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>         inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -351,27 +350,27 @@ gen6_gs_visitor::emit_thread_end()
>      int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
>   
>      /* Issue the FF_SYNC message and obtain the initial VUE handle. */
> -   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G));
> -   emit(IF(BRW_PREDICATE_NORMAL));
> -   {
> -      this->current_annotation = "gen6 thread end: ff_sync";
> +   this->current_annotation = "gen6 thread end: ff_sync";
>   
> -      vec4_instruction *inst;
> -      if (prog->info.has_transform_feedback_varyings) {
> +   vec4_instruction *inst = NULL;
> +   if (prog->info.has_transform_feedback_varyings) {
>            src_reg sol_temp(this, glsl_type::uvec4_type);
>            emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> -              dst_reg(this->svbi),
> -              this->vertex_count,
> -              this->prim_count,
> -              sol_temp);
> +               dst_reg(this->svbi),
> +               this->vertex_count,
> +               this->prim_count,
> +               sol_temp);
>            inst = emit(GS_OPCODE_FF_SYNC,
>                        dst_reg(this->temp), this->prim_count, this->svbi);
> -      } else {
> +   } else {
>            inst = emit(GS_OPCODE_FF_SYNC,
>                        dst_reg(this->temp), this->prim_count, brw_imm_ud(0u));
> -      }
> -      inst->base_mrf = base_mrf;
> +   }
> +   inst->base_mrf = base_mrf;
>   
> +   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G));
> +   emit(IF(BRW_PREDICATE_NORMAL));
> +   {
>         /* Loop over all buffered vertices and emit URB write messages */
>         this->current_annotation = "gen6 thread end: urb writes init";
>         src_reg vertex(this, glsl_type::uint_type);
> @@ -415,7 +414,7 @@ gen6_gs_visitor::emit_thread_end()
>                  dst_reg reg = dst_reg(MRF, mrf);
>                  reg.type = output_reg[varying][0].type;
>                  data.type = reg.type;
> -               vec4_instruction *inst = emit(MOV(reg, data));
> +               inst = emit(MOV(reg, data));
>                  inst->force_writemask_all = true;
>   
>                  mrf++;
> @@ -450,11 +449,8 @@ gen6_gs_visitor::emit_thread_end()
>         if (prog->info.has_transform_feedback_varyings)
>            xfb_write();
>      }
> -   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);
> -   if(common_eot_approach_can_be_used)
> -   {
> -      emit(BRW_OPCODE_ENDIF);
> -   }
> +   emit(BRW_OPCODE_ENDIF);
> +
>      /* Finally, emit EOT message.
>       *
>       * In gen6 we need to end the thread differently depending on whether we have
> @@ -464,35 +460,11 @@ gen6_gs_visitor::emit_thread_end()
>       *
>       * However, this would lead us to end the program with an ENDIF opcode,
>       * which we want to avoid, so what we do is that we always request a new
> -    * VUE handle every time we do a URB WRITE, even for the last vertex we emit.
> +    * VUE handle every time, even if GS produces no output.
>       * With this we make sure that whether we have emitted at least one vertex
>       * or none at all, we have to finish the thread without writing to the URB,
> -    * which works for both cases (but only under Pre-IL) by setting
> -    * the COMPLETE and UNUSED flags in the EOT message.
> -    *
> -    * But under ILK or SNB we must not use combination COMPLETE and UNUSED
> -    * because this combination could be used only for already allocated VUE.
> -    * But unlike Pre-IL in the ILK and SNB
> -    * the initial VUE is not passed to threads.
> -    * This behaver mentioned in specification:
> -    * SNB Volume 2 Part 1:
> -    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
> -    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
> -    *     "The threads are not passed an initial handle.
> -    *     Instead, they request a first handle (if any)
> -    *     via the URB shared function’s FF_SYNC message (see Shared Functions).
> -    *     If additional handles are required,
> -    *     the URB_WRITE allocate mechanism (mentioned above) is used."
> -    *
> -    * So for ILK and for SNB we must use only UNUSED flag.
> -    * This is accepteble combination according to:
> -    *    SNB Volume 4 Part 2:
> -    *       "2.4.2 Message Descriptor"
> -    *          "Table lists the valid and invalid combinations of
> -    *           the Complete, Used, Allocate and EOT bits"
> -    *          "Thread terminate non-write of URB"
> -    *    SNB Volume 2 Part 1:
> -    *       "1.6.5.6 Thread Termination"
> +    * which works for both cases by setting the COMPLETE and UNUSED flags in
> +    * the EOT message.
>       */
>      this->current_annotation = "gen6 thread end: EOT";
>   
> @@ -504,26 +476,12 @@ gen6_gs_visitor::emit_thread_end()
>         emit(GS_OPCODE_SET_DWORD_2, dst_reg(MRF, base_mrf), data);
>      }
>   
> -   vec4_instruction *inst = emit(GS_OPCODE_THREAD_END);
> +   inst = emit(GS_OPCODE_THREAD_END);
>      inst->urb_write_flags = BRW_URB_WRITE_COMPLETE | BRW_URB_WRITE_UNUSED;
>      inst->base_mrf = base_mrf;
>      inst->mlen = 1;
> -
> -   if(!common_eot_approach_can_be_used)
> -   {
> -      emit(BRW_OPCODE_ELSE);
> -
> -      this->current_annotation = "gen6 thread end: EOT";
> -
> -      vec4_instruction *unused_urb_inst = emit(GS_OPCODE_THREAD_END);
> -      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
> -      unused_urb_inst->base_mrf = base_mrf;
> -      unused_urb_inst->mlen = 1;
> -
> -      emit(BRW_OPCODE_ENDIF);
> -   }
>   }
> -
> +
>   void
>   gen6_gs_visitor::setup_payload()
>   {
Hi Andrii,


thanks for verifying my suggestion and sending the new patch.

However, this patch is the diff against your previous patch, please
merge both patches into a single patch so we get a single patch with
all the changes against current master. Once we have that I'll run the
resulting patch through Intel's CI system to ensure that this doesn't
break anything in our test suites.

I also add a few addition comments below:

On Thu, 2018-06-21 at 16:40 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> v2: Approach to fix this issue was changed.
> Instead of different EOT flags in the program end
> we will create VUE every time even if GS produces no output.
> 
> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 88 +++++++++---------------
> ----------
>  1 file changed, 23 insertions(+), 65 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index ac3ba55..b831d33 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,11 +300,10 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> complete, int base_mrf,
>        /* Otherwise we always request to allocate a new VUE handle.
> If this is
>         * the last write before the EOT message and the new handle
> never gets
>         * used it will be dereferenced when we send the EOT message.
> This is
> -       * necessary to avoid different setups (under Pre-IL only) for
> the EOT message (one for the
> +       * necessary to avoid different setups for the EOT message
> (one for the
>         * case when there is no output and another for the case when
> there is)
>         * which would require to end the program with an
> IF/ELSE/ENDIF block,
> -       * something we do not want. 
> -       * But for ILK and SNB we can not avoid the end the program
> with an IF/ELSE/ENDIF block.
> +       * something we do not want.
>         */
>        inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>        inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -351,27 +350,27 @@ gen6_gs_visitor::emit_thread_end()
>     int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
>  
>     /* Issue the FF_SYNC message and obtain the initial VUE handle.
> */
> -   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> BRW_CONDITIONAL_G));
> -   emit(IF(BRW_PREDICATE_NORMAL));
> -   {
> -      this->current_annotation = "gen6 thread end: ff_sync";
> +   this->current_annotation = "gen6 thread end: ff_sync";
>  
> -      vec4_instruction *inst;
> -      if (prog->info.has_transform_feedback_varyings) {
> +   vec4_instruction *inst = NULL;
> +   if (prog->info.has_transform_feedback_varyings) {
>           src_reg sol_temp(this, glsl_type::uvec4_type);
>           emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,

Since you are removing one level of indentation here, please re-indent
the code inside the if accordingly (the coding style is 3 blank spaces
for each indentation level).

> 
> -              dst_reg(this->svbi),
> -              this->vertex_count,
> -              this->prim_count,
> -              sol_temp);
> +               dst_reg(this->svbi),
> +               this->vertex_count,
> +               this->prim_count,
> +               sol_temp);
>           inst = emit(GS_OPCODE_FF_SYNC,
>                       dst_reg(this->temp), this->prim_count, this-
> >svbi);
> -      } else {
> +   } else {
>           inst = emit(GS_OPCODE_FF_SYNC,
>                       dst_reg(this->temp), this->prim_count,
> brw_imm_ud(0u));

Same inside the else.

> -      }
> -      inst->base_mrf = base_mrf;
> +   }
> +   inst->base_mrf = base_mrf;
>  
> +   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> BRW_CONDITIONAL_G));
> +   emit(IF(BRW_PREDICATE_NORMAL));
> +   {
>        /* Loop over all buffered vertices and emit URB write messages
> */
>        this->current_annotation = "gen6 thread end: urb writes init";
>        src_reg vertex(this, glsl_type::uint_type);
> @@ -415,7 +414,7 @@ gen6_gs_visitor::emit_thread_end()
>                 dst_reg reg = dst_reg(MRF, mrf);
>                 reg.type = output_reg[varying][0].type;
>                 data.type = reg.type;
> -               vec4_instruction *inst = emit(MOV(reg, data));
> +               inst = emit(MOV(reg, data));
>                 inst->force_writemask_all = true;
>  
>                 mrf++;
> @@ -450,11 +449,8 @@ gen6_gs_visitor::emit_thread_end()
>        if (prog->info.has_transform_feedback_varyings)
>           xfb_write();
>     }
> -   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);
> -   if(common_eot_approach_can_be_used)
> -   {
> -      emit(BRW_OPCODE_ENDIF);  
> -   }
> +   emit(BRW_OPCODE_ENDIF);
> +
>     /* Finally, emit EOT message.
>      *
>      * In gen6 we need to end the thread differently depending on
> whether we have
> @@ -464,35 +460,11 @@ gen6_gs_visitor::emit_thread_end()
>      *
>      * However, this would lead us to end the program with an ENDIF
> opcode,
>      * which we want to avoid, so what we do is that we always
> request a new
> -    * VUE handle every time we do a URB WRITE, even for the last
> vertex we emit.
> +    * VUE handle every time, even if GS produces no output.
>      * With this we make sure that whether we have emitted at least
> one vertex
>      * or none at all, we have to finish the thread without writing
> to the URB,
> -    * which works for both cases (but only under Pre-IL) by setting 
> -    * the COMPLETE and UNUSED flags in the EOT message.
> -    * 
> -    * But under ILK or SNB we must not use combination COMPLETE and
> UNUSED 
> -    * because this combination could be used only for already
> allocated VUE. 
> -    * But unlike Pre-IL in the ILK and SNB 
> -    * the initial VUE is not passed to threads. 
> -    * This behaver mentioned in specification: 
> -    * SNB Volume 2 Part 1:
> -    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
> -    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
> -    *     "The threads are not passed an initial handle.  
> -    *     Instead, they request a first handle (if any) 
> -    *     via the URB shared function’s FF_SYNC message (see Shared
> Functions). 
> -    *     If additional handles are required, 
> -    *     the URB_WRITE allocate mechanism (mentioned above) is
> used."
> -    * 
> -    * So for ILK and for SNB we must use only UNUSED flag.
> -    * This is accepteble combination according to:
> -    *    SNB Volume 4 Part 2:
> -    *       "2.4.2 Message Descriptor"
> -    *          "Table lists the valid and invalid combinations of 
> -    *           the Complete, Used, Allocate and EOT bits"
> -    *          "Thread terminate non-write of URB"
> -    *    SNB Volume 2 Part 1:
> -    *       "1.6.5.6 Thread Termination"
> +    * which works for both cases by setting the COMPLETE and UNUSED
> flags in
> +    * the EOT message.
>      */
>     this->current_annotation = "gen6 thread end: EOT";
>  
> @@ -504,26 +476,12 @@ gen6_gs_visitor::emit_thread_end()
>        emit(GS_OPCODE_SET_DWORD_2, dst_reg(MRF, base_mrf), data);
>     }
>  
> -   vec4_instruction *inst = emit(GS_OPCODE_THREAD_END);
> +   inst = emit(GS_OPCODE_THREAD_END);
>     inst->urb_write_flags = BRW_URB_WRITE_COMPLETE |
> BRW_URB_WRITE_UNUSED;
>     inst->base_mrf = base_mrf;
>     inst->mlen = 1;
> -   
> -   if(!common_eot_approach_can_be_used)
> -   {
> -      emit(BRW_OPCODE_ELSE);
> -      
> -      this->current_annotation = "gen6 thread end: EOT";
> -
> -      vec4_instruction *unused_urb_inst =
> emit(GS_OPCODE_THREAD_END);
> -      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
> -      unused_urb_inst->base_mrf = base_mrf;
> -      unused_urb_inst->mlen = 1;
> -
> -      emit(BRW_OPCODE_ENDIF);  
> -   }
>  }
> -   
> +

Remove this spurious change here.

Iago

>  void
>  gen6_gs_visitor::setup_payload()
>  {