i965/gen6/gs: Handle case where a GS doesn't allocate VUE

Submitted by andrey simiklit on June 19, 2018, 2:06 p.m.

Details

Message ID 1529417184-10175-1-git-send-email-andrii.simiklit@globallogic.com
State New
Headers show
Series "i965/gen6/gs: Handle case where a GS doesn't allocate VUE" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

andrey simiklit June 19, 2018, 2:06 p.m.
We can not use the VUE Dereference flags combination for EOT
message under ILK and SNB because the threads are not initialized
there with initial VUE handle unlike Pre-IL.
So to avoid GPU hangs on SNB and ILK we need
to avoid usage of the VUE Dereference flags combination.
(Was tested only on SNB but according to the specification
SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
the ILK must behave itself in the similar way)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399

Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
---
 src/intel/compiler/gen6_gs_visitor.cpp | 56 +++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 7 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/intel/compiler/gen6_gs_visitor.cpp b/src/intel/compiler/gen6_gs_visitor.cpp
index 66c69fb..ac3ba55 100644
--- a/src/intel/compiler/gen6_gs_visitor.cpp
+++ b/src/intel/compiler/gen6_gs_visitor.cpp
@@ -300,10 +300,11 @@  gen6_gs_visitor::emit_urb_write_opcode(bool complete, int base_mrf,
       /* Otherwise we always request to allocate a new VUE handle. If this is
        * the last write before the EOT message and the new handle never gets
        * used it will be dereferenced when we send the EOT message. This is
-       * necessary to avoid different setups for the EOT message (one for the
+       * necessary to avoid different setups (under Pre-IL only) for the EOT message (one for the
        * case when there is no output and another for the case when there is)
        * which would require to end the program with an IF/ELSE/ENDIF block,
-       * something we do not want.
+       * something we do not want. 
+       * But for ILK and SNB we can not avoid the end the program with an IF/ELSE/ENDIF block.
        */
       inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
       inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
@@ -449,8 +450,11 @@  gen6_gs_visitor::emit_thread_end()
       if (prog->info.has_transform_feedback_varyings)
          xfb_write();
    }
-   emit(BRW_OPCODE_ENDIF);
-
+   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);
+   if(common_eot_approach_can_be_used)
+   {
+      emit(BRW_OPCODE_ENDIF);  
+   }
    /* Finally, emit EOT message.
     *
     * In gen6 we need to end the thread differently depending on whether we have
@@ -463,8 +467,32 @@  gen6_gs_visitor::emit_thread_end()
     * VUE handle every time we do a URB WRITE, even for the last vertex we emit.
     * With this we make sure that whether we have emitted at least one vertex
     * or none at all, we have to finish the thread without writing to the URB,
-    * which works for both cases by setting the COMPLETE and UNUSED flags in
-    * the EOT message.
+    * which works for both cases (but only under Pre-IL) by setting 
+    * the COMPLETE and UNUSED flags in the EOT message.
+    * 
+    * But under ILK or SNB we must not use combination COMPLETE and UNUSED 
+    * because this combination could be used only for already allocated VUE. 
+    * But unlike Pre-IL in the ILK and SNB 
+    * the initial VUE is not passed to threads. 
+    * This behaver mentioned in specification: 
+    * SNB Volume 2 Part 1:
+    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
+    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
+    *     "The threads are not passed an initial handle.  
+    *     Instead, they request a first handle (if any) 
+    *     via the URB shared function’s FF_SYNC message (see Shared Functions). 
+    *     If additional handles are required, 
+    *     the URB_WRITE allocate mechanism (mentioned above) is used."
+    * 
+    * So for ILK and for SNB we must use only UNUSED flag.
+    * This is accepteble combination according to:
+    *    SNB Volume 4 Part 2:
+    *       "2.4.2 Message Descriptor"
+    *          "Table lists the valid and invalid combinations of 
+    *           the Complete, Used, Allocate and EOT bits"
+    *          "Thread terminate non-write of URB"
+    *    SNB Volume 2 Part 1:
+    *       "1.6.5.6 Thread Termination"
     */
    this->current_annotation = "gen6 thread end: EOT";
 
@@ -480,8 +508,22 @@  gen6_gs_visitor::emit_thread_end()
    inst->urb_write_flags = BRW_URB_WRITE_COMPLETE | BRW_URB_WRITE_UNUSED;
    inst->base_mrf = base_mrf;
    inst->mlen = 1;
-}
+   
+   if(!common_eot_approach_can_be_used)
+   {
+      emit(BRW_OPCODE_ELSE);
+      
+      this->current_annotation = "gen6 thread end: EOT";
+
+      vec4_instruction *unused_urb_inst = emit(GS_OPCODE_THREAD_END);
+      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
+      unused_urb_inst->base_mrf = base_mrf;
+      unused_urb_inst->mlen = 1;
 
+      emit(BRW_OPCODE_ENDIF);  
+   }
+}
+   
 void
 gen6_gs_visitor::setup_payload()
 {

Comments

On Tue, 2018-06-19 at 17:06 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> 
> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 56
> +++++++++++++++++++++++++++++-----
>  1 file changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index 66c69fb..ac3ba55 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,10 +300,11 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> complete, int base_mrf,
>        /* Otherwise we always request to allocate a new VUE handle.
> If this is
>         * the last write before the EOT message and the new handle
> never gets
>         * used it will be dereferenced when we send the EOT message.
> This is
> -       * necessary to avoid different setups for the EOT message
> (one for the
> +       * necessary to avoid different setups (under Pre-IL only) for
> the EOT message (one for the
>         * case when there is no output and another for the case when
> there is)
>         * which would require to end the program with an
> IF/ELSE/ENDIF block,
> -       * something we do not want.
> +       * something we do not want. 
> +       * But for ILK and SNB we can not avoid the end the program
> with an IF/ELSE/ENDIF block.
>         */
>        inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>        inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -449,8 +450,11 @@ gen6_gs_visitor::emit_thread_end()
>        if (prog->info.has_transform_feedback_varyings)
>           xfb_write();
>     }
> -   emit(BRW_OPCODE_ENDIF);
> -
> +   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);

We don't implement GS before gen6, and I don't think there are plans
for it at this point, so I think we can just simplify the patch by
assuming that devinfo->gen is always going to be 6 here (later gens use
a different implementation of GS).

> +   if(common_eot_approach_can_be_used)
> +   {
> +      emit(BRW_OPCODE_ENDIF);  
> +   }
>     /* Finally, emit EOT message.
>      *
>      * In gen6 we need to end the thread differently depending on
> whether we have
> @@ -463,8 +467,32 @@ gen6_gs_visitor::emit_thread_end()
>      * VUE handle every time we do a URB WRITE, even for the last
> vertex we emit.
>      * With this we make sure that whether we have emitted at least
> one vertex
>      * or none at all, we have to finish the thread without writing
> to the URB,
> -    * which works for both cases by setting the COMPLETE and UNUSED
> flags in
> -    * the EOT message.
> +    * which works for both cases (but only under Pre-IL) by setting 
> +    * the COMPLETE and UNUSED flags in the EOT message.
> +    * 
> +    * But under ILK or SNB we must not use combination COMPLETE and
> UNUSED 
> +    * because this combination could be used only for already
> allocated VUE. 
> +    * But unlike Pre-IL in the ILK and SNB 
> +    * the initial VUE is not passed to threads. 
> +    * This behaver mentioned in specification: 
> +    * SNB Volume 2 Part 1:
> +    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
> +    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
> +    *     "The threads are not passed an initial handle.  
> +    *     Instead, they request a first handle (if any) 
> +    *     via the URB shared function’s FF_SYNC message (see Shared
> Functions). 
> +    *     If additional handles are required, 
> +    *     the URB_WRITE allocate mechanism (mentioned above) is
> used."
> +    * 
> +    * So for ILK and for SNB we must use only UNUSED flag.
> +    * This is accepteble combination according to:
> +    *    SNB Volume 4 Part 2:
> +    *       "2.4.2 Message Descriptor"
> +    *          "Table lists the valid and invalid combinations of 
> +    *           the Complete, Used, Allocate and EOT bits"
> +    *          "Thread terminate non-write of URB"
> +    *    SNB Volume 2 Part 1:
> +    *       "1.6.5.6 Thread Termination"
>      */

I am not sure why you conclude all this from the PRM. This is what I
see:

Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:

"It is possible and legal for a thread to produce no output
 or subsequently allocate a destination VUE that 
 was not required (e.g., the thread allocated ahead). 
 Therefore, there is a mechanism by which a thread can “give back”  
 (dereference) an a llocated VUE.  This mechanism must  be used if   
 the  VUE is not written before the thread terminates.  A  kernel can 
 explicitly dereference a VUE by issuing a URB_WRITE message 
 (specifying the to-be-dereference handle) with the Complete 
 bit set and the Used bit clear."

This is explicitly saying that COMPLETE + UNUSED is a valid
combination, and one that is in fact created for this very purpose.
Nothing in that text states that this is Pre-ILK or that this is only
for thread pre-allocated VUEs alone.

Then in 2.4.2 Message Descriptor (vol4, part2), it says:

" Used: 
  If set, this signals that the URB entry(s) referenced by
  the handle(s) are valid outputs of the thread.  In 
  all likelihood this means that that entry(s) contains
  complete & valid data to be subject to further 
  processing by the pipeline.   
  If clear, this signals that the URB entry(s) referenced by
  the handle(s) are not valid outputs of the thread.  
  Use of this setting will result in the handle(s) 
  being immediately dereferenced by the owning FF unit.  
  This setting is to be used by GS or CLIP threads to 
  dereference handles it obtained (either in the initial 
  thread payload or subsequent allocation writebacks) 
  but subsequently determined were not required  (e.g.,
  the object was completely clipped out)."

Again, there is no mention of this being Pre-ILK only and on top of
that, the text explicitly states that this combination is used to
deference handles obtained  either in the initial thread payload or
subsequent allocation writebacks.

And finally, it also says the following:

"Complete: (...)
Programming Notes: 
The following message descriptor fields are only valid when 
Complete is set:  Used"

Which I understand means that 'Used' is only applicable when Complete
is set, or in other words, that the only possible combinations where
Used is accounted for are those in which we we also have Complete set.

So I am not sure why you understand that COMPLETE + UNUSED is pre-ILK
or for thread allocated handles only only. Could you provide a specific
pointer to the exact place in the documentation where such thing is
clearly stated?

Iago

>     this->current_annotation = "gen6 thread end: EOT";
>  
> @@ -480,8 +508,22 @@ gen6_gs_visitor::emit_thread_end()
>     inst->urb_write_flags = BRW_URB_WRITE_COMPLETE |
> BRW_URB_WRITE_UNUSED;
>     inst->base_mrf = base_mrf;
>     inst->mlen = 1;
> -}
> +   
> +   if(!common_eot_approach_can_be_used)
> +   {
> +      emit(BRW_OPCODE_ELSE);
> +      
> +      this->current_annotation = "gen6 thread end: EOT";
> +
> +      vec4_instruction *unused_urb_inst =
> emit(GS_OPCODE_THREAD_END);
> +      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
> +      unused_urb_inst->base_mrf = base_mrf;
> +      unused_urb_inst->mlen = 1;
>  
> +      emit(BRW_OPCODE_ENDIF);  
> +   }
> +}
> +   
>  void
>  gen6_gs_visitor::setup_payload()
>  {
Hello, Thanks for your feedback.

> We don't implement GS before gen6, and I don't think there are plans
> for it at this point, so I think we can just simplify the patch by
> assuming that devinfo->gen is always going to be 6 here (later gens use
> a different implementation of GS).
Got it. I will fix it as soon as we validate this idea)
> Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:
>
> "It is possible and legal for a thread to produce no output
>   or subsequently allocate a destination VUE that
>   was not required*(e.g., the thread allocated ahead)*.
>   *Therefore, there is a mechanism by which a thread can “give back” 
> (dereference) **an allocated VUE*.  This mechanism must  be used if
>   the  VUE is not written before the thread terminates.  A  kernel can
>   explicitly dereference a VUE by issuing a URB_WRITE message
>   (specifying the to-be-dereference handle) with the Complete
>   bit set and the Used bit clear."
>
> This is explicitly saying that COMPLETE + UNUSED is a valid
> combination, and one that is in fact created for this very purpose.
> Nothing in that text states that this is Pre-ILK or that this is only
> for thread pre-allocated VUEs alone.
Yes I agree that it is valid combination but this is valid only for an 
allocated VUE (e.g., the thread allocated ahead). As far as I 
understand, this line explicitly saying that this combination only for 
an allocated VUEs: " Therefore, there is a mechanism by which a thread 
can 'give back' (dereference) *an allocated VUE.* "

So according to that and to following section:  Section 1.6.5.4 VUE Allocation:
  " The following description is applicable only to the GS stage.
    The threads are not passed an initial handle.
    In stead, they request a first handle (if any) via the URB
    shared function’s FF_SYNC message (see Shared Functions).
    If additional handles are required,
    the URB_WRITE allocate mechanism (mentioned above) is used."If GS doesn't allocate/request VUEs then GS shouldn't use the 
Dereference (COMPLETE + UNUSED) message. So when GS produces no output 
GS doesn't allocate VUEs at all and GS shouldn't use Dereference message.
> Then in 2.4.2 Message Descriptor (vol4, part2), it says:
>
> " Used:
>    If set, this signals that the URB entry(s) referenced by
>    the handle(s) are valid outputs of the thread.  In
>    all likelihood this means that that entry(s) contains
>    complete & valid data to be subject to further
>    processing by the pipeline.
>    If clear, this signals that the URB entry(s) referenced by
>    the handle(s) are not valid outputs of the thread.
>    Use of this setting will result in the handle(s)
>    being immediately*dereferenced*  by the owning FF unit.
>    This setting is to be used by GS or CLIP threads to
>    dereference handles it obtained (either in the initial
>    thread payload or subsequent allocation writebacks)
>    but subsequently determined were not required  (e.g.,
>    the object was completely clipped out)."
>
> Again, there is no mention of this being Pre-ILK only and on top of
> that, the text explicitly states that this combination is used to
> deference handles obtained  either in the initial thread payload or
> subsequent allocation writebacks.

So, according to section 1.6.5.5 VUE Dereference (GS) (vol2, part1)
this combination is the Dereference operation and we shouldn't use it
in case GS produces no output (see my first comment above).

> And finally, it also says the following:
>
> "Complete: (...)
> Programming Notes:
> The following message descriptor fields are only valid when
> Complete is set:  Used"
>
> Which I understand means that 'Used' is only applicable when Complete
> is set, or in other words, that the only possible combinations where
> Used is accounted for are those in which we we also have Complete set.

According to
Section 1.6.5.6 Thread Termination (vol2, part1)
  " All threads must explicitly terminate
    by executing a SEND instruction
    with the EOT bit set.  (See EU chapters).
    When a thread spawned by a 3D FF unit terminates,
    the spawning FF unit detects
    this termination as a part of Thread Management.
    This allows the FF units to manage the number of
    concurrent threads it has spawned and also manage
    the resources (e.g., scratch space) allocated to those threads.

    Programming Note:*[Pre-DevIL]*  GS and Clip threads*must terminate ****by sending a URB_WRITE message (with EOT set) with the Complete bit*  also
    set (therein returning a URB handle marked as either used or un-used). "

Only Pre-DevIL architectures must specify the Complete=1.

And finally according to all comments above we shouldn't use
the Dereference operation for the no output case and we know
that we able to set Complete=0 because the Complete=1 value is mandatory only for Pre-DevIL.

I hope, that I was understandable) Could you please let me know if you agree with me)

Regards, Andrii.


On 20.06.18 15:19, Iago Toral wrote:
> On Tue, 2018-06-19 at 17:06 +0300, Andrii Simiklit wrote:
>> We can not use the VUE Dereference flags combination for EOT
>> message under ILK and SNB because the threads are not initialized
>> there with initial VUE handle unlike Pre-IL.
>> So to avoid GPU hangs on SNB and ILK we need
>> to avoid usage of the VUE Dereference flags combination.
>> (Was tested only on SNB but according to the specification
>> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
>> the ILK must behave itself in the similar way)
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
>>
>> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
>> ---
>>   src/intel/compiler/gen6_gs_visitor.cpp | 56
>> +++++++++++++++++++++++++++++-----
>>   1 file changed, 49 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
>> b/src/intel/compiler/gen6_gs_visitor.cpp
>> index 66c69fb..ac3ba55 100644
>> --- a/src/intel/compiler/gen6_gs_visitor.cpp
>> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
>> @@ -300,10 +300,11 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
>> complete, int base_mrf,
>>         /* Otherwise we always request to allocate a new VUE handle.
>> If this is
>>          * the last write before the EOT message and the new handle
>> never gets
>>          * used it will be dereferenced when we send the EOT message.
>> This is
>> -       * necessary to avoid different setups for the EOT message
>> (one for the
>> +       * necessary to avoid different setups (under Pre-IL only) for
>> the EOT message (one for the
>>          * case when there is no output and another for the case when
>> there is)
>>          * which would require to end the program with an
>> IF/ELSE/ENDIF block,
>> -       * something we do not want.
>> +       * something we do not want.
>> +       * But for ILK and SNB we can not avoid the end the program
>> with an IF/ELSE/ENDIF block.
>>          */
>>         inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>>         inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
>> @@ -449,8 +450,11 @@ gen6_gs_visitor::emit_thread_end()
>>         if (prog->info.has_transform_feedback_varyings)
>>            xfb_write();
>>      }
>> -   emit(BRW_OPCODE_ENDIF);
>> -
>> +   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);
> We don't implement GS before gen6, and I don't think there are plans
> for it at this point, so I think we can just simplify the patch by
> assuming that devinfo->gen is always going to be 6 here (later gens use
> a different implementation of GS).
>
>> +   if(common_eot_approach_can_be_used)
>> +   {
>> +      emit(BRW_OPCODE_ENDIF);
>> +   }
>>      /* Finally, emit EOT message.
>>       *
>>       * In gen6 we need to end the thread differently depending on
>> whether we have
>> @@ -463,8 +467,32 @@ gen6_gs_visitor::emit_thread_end()
>>       * VUE handle every time we do a URB WRITE, even for the last
>> vertex we emit.
>>       * With this we make sure that whether we have emitted at least
>> one vertex
>>       * or none at all, we have to finish the thread without writing
>> to the URB,
>> -    * which works for both cases by setting the COMPLETE and UNUSED
>> flags in
>> -    * the EOT message.
>> +    * which works for both cases (but only under Pre-IL) by setting
>> +    * the COMPLETE and UNUSED flags in the EOT message.
>> +    *
>> +    * But under ILK or SNB we must not use combination COMPLETE and
>> UNUSED
>> +    * because this combination could be used only for already
>> allocated VUE.
>> +    * But unlike Pre-IL in the ILK and SNB
>> +    * the initial VUE is not passed to threads.
>> +    * This behaver mentioned in specification:
>> +    * SNB Volume 2 Part 1:
>> +    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
>> +    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
>> +    *     "The threads are not passed an initial handle.
>> +    *     Instead, they request a first handle (if any)
>> +    *     via the URB shared function’s FF_SYNC message (see Shared
>> Functions).
>> +    *     If additional handles are required,
>> +    *     the URB_WRITE allocate mechanism (mentioned above) is
>> used."
>> +    *
>> +    * So for ILK and for SNB we must use only UNUSED flag.
>> +    * This is accepteble combination according to:
>> +    *    SNB Volume 4 Part 2:
>> +    *       "2.4.2 Message Descriptor"
>> +    *          "Table lists the valid and invalid combinations of
>> +    *           the Complete, Used, Allocate and EOT bits"
>> +    *          "Thread terminate non-write of URB"
>> +    *    SNB Volume 2 Part 1:
>> +    *       "1.6.5.6 Thread Termination"
>>       */
> I am not sure why you conclude all this from the PRM. This is what I
> see:
>
> Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:
>
> "It is possible and legal for a thread to produce no output
>   or subsequently allocate a destination VUE that
>   was not required (e.g., the thread allocated ahead).
>   Therefore, there is a mechanism by which a thread can “give back”
>   (dereference) an a llocated VUE.  This mechanism must  be used if
>   the  VUE is not written before the thread terminates.  A  kernel can
>   explicitly dereference a VUE by issuing a URB_WRITE message
>   (specifying the to-be-dereference handle) with the Complete
>   bit set and the Used bit clear."
>
> This is explicitly saying that COMPLETE + UNUSED is a valid
> combination, and one that is in fact created for this very purpose.
> Nothing in that text states that this is Pre-ILK or that this is only
> for thread pre-allocated VUEs alone.
>
> Then in 2.4.2 Message Descriptor (vol4, part2), it says:
>
> " Used:
>    If set, this signals that the URB entry(s) referenced by
>    the handle(s) are valid outputs of the thread.  In
>    all likelihood this means that that entry(s) contains
>    complete & valid data to be subject to further
>    processing by the pipeline.
>    If clear, this signals that the URB entry(s) referenced by
>    the handle(s) are not valid outputs of the thread.
>    Use of this setting will result in the handle(s)
>    being immediately dereferenced by the owning FF unit.
>    This setting is to be used by GS or CLIP threads to
>    dereference handles it obtained (either in the initial
>    thread payload or subsequent allocation writebacks)
>    but subsequently determined were not required  (e.g.,
>    the object was completely clipped out)."
>
> Again, there is no mention of this being Pre-ILK only and on top of
> that, the text explicitly states that this combination is used to
> deference handles obtained  either in the initial thread payload or
> subsequent allocation writebacks.
>
> And finally, it also says the following:
>
> "Complete: (...)
> Programming Notes:
> The following message descriptor fields are only valid when
> Complete is set:  Used"
>
> Which I understand means that 'Used' is only applicable when Complete
> is set, or in other words, that the only possible combinations where
> Used is accounted for are those in which we we also have Complete set.
>
> So I am not sure why you understand that COMPLETE + UNUSED is pre-ILK
> or for thread allocated handles only only. Could you provide a specific
> pointer to the exact place in the documentation where such thing is
> clearly stated?
>
> Iago
>
>>      this->current_annotation = "gen6 thread end: EOT";
>>   
>> @@ -480,8 +508,22 @@ gen6_gs_visitor::emit_thread_end()
>>      inst->urb_write_flags = BRW_URB_WRITE_COMPLETE |
>> BRW_URB_WRITE_UNUSED;
>>      inst->base_mrf = base_mrf;
>>      inst->mlen = 1;
>> -}
>> +
>> +   if(!common_eot_approach_can_be_used)
>> +   {
>> +      emit(BRW_OPCODE_ELSE);
>> +
>> +      this->current_annotation = "gen6 thread end: EOT";
>> +
>> +      vec4_instruction *unused_urb_inst =
>> emit(GS_OPCODE_THREAD_END);
>> +      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
>> +      unused_urb_inst->base_mrf = base_mrf;
>> +      unused_urb_inst->mlen = 1;
>>   
>> +      emit(BRW_OPCODE_ENDIF);
>> +   }
>> +}
>> +
>>   void
>>   gen6_gs_visitor::setup_payload()
>>   {
On Wed, 2018-06-20 at 18:23 +0300, andrii.simiklit wrote:
>     Hello,
> 
> Thanks for your feedback.
> 
> 
>     
> > We don't implement GS before gen6, and I don't think there are
> > plans
> > for it at this point, so I think we can just simplify the patch by
> > assuming that devinfo->gen is always going to be 6 here (later gens
> > use
> > a different implementation of GS).
> 
> Got it. I will fix it as soon as we validate this idea)
> 
> > Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:
> > 
> > "It is possible and legal for a thread to produce no output
> >  or subsequently allocate a destination VUE that 
> >  was not required (e.g., the thread allocated ahead). 
> >  Therefore, there is a mechanism by which a thread can “give
> > back”  
> >  (dereference) an allocated VUE.  This mechanism must  be used
> > if   
> >  the  VUE is not written before the thread terminates.  A  kernel
> > can 
> >  explicitly dereference a VUE by issuing a URB_WRITE message 
> >  (specifying the to-be-dereference handle) with the Complete 
> >  bit set and the Used bit clear."
> > 
> > This is explicitly saying that COMPLETE + UNUSED is a valid
> > combination, and one that is in fact created for this very purpose.
> > Nothing in that text states that this is Pre-ILK or that this is
> > only
> > for thread pre-allocated VUEs alone.
> 
> Yes I agree that it is valid combination but this is valid only for 
> an allocated VUE (e.g., the thread allocated ahead). 
> As far as I understand, this line explicitly saying that this
> combination 
> only for an allocated VUEs:
>    " Therefore, there is a mechanism by which a thread can 'give
> back' 
>      (dereference) an allocated VUE. "
> So according to that and to following section:

Ok, then we are in agreement.
>  Section 1.6.5.4 VUE Allocation:
>  " The following description is applicable only to the GS stage. 
>    The threads are not passed an initial handle.  
>    In stead, they request a first handle (if any) via the URB 
>    shared function’s FF_SYNC message (see Shared Functions).  
>    If additional handles are required, 
>    the URB_WRITE allocate mechanism (mentioned above) is used."
> 
> If GS doesn't allocate/request VUEs then GS shouldn't use 
> the Dereference (COMPLETE + UNUSED) message. 
> So when GS produces no output GS doesn't allocate VUEs at all
> and GS shouldn't use Dereference message.

Agreed as well. But do notice that none of this is pre-ILK as far as
the documentation goes, it is the same across all supported platforms
up to SNB.
Now, we agree that the issue here is that we should not be
dereferencing handles that have not been allocated, and that the only
situation where this can happen with the existing implementation is
when the GS emits no output at all, rigth?
The problem with your patch is the following: if my memory serves me
right,  the intended behavior for this was that we always allocate a
VUE, *even* when the GS has no output. This is because otherwise we
need to end the GS program withan IF/ELSE/ENDIF block (like you do in
this patch), and that was undesirable at the time we wrote this
according to feedback from Intel. This is clearly explained in this
comment:
   /* Finally, emit EOT message.    *    * In gen6 we need to end the
thread differently depending on whether we have    * emitted at least
one vertex or not. In case we did, the EOT message must    * always
include the COMPLETE flag or else the GPU hangs. If we have not    *
produced any output we can't use the COMPLETE flag.    *    * However,
this would lead us to end the program with an ENDIF opcode,    * which
we want to avoid, so what we do is that we always request a new    *
VUE handle every time we do a URB WRITE, even for the last vertex we
emit.    * With this we make sure that whether we have emitted at least
one vertex    * or none at all, we have to finish the thread without
writing to the URB,    * which works for both cases by setting the
COMPLETE and UNUSED flags in    * the EOT message.    */
So, unless Kenneth believes that ending the program and IF/ELSE/ENDIF
block is alright nowadays, this patch poses a problem.
With that being said, it is true that in the case of the GS emitting no
output, we are not allocating a VUE so we cannot dereference it and we
need to fix that. I think we can try the following and see if that
solves the problem:  Instead of terminating the program with an
IF/ELSE/ENDIF, we can make the FF_SYNC happen unconditionally (right
now it only happens if there is at least one vertex to emit). That
should ensure that at least one VUE is allocated when there is no
output and then we can end the program in the same fashion we have been
doung until now. According to Vol4, part2, section 2.4.4.1
FF_SYNC  Message  Header,  it is valid to specify 0 vertices/primitives
for the FF_SYNC message, so that should in theory work.
Could you see f if that fixes the problem?
Iago
> > Then in 2.4.2 Message Descriptor (vol4, part2), it says:
> > 
> > " Used: 
> >   If set, this signals that the URB entry(s) referenced by
> >   the handle(s) are valid outputs of the thread.  In 
> >   all likelihood this means that that entry(s) contains
> >   complete & valid data to be subject to further 
> >   processing by the pipeline.   
> >   If clear, this signals that the URB entry(s) referenced by
> >   the handle(s) are not valid outputs of the thread.  
> >   Use of this setting will result in the handle(s) 
> >   being immediately dereferenced by the owning FF unit.  
> >   This setting is to be used by GS or CLIP threads to 
> >   dereference handles it obtained (either in the initial 
> >   thread payload or subsequent allocation writebacks) 
> >   but subsequently determined were not required  (e.g.,
> >   the object was completely clipped out)."
> > 
> > Again, there is no mention of this being Pre-ILK only and on top of
> > that, the text explicitly states that this combination is used to
> > deference handles obtained  either in the initial thread payload or
> > subsequent allocation writebacks.
> 
> So, according to section 1.6.5.5 VUE Dereference (GS) (vol2, part1)
> this combination is the Dereference operation and we shouldn't use it
> in case GS produces no output (see my first comment above).  
> 
> > And finally, it also says the following:
> > 
> > "Complete: (...)
> > Programming Notes: 
> > The following message descriptor fields are only valid when 
> > Complete is set:  Used"
> > 
> > Which I understand means that 'Used' is only applicable when
> > Complete
> > is set, or in other words, that the only possible combinations
> > where
> > Used is accounted for are those in which we we also have Complete
> > set.
> 
> According to 
> Section 1.6.5.6 Thread Termination (vol2, part1)
>  " All threads must explicitly terminate 
>    by executing a SEND instruction 
>    with the EOT bit set.  (See EU chapters).  
>    When a thread spawned by a 3D FF unit terminates, 
>    the spawning FF unit detects 
>    this termination as a part of Thread Management.  
>    This allows the FF units to manage the number of 
>    concurrent threads it has spawned and also manage 
>    the resources (e.g., scratch space) allocated to those threads. 
> 
>    Programming Note: [Pre-DevIL] GS and Clip threads must terminate 
>    by sending a URB_WRITE message (with EOT set) with the Complete
> bit also
>    set (therein returning a URB handle marked as either used or un-
> used). "
> 
> Only Pre-DevIL architectures must specify the Complete=1.
> 
> And finally according to all comments above we shouldn't use 
> the Dereference operation for the no output case and we know 
> that we able to set Complete=0 because the Complete=1 value is
> mandatory only for Pre-DevIL.
> 
> I hope, that I was understandable) Could you please let me know if
> you agree with me)
> Regards,
> Andrii.
>     
> 
>     
>     On 20.06.18 15:19, Iago Toral
>         wrote:
> 
>     
>     
> >       On Tue, 2018-06-19 at 17:06 +0300, Andrii Simiklit wrote:
> > 
> >       
> > >         We can not use the VUE Dereference flags combination for
> > > EOT
> > > message under ILK and SNB because the threads are not initialized
> > > there with initial VUE handle unlike Pre-IL.
> > > So to avoid GPU hangs on SNB and ILK we need
> > > to avoid usage of the VUE Dereference flags combination.
> > > (Was tested only on SNB but according to the specification
> > > SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> > > the ILK must behave itself in the similar way)
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> > > 
> > > Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
> > > ---
> > >  src/intel/compiler/gen6_gs_visitor.cpp | 56
> > > +++++++++++++++++++++++++++++-----
> > >  1 file changed, 49 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> > > b/src/intel/compiler/gen6_gs_visitor.cpp
> > > index 66c69fb..ac3ba55 100644
> > > --- a/src/intel/compiler/gen6_gs_visitor.cpp
> > > +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> > > @@ -300,10 +300,11 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> > > complete, int base_mrf,
> > >        /* Otherwise we always request to allocate a new VUE
> > > handle.
> > > If this is
> > >         * the last write before the EOT message and the new
> > > handle
> > > never gets
> > >         * used it will be dereferenced when we send the EOT
> > > message.
> > > This is
> > > -       * necessary to avoid different setups for the EOT message
> > > (one for the
> > > +       * necessary to avoid different setups (under Pre-IL only)
> > > for
> > > the EOT message (one for the
> > >         * case when there is no output and another for the case
> > > when
> > > there is)
> > >         * which would require to end the program with an
> > > IF/ELSE/ENDIF block,
> > > -       * something we do not want.
> > > +       * something we do not want. 
> > > +       * But for ILK and SNB we can not avoid the end the
> > > program
> > > with an IF/ELSE/ENDIF block.
> > >         */
> > >        inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
> > >        inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> > > @@ -449,8 +450,11 @@ gen6_gs_visitor::emit_thread_end()
> > >        if (prog->info.has_transform_feedback_varyings)
> > >           xfb_write();
> > >     }
> > > -   emit(BRW_OPCODE_ENDIF);
> > > -
> > > +   const bool common_eot_approach_can_be_used = (devinfo->gen <
> > > 5);
> > > 
> > >       
> > 
> >       We don't implement GS before gen6, and I don't think there
> > are plans
> > for it at this point, so I think we can just simplify the patch by
> > assuming that devinfo->gen is always going to be 6 here (later gens
> > use
> > a different implementation of GS).
> > 
> > 
> >       
> > >         +   if(common_eot_approach_can_be_used)
> > > +   {
> > > +      emit(BRW_OPCODE_ENDIF);  
> > > +   }
> > >     /* Finally, emit EOT message.
> > >      *
> > >      * In gen6 we need to end the thread differently depending on
> > > whether we have
> > > @@ -463,8 +467,32 @@ gen6_gs_visitor::emit_thread_end()
> > >      * VUE handle every time we do a URB WRITE, even for the last
> > > vertex we emit.
> > >      * With this we make sure that whether we have emitted at
> > > least
> > > one vertex
> > >      * or none at all, we have to finish the thread without
> > > writing
> > > to the URB,
> > > -    * which works for both cases by setting the COMPLETE and
> > > UNUSED
> > > flags in
> > > -    * the EOT message.
> > > +    * which works for both cases (but only under Pre-IL) by
> > > setting 
> > > +    * the COMPLETE and UNUSED flags in the EOT message.
> > > +    * 
> > > +    * But under ILK or SNB we must not use combination COMPLETE
> > > and
> > > UNUSED 
> > > +    * because this combination could be used only for already
> > > allocated VUE. 
> > > +    * But unlike Pre-IL in the ILK and SNB 
> > > +    * the initial VUE is not passed to threads. 
> > > +    * This behaver mentioned in specification: 
> > > +    * SNB Volume 2 Part 1:
> > > +    *  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
> > > +    *  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
> > > +    *     "The threads are not passed an initial handle.  
> > > +    *     Instead, they request a first handle (if any) 
> > > +    *     via the URB shared function’s FF_SYNC message (see
> > > Shared
> > > Functions). 
> > > +    *     If additional handles are required, 
> > > +    *     the URB_WRITE allocate mechanism (mentioned above) is
> > > used."
> > > +    * 
> > > +    * So for ILK and for SNB we must use only UNUSED flag.
> > > +    * This is accepteble combination according to:
> > > +    *    SNB Volume 4 Part 2:
> > > +    *       "2.4.2 Message Descriptor"
> > > +    *          "Table lists the valid and invalid combinations
> > > of 
> > > +    *           the Complete, Used, Allocate and EOT bits"
> > > +    *          "Thread terminate non-write of URB"
> > > +    *    SNB Volume 2 Part 1:
> > > +    *       "1.6.5.6 Thread Termination"
> > >      */
> > > 
> > >       
> > 
> >       I am not sure why you conclude all this from the PRM. This is
> > what I
> > see:
> > 
> > Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:
> > 
> > "It is possible and legal for a thread to produce no output
> >  or subsequently allocate a destination VUE that 
> >  was not required (e.g., the thread allocated ahead). 
> >  Therefore, there is a mechanism by which a thread can “give
> > back”  
> >  (dereference) an a llocated VUE.  This mechanism must  be used
> > if   
> >  the  VUE is not written before the thread terminates.  A  kernel
> > can 
> >  explicitly dereference a VUE by issuing a URB_WRITE message 
> >  (specifying the to-be-dereference handle) with the Complete 
> >  bit set and the Used bit clear."
> > 
> > This is explicitly saying that COMPLETE + UNUSED is a valid
> > combination, and one that is in fact created for this very purpose.
> > Nothing in that text states that this is Pre-ILK or that this is
> > only
> > for thread pre-allocated VUEs alone.
> > 
> > Then in 2.4.2 Message Descriptor (vol4, part2), it says:
> > 
> > " Used: 
> >   If set, this signals that the URB entry(s) referenced by
> >   the handle(s) are valid outputs of the thread.  In 
> >   all likelihood this means that that entry(s) contains
> >   complete & valid data to be subject to further 
> >   processing by the pipeline.   
> >   If clear, this signals that the URB entry(s) referenced by
> >   the handle(s) are not valid outputs of the thread.  
> >   Use of this setting will result in the handle(s) 
> >   being immediately dereferenced by the owning FF unit.  
> >   This setting is to be used by GS or CLIP threads to 
> >   dereference handles it obtained (either in the initial 
> >   thread payload or subsequent allocation writebacks) 
> >   but subsequently determined were not required  (e.g.,
> >   the object was completely clipped out)."
> > 
> > Again, there is no mention of this being Pre-ILK only and on top of
> > that, the text explicitly states that this combination is used to
> > deference handles obtained  either in the initial thread payload or
> > subsequent allocation writebacks.
> > 
> > And finally, it also says the following:
> > 
> > "Complete: (...)
> > Programming Notes: 
> > The following message descriptor fields are only valid when 
> > Complete is set:  Used"
> > 
> > Which I understand means that 'Used' is only applicable when
> > Complete
> > is set, or in other words, that the only possible combinations
> > where
> > Used is accounted for are those in which we we also have Complete
> > set.
> > 
> > So I am not sure why you understand that COMPLETE + UNUSED is pre-
> > ILK
> > or for thread allocated handles only only. Could you provide a
> > specific
> > pointer to the exact place in the documentation where such thing is
> > clearly stated?
> > 
> > Iago
> > 
> > 
> >       
> > >             this->current_annotation = "gen6 thread end: EOT";
> > >  
> > > @@ -480,8 +508,22 @@ gen6_gs_visitor::emit_thread_end()
> > >     inst->urb_write_flags = BRW_URB_WRITE_COMPLETE |
> > > BRW_URB_WRITE_UNUSED;
> > >     inst->base_mrf = base_mrf;
> > >     inst->mlen = 1;
> > > -}
> > > +   
> > > +   if(!common_eot_approach_can_be_used)
> > > +   {
> > > +      emit(BRW_OPCODE_ELSE);
> > > +      
> > > +      this->current_annotation = "gen6 thread end: EOT";
> > > +
> > > +      vec4_instruction *unused_urb_inst =
> > > emit(GS_OPCODE_THREAD_END);
> > > +      unused_urb_inst->urb_write_flags = BRW_URB_WRITE_UNUSED;
> > > +      unused_urb_inst->base_mrf = base_mrf;
> > > +      unused_urb_inst->mlen = 1;
> > >  
> > > +      emit(BRW_OPCODE_ENDIF);  
> > > +   }
> > > +}
> > > +   
> > >  void
> > >  gen6_gs_visitor::setup_payload()
> > >  {
> > > 
> > >       
> > 
> >     
> 
>     
> 
>   
>