[Mesa-dev,09/12] i965/fs: Add support for gl_ViewportIndex input

Submitted by Chris Forbes on Jan. 25, 2014, 6:51 a.m.

Details

Message ID 1390632720-23391-10-git-send-email-chrisf@ijw.co.nz
State New
Headers show

Not browsing as part of any series.

Commit Message

Chris Forbes Jan. 25, 2014, 6:51 a.m.
Same idea as gl_Layer -- this is delivered in part of R0.0.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
---
 src/mesa/drivers/dri/i965/brw_fs.cpp         | 22 ++++++++++++++++++++++
 src/mesa/drivers/dri/i965/brw_fs.h           |  1 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 ++
 3 files changed, 25 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 17d5237..e32133b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1294,6 +1294,28 @@  fs_visitor::emit_layer_setup(ir_variable *ir)
    return reg;
 }
 
+fs_reg *
+fs_visitor::emit_viewport_index_setup(ir_variable *ir)
+{
+   /* The value for gl_ViewportIndex is provided in bits 30:27 of R0.0. */
+
+   /* These bits are actually present on all Gen4+ h/w, but until GS is enabled
+    * on earlier platforms we don't expect to get here on anything earlier
+    * than Gen7.
+    */
+   assert(brw->gen >= 7);
+
+   this->current_annotation = "gl_ViewportIndex";
+   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
+   fs_reg temp = fs_reg(this, glsl_type::int_type);
+   emit(BRW_OPCODE_SHR, temp,
+         fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+         fs_reg(brw_imm_d(27)));
+   emit(BRW_OPCODE_AND, *reg, temp,
+         fs_reg(brw_imm_d(0x0f)));
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
index e04c341..e47fff4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -346,6 +346,7 @@  public:
    fs_reg *emit_sampleid_setup(ir_variable *ir);
    fs_reg *emit_samplemaskin_setup(ir_variable *ir);
    fs_reg *emit_layer_setup(ir_variable *ir);
+   fs_reg *emit_viewport_index_setup(ir_variable *ir);
    fs_reg *emit_general_interpolation(ir_variable *ir);
    void emit_interpolation_setup_gen4();
    void emit_interpolation_setup_gen6();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index e949f4b..8864cf2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -63,6 +63,8 @@  fs_visitor::visit(ir_variable *ir)
 	 reg = emit_frontfacing_interpolation(ir);
       } else if (!strcmp(ir->name, "gl_Layer")) {
          reg = emit_layer_setup(ir);
+      } else if (!strcmp(ir->name, "gl_ViewportIndex")) {
+         reg = emit_viewport_index_setup(ir);
       } else {
 	 reg = emit_general_interpolation(ir);
       }

Comments

On 01/24/2014 10:51 PM, Chris Forbes wrote:
> Same idea as gl_Layer -- this is delivered in part of R0.0.

NAK.

The spec says:

    "... the fragment stage will read the same value written by
    the geometry stage, even if that value is out of range."

If the geometry shader passes 0xDEADBEEF, the fragment shader is
supposed to read 0xDEADBEEF.  That won't fit in the 4-bits available in
R0.0.  That's why I didn't implement this when I did the
GL_ARB_viewport_array work. :)

I think I want to provide an Intel extension that provides the value the
hardware will actually use.  I'm thinking take the existing ARB spec and
replace that one sentence with

    "If the value written by the geometry stage is out of range, the
    value read by the fragment stage is undefined."

We would also replace the next sentence with:

    "If a fragment shader contains a static access to gl_ViewportIndex,
    it will NOT count against the implementation defined limit for the
    maximum number of inputs to the fragment stage."

There are probably a couple other little edits too.

I'm also concerned about interactions with this extension and SSO.
Since we have to assign a real slot for gl_ViewportIndex, a separable
geometry shader that writes it will always have to write to the shadow
copy.  If the separable fragment shader doesn't read it, the varying
layouts may not match. :(

> Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp         | 22 ++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs.h           |  1 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 ++
>  3 files changed, 25 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 17d5237..e32133b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1294,6 +1294,28 @@ fs_visitor::emit_layer_setup(ir_variable *ir)
>     return reg;
>  }
>  
> +fs_reg *
> +fs_visitor::emit_viewport_index_setup(ir_variable *ir)
> +{
> +   /* The value for gl_ViewportIndex is provided in bits 30:27 of R0.0. */
> +
> +   /* These bits are actually present on all Gen4+ h/w, but until GS is enabled
> +    * on earlier platforms we don't expect to get here on anything earlier
> +    * than Gen7.
> +    */
> +   assert(brw->gen >= 7);
> +
> +   this->current_annotation = "gl_ViewportIndex";
> +   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
> +   fs_reg temp = fs_reg(this, glsl_type::int_type);
> +   emit(BRW_OPCODE_SHR, temp,
> +         fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
> +         fs_reg(brw_imm_d(27)));
> +   emit(BRW_OPCODE_AND, *reg, temp,
> +         fs_reg(brw_imm_d(0x0f)));
> +   return reg;
> +}
> +
>  fs_reg
>  fs_visitor::fix_math_operand(fs_reg src)
>  {
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index e04c341..e47fff4 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -346,6 +346,7 @@ public:
>     fs_reg *emit_sampleid_setup(ir_variable *ir);
>     fs_reg *emit_samplemaskin_setup(ir_variable *ir);
>     fs_reg *emit_layer_setup(ir_variable *ir);
> +   fs_reg *emit_viewport_index_setup(ir_variable *ir);
>     fs_reg *emit_general_interpolation(ir_variable *ir);
>     void emit_interpolation_setup_gen4();
>     void emit_interpolation_setup_gen6();
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index e949f4b..8864cf2 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -63,6 +63,8 @@ fs_visitor::visit(ir_variable *ir)
>  	 reg = emit_frontfacing_interpolation(ir);
>        } else if (!strcmp(ir->name, "gl_Layer")) {
>           reg = emit_layer_setup(ir);
> +      } else if (!strcmp(ir->name, "gl_ViewportIndex")) {
> +         reg = emit_viewport_index_setup(ir);
>        } else {
>  	 reg = emit_general_interpolation(ir);
>        }
>
Ian,

I'd thought about that a bit while building this, and struggled to
find cases where this was observable in a defined fragment shader
execution.

The ARB_viewport_array spec says:

    If the value of the viewport index is outside the range zero to the value
    of MAX_VIEWPORTS minus one, the results of the viewport
    transformation are undefined.

It seems absurd to carry around an extra real slot which only adds any
value in a case where we're not required to be performing any
particular fragment shader invocations at all.

I can see cases where an out-of-range gl_Layer *almost* makes sense --
only interactions with the framebuffer are undefined, so you could
have no framebuffer writes, no fragment tests, and then do something
based on gl_Layer with atomics, images, or shader storage buffers. But
it's still a mad thing to do.

Do you know the rationale for having this language in the spec?

In any case, happy to park this for now -- it just looked like an easy
win, and it turns out it's not quite.

-- Chris

On Mon, Jan 27, 2014 at 5:59 AM, Ian Romanick <idr@freedesktop.org> wrote:
> On 01/24/2014 10:51 PM, Chris Forbes wrote:
>> Same idea as gl_Layer -- this is delivered in part of R0.0.
>
> NAK.
>
> The spec says:
>
>     "... the fragment stage will read the same value written by
>     the geometry stage, even if that value is out of range."
>
> If the geometry shader passes 0xDEADBEEF, the fragment shader is
> supposed to read 0xDEADBEEF.  That won't fit in the 4-bits available in
> R0.0.  That's why I didn't implement this when I did the
> GL_ARB_viewport_array work. :)
>
> I think I want to provide an Intel extension that provides the value the
> hardware will actually use.  I'm thinking take the existing ARB spec and
> replace that one sentence with
>
>     "If the value written by the geometry stage is out of range, the
>     value read by the fragment stage is undefined."
>
> We would also replace the next sentence with:
>
>     "If a fragment shader contains a static access to gl_ViewportIndex,
>     it will NOT count against the implementation defined limit for the
>     maximum number of inputs to the fragment stage."
>
> There are probably a couple other little edits too.
>
> I'm also concerned about interactions with this extension and SSO.
> Since we have to assign a real slot for gl_ViewportIndex, a separable
> geometry shader that writes it will always have to write to the shadow
> copy.  If the separable fragment shader doesn't read it, the varying
> layouts may not match. :(
>
>> Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp         | 22 ++++++++++++++++++++++
>>  src/mesa/drivers/dri/i965/brw_fs.h           |  1 +
>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 ++
>>  3 files changed, 25 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 17d5237..e32133b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -1294,6 +1294,28 @@ fs_visitor::emit_layer_setup(ir_variable *ir)
>>     return reg;
>>  }
>>
>> +fs_reg *
>> +fs_visitor::emit_viewport_index_setup(ir_variable *ir)
>> +{
>> +   /* The value for gl_ViewportIndex is provided in bits 30:27 of R0.0. */
>> +
>> +   /* These bits are actually present on all Gen4+ h/w, but until GS is enabled
>> +    * on earlier platforms we don't expect to get here on anything earlier
>> +    * than Gen7.
>> +    */
>> +   assert(brw->gen >= 7);
>> +
>> +   this->current_annotation = "gl_ViewportIndex";
>> +   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
>> +   fs_reg temp = fs_reg(this, glsl_type::int_type);
>> +   emit(BRW_OPCODE_SHR, temp,
>> +         fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
>> +         fs_reg(brw_imm_d(27)));
>> +   emit(BRW_OPCODE_AND, *reg, temp,
>> +         fs_reg(brw_imm_d(0x0f)));
>> +   return reg;
>> +}
>> +
>>  fs_reg
>>  fs_visitor::fix_math_operand(fs_reg src)
>>  {
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>> index e04c341..e47fff4 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>> @@ -346,6 +346,7 @@ public:
>>     fs_reg *emit_sampleid_setup(ir_variable *ir);
>>     fs_reg *emit_samplemaskin_setup(ir_variable *ir);
>>     fs_reg *emit_layer_setup(ir_variable *ir);
>> +   fs_reg *emit_viewport_index_setup(ir_variable *ir);
>>     fs_reg *emit_general_interpolation(ir_variable *ir);
>>     void emit_interpolation_setup_gen4();
>>     void emit_interpolation_setup_gen6();
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> index e949f4b..8864cf2 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> @@ -63,6 +63,8 @@ fs_visitor::visit(ir_variable *ir)
>>        reg = emit_frontfacing_interpolation(ir);
>>        } else if (!strcmp(ir->name, "gl_Layer")) {
>>           reg = emit_layer_setup(ir);
>> +      } else if (!strcmp(ir->name, "gl_ViewportIndex")) {
>> +         reg = emit_viewport_index_setup(ir);
>>        } else {
>>        reg = emit_general_interpolation(ir);
>>        }
>>
>
On 01/26/2014 02:31 PM, Chris Forbes wrote:
> Ian,
> 
> I'd thought about that a bit while building this, and struggled to
> find cases where this was observable in a defined fragment shader
> execution.

Yeah, I've been thinking about it a bit too.

> The ARB_viewport_array spec says:
> 
>     If the value of the viewport index is outside the range zero to the value
>     of MAX_VIEWPORTS minus one, the results of the viewport
>     transformation are undefined.

There is similar language for layer.

> It seems absurd to carry around an extra real slot which only adds any
> value in a case where we're not required to be performing any
> particular fragment shader invocations at all.

That's the rub.  If there are no invocations, then we shouldn't have to
carry it.  If there are invocations...  We can test that with a simple
fragment shader:

    #version 150
    #extension GL_ARB_fragment_layer_viewport: require
    in int viewport_index;
    layout (binding = 0, offset = 0) uniform atomic_uint invocations;
    layout (binding = 0, offset = 4) uniform atomic_uint matches;

    void main()
    {
        atomicCounterIncrement(invocations);
        if (gl_ViewportIndex == viewport_index)
            atomicCounterIncrement(matches);
    }

If invocations and matches match after rendering, we should be good.

> I can see cases where an out-of-range gl_Layer *almost* makes sense --
> only interactions with the framebuffer are undefined, so you could
> have no framebuffer writes, no fragment tests, and then do something
> based on gl_Layer with atomics, images, or shader storage buffers. But
> it's still a mad thing to do.
> 
> Do you know the rationale for having this language in the spec?

I believe DX has a similar requirement.  I'm not sure if there's some
additional method by which out-of-range values can be observed.

> In any case, happy to park this for now -- it just looked like an easy
> win, and it turns out it's not quite.
> 
> -- Chris
> 
> On Mon, Jan 27, 2014 at 5:59 AM, Ian Romanick <idr@freedesktop.org> wrote:
>> On 01/24/2014 10:51 PM, Chris Forbes wrote:
>>> Same idea as gl_Layer -- this is delivered in part of R0.0.
>>
>> NAK.
>>
>> The spec says:
>>
>>     "... the fragment stage will read the same value written by
>>     the geometry stage, even if that value is out of range."
>>
>> If the geometry shader passes 0xDEADBEEF, the fragment shader is
>> supposed to read 0xDEADBEEF.  That won't fit in the 4-bits available in
>> R0.0.  That's why I didn't implement this when I did the
>> GL_ARB_viewport_array work. :)
>>
>> I think I want to provide an Intel extension that provides the value the
>> hardware will actually use.  I'm thinking take the existing ARB spec and
>> replace that one sentence with
>>
>>     "If the value written by the geometry stage is out of range, the
>>     value read by the fragment stage is undefined."
>>
>> We would also replace the next sentence with:
>>
>>     "If a fragment shader contains a static access to gl_ViewportIndex,
>>     it will NOT count against the implementation defined limit for the
>>     maximum number of inputs to the fragment stage."
>>
>> There are probably a couple other little edits too.
>>
>> I'm also concerned about interactions with this extension and SSO.
>> Since we have to assign a real slot for gl_ViewportIndex, a separable
>> geometry shader that writes it will always have to write to the shadow
>> copy.  If the separable fragment shader doesn't read it, the varying
>> layouts may not match. :(
>>
>>> Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp         | 22 ++++++++++++++++++++++
>>>  src/mesa/drivers/dri/i965/brw_fs.h           |  1 +
>>>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 ++
>>>  3 files changed, 25 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> index 17d5237..e32133b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> @@ -1294,6 +1294,28 @@ fs_visitor::emit_layer_setup(ir_variable *ir)
>>>     return reg;
>>>  }
>>>
>>> +fs_reg *
>>> +fs_visitor::emit_viewport_index_setup(ir_variable *ir)
>>> +{
>>> +   /* The value for gl_ViewportIndex is provided in bits 30:27 of R0.0. */
>>> +
>>> +   /* These bits are actually present on all Gen4+ h/w, but until GS is enabled
>>> +    * on earlier platforms we don't expect to get here on anything earlier
>>> +    * than Gen7.
>>> +    */
>>> +   assert(brw->gen >= 7);
>>> +
>>> +   this->current_annotation = "gl_ViewportIndex";
>>> +   fs_reg *reg = new(this->mem_ctx) fs_reg(this, ir->type);
>>> +   fs_reg temp = fs_reg(this, glsl_type::int_type);
>>> +   emit(BRW_OPCODE_SHR, temp,
>>> +         fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
>>> +         fs_reg(brw_imm_d(27)));
>>> +   emit(BRW_OPCODE_AND, *reg, temp,
>>> +         fs_reg(brw_imm_d(0x0f)));
>>> +   return reg;
>>> +}
>>> +
>>>  fs_reg
>>>  fs_visitor::fix_math_operand(fs_reg src)
>>>  {
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
>>> index e04c341..e47fff4 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
>>> @@ -346,6 +346,7 @@ public:
>>>     fs_reg *emit_sampleid_setup(ir_variable *ir);
>>>     fs_reg *emit_samplemaskin_setup(ir_variable *ir);
>>>     fs_reg *emit_layer_setup(ir_variable *ir);
>>> +   fs_reg *emit_viewport_index_setup(ir_variable *ir);
>>>     fs_reg *emit_general_interpolation(ir_variable *ir);
>>>     void emit_interpolation_setup_gen4();
>>>     void emit_interpolation_setup_gen6();
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> index e949f4b..8864cf2 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>>> @@ -63,6 +63,8 @@ fs_visitor::visit(ir_variable *ir)
>>>        reg = emit_frontfacing_interpolation(ir);
>>>        } else if (!strcmp(ir->name, "gl_Layer")) {
>>>           reg = emit_layer_setup(ir);
>>> +      } else if (!strcmp(ir->name, "gl_ViewportIndex")) {
>>> +         reg = emit_viewport_index_setup(ir);
>>>        } else {
>>>        reg = emit_general_interpolation(ir);
>>>        }
Am 28.01.2014 21:45, schrieb Ian Romanick:
> On 01/26/2014 02:31 PM, Chris Forbes wrote:
>> Ian,
>>
>> I'd thought about that a bit while building this, and struggled to
>> find cases where this was observable in a defined fragment shader
>> execution.
> 
> Yeah, I've been thinking about it a bit too.
> 
>> The ARB_viewport_array spec says:
>>
>>     If the value of the viewport index is outside the range zero to the value
>>     of MAX_VIEWPORTS minus one, the results of the viewport
>>     transformation are undefined.
> 
> There is similar language for layer.
> 
>> It seems absurd to carry around an extra real slot which only adds any
>> value in a case where we're not required to be performing any
>> particular fragment shader invocations at all.
> 
> That's the rub.  If there are no invocations, then we shouldn't have to
> carry it.  If there are invocations...  We can test that with a simple
> fragment shader:
> 
>     #version 150
>     #extension GL_ARB_fragment_layer_viewport: require
>     in int viewport_index;
>     layout (binding = 0, offset = 0) uniform atomic_uint invocations;
>     layout (binding = 0, offset = 4) uniform atomic_uint matches;
> 
>     void main()
>     {
>         atomicCounterIncrement(invocations);
>         if (gl_ViewportIndex == viewport_index)
>             atomicCounterIncrement(matches);
>     }
> 
> If invocations and matches match after rendering, we should be good.
> 
>> I can see cases where an out-of-range gl_Layer *almost* makes sense --
>> only interactions with the framebuffer are undefined, so you could
>> have no framebuffer writes, no fragment tests, and then do something
>> based on gl_Layer with atomics, images, or shader storage buffers. But
>> it's still a mad thing to do.
>>
>> Do you know the rationale for having this language in the spec?
> 
> I believe DX has a similar requirement.  I'm not sure if there's some
> additional method by which out-of-range values can be observed.

That is very much true and probably why it's in there. That said, d3d10
has a LOT of very well defined error behavior usually going way beyond
what OpenGL requires, to the point of being silly (but yes the tests
will test all of that error behavior).
For viewport, like GL, it also has the fs input value being the same as
gs output. Furthermore however, d3d10 also requires defined behavior for
vp indices out of range for vp transform (index is set to 0) - hence the
behavior of such an input in the fs is well defined and makes a bit more
sense probably with d3d10 than it does in GL. Not that I'd expect any
app outside whck to rely on that.

Roland

>> In any case, happy to park this for now -- it just looked like an easy
>> win, and it turns out it's not quite.
>>