[Mesa-dev,1/2] i965/fs: fix precision of f2b

Submitted by Iago Toral Quiroga on Feb. 25, 2016, 10:15 a.m.

Details

Message ID 1456395349-8923-1-git-send-email-itoral@igalia.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Iago Toral Quiroga Feb. 25, 2016, 10:15 a.m.
From the OpenGL 4.2 spec:

"When a constructor is used to convert any integer or floating-point type to a
 bool, 0 and 0.0 are converted to false, and non-zero values are converted to
 true."

Thus, even the smallest non-zero floating value should be translated to true.
This behavior has been verified also with proprietary NVIDIA drivers.

Currently, we implement this conversion as a cmp.nz operation with floats,
subject to floating-point precision limitations, and as a result, relatively
small non-zero floating point numbers return false instead of true.

This patch fixes the problem by getting rid of the sign bit (to cover the case
of -0.0) and testing the result against 0u using an integer comparison instead.
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index db20c71..7d62d7e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -913,9 +913,18 @@  fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
       bld.MOV(result, negate(op[0]));
       break;
 
-   case nir_op_f2b:
-      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
-      break;
+   case nir_op_f2b: {
+      /* Because comparing to 0.0f is subject to precision limitations, do the
+       * comparison using integers (we need to get rid of the sign bit for that)
+       */
+      if (devinfo->gen >= 8)
+         op[0] = resolve_source_modifiers(op[0]);
+      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
+      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
+      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
+       break;
+   }
+
    case nir_op_i2b:
       bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
       break;

Comments

On Thu, Feb 25, 2016 at 2:15 AM, Iago Toral Quiroga <itoral@igalia.com> wrote:
> From the OpenGL 4.2 spec:
>
> "When a constructor is used to convert any integer or floating-point type to a
>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>  true."
>
> Thus, even the smallest non-zero floating value should be translated to true.
> This behavior has been verified also with proprietary NVIDIA drivers.

Ooh, interesting.

> Currently, we implement this conversion as a cmp.nz operation with floats,
> subject to floating-point precision limitations, and as a result, relatively
> small non-zero floating point numbers return false instead of true.
>
> This patch fixes the problem by getting rid of the sign bit (to cover the case
> of -0.0) and testing the result against 0u using an integer comparison instead.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index db20c71..7d62d7e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>        bld.MOV(result, negate(op[0]));
>        break;
>
> -   case nir_op_f2b:
> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> -      break;
> +   case nir_op_f2b: {
> +      /* Because comparing to 0.0f is subject to precision limitations, do the
> +       * comparison using integers (we need to get rid of the sign bit for that)
> +       */
> +      if (devinfo->gen >= 8)
> +         op[0] = resolve_source_modifiers(op[0]);

Oh, good thinking.

> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);

The cmod_propagation is going to turn this into an AND.NZ, which we
may as well just do here:

   set_condmod(BRW_CONDITIONAL_NZ,
                          bld.AND(result, op[0], brw_imm_ud(0x7FFFFFFFu));

> +       break;

Bad indentation.

With those two things changed in both patches, they are

Reviewed-by: Matt Turner <mattst88@gmail.com>
Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
> From the OpenGL 4.2 spec:
> 
> "When a constructor is used to convert any integer or floating-point type to a
>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>  true."
> 
> Thus, even the smallest non-zero floating value should be translated to true.
> This behavior has been verified also with proprietary NVIDIA drivers.
> 
> Currently, we implement this conversion as a cmp.nz operation with floats,
> subject to floating-point precision limitations, and as a result, relatively
> small non-zero floating point numbers return false instead of true.
> 
> This patch fixes the problem by getting rid of the sign bit (to cover the case
> of -0.0) and testing the result against 0u using an integer comparison instead.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index db20c71..7d62d7e 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>        bld.MOV(result, negate(op[0]));
>        break;
>  
> -   case nir_op_f2b:
> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> -      break;
> +   case nir_op_f2b: {
> +      /* Because comparing to 0.0f is subject to precision limitations, do the
> +       * comparison using integers (we need to get rid of the sign bit for that)
> +       */
> +      if (devinfo->gen >= 8)
> +         op[0] = resolve_source_modifiers(op[0]);
> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
> +       break;
> +   }
> +
>     case nir_op_i2b:
>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>        break;
> 

Does that fix anything? I don't really see a problem with the existing
logic. Yes any "non-zero value" should be converted to true. But surely
that definition cannot include denorms, which you are always allowed to
flush to zero.
(Albeit I can't tell what the result would be with NaNs with the float
compare, nor what the result actually should be in this case since glsl
doesn't require NaNs neither.)

Roland
On Thu, Feb 25, 2016 at 8:19 AM, Matt Turner <mattst88@gmail.com> wrote:

> On Thu, Feb 25, 2016 at 2:15 AM, Iago Toral Quiroga <itoral@igalia.com>
> wrote:
> > From the OpenGL 4.2 spec:
> >
> > "When a constructor is used to convert any integer or floating-point
> type to a
> >  bool, 0 and 0.0 are converted to false, and non-zero values are
> converted to
> >  true."
> >
> > Thus, even the smallest non-zero floating value should be translated to
> true.
> > This behavior has been verified also with proprietary NVIDIA drivers.
>
> Ooh, interesting.
>
> > Currently, we implement this conversion as a cmp.nz operation with
> floats,
> > subject to floating-point precision limitations, and as a result,
> relatively
> > small non-zero floating point numbers return false instead of true.
> >
> > This patch fixes the problem by getting rid of the sign bit (to cover
> the case
> > of -0.0) and testing the result against 0u using an integer comparison
> instead.
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index db20c71..7d62d7e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld,
> nir_alu_instr *instr)
> >        bld.MOV(result, negate(op[0]));
> >        break;
> >
> > -   case nir_op_f2b:
> > -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> > -      break;
> > +   case nir_op_f2b: {
> > +      /* Because comparing to 0.0f is subject to precision limitations,
> do the
> > +       * comparison using integers (we need to get rid of the sign bit
> for that)
> > +       */
> > +      if (devinfo->gen >= 8)
> > +         op[0] = resolve_source_modifiers(op[0]);
>

Hrm... I'm not sure what I think about this.  Neither abs nor neg *should*
affect f2b since zero/non-zero should just pass through.  However,
resolving the source modifiers could affect if they, for instance, flushed
dnorms.  Incidentally, this means we probably want a NIR optimization to
get rid of neg and abs right before != 0 if we can.


>
> Oh, good thinking.
>
> > +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> > +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> > +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>
> The cmod_propagation is going to turn this into an AND.NZ, which we
> may as well just do here:
>
>    set_condmod(BRW_CONDITIONAL_NZ,
>                           bld.AND(result, op[0], brw_imm_ud(0x7FFFFFFFu));
>
> > +       break;
>
> Bad indentation.
>
> With those two things changed in both patches, they are
>
> Reviewed-by: Matt Turner <mattst88@gmail.com>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>> From the OpenGL 4.2 spec:
>>
>> "When a constructor is used to convert any integer or floating-point type to a
>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>  true."
>>
>> Thus, even the smallest non-zero floating value should be translated to true.
>> This behavior has been verified also with proprietary NVIDIA drivers.
>>
>> Currently, we implement this conversion as a cmp.nz operation with floats,
>> subject to floating-point precision limitations, and as a result, relatively
>> small non-zero floating point numbers return false instead of true.
>>
>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>> of -0.0) and testing the result against 0u using an integer comparison instead.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index db20c71..7d62d7e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>        bld.MOV(result, negate(op[0]));
>>        break;
>>  
>> -   case nir_op_f2b:
>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>> -      break;
>> +   case nir_op_f2b: {
>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>> +       * comparison using integers (we need to get rid of the sign bit for that)
>> +       */
>> +      if (devinfo->gen >= 8)
>> +         op[0] = resolve_source_modifiers(op[0]);
>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>> +       break;
>> +   }
>> +
>>     case nir_op_i2b:
>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>        break;
>>
> 
> Does that fix anything? I don't really see a problem with the existing
> logic. Yes any "non-zero value" should be converted to true. But surely
> that definition cannot include denorms, which you are always allowed to
> flush to zero.
> (Albeit I can't tell what the result would be with NaNs with the float
> compare, nor what the result actually should be in this case since glsl
> doesn't require NaNs neither.)

Based on this and Jason's comments, I think we need a bunch of new tests.

 - smallest positive normal number
 - abs of smallest positive normal number
 - neg of     "       "        "      "
 - largest positive subnormal number
 - abs of largest positive subnormal number
 - neg of    "        "        "        "
 - all of the above with negative numbers
 - NaN
 - abs of NaN
 - neg of  "

Perhaps others? +/-Inf just for kicks?

> Roland
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Roland Scheidegger <sroland@vmware.com> writes:

> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>> From the OpenGL 4.2 spec:
>> 
>> "When a constructor is used to convert any integer or floating-point type to a
>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>  true."
>> 
>> Thus, even the smallest non-zero floating value should be translated to true.
>> This behavior has been verified also with proprietary NVIDIA drivers.
>> 
>> Currently, we implement this conversion as a cmp.nz operation with floats,
>> subject to floating-point precision limitations, and as a result, relatively

The bool constructor *is* subject to floating-point precision
limitations AFAIK, just like anything else dealing with floating-point
numbers.

>> small non-zero floating point numbers return false instead of true.
>> 
>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>> of -0.0) and testing the result against 0u using an integer comparison instead.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> index db20c71..7d62d7e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>        bld.MOV(result, negate(op[0]));
>>        break;
>>  
>> -   case nir_op_f2b:
>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>> -      break;
>> +   case nir_op_f2b: {
>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>> +       * comparison using integers (we need to get rid of the sign bit for that)
>> +       */
>> +      if (devinfo->gen >= 8)
>> +         op[0] = resolve_source_modifiers(op[0]);
>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>> +       break;
>> +   }
>> +
>>     case nir_op_i2b:
>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>        break;
>> 
>
> Does that fix anything? I don't really see a problem with the existing
> logic. Yes any "non-zero value" should be converted to true. But surely
> that definition cannot include denorms, which you are always allowed to
> flush to zero.

Yeah, one is allowed to flush denorms to zero on input to any operation,
including bool(), I don't see any reason in the above why bool()
shouldn't be equivalent to "!= 0".

> (Albeit I can't tell what the result would be with NaNs with the float
> compare, nor what the result actually should be in this case since glsl
> doesn't require NaNs neither.)
>
The hardware CMP instruction considers NaNs to be different from zero,
and AFAICT the implementation in this patch does the same.

> Roland
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Ian Romanick <idr@freedesktop.org> writes:

> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>> From the OpenGL 4.2 spec:
>>>
>>> "When a constructor is used to convert any integer or floating-point type to a
>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>  true."
>>>
>>> Thus, even the smallest non-zero floating value should be translated to true.
>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>
>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>> subject to floating-point precision limitations, and as a result, relatively
>>> small non-zero floating point numbers return false instead of true.
>>>
>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> index db20c71..7d62d7e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>        bld.MOV(result, negate(op[0]));
>>>        break;
>>>  
>>> -   case nir_op_f2b:
>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>> -      break;
>>> +   case nir_op_f2b: {
>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>> +       */
>>> +      if (devinfo->gen >= 8)
>>> +         op[0] = resolve_source_modifiers(op[0]);
>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>> +       break;
>>> +   }
>>> +
>>>     case nir_op_i2b:
>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>        break;
>>>
>> 
>> Does that fix anything? I don't really see a problem with the existing
>> logic. Yes any "non-zero value" should be converted to true. But surely
>> that definition cannot include denorms, which you are always allowed to
>> flush to zero.
>> (Albeit I can't tell what the result would be with NaNs with the float
>> compare, nor what the result actually should be in this case since glsl
>> doesn't require NaNs neither.)
>
> Based on this and Jason's comments, I think we need a bunch of new tests.
>
>  - smallest positive normal number
>  - abs of smallest positive normal number
>  - neg of     "       "        "      "
>  - largest positive subnormal number
>  - abs of largest positive subnormal number
>  - neg of    "        "        "        "
>  - all of the above with negative numbers
>  - NaN
>  - abs of NaN
>  - neg of  "
>
> Perhaps others? +/-Inf just for kicks?
>

What's the point?  The result of most of the above (except possibly
bool(NaN)) is undefined by the spec:

"Any denormalized value input into a shader or potentially generated by
 any operation in a shader can be flushed to 0. [...] NaNs are not
 required to be generated. [...] Operations and built-in functions that
 operate on a NaN are not required to return a NaN as the result."


>> Roland
>> 
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On 02/25/2016 12:13 PM, Francisco Jerez wrote:
> Ian Romanick <idr@freedesktop.org> writes:
> 
>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>>> From the OpenGL 4.2 spec:
>>>>
>>>> "When a constructor is used to convert any integer or floating-point type to a
>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>>  true."
>>>>
>>>> Thus, even the smallest non-zero floating value should be translated to true.
>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>>
>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>>> subject to floating-point precision limitations, and as a result, relatively
>>>> small non-zero floating point numbers return false instead of true.
>>>>
>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>>> ---
>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> index db20c71..7d62d7e 100644
>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>>        bld.MOV(result, negate(op[0]));
>>>>        break;
>>>>  
>>>> -   case nir_op_f2b:
>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>>> -      break;
>>>> +   case nir_op_f2b: {
>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>>> +       */
>>>> +      if (devinfo->gen >= 8)
>>>> +         op[0] = resolve_source_modifiers(op[0]);
>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>>> +       break;
>>>> +   }
>>>> +
>>>>     case nir_op_i2b:
>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>>        break;
>>>>
>>>
>>> Does that fix anything? I don't really see a problem with the existing
>>> logic. Yes any "non-zero value" should be converted to true. But surely
>>> that definition cannot include denorms, which you are always allowed to
>>> flush to zero.
>>> (Albeit I can't tell what the result would be with NaNs with the float
>>> compare, nor what the result actually should be in this case since glsl
>>> doesn't require NaNs neither.)
>>
>> Based on this and Jason's comments, I think we need a bunch of new tests.
>>
>>  - smallest positive normal number
>>  - abs of smallest positive normal number
>>  - neg of     "       "        "      "
>>  - largest positive subnormal number
>>  - abs of largest positive subnormal number
>>  - neg of    "        "        "        "
>>  - all of the above with negative numbers
>>  - NaN
>>  - abs of NaN
>>  - neg of  "
>>
>> Perhaps others? +/-Inf just for kicks?
> 
> What's the point?  The result of most of the above (except possibly
> bool(NaN)) is undefined by the spec:
> 
> "Any denormalized value input into a shader or potentially generated by
>  any operation in a shader can be flushed to 0. [...] NaNs are not
>  required to be generated. [...] Operations and built-in functions that
>  operate on a NaN are not required to return a NaN as the result."

Except that apparently one major OpenGL vendor does something well
defined that's different than what we do.  If we can validate what the
behavior is across multiple implementations, we may find a set of common
behavior that differs from ours.  If other people commonly do a thing
that's different than what we do, there's a very good chance that some
application depends on that behavior.  This certainly would not be the
first time.

>>> Roland
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Ian Romanick <idr@freedesktop.org> writes:

> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>> Ian Romanick <idr@freedesktop.org> writes:
>> 
>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>>>> From the OpenGL 4.2 spec:
>>>>>
>>>>> "When a constructor is used to convert any integer or floating-point type to a
>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>>>  true."
>>>>>
>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>>>
>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>>>> subject to floating-point precision limitations, and as a result, relatively
>>>>> small non-zero floating point numbers return false instead of true.
>>>>>
>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>>>> ---
>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>> index db20c71..7d62d7e 100644
>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>>>        bld.MOV(result, negate(op[0]));
>>>>>        break;
>>>>>  
>>>>> -   case nir_op_f2b:
>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>>>> -      break;
>>>>> +   case nir_op_f2b: {
>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>>>> +       */
>>>>> +      if (devinfo->gen >= 8)
>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>>>> +       break;
>>>>> +   }
>>>>> +
>>>>>     case nir_op_i2b:
>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>>>        break;
>>>>>
>>>>
>>>> Does that fix anything? I don't really see a problem with the existing
>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>>>> that definition cannot include denorms, which you are always allowed to
>>>> flush to zero.
>>>> (Albeit I can't tell what the result would be with NaNs with the float
>>>> compare, nor what the result actually should be in this case since glsl
>>>> doesn't require NaNs neither.)
>>>
>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>>>
>>>  - smallest positive normal number
>>>  - abs of smallest positive normal number
>>>  - neg of     "       "        "      "
>>>  - largest positive subnormal number
>>>  - abs of largest positive subnormal number
>>>  - neg of    "        "        "        "
>>>  - all of the above with negative numbers
>>>  - NaN
>>>  - abs of NaN
>>>  - neg of  "
>>>
>>> Perhaps others? +/-Inf just for kicks?
>> 
>> What's the point?  The result of most of the above (except possibly
>> bool(NaN)) is undefined by the spec:
>> 
>> "Any denormalized value input into a shader or potentially generated by
>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>>  required to be generated. [...] Operations and built-in functions that
>>  operate on a NaN are not required to return a NaN as the result."
>
> Except that apparently one major OpenGL vendor does something well
> defined that's different than what we do.

I'm skeptical that nVidia would treat single-precision denorms
inconsistently between datatype constructors and other floating-point
arithmetic, but assuming that's the case it would be an argument for
proposing the spec change to Khronos rather than introducing a dubiously
compliant change into the back-end.  I think I would argue against
making such a change in the spec in any case, because even though it's
implementation-defined whether denorms are flushed or not, the following
is guaranteed by the spec AFAIUI:

| if (bool(f))
|    random_arithmetic_on(f /* Guaranteed not to be zero here even if
|                              denorms are flushed */);

With this change in, bool(f) would evaluate to true even if f is a
denorm which is flushed to zero for all subsequent arithmetic in the
block.  For that reason this seems more likely to break stuff than to
fix stuff, it's not like applications can expect denorms to be
representable at all regardless of what nVidia does, but they can expect
the above GLSL source to work.

> If we can validate what the
> behavior is across multiple implementations, we may find a set of common
> behavior that differs from ours.  If other people commonly do a thing
> that's different than what we do, there's a very good chance that some
> application depends on that behavior.  This certainly would not be the
> first time.
>
>>>> Roland
>>>>
>>>> _______________________________________________
>>>> mesa-dev mailing list
>>>> mesa-dev@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
> Ian Romanick <idr@freedesktop.org> writes:
>
>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>>> Ian Romanick <idr@freedesktop.org> writes:
>>>
>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>>>>> From the OpenGL 4.2 spec:
>>>>>>
>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>>>>  true."
>>>>>>
>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>>>>
>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>>>>> subject to floating-point precision limitations, and as a result, relatively
>>>>>> small non-zero floating point numbers return false instead of true.
>>>>>>
>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>>>>> ---
>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> index db20c71..7d62d7e 100644
>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>>>>        bld.MOV(result, negate(op[0]));
>>>>>>        break;
>>>>>>
>>>>>> -   case nir_op_f2b:
>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>>>>> -      break;
>>>>>> +   case nir_op_f2b: {
>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>>>>> +       */
>>>>>> +      if (devinfo->gen >= 8)
>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>>>>> +       break;
>>>>>> +   }
>>>>>> +
>>>>>>     case nir_op_i2b:
>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>>>>        break;
>>>>>>
>>>>>
>>>>> Does that fix anything? I don't really see a problem with the existing
>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>>>>> that definition cannot include denorms, which you are always allowed to
>>>>> flush to zero.
>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>>>>> compare, nor what the result actually should be in this case since glsl
>>>>> doesn't require NaNs neither.)
>>>>
>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>>>>
>>>>  - smallest positive normal number
>>>>  - abs of smallest positive normal number
>>>>  - neg of     "       "        "      "
>>>>  - largest positive subnormal number
>>>>  - abs of largest positive subnormal number
>>>>  - neg of    "        "        "        "
>>>>  - all of the above with negative numbers
>>>>  - NaN
>>>>  - abs of NaN
>>>>  - neg of  "
>>>>
>>>> Perhaps others? +/-Inf just for kicks?
>>>
>>> What's the point?  The result of most of the above (except possibly
>>> bool(NaN)) is undefined by the spec:
>>>
>>> "Any denormalized value input into a shader or potentially generated by
>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>>>  required to be generated. [...] Operations and built-in functions that
>>>  operate on a NaN are not required to return a NaN as the result."
>>
>> Except that apparently one major OpenGL vendor does something well
>> defined that's different than what we do.
>
> I'm skeptical that nVidia would treat single-precision denorms
> inconsistently between datatype constructors and other floating-point
> arithmetic, but assuming that's the case it would be an argument for
> proposing the spec change to Khronos rather than introducing a dubiously
> compliant change into the back-end.  I think I would argue against
> making such a change in the spec in any case, because even though it's
> implementation-defined whether denorms are flushed or not, the following
> is guaranteed by the spec AFAIUI:
>
> | if (bool(f))
> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
> |                              denorms are flushed */);
>
> With this change in, bool(f) would evaluate to true even if f is a
> denorm which is flushed to zero for all subsequent arithmetic in the
> block.  For that reason this seems more likely to break stuff than to
> fix stuff, it's not like applications can expect denorms to be
> representable at all regardless of what nVidia does, but they can expect
> the above GLSL source to work.

I'd be curious to see a shader test that triggers the relevant
behaviour -- AFAIK nvidia always flushes denorms. The flush is a
per-instruction setting though [since Fermi], so it's technically
feasible not to flush sometimes and to flush other times.

  -ilia
On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
> > Ian Romanick <idr@freedesktop.org> writes:
> >
> >> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
> >>> Ian Romanick <idr@freedesktop.org> writes:
> >>>
> >>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
> >>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
> >>>>>> From the OpenGL 4.2 spec:
> >>>>>>
> >>>>>> "When a constructor is used to convert any integer or floating-point type to a
> >>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
> >>>>>>  true."
> >>>>>>
> >>>>>> Thus, even the smallest non-zero floating value should be translated to true.
> >>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
> >>>>>>
> >>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
> >>>>>> subject to floating-point precision limitations, and as a result, relatively
> >>>>>> small non-zero floating point numbers return false instead of true.
> >>>>>>
> >>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
> >>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
> >>>>>> ---
> >>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
> >>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> index db20c71..7d62d7e 100644
> >>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
> >>>>>>        bld.MOV(result, negate(op[0]));
> >>>>>>        break;
> >>>>>>
> >>>>>> -   case nir_op_f2b:
> >>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> >>>>>> -      break;
> >>>>>> +   case nir_op_f2b: {
> >>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
> >>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
> >>>>>> +       */
> >>>>>> +      if (devinfo->gen >= 8)
> >>>>>> +         op[0] = resolve_source_modifiers(op[0]);
> >>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> >>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> >>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
> >>>>>> +       break;
> >>>>>> +   }
> >>>>>> +
> >>>>>>     case nir_op_i2b:
> >>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
> >>>>>>        break;
> >>>>>>
> >>>>>
> >>>>> Does that fix anything? I don't really see a problem with the existing
> >>>>> logic. Yes any "non-zero value" should be converted to true. But surely
> >>>>> that definition cannot include denorms, which you are always allowed to
> >>>>> flush to zero.
> >>>>> (Albeit I can't tell what the result would be with NaNs with the float
> >>>>> compare, nor what the result actually should be in this case since glsl
> >>>>> doesn't require NaNs neither.)
> >>>>
> >>>> Based on this and Jason's comments, I think we need a bunch of new tests.
> >>>>
> >>>>  - smallest positive normal number
> >>>>  - abs of smallest positive normal number
> >>>>  - neg of     "       "        "      "
> >>>>  - largest positive subnormal number
> >>>>  - abs of largest positive subnormal number
> >>>>  - neg of    "        "        "        "
> >>>>  - all of the above with negative numbers
> >>>>  - NaN
> >>>>  - abs of NaN
> >>>>  - neg of  "
> >>>>
> >>>> Perhaps others? +/-Inf just for kicks?
> >>>
> >>> What's the point?  The result of most of the above (except possibly
> >>> bool(NaN)) is undefined by the spec:
> >>>
> >>> "Any denormalized value input into a shader or potentially generated by
> >>>  any operation in a shader can be flushed to 0. [...] NaNs are not
> >>>  required to be generated. [...] Operations and built-in functions that
> >>>  operate on a NaN are not required to return a NaN as the result."
> >>
> >> Except that apparently one major OpenGL vendor does something well
> >> defined that's different than what we do.
> >
> > I'm skeptical that nVidia would treat single-precision denorms
> > inconsistently between datatype constructors and other floating-point
> > arithmetic, but assuming that's the case it would be an argument for
> > proposing the spec change to Khronos rather than introducing a dubiously
> > compliant change into the back-end.  I think I would argue against
> > making such a change in the spec in any case, because even though it's
> > implementation-defined whether denorms are flushed or not, the following
> > is guaranteed by the spec AFAIUI:
> >
> > | if (bool(f))
> > |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
> > |                              denorms are flushed */);
> >
> > With this change in, bool(f) would evaluate to true even if f is a
> > denorm which is flushed to zero for all subsequent arithmetic in the
> > block.  For that reason this seems more likely to break stuff than to
> > fix stuff, it's not like applications can expect denorms to be
> > representable at all regardless of what nVidia does, but they can expect
> > the above GLSL source to work.
> 
> I'd be curious to see a shader test that triggers the relevant
> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
> per-instruction setting though [since Fermi], so it's technically
> feasible not to flush sometimes and to flush other times.

This one:
https://github.com/Igalia/piglit/blob/arb_gpu_shader_fp64/tests/spec/arb_gpu_shader_fp64/execution/fs-conversion-explicit-double-bool-bad.shader_test

That test is for doubles. The author of the test told me that the
equivalent version for float also passed, but I have just verified that
he made a mistake. In summary: that test passes for double and fails for
float... 

The quote from the spec shared by Curro says:

"Any denormalized value input into a shader or potentially generated by
 any operation in a shader can be flushed to 0"

It says _can_ instead of _must_, so I share Curro's point of view that
this is in fact undefined behavior by the spec and f2b of a denorm could
return true or false and both are valid implementations. I wonder if
nvidia's case of flushing floats but not doubles is also acceptable
though, sounds like a bug to me.

In any case, based on this I think we can drop the patches.

Iago
Am 26.02.2016 um 11:25 schrieb Iago Toral:
> 
> On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
>> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
>>> Ian Romanick <idr@freedesktop.org> writes:
>>>
>>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>>>>> Ian Romanick <idr@freedesktop.org> writes:
>>>>>
>>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>>>>>>>> From the OpenGL 4.2 spec:
>>>>>>>>
>>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>>>>>>>>  true."
>>>>>>>>
>>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>>>>>>>>
>>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>>>>>>>> subject to floating-point precision limitations, and as a result, relatively
>>>>>>>> small non-zero floating point numbers return false instead of true.
>>>>>>>>
>>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>>>>>>>> ---
>>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>>>> index db20c71..7d62d7e 100644
>>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>>>>>>>>        bld.MOV(result, negate(op[0]));
>>>>>>>>        break;
>>>>>>>>
>>>>>>>> -   case nir_op_f2b:
>>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>>>>>>>> -      break;
>>>>>>>> +   case nir_op_f2b: {
>>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>>>>>>>> +       */
>>>>>>>> +      if (devinfo->gen >= 8)
>>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>>>>>>>> +       break;
>>>>>>>> +   }
>>>>>>>> +
>>>>>>>>     case nir_op_i2b:
>>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>>>>>>>>        break;
>>>>>>>>
>>>>>>>
>>>>>>> Does that fix anything? I don't really see a problem with the existing
>>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>>>>>>> that definition cannot include denorms, which you are always allowed to
>>>>>>> flush to zero.
>>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>>>>>>> compare, nor what the result actually should be in this case since glsl
>>>>>>> doesn't require NaNs neither.)
>>>>>>
>>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>>>>>>
>>>>>>  - smallest positive normal number
>>>>>>  - abs of smallest positive normal number
>>>>>>  - neg of     "       "        "      "
>>>>>>  - largest positive subnormal number
>>>>>>  - abs of largest positive subnormal number
>>>>>>  - neg of    "        "        "        "
>>>>>>  - all of the above with negative numbers
>>>>>>  - NaN
>>>>>>  - abs of NaN
>>>>>>  - neg of  "
>>>>>>
>>>>>> Perhaps others? +/-Inf just for kicks?
>>>>>
>>>>> What's the point?  The result of most of the above (except possibly
>>>>> bool(NaN)) is undefined by the spec:
>>>>>
>>>>> "Any denormalized value input into a shader or potentially generated by
>>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>>>>>  required to be generated. [...] Operations and built-in functions that
>>>>>  operate on a NaN are not required to return a NaN as the result."
>>>>
>>>> Except that apparently one major OpenGL vendor does something well
>>>> defined that's different than what we do.
>>>
>>> I'm skeptical that nVidia would treat single-precision denorms
>>> inconsistently between datatype constructors and other floating-point
>>> arithmetic, but assuming that's the case it would be an argument for
>>> proposing the spec change to Khronos rather than introducing a dubiously
>>> compliant change into the back-end.  I think I would argue against
>>> making such a change in the spec in any case, because even though it's
>>> implementation-defined whether denorms are flushed or not, the following
>>> is guaranteed by the spec AFAIUI:
>>>
>>> | if (bool(f))
>>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
>>> |                              denorms are flushed */);
>>>
>>> With this change in, bool(f) would evaluate to true even if f is a
>>> denorm which is flushed to zero for all subsequent arithmetic in the
>>> block.  For that reason this seems more likely to break stuff than to
>>> fix stuff, it's not like applications can expect denorms to be
>>> representable at all regardless of what nVidia does, but they can expect
>>> the above GLSL source to work.
>>
>> I'd be curious to see a shader test that triggers the relevant
>> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
>> per-instruction setting though [since Fermi], so it's technically
>> feasible not to flush sometimes and to flush other times.
> 
> This one:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e= 
> 
> That test is for doubles. The author of the test told me that the
> equivalent version for float also passed, but I have just verified that
> he made a mistake. In summary: that test passes for double and fails for
> float... 
> 
> The quote from the spec shared by Curro says:
> 
> "Any denormalized value input into a shader or potentially generated by
>  any operation in a shader can be flushed to 0"
> 
> It says _can_ instead of _must_, so I share Curro's point of view that
> this is in fact undefined behavior by the spec and f2b of a denorm could
> return true or false and both are valid implementations. I wonder if
> nvidia's case of flushing floats but not doubles is also acceptable
> though, sounds like a bug to me.

GL is always "can" for such things, and never "must" :-).
(d3d10 would require "must be flushed" for floats.)
Flushing denorms for floats but not for doubles looks perfectly fine to
me too, there's quite some hw which can't do denorms for floats (at
least not without compromises) but can do it for doubles (because
doubles had a low rate and were only useful for HPC, whereas floats need
to favor performance over everything else, essentially).

Roland
On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote:
> Am 26.02.2016 um 11:25 schrieb Iago Toral:
> > 
> > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
> >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
> >>> Ian Romanick <idr@freedesktop.org> writes:
> >>>
> >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
> >>>>> Ian Romanick <idr@freedesktop.org> writes:
> >>>>>
> >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
> >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
> >>>>>>>> From the OpenGL 4.2 spec:
> >>>>>>>>
> >>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
> >>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
> >>>>>>>>  true."
> >>>>>>>>
> >>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
> >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
> >>>>>>>>
> >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
> >>>>>>>> subject to floating-point precision limitations, and as a result, relatively
> >>>>>>>> small non-zero floating point numbers return false instead of true.
> >>>>>>>>
> >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
> >>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
> >>>>>>>> ---
> >>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
> >>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> index db20c71..7d62d7e 100644
> >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
> >>>>>>>>        bld.MOV(result, negate(op[0]));
> >>>>>>>>        break;
> >>>>>>>>
> >>>>>>>> -   case nir_op_f2b:
> >>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
> >>>>>>>> -      break;
> >>>>>>>> +   case nir_op_f2b: {
> >>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
> >>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
> >>>>>>>> +       */
> >>>>>>>> +      if (devinfo->gen >= 8)
> >>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
> >>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
> >>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
> >>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
> >>>>>>>> +       break;
> >>>>>>>> +   }
> >>>>>>>> +
> >>>>>>>>     case nir_op_i2b:
> >>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
> >>>>>>>>        break;
> >>>>>>>>
> >>>>>>>
> >>>>>>> Does that fix anything? I don't really see a problem with the existing
> >>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
> >>>>>>> that definition cannot include denorms, which you are always allowed to
> >>>>>>> flush to zero.
> >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
> >>>>>>> compare, nor what the result actually should be in this case since glsl
> >>>>>>> doesn't require NaNs neither.)
> >>>>>>
> >>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
> >>>>>>
> >>>>>>  - smallest positive normal number
> >>>>>>  - abs of smallest positive normal number
> >>>>>>  - neg of     "       "        "      "
> >>>>>>  - largest positive subnormal number
> >>>>>>  - abs of largest positive subnormal number
> >>>>>>  - neg of    "        "        "        "
> >>>>>>  - all of the above with negative numbers
> >>>>>>  - NaN
> >>>>>>  - abs of NaN
> >>>>>>  - neg of  "
> >>>>>>
> >>>>>> Perhaps others? +/-Inf just for kicks?
> >>>>>
> >>>>> What's the point?  The result of most of the above (except possibly
> >>>>> bool(NaN)) is undefined by the spec:
> >>>>>
> >>>>> "Any denormalized value input into a shader or potentially generated by
> >>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
> >>>>>  required to be generated. [...] Operations and built-in functions that
> >>>>>  operate on a NaN are not required to return a NaN as the result."
> >>>>
> >>>> Except that apparently one major OpenGL vendor does something well
> >>>> defined that's different than what we do.
> >>>
> >>> I'm skeptical that nVidia would treat single-precision denorms
> >>> inconsistently between datatype constructors and other floating-point
> >>> arithmetic, but assuming that's the case it would be an argument for
> >>> proposing the spec change to Khronos rather than introducing a dubiously
> >>> compliant change into the back-end.  I think I would argue against
> >>> making such a change in the spec in any case, because even though it's
> >>> implementation-defined whether denorms are flushed or not, the following
> >>> is guaranteed by the spec AFAIUI:
> >>>
> >>> | if (bool(f))
> >>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
> >>> |                              denorms are flushed */);
> >>>
> >>> With this change in, bool(f) would evaluate to true even if f is a
> >>> denorm which is flushed to zero for all subsequent arithmetic in the
> >>> block.  For that reason this seems more likely to break stuff than to
> >>> fix stuff, it's not like applications can expect denorms to be
> >>> representable at all regardless of what nVidia does, but they can expect
> >>> the above GLSL source to work.
> >>
> >> I'd be curious to see a shader test that triggers the relevant
> >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
> >> per-instruction setting though [since Fermi], so it's technically
> >> feasible not to flush sometimes and to flush other times.
> > 
> > This one:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e= 
> > 
> > That test is for doubles. The author of the test told me that the
> > equivalent version for float also passed, but I have just verified that
> > he made a mistake. In summary: that test passes for double and fails for
> > float... 
> > 
> > The quote from the spec shared by Curro says:
> > 
> > "Any denormalized value input into a shader or potentially generated by
> >  any operation in a shader can be flushed to 0"
> > 
> > It says _can_ instead of _must_, so I share Curro's point of view that
> > this is in fact undefined behavior by the spec and f2b of a denorm could
> > return true or false and both are valid implementations. I wonder if
> > nvidia's case of flushing floats but not doubles is also acceptable
> > though, sounds like a bug to me.
> 
> GL is always "can" for such things, and never "must" :-).
> (d3d10 would require "must be flushed" for floats.)
> Flushing denorms for floats but not for doubles looks perfectly fine to
> me too, there's quite some hw which can't do denorms for floats (at
> least not without compromises) but can do it for doubles (because
> doubles had a low rate and were only useful for HPC, whereas floats need
> to favor performance over everything else, essentially).

Ah, interesting, thanks for sharing this!

In that case I wonder what we should do for doubles in the case of i965.
I have patches for both implementations, so which one do we want?

Iago
On Fri, Feb 26, 2016 at 10:39 AM, Iago Toral <itoral@igalia.com> wrote:
> On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote:
>> Am 26.02.2016 um 11:25 schrieb Iago Toral:
>> >
>> > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
>> >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
>> >>> Ian Romanick <idr@freedesktop.org> writes:
>> >>>
>> >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>> >>>>> Ian Romanick <idr@freedesktop.org> writes:
>> >>>>>
>> >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>> >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>> >>>>>>>> From the OpenGL 4.2 spec:
>> >>>>>>>>
>> >>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>> >>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>> >>>>>>>>  true."
>> >>>>>>>>
>> >>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>> >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>> >>>>>>>>
>> >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>> >>>>>>>> subject to floating-point precision limitations, and as a result, relatively
>> >>>>>>>> small non-zero floating point numbers return false instead of true.
>> >>>>>>>>
>> >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>> >>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>> >>>>>>>> ---
>> >>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>> >>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>>>>>>>
>> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> index db20c71..7d62d7e 100644
>> >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>> >>>>>>>>        bld.MOV(result, negate(op[0]));
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>> -   case nir_op_f2b:
>> >>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>> >>>>>>>> -      break;
>> >>>>>>>> +   case nir_op_f2b: {
>> >>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>> >>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>> >>>>>>>> +       */
>> >>>>>>>> +      if (devinfo->gen >= 8)
>> >>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>> >>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>> >>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>> >>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>> >>>>>>>> +       break;
>> >>>>>>>> +   }
>> >>>>>>>> +
>> >>>>>>>>     case nir_op_i2b:
>> >>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Does that fix anything? I don't really see a problem with the existing
>> >>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>> >>>>>>> that definition cannot include denorms, which you are always allowed to
>> >>>>>>> flush to zero.
>> >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>> >>>>>>> compare, nor what the result actually should be in this case since glsl
>> >>>>>>> doesn't require NaNs neither.)
>> >>>>>>
>> >>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>> >>>>>>
>> >>>>>>  - smallest positive normal number
>> >>>>>>  - abs of smallest positive normal number
>> >>>>>>  - neg of     "       "        "      "
>> >>>>>>  - largest positive subnormal number
>> >>>>>>  - abs of largest positive subnormal number
>> >>>>>>  - neg of    "        "        "        "
>> >>>>>>  - all of the above with negative numbers
>> >>>>>>  - NaN
>> >>>>>>  - abs of NaN
>> >>>>>>  - neg of  "
>> >>>>>>
>> >>>>>> Perhaps others? +/-Inf just for kicks?
>> >>>>>
>> >>>>> What's the point?  The result of most of the above (except possibly
>> >>>>> bool(NaN)) is undefined by the spec:
>> >>>>>
>> >>>>> "Any denormalized value input into a shader or potentially generated by
>> >>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>> >>>>>  required to be generated. [...] Operations and built-in functions that
>> >>>>>  operate on a NaN are not required to return a NaN as the result."
>> >>>>
>> >>>> Except that apparently one major OpenGL vendor does something well
>> >>>> defined that's different than what we do.
>> >>>
>> >>> I'm skeptical that nVidia would treat single-precision denorms
>> >>> inconsistently between datatype constructors and other floating-point
>> >>> arithmetic, but assuming that's the case it would be an argument for
>> >>> proposing the spec change to Khronos rather than introducing a dubiously
>> >>> compliant change into the back-end.  I think I would argue against
>> >>> making such a change in the spec in any case, because even though it's
>> >>> implementation-defined whether denorms are flushed or not, the following
>> >>> is guaranteed by the spec AFAIUI:
>> >>>
>> >>> | if (bool(f))
>> >>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
>> >>> |                              denorms are flushed */);
>> >>>
>> >>> With this change in, bool(f) would evaluate to true even if f is a
>> >>> denorm which is flushed to zero for all subsequent arithmetic in the
>> >>> block.  For that reason this seems more likely to break stuff than to
>> >>> fix stuff, it's not like applications can expect denorms to be
>> >>> representable at all regardless of what nVidia does, but they can expect
>> >>> the above GLSL source to work.
>> >>
>> >> I'd be curious to see a shader test that triggers the relevant
>> >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
>> >> per-instruction setting though [since Fermi], so it's technically
>> >> feasible not to flush sometimes and to flush other times.
>> >
>> > This one:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e=
>> >
>> > That test is for doubles. The author of the test told me that the
>> > equivalent version for float also passed, but I have just verified that
>> > he made a mistake. In summary: that test passes for double and fails for
>> > float...
>> >
>> > The quote from the spec shared by Curro says:
>> >
>> > "Any denormalized value input into a shader or potentially generated by
>> >  any operation in a shader can be flushed to 0"
>> >
>> > It says _can_ instead of _must_, so I share Curro's point of view that
>> > this is in fact undefined behavior by the spec and f2b of a denorm could
>> > return true or false and both are valid implementations. I wonder if
>> > nvidia's case of flushing floats but not doubles is also acceptable
>> > though, sounds like a bug to me.
>>
>> GL is always "can" for such things, and never "must" :-).
>> (d3d10 would require "must be flushed" for floats.)
>> Flushing denorms for floats but not for doubles looks perfectly fine to
>> me too, there's quite some hw which can't do denorms for floats (at
>> least not without compromises) but can do it for doubles (because
>> doubles had a low rate and were only useful for HPC, whereas floats need
>> to favor performance over everything else, essentially).
>
> Ah, interesting, thanks for sharing this!
>
> In that case I wonder what we should do for doubles in the case of i965.
> I have patches for both implementations, so which one do we want?

For GLSL it doesn't really matter, I'd go with whatever way is simpler
to implement.

>
> Iago
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Iago Toral <itoral@igalia.com> writes:

> On Fri, 2016-02-26 at 16:28 +0100, Roland Scheidegger wrote:
>> Am 26.02.2016 um 11:25 schrieb Iago Toral:
>> > 
>> > On Thu, 2016-02-25 at 18:20 -0500, Ilia Mirkin wrote:
>> >> On Thu, Feb 25, 2016 at 6:16 PM, Francisco Jerez <currojerez@riseup.net> wrote:
>> >>> Ian Romanick <idr@freedesktop.org> writes:
>> >>>
>> >>>> On 02/25/2016 12:13 PM, Francisco Jerez wrote:
>> >>>>> Ian Romanick <idr@freedesktop.org> writes:
>> >>>>>
>> >>>>>> On 02/25/2016 08:46 AM, Roland Scheidegger wrote:
>> >>>>>>> Am 25.02.2016 um 11:15 schrieb Iago Toral Quiroga:
>> >>>>>>>> From the OpenGL 4.2 spec:
>> >>>>>>>>
>> >>>>>>>> "When a constructor is used to convert any integer or floating-point type to a
>> >>>>>>>>  bool, 0 and 0.0 are converted to false, and non-zero values are converted to
>> >>>>>>>>  true."
>> >>>>>>>>
>> >>>>>>>> Thus, even the smallest non-zero floating value should be translated to true.
>> >>>>>>>> This behavior has been verified also with proprietary NVIDIA drivers.
>> >>>>>>>>
>> >>>>>>>> Currently, we implement this conversion as a cmp.nz operation with floats,
>> >>>>>>>> subject to floating-point precision limitations, and as a result, relatively
>> >>>>>>>> small non-zero floating point numbers return false instead of true.
>> >>>>>>>>
>> >>>>>>>> This patch fixes the problem by getting rid of the sign bit (to cover the case
>> >>>>>>>> of -0.0) and testing the result against 0u using an integer comparison instead.
>> >>>>>>>> ---
>> >>>>>>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 ++++++++++++---
>> >>>>>>>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>>>>>>>
>> >>>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> index db20c71..7d62d7e 100644
>> >>>>>>>> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >>>>>>>> @@ -913,9 +913,18 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr)
>> >>>>>>>>        bld.MOV(result, negate(op[0]));
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>> -   case nir_op_f2b:
>> >>>>>>>> -      bld.CMP(result, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
>> >>>>>>>> -      break;
>> >>>>>>>> +   case nir_op_f2b: {
>> >>>>>>>> +      /* Because comparing to 0.0f is subject to precision limitations, do the
>> >>>>>>>> +       * comparison using integers (we need to get rid of the sign bit for that)
>> >>>>>>>> +       */
>> >>>>>>>> +      if (devinfo->gen >= 8)
>> >>>>>>>> +         op[0] = resolve_source_modifiers(op[0]);
>> >>>>>>>> +      op[0] = retype(op[0], BRW_REGISTER_TYPE_UD);
>> >>>>>>>> +      bld.AND(op[0], op[0], brw_imm_ud(0x7FFFFFFFu));
>> >>>>>>>> +      bld.CMP(result, op[0], brw_imm_ud(0u), BRW_CONDITIONAL_NZ);
>> >>>>>>>> +       break;
>> >>>>>>>> +   }
>> >>>>>>>> +
>> >>>>>>>>     case nir_op_i2b:
>> >>>>>>>>        bld.CMP(result, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ);
>> >>>>>>>>        break;
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Does that fix anything? I don't really see a problem with the existing
>> >>>>>>> logic. Yes any "non-zero value" should be converted to true. But surely
>> >>>>>>> that definition cannot include denorms, which you are always allowed to
>> >>>>>>> flush to zero.
>> >>>>>>> (Albeit I can't tell what the result would be with NaNs with the float
>> >>>>>>> compare, nor what the result actually should be in this case since glsl
>> >>>>>>> doesn't require NaNs neither.)
>> >>>>>>
>> >>>>>> Based on this and Jason's comments, I think we need a bunch of new tests.
>> >>>>>>
>> >>>>>>  - smallest positive normal number
>> >>>>>>  - abs of smallest positive normal number
>> >>>>>>  - neg of     "       "        "      "
>> >>>>>>  - largest positive subnormal number
>> >>>>>>  - abs of largest positive subnormal number
>> >>>>>>  - neg of    "        "        "        "
>> >>>>>>  - all of the above with negative numbers
>> >>>>>>  - NaN
>> >>>>>>  - abs of NaN
>> >>>>>>  - neg of  "
>> >>>>>>
>> >>>>>> Perhaps others? +/-Inf just for kicks?
>> >>>>>
>> >>>>> What's the point?  The result of most of the above (except possibly
>> >>>>> bool(NaN)) is undefined by the spec:
>> >>>>>
>> >>>>> "Any denormalized value input into a shader or potentially generated by
>> >>>>>  any operation in a shader can be flushed to 0. [...] NaNs are not
>> >>>>>  required to be generated. [...] Operations and built-in functions that
>> >>>>>  operate on a NaN are not required to return a NaN as the result."
>> >>>>
>> >>>> Except that apparently one major OpenGL vendor does something well
>> >>>> defined that's different than what we do.
>> >>>
>> >>> I'm skeptical that nVidia would treat single-precision denorms
>> >>> inconsistently between datatype constructors and other floating-point
>> >>> arithmetic, but assuming that's the case it would be an argument for
>> >>> proposing the spec change to Khronos rather than introducing a dubiously
>> >>> compliant change into the back-end.  I think I would argue against
>> >>> making such a change in the spec in any case, because even though it's
>> >>> implementation-defined whether denorms are flushed or not, the following
>> >>> is guaranteed by the spec AFAIUI:
>> >>>
>> >>> | if (bool(f))
>> >>> |    random_arithmetic_on(f /* Guaranteed not to be zero here even if
>> >>> |                              denorms are flushed */);
>> >>>
>> >>> With this change in, bool(f) would evaluate to true even if f is a
>> >>> denorm which is flushed to zero for all subsequent arithmetic in the
>> >>> block.  For that reason this seems more likely to break stuff than to
>> >>> fix stuff, it's not like applications can expect denorms to be
>> >>> representable at all regardless of what nVidia does, but they can expect
>> >>> the above GLSL source to work.
>> >>
>> >> I'd be curious to see a shader test that triggers the relevant
>> >> behaviour -- AFAIK nvidia always flushes denorms. The flush is a
>> >> per-instruction setting though [since Fermi], so it's technically
>> >> feasible not to flush sometimes and to flush other times.
>> > 
>> > This one:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Igalia_piglit_blob_arb-5Fgpu-5Fshader-5Ffp64_tests_spec_arb-5Fgpu-5Fshader-5Ffp64_execution_fs-2Dconversion-2Dexplicit-2Ddouble-2Dbool-2Dbad.shader-5Ftest&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=K5Q85JZWVAJuljMHdoEb9NzJVH29wma5HWxq4K5UNs0&s=knFllQfBVoab6tHDLvgEdZUWb7WW77hBt62DQT6Co3s&e= 
>> > 
>> > That test is for doubles. The author of the test told me that the
>> > equivalent version for float also passed, but I have just verified that
>> > he made a mistake. In summary: that test passes for double and fails for
>> > float... 
>> > 
>> > The quote from the spec shared by Curro says:
>> > 
>> > "Any denormalized value input into a shader or potentially generated by
>> >  any operation in a shader can be flushed to 0"
>> > 
>> > It says _can_ instead of _must_, so I share Curro's point of view that
>> > this is in fact undefined behavior by the spec and f2b of a denorm could

Strictly speaking it's not undefined but implementation-defined
behavior.  Either way I don't think the test above is correct, because
implementations are free to flush denorms to zero for either double or
single precision floats.

>> > return true or false and both are valid implementations. I wonder if
>> > nvidia's case of flushing floats but not doubles is also acceptable
>> > though, sounds like a bug to me.
>> 
>> GL is always "can" for such things, and never "must" :-).
>> (d3d10 would require "must be flushed" for floats.)
>> Flushing denorms for floats but not for doubles looks perfectly fine to
>> me too, there's quite some hw which can't do denorms for floats (at
>> least not without compromises) but can do it for doubles (because
>> doubles had a low rate and were only useful for HPC, whereas floats need
>> to favor performance over everything else, essentially).
>
> Ah, interesting, thanks for sharing this!
>
Yeah, on nVidia hardware it used to be the case (on older compute
capability 1.3 hardware, earlier Tesla hardware didn't support
double-precision floating point and always flushed denorms AFAIK) that
handling denormalized numbers was basically for free for much of the
double-precision pipeline (for some math instructions it did incur
substantial overhead which might be why you can disable it for some
instructions as Ilia pointed out), but the single-precision pipeline
wouldn't handle denorms at all.  Fermi and up handle denorms for both
single and double precision and a subset of arithmetic instructions
support forcing denorms to zero (the set may be different for single and
double-precision type which may explain why you don't get consistent
results from the test above).

> In that case I wonder what we should do for doubles in the case of i965.
> I have patches for both implementations, so which one do we want?

On Intel hardware the denorm handling mode is controlled globally in the
cr0 register (which in theory you can modify at run-time to mix both
behaviors, but I don't think that's a good idea for now :P).  The mode
can be set independently for each floating point type: Gen7 supports
preserving denorms on DF only, Gen8 supports them on DF and F, and Gen9
supports them on DF, F and HF.  In doubt I'd suggest to just leave them
disabled in all cases for now and until we have a notion of the impact
of handling denorms on performance.  It's not necessary for GLSL anyway.

>
> Iago