gallivm: Improve lp_build_rcp_refine.

Submitted by Jose Fonseca on June 25, 2019, 9:17 a.m.

Details

Message ID 20190625091740.32556-1-jfonseca@vmware.com
State New
Headers show
Series "gallivm: Improve lp_build_rcp_refine." ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Jose Fonseca June 25, 2019, 9:17 a.m.
Use the alternative more accurate expression from
https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division

Tested by enabling this code path, and running gloss mesa demo.
---
 src/gallium/auxiliary/gallivm/lp_bld_arit.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

-- 
2.17.1

Patch hide | download patch | download mbox

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 02fb81afe51..8aa5931eb69 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -2707,11 +2707,11 @@  lp_build_sqrt(struct lp_build_context *bld,
 /**
  * Do one Newton-Raphson step to improve reciprocate precision:
  *
- *   x_{i+1} = x_i * (2 - a * x_i)
+ *   x_{i+1} = x_i + x_i * (1 - a * x_i)
  *
  * XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or
  * +/-Inf, giving NaN instead.  Certain applications rely on this behavior,
- * such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's
+ * such as Google Earth, which does RCP(RSQRT(0.0)) when drawing the Earth's
  * halo. It would be necessary to clamp the argument to prevent this.
  *
  * See also:
@@ -2724,12 +2724,13 @@  lp_build_rcp_refine(struct lp_build_context *bld,
                     LLVMValueRef rcp_a)
 {
    LLVMBuilderRef builder = bld->gallivm->builder;
-   LLVMValueRef two = lp_build_const_vec(bld->gallivm, bld->type, 2.0);
    LLVMValueRef res;
 
    res = LLVMBuildFMul(builder, a, rcp_a, "");
-   res = LLVMBuildFSub(builder, two, res, "");
+   res = LLVMBuildFSub(builder, bld->one, res, "");
+
    res = LLVMBuildFMul(builder, rcp_a, res, "");
+   res = LLVMBuildFAdd(builder, rcp_a, res, "");
 
    return res;
 }

Comments

Looks good to me, albeit it's potentially minimally slower, so I'm
wondering if the higher precision is actually useful?
I guess though the last two steps could use lp_build_fmuladd?
Reviewed-by: Roland Scheidegger <sroland@vmware.com>



Am 25.06.19 um 11:17 schrieb Jose Fonseca:
> Use the alternative more accurate expression from
> https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division
> 
> Tested by enabling this code path, and running gloss mesa demo.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> index 02fb81afe51..8aa5931eb69 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
> @@ -2707,11 +2707,11 @@ lp_build_sqrt(struct lp_build_context *bld,
>  /**
>   * Do one Newton-Raphson step to improve reciprocate precision:
>   *
> - *   x_{i+1} = x_i * (2 - a * x_i)
> + *   x_{i+1} = x_i + x_i * (1 - a * x_i)
>   *
>   * XXX: Unfortunately this won't give IEEE-754 conformant results for 0 or
>   * +/-Inf, giving NaN instead.  Certain applications rely on this behavior,
> - * such as Google Earth, which does RCP(RSQRT(0.0) when drawing the Earth's
> + * such as Google Earth, which does RCP(RSQRT(0.0)) when drawing the Earth's
>   * halo. It would be necessary to clamp the argument to prevent this.
>   *
>   * See also:
> @@ -2724,12 +2724,13 @@ lp_build_rcp_refine(struct lp_build_context *bld,
>                      LLVMValueRef rcp_a)
>  {
>     LLVMBuilderRef builder = bld->gallivm->builder;
> -   LLVMValueRef two = lp_build_const_vec(bld->gallivm, bld->type, 2.0);
>     LLVMValueRef res;
>  
>     res = LLVMBuildFMul(builder, a, rcp_a, "");
> -   res = LLVMBuildFSub(builder, two, res, "");
> +   res = LLVMBuildFSub(builder, bld->one, res, "");
> +
>     res = LLVMBuildFMul(builder, rcp_a, res, "");
> +   res = LLVMBuildFAdd(builder, rcp_a, res, "");
>  
>     return res;
>  }
>
On 25/06/2019 16:22, Roland Scheidegger wrote:
> Looks good to me, albeit it's potentially minimally slower, so I'm

> wondering if the higher precision is actually useful?


It gets you an extra bit, and is necessary if you want to reach 0.5 ULP 
(otherwise it never gets there.)

Anyway, it's still disabled.  This change is mostly for reference in 
case we once want to enable this code path.  We could still use, in 
situations where we don't get or don't care for inf.  Like perspective 
correct interpolation.

> I guess though the last two steps could use lp_build_fmuladd?


Good point.  I've updated to only use fmuladd.  It might be faster and 
converge even more quickly:


@@ -2724,12 +2724,12 @@ lp_build_rcp_refine(struct lp_build_context *bld,
                      LLVMValueRef rcp_a)
  {
     LLVMBuilderRef builder = bld->gallivm->builder;
-   LLVMValueRef two = lp_build_const_vec(bld->gallivm, bld->type, 2.0);
+   LLVMValueRef neg_a;
     LLVMValueRef res;

-   res = LLVMBuildFMul(builder, a, rcp_a, "");
-   res = LLVMBuildFSub(builder, two, res, "");
-   res = LLVMBuildFMul(builder, rcp_a, res, "");
+   neg_a = LLVMBuildFNeg(builder, a, "");
+   res = lp_build_fmuladd(builder, neg_a, rcp_a, bld->one);
+   res = lp_build_fmuladd(builder, res, rcp_a, rcp_a);

     return res;
  }


Jose