nv50/ir: Add mul and mod constant optimizations

Submitted by Mark Menzynski on July 23, 2019, 2:19 p.m.

Details

Message ID 20190723141946.29330-1-mmenzyns@redhat.com
State New
Headers show
Series "nv50/ir: Add mul and mod constant optimizations" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Mark Menzynski July 23, 2019, 2:19 p.m.
Optimizations for 0/n, 1/n and 0%n.
No changes in shader db tests, because it is never used here, but it
should become handy.

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
---
 .../nouveau/codegen/nv50_ir_peephole.cpp      | 30 +++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 0b3220903b9..12069e19808 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -1177,10 +1177,28 @@  ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
       break;
 
    case OP_DIV:
-      if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
+      if (i->dType != TYPE_S32 && i->dType != TYPE_U32)
          break;
+
       bld.setPosition(i, false);
-      if (imm0.reg.data.u32 == 0) {
+      if (s == 0) {
+         if (imm0.reg.data.u32 == 0) {
+            i->op = OP_MOV;
+            i->setSrc(1, NULL);
+         }
+         else if (imm0.reg.data.u32 == 1) {
+            Value *tA, *tB;
+            Instruction *slct;
+
+            tA = bld.mkOp1v(OP_ABS, TYPE_U32, bld.getSSA(), i->getSrc(1));
+            tB = bld.mkOp2v(OP_ADD, TYPE_S32, bld.getSSA(), tA, bld.loadImm(NULL, -1));
+            slct = bld.mkCmp(OP_SLCT, CC_GT, i->dType, bld.getSSA(), TYPE_U32, bld.loadImm(NULL, 0), i->getSrc(1), tB);
+            i->def(0).replace(slct->getDef(0), false);
+         }
+         break;
+      }
+
+      if (s != 1 || imm0.reg.data.u32 == 0) {
          break;
       } else
       if (imm0.reg.data.u32 == 1) {
@@ -1259,6 +1277,14 @@  ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
       break;
 
    case OP_MOD:
+      if (s == 0) {
+         if (imm0.reg.data.u32 == 0) {
+            i->op = OP_MOV;
+            i->setSrc(1, NULL);
+         }
+         break;
+      }
+
       if (s == 1 && imm0.isPow2()) {
          bld.setPosition(i, false);
          if (i->sType == TYPE_U32) {

Comments

You handle 1/n but not 1%n? TBH, the 1/n code isn't 100% obvious to
me... 1/n = |n|-1 > 0 ? .... i forget how SLCT works, but I can't
think of a way to finish that expression in terms of |n|-1 and n. And
what about n == 0. I'd just as soon drop that case.

On Tue, Jul 23, 2019 at 10:20 AM Mark Menzynski <mmenzyns@redhat.com> wrote:
>
> Optimizations for 0/n, 1/n and 0%n.
> No changes in shader db tests, because it is never used here, but it
> should become handy.
>
> Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
> ---
>  .../nouveau/codegen/nv50_ir_peephole.cpp      | 30 +++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 0b3220903b9..12069e19808 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -1177,10 +1177,28 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
>        break;
>
>     case OP_DIV:
> -      if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
> +      if (i->dType != TYPE_S32 && i->dType != TYPE_U32)
>           break;
> +
>        bld.setPosition(i, false);
> -      if (imm0.reg.data.u32 == 0) {
> +      if (s == 0) {
> +         if (imm0.reg.data.u32 == 0) {
> +            i->op = OP_MOV;
> +            i->setSrc(1, NULL);
> +         }
> +         else if (imm0.reg.data.u32 == 1) {
> +            Value *tA, *tB;
> +            Instruction *slct;
> +
> +            tA = bld.mkOp1v(OP_ABS, TYPE_U32, bld.getSSA(), i->getSrc(1));
> +            tB = bld.mkOp2v(OP_ADD, TYPE_S32, bld.getSSA(), tA, bld.loadImm(NULL, -1));
> +            slct = bld.mkCmp(OP_SLCT, CC_GT, i->dType, bld.getSSA(), TYPE_U32, bld.loadImm(NULL, 0), i->getSrc(1), tB);
> +            i->def(0).replace(slct->getDef(0), false);
> +         }
> +         break;
> +      }
> +
> +      if (s != 1 || imm0.reg.data.u32 == 0) {
>           break;
>        } else
>        if (imm0.reg.data.u32 == 1) {
> @@ -1259,6 +1277,14 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
>        break;
>
>     case OP_MOD:
> +      if (s == 0) {
> +         if (imm0.reg.data.u32 == 0) {
> +            i->op = OP_MOV;
> +            i->setSrc(1, NULL);
> +         }
> +         break;
> +      }
> +
>        if (s == 1 && imm0.isPow2()) {
>           bld.setPosition(i, false);
>           if (i->sType == TYPE_U32) {
> --
> 2.21.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Tue, Jul 23, 2019 at 4:50 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>
> You handle 1/n but not 1%n? TBH, the 1/n code isn't 100% obvious to
> me... 1/n = |n|-1 > 0 ? .... i forget how SLCT works, but I can't
> think of a way to finish that expression in terms of |n|-1 and n. And
> what about n == 0. I'd just as soon drop that case.
>

is 1/0 actually defined in glsl? I thought that the result is
undefined and we can basically do whatever, no? At least intel seems
to return INT_MAX for 1/0

> On Tue, Jul 23, 2019 at 10:20 AM Mark Menzynski <mmenzyns@redhat.com> wrote:
> >
> > Optimizations for 0/n, 1/n and 0%n.
> > No changes in shader db tests, because it is never used here, but it
> > should become handy.
> >
> > Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
> > ---
> >  .../nouveau/codegen/nv50_ir_peephole.cpp      | 30 +++++++++++++++++--
> >  1 file changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> > index 0b3220903b9..12069e19808 100644
> > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> > @@ -1177,10 +1177,28 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
> >        break;
> >
> >     case OP_DIV:
> > -      if (s != 1 || (i->dType != TYPE_S32 && i->dType != TYPE_U32))
> > +      if (i->dType != TYPE_S32 && i->dType != TYPE_U32)
> >           break;
> > +
> >        bld.setPosition(i, false);
> > -      if (imm0.reg.data.u32 == 0) {
> > +      if (s == 0) {
> > +         if (imm0.reg.data.u32 == 0) {
> > +            i->op = OP_MOV;
> > +            i->setSrc(1, NULL);
> > +         }
> > +         else if (imm0.reg.data.u32 == 1) {
> > +            Value *tA, *tB;
> > +            Instruction *slct;
> > +
> > +            tA = bld.mkOp1v(OP_ABS, TYPE_U32, bld.getSSA(), i->getSrc(1));
> > +            tB = bld.mkOp2v(OP_ADD, TYPE_S32, bld.getSSA(), tA, bld.loadImm(NULL, -1));
> > +            slct = bld.mkCmp(OP_SLCT, CC_GT, i->dType, bld.getSSA(), TYPE_U32, bld.loadImm(NULL, 0), i->getSrc(1), tB);
> > +            i->def(0).replace(slct->getDef(0), false);
> > +         }
> > +         break;
> > +      }
> > +
> > +      if (s != 1 || imm0.reg.data.u32 == 0) {
> >           break;
> >        } else
> >        if (imm0.reg.data.u32 == 1) {
> > @@ -1259,6 +1277,14 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s)
> >        break;
> >
> >     case OP_MOD:
> > +      if (s == 0) {
> > +         if (imm0.reg.data.u32 == 0) {
> > +            i->op = OP_MOV;
> > +            i->setSrc(1, NULL);
> > +         }
> > +         break;
> > +      }
> > +
> >        if (s == 1 && imm0.isPow2()) {
> >           bld.setPosition(i, false);
> >           if (i->sType == TYPE_U32) {
> > --
> > 2.21.0
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Tue, Jul 23, 2019 at 11:15 AM Karol Herbst <kherbst@redhat.com> wrote:
>
> On Tue, Jul 23, 2019 at 4:50 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> >
> > You handle 1/n but not 1%n? TBH, the 1/n code isn't 100% obvious to
> > me... 1/n = |n|-1 > 0 ? .... i forget how SLCT works, but I can't
> > think of a way to finish that expression in terms of |n|-1 and n. And
> > what about n == 0. I'd just as soon drop that case.
> >
>
> is 1/0 actually defined in glsl? I thought that the result is
> undefined and we can basically do whatever, no? At least intel seems
> to return INT_MAX for 1/0

If you guys really like it, just add more comments that cover my questions.
yeah.. I am not quite sure myself about it. But skipping on the div
emulation seems like a good idea in general. But it's also not common
enough to actually care all that much about it.

On Tue, Jul 23, 2019 at 5:18 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>
> On Tue, Jul 23, 2019 at 11:15 AM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > On Tue, Jul 23, 2019 at 4:50 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> > >
> > > You handle 1/n but not 1%n? TBH, the 1/n code isn't 100% obvious to
> > > me... 1/n = |n|-1 > 0 ? .... i forget how SLCT works, but I can't
> > > think of a way to finish that expression in terms of |n|-1 and n. And
> > > what about n == 0. I'd just as soon drop that case.
> > >
> >
> > is 1/0 actually defined in glsl? I thought that the result is
> > undefined and we can basically do whatever, no? At least intel seems
> > to return INT_MAX for 1/0
>
> If you guys really like it, just add more comments that cover my questions.