[RFC,2/4] nir: Add a new ALU nir_op_imad

Submitted by Eduardo Lima Mitev on Jan. 25, 2019, 3:48 p.m.

Details

Message ID 20190125154833.25757-3-elima@igalia.com
State New
Series "freedreno: Move some compiler offset computations to NIR"
Headers show

Commit Message

Eduardo Lima Mitev Jan. 25, 2019, 3:48 p.m.
ir3 compiler has an integer multiply-add instruction (IMAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can represent it.
---
 src/compiler/nir/nir_opcodes.py | 1 +
 1 file changed, 1 insertion(+)

Patch hide | download patch | download mbox

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index d32005846a6..b61845fd514 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -754,6 +754,7 @@  def triop_horiz(name, output_size, src1_size, src2_size, src3_size, const_expr):
    [tuint, tuint, tuint], "", const_expr)
 
 triop("ffma", tfloat, "src0 * src1 + src2")
+triop("imad", tint, "src0 * src1 + src2")
 
 triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
 

Comments

Ilia Mirkin Jan. 25, 2019, 3:58 p.m.
IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
had this, and I think maxwell+ has a variant of this implemented by
XMAD):

(src0 * src1) & 0xffffff + src2

Cheers,

  -ilia

On Fri, Jan 25, 2019 at 10:49 AM Eduardo Lima Mitev <elima@igalia.com> wrote:
>
> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..b61845fd514 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, src3_size, const_expr):
>     [tuint, tuint, tuint], "", const_expr)
>
>  triop("ffma", tfloat, "src0 * src1 + src2")
> +triop("imad", tint, "src0 * src1 + src2")
>
>  triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
>
> --
> 2.20.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Ilia Mirkin Jan. 25, 2019, 4:01 p.m.
On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>
> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> had this, and I think maxwell+ has a variant of this implemented by
> XMAD):
>
> (src0 * src1) & 0xffffff + src2

And of course even that's wrong... the 24th bit has to get
sign-extended on that. Can express it with shifts.
Eduardo Lima Mitev Jan. 25, 2019, 4:06 p.m.
On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>>
>> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
>> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
>> had this, and I think maxwell+ has a variant of this implemented by
>> XMAD):
>>
>> (src0 * src1) & 0xffffff + src2
> 
> And of course even that's wrong... the 24th bit has to get
> sign-extended on that. Can express it with shifts.
>

IMAD_S24 is what is currently used in
ir3_compiler_nir::get_image_offset(), so the pass doesn't change
anything regarding computations.

I agree that the nir opcode should hint at the bit limit, so probably
nir_op_imad24. That is one of the open questions.

Thanks,

Eduardo
Ilia Mirkin Jan. 25, 2019, 4:07 p.m.
The specification in NIR has to be exact. Otherwise it will
constant-fold in a way that doesn't reflect what the hardware would
do, leading to subtle bugs.

On Fri, Jan 25, 2019 at 11:06 AM Eduardo Lima Mitev <elima@igalia.com> wrote:
>
> On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> >>
> >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
> >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
> >> had this, and I think maxwell+ has a variant of this implemented by
> >> XMAD):
> >>
> >> (src0 * src1) & 0xffffff + src2
> >
> > And of course even that's wrong... the 24th bit has to get
> > sign-extended on that. Can express it with shifts.
> >
>
> IMAD_S24 is what is currently used in
> ir3_compiler_nir::get_image_offset(), so the pass doesn't change
> anything regarding computations.
>
> I agree that the nir opcode should hint at the bit limit, so probably
> nir_op_imad24. That is one of the open questions.
>
> Thanks,
>
> Eduardo
>
>
Eric Anholt Jan. 25, 2019, 5:46 p.m.
Eduardo Lima Mitev <elima@igalia.com> writes:

> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
> that is used for different offset calculations in the backend.
> Since we intend to move some of these calculations to NIR, we need
> a new ALU op that can represent it.
> ---
>  src/compiler/nir/nir_opcodes.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d32005846a6..b61845fd514 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, src3_size, const_expr):
>     [tuint, tuint, tuint], "", const_expr)
>  
>  triop("ffma", tfloat, "src0 * src1 + src2")
> +triop("imad", tint, "src0 * src1 + src2")

The constant-folding expression should be corrected to what the backend
operation actually does, and you should probably call it imad24 or
something in that case.
Eduardo Lima Mitev Jan. 26, 2019, 8:40 a.m.
On 1/25/19 6:46 PM, Eric Anholt wrote:
> Eduardo Lima Mitev <elima@igalia.com> writes:
> 
>> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
>> that is used for different offset calculations in the backend.
>> Since we intend to move some of these calculations to NIR, we need
>> a new ALU op that can represent it.
>> ---
>>  src/compiler/nir/nir_opcodes.py | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
>> index d32005846a6..b61845fd514 100644
>> --- a/src/compiler/nir/nir_opcodes.py
>> +++ b/src/compiler/nir/nir_opcodes.py
>> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, src3_size, const_expr):
>>     [tuint, tuint, tuint], "", const_expr)
>>  
>>  triop("ffma", tfloat, "src0 * src1 + src2")
>> +triop("imad", tint, "src0 * src1 + src2")
> 
> The constant-folding expression should be corrected to what the backend
> operation actually does, and you should probably call it imad24 or
> something in that case.
> 

Roger. Got similar feedback from Ilia. Will fix that.

Eduardo