[Mesa-dev,2/9] radeonsi: use V_BFE for extracting a sample index

Submitted by Marek Olšák on March 2, 2015, 11:54 a.m.

Details

Message ID 1425297263-25991-2-git-send-email-maraeo@gmail.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Marek Olšák March 2, 2015, 11:54 a.m.
From: Marek Olšák <marek.olsak@amd.com>

---
 src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
index b0417ed..f125483 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -192,6 +192,20 @@  static int get_param_index(unsigned semantic_name, unsigned index,
 }
 
 /**
+ * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
+ */
+static LLVMValueRef build_bfe(struct gallivm_state *gallivm,
+			      LLVMValueRef value, LLVMValueRef rshift,
+			      LLVMValueRef bitwidth)
+{
+	LLVMValueRef args[3] = {value, rshift, bitwidth};
+
+	return build_intrinsic(gallivm->builder, "llvm.AMDGPU.bfe.u32",
+			       LLVMInt32TypeInContext(gallivm->context),
+			       args, Elements(args), LLVMReadNoneAttribute);
+}
+
+/**
  * Build an LLVM bytecode indexed load using LLVMBuildGEP + LLVMBuildLoad.
  * It's equivalent to doing a load from &base_ptr[index].
  *
@@ -1721,7 +1735,6 @@  static void tex_fetch_args(
 
 		/* Initialize some constants. */
 		LLVMValueRef four = LLVMConstInt(uint_bld->elem_type, 4, 0);
-		LLVMValueRef F = LLVMConstInt(uint_bld->elem_type, 0xF, 0);
 
 		/* Apply the formula. */
 		LLVMValueRef fmask =
@@ -1734,11 +1747,8 @@  static void tex_fetch_args(
 		LLVMValueRef sample_index4 =
 			LLVMBuildMul(gallivm->builder, address[sample_chan], four, "");
 
-		LLVMValueRef shifted_fmask =
-			LLVMBuildLShr(gallivm->builder, fmask, sample_index4, "");
-
-		LLVMValueRef final_sample =
-			LLVMBuildAnd(gallivm->builder, shifted_fmask, F, "");
+		LLVMValueRef final_sample = build_bfe(gallivm, fmask,
+						      sample_index4, four);
 
 		/* Don't rewrite the sample index if WORD1.DATA_FORMAT of the FMASK
 		 * resource descriptor is 0 (invalid),

Comments

On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
> From: Marek Olšák <marek.olsak@amd.com>
> 
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
> index b0417ed..f125483 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
>  }
>  
>  /**
> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
> + */

Ideally, we would just add a pattern for this in the backend and emit generic
LLVM IR here.  This would also make it possible to share the code with llvmpipe.

I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().

-Tom

> +static LLVMValueRef build_bfe(struct gallivm_state *gallivm,
> +			      LLVMValueRef value, LLVMValueRef rshift,
> +			      LLVMValueRef bitwidth)
> +{
> +	LLVMValueRef args[3] = {value, rshift, bitwidth};
> +
> +	return build_intrinsic(gallivm->builder, "llvm.AMDGPU.bfe.u32",
> +			       LLVMInt32TypeInContext(gallivm->context),
> +			       args, Elements(args), LLVMReadNoneAttribute);
> +}
> +
> +/**
>   * Build an LLVM bytecode indexed load using LLVMBuildGEP + LLVMBuildLoad.
>   * It's equivalent to doing a load from &base_ptr[index].
>   *
> @@ -1721,7 +1735,6 @@ static void tex_fetch_args(
>  
>  		/* Initialize some constants. */
>  		LLVMValueRef four = LLVMConstInt(uint_bld->elem_type, 4, 0);
> -		LLVMValueRef F = LLVMConstInt(uint_bld->elem_type, 0xF, 0);
>  
>  		/* Apply the formula. */
>  		LLVMValueRef fmask =
> @@ -1734,11 +1747,8 @@ static void tex_fetch_args(
>  		LLVMValueRef sample_index4 =
>  			LLVMBuildMul(gallivm->builder, address[sample_chan], four, "");
>  
> -		LLVMValueRef shifted_fmask =
> -			LLVMBuildLShr(gallivm->builder, fmask, sample_index4, "");
> -
> -		LLVMValueRef final_sample =
> -			LLVMBuildAnd(gallivm->builder, shifted_fmask, F, "");
> +		LLVMValueRef final_sample = build_bfe(gallivm, fmask,
> +						      sample_index4, four);
>  
>  		/* Don't rewrite the sample index if WORD1.DATA_FORMAT of the FMASK
>  		 * resource descriptor is 0 (invalid),
> -- 
> 2.1.0
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
> On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> ---
>>  src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
>>  1 file changed, 16 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>> index b0417ed..f125483 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
>>  }
>>
>>  /**
>> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
>> + */
>
> Ideally, we would just add a pattern for this in the backend and emit generic
> LLVM IR here.  This would also make it possible to share the code with llvmpipe.
>
> I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().

Why not SIInstructions.td?

Marek
On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote:
> On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
> > On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
> >> From: Marek Olšák <marek.olsak@amd.com>
> >>
> >> ---
> >>  src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
> >>  1 file changed, 16 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
> >> index b0417ed..f125483 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
> >>  }
> >>
> >>  /**
> >> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
> >> + */
> >
> > Ideally, we would just add a pattern for this in the backend and emit generic
> > LLVM IR here.  This would also make it possible to share the code with llvmpipe.
> >
> > I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
> 
> Why not SIInstructions.td?
> 

Because for patterns like this, I think it is important to match them as
early as possible, because there may be another optimization which reduces
the sequence from 5 to 4 instructions which would cause the pattern not to match.

-Tom

> Marek
> On Mar 2, 2015, at 1:19 PM, Tom Stellard <tom@stellard.net> wrote:
> 
> On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote:
>> On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
>>> On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>> 
>>>> ---
>>>> src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
>>>> 1 file changed, 16 insertions(+), 6 deletions(-)
>>>> 
>>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>>>> index b0417ed..f125483 100644
>>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>>> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
>>>> }
>>>> 
>>>> /**
>>>> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
>>>> + */
>>> 
>>> Ideally, we would just add a pattern for this in the backend and emit generic
>>> LLVM IR here.  This would also make it possible to share the code with llvmpipe.
>>> 
>>> I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
>> 
>> Why not SIInstructions.td?
>> 
> 
> Because for patterns like this, I think it is important to match them as
> early as possible, because there may be another optimization which reduces
> the sequence from 5 to 4 instructions which would cause the pattern not to match.
> 
> -Tom


I think the opposite in this case. The basic bit operations have a lot of existing combines on them, and the computeKnownBits implementations are more complete. The BFE nodes are not as well understood, and trickier to deal with. AArch64 and NVPTX both have essentially the same instruction, and they have a large bit of code to match them in their ISelDAGToDAGs. I’ve wanted to add a generic BFE node to be matched after legalization, but I haven’t had time to do it.

-Matt
On Mon, Mar 02, 2015 at 02:09:29PM -0800, Matt Arsenault wrote:
> 
> > On Mar 2, 2015, at 1:19 PM, Tom Stellard <tom@stellard.net> wrote:
> > 
> > On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote:
> >> On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
> >>> On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
> >>>> From: Marek Olšák <marek.olsak@amd.com>
> >>>> 
> >>>> ---
> >>>> src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
> >>>> 1 file changed, 16 insertions(+), 6 deletions(-)
> >>>> 
> >>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
> >>>> index b0417ed..f125483 100644
> >>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >>>> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
> >>>> }
> >>>> 
> >>>> /**
> >>>> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
> >>>> + */
> >>> 
> >>> Ideally, we would just add a pattern for this in the backend and emit generic
> >>> LLVM IR here.  This would also make it possible to share the code with llvmpipe.
> >>> 
> >>> I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
> >> 
> >> Why not SIInstructions.td?
> >> 
> > 
> > Because for patterns like this, I think it is important to match them as
> > early as possible, because there may be another optimization which reduces
> > the sequence from 5 to 4 instructions which would cause the pattern not to match.
> > 
> > -Tom
> 
> 
> I think the opposite in this case. The basic bit operations have a lot of existing combines on them, and the computeKnownBits implementations are more complete. The BFE nodes are not as well understood, and trickier to deal with. AArch64 and NVPTX both have essentially the same instruction, and they have a large bit of code to match them in their ISelDAGToDAGs. I’ve wanted to add a generic BFE node to be matched after legalization, but I haven’t had time to do it.
> 

Isn't there a potential for a DAG combine on bit operations, to 'break'
the pattern so it can't be recognized?

-Tom

> -Matt
Since you acked patch #3, which depends on this, I assume this has your Rb too?

Marek

On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
> On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> ---
>>  src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
>>  1 file changed, 16 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>> index b0417ed..f125483 100644
>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
>>  }
>>
>>  /**
>> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
>> + */
>
> Ideally, we would just add a pattern for this in the backend and emit generic
> LLVM IR here.  This would also make it possible to share the code with llvmpipe.
>
> I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
>
> -Tom
>
>> +static LLVMValueRef build_bfe(struct gallivm_state *gallivm,
>> +                           LLVMValueRef value, LLVMValueRef rshift,
>> +                           LLVMValueRef bitwidth)
>> +{
>> +     LLVMValueRef args[3] = {value, rshift, bitwidth};
>> +
>> +     return build_intrinsic(gallivm->builder, "llvm.AMDGPU.bfe.u32",
>> +                            LLVMInt32TypeInContext(gallivm->context),
>> +                            args, Elements(args), LLVMReadNoneAttribute);
>> +}
>> +
>> +/**
>>   * Build an LLVM bytecode indexed load using LLVMBuildGEP + LLVMBuildLoad.
>>   * It's equivalent to doing a load from &base_ptr[index].
>>   *
>> @@ -1721,7 +1735,6 @@ static void tex_fetch_args(
>>
>>               /* Initialize some constants. */
>>               LLVMValueRef four = LLVMConstInt(uint_bld->elem_type, 4, 0);
>> -             LLVMValueRef F = LLVMConstInt(uint_bld->elem_type, 0xF, 0);
>>
>>               /* Apply the formula. */
>>               LLVMValueRef fmask =
>> @@ -1734,11 +1747,8 @@ static void tex_fetch_args(
>>               LLVMValueRef sample_index4 =
>>                       LLVMBuildMul(gallivm->builder, address[sample_chan], four, "");
>>
>> -             LLVMValueRef shifted_fmask =
>> -                     LLVMBuildLShr(gallivm->builder, fmask, sample_index4, "");
>> -
>> -             LLVMValueRef final_sample =
>> -                     LLVMBuildAnd(gallivm->builder, shifted_fmask, F, "");
>> +             LLVMValueRef final_sample = build_bfe(gallivm, fmask,
>> +                                                   sample_index4, four);
>>
>>               /* Don't rewrite the sample index if WORD1.DATA_FORMAT of the FMASK
>>                * resource descriptor is 0 (invalid),
>> --
>> 2.1.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
On Thu, Mar 05, 2015 at 05:14:09PM +0100, Marek Olšák wrote:
> Since you acked patch #3, which depends on this, I assume this has your Rb too?
> 

No, I still want to resolved with Matt what to do about intrinsics vs
IR.

-Tom

> Marek
> 
> On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
> > On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
> >> From: Marek Olšák <marek.olsak@amd.com>
> >>
> >> ---
> >>  src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
> >>  1 file changed, 16 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
> >> index b0417ed..f125483 100644
> >> --- a/src/gallium/drivers/radeonsi/si_shader.c
> >> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> >> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
> >>  }
> >>
> >>  /**
> >> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
> >> + */
> >
> > Ideally, we would just add a pattern for this in the backend and emit generic
> > LLVM IR here.  This would also make it possible to share the code with llvmpipe.
> >
> > I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
> >
> > -Tom
> >
> >> +static LLVMValueRef build_bfe(struct gallivm_state *gallivm,
> >> +                           LLVMValueRef value, LLVMValueRef rshift,
> >> +                           LLVMValueRef bitwidth)
> >> +{
> >> +     LLVMValueRef args[3] = {value, rshift, bitwidth};
> >> +
> >> +     return build_intrinsic(gallivm->builder, "llvm.AMDGPU.bfe.u32",
> >> +                            LLVMInt32TypeInContext(gallivm->context),
> >> +                            args, Elements(args), LLVMReadNoneAttribute);
> >> +}
> >> +
> >> +/**
> >>   * Build an LLVM bytecode indexed load using LLVMBuildGEP + LLVMBuildLoad.
> >>   * It's equivalent to doing a load from &base_ptr[index].
> >>   *
> >> @@ -1721,7 +1735,6 @@ static void tex_fetch_args(
> >>
> >>               /* Initialize some constants. */
> >>               LLVMValueRef four = LLVMConstInt(uint_bld->elem_type, 4, 0);
> >> -             LLVMValueRef F = LLVMConstInt(uint_bld->elem_type, 0xF, 0);
> >>
> >>               /* Apply the formula. */
> >>               LLVMValueRef fmask =
> >> @@ -1734,11 +1747,8 @@ static void tex_fetch_args(
> >>               LLVMValueRef sample_index4 =
> >>                       LLVMBuildMul(gallivm->builder, address[sample_chan], four, "");
> >>
> >> -             LLVMValueRef shifted_fmask =
> >> -                     LLVMBuildLShr(gallivm->builder, fmask, sample_index4, "");
> >> -
> >> -             LLVMValueRef final_sample =
> >> -                     LLVMBuildAnd(gallivm->builder, shifted_fmask, F, "");
> >> +             LLVMValueRef final_sample = build_bfe(gallivm, fmask,
> >> +                                                   sample_index4, four);
> >>
> >>               /* Don't rewrite the sample index if WORD1.DATA_FORMAT of the FMASK
> >>                * resource descriptor is 0 (invalid),
> >> --
> >> 2.1.0
> >>
> >> _______________________________________________
> >> mesa-dev mailing list
> >> mesa-dev@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> On Mar 5, 2015, at 6:50 AM, Tom Stellard <tom@stellard.net> wrote:
> 
> On Mon, Mar 02, 2015 at 02:09:29PM -0800, Matt Arsenault wrote:
>> 
>>> On Mar 2, 2015, at 1:19 PM, Tom Stellard <tom@stellard.net> wrote:
>>> 
>>> On Mon, Mar 02, 2015 at 10:14:00PM +0100, Marek Olšák wrote:
>>>> On Mon, Mar 2, 2015 at 10:05 PM, Tom Stellard <tom@stellard.net> wrote:
>>>>> On Mon, Mar 02, 2015 at 12:54:16PM +0100, Marek Olšák wrote:
>>>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>>> 
>>>>>> ---
>>>>>> src/gallium/drivers/radeonsi/si_shader.c | 22 ++++++++++++++++------
>>>>>> 1 file changed, 16 insertions(+), 6 deletions(-)
>>>>>> 
>>>>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c
>>>>>> index b0417ed..f125483 100644
>>>>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>>>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>>>>> @@ -192,6 +192,20 @@ static int get_param_index(unsigned semantic_name, unsigned index,
>>>>>> }
>>>>>> 
>>>>>> /**
>>>>>> + * BitField Extract: ((value >> rshift) & ((1 << bitwidth) - 1))
>>>>>> + */
>>>>> 
>>>>> Ideally, we would just add a pattern for this in the backend and emit generic
>>>>> LLVM IR here.  This would also make it possible to share the code with llvmpipe.
>>>>> 
>>>>> I think the best place to do this would be in AMDGPUTargetLowering::performDAGCombine().
>>>> 
>>>> Why not SIInstructions.td?
>>>> 
>>> 
>>> Because for patterns like this, I think it is important to match them as
>>> early as possible, because there may be another optimization which reduces
>>> the sequence from 5 to 4 instructions which would cause the pattern not to match.
>>> 
>>> -Tom
>> 
>> 
>> I think the opposite in this case. The basic bit operations have a lot of existing combines on them, and the computeKnownBits implementations are more complete. The BFE nodes are not as well understood, and trickier to deal with. AArch64 and NVPTX both have essentially the same instruction, and they have a large bit of code to match them in their ISelDAGToDAGs. I’ve wanted to add a generic BFE node to be matched after legalization, but I haven’t had time to do it.
>> 
> 
> Isn't there a potential for a DAG combine on bit operations, to 'break'
> the pattern so it can't be recognized?
> 
> -Tom
> 

Yes and no. There isn’t really only one pattern for this. Ideally the different ∑ays to use this would each form a single canonical form the pattern would need to match, but that isn’t always possible. The AArch64 code for example tries many different patterns (see isBitfieldExtractOp in AArch64ISelDAGToDAG)