[Mesa-dev,09/32] i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst.

Submitted by Francisco Jerez on Feb. 6, 2015, 2:42 p.m.

Details

Message ID 1423233792-11767-9-git-send-email-currojerez@riseup.net
State New
Headers show

Not browsing as part of any series.

Commit Message

Francisco Jerez Feb. 6, 2015, 2:42 p.m.
Scalar registers are required to have zero stride, fix the
regs_written calculation not to assume that the instruction writes
zero registers in that case.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 0a82fb7..bafe438 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -126,7 +126,8 @@  fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
    case HW_REG:
    case MRF:
    case ATTR:
-      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
+      this->regs_written =
+         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
       break;
    case BAD_FILE:
       this->regs_written = 0;

Comments

On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
> Scalar registers are required to have zero stride, fix the
> regs_written calculation not to assume that the instruction writes
> zero registers in that case.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 0a82fb7..bafe438 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
>     case HW_REG:
>     case MRF:
>     case ATTR:
> -      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
> +      this->regs_written =
> +         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);

I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
subregister writes to be writing a register.")

I don't really care which gets committed, but I did have to think a
lot more about yours (and look up what CEILING did) to be sure it did
the same thing. Maybe you like mine better?
Matt Turner <mattst88@gmail.com> writes:

> On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>> Scalar registers are required to have zero stride, fix the
>> regs_written calculation not to assume that the instruction writes
>> zero registers in that case.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index 0a82fb7..bafe438 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
>>     case HW_REG:
>>     case MRF:
>>     case ATTR:
>> -      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
>> +      this->regs_written =
>> +         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
>
> I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
> subregister writes to be writing a register.")
>
> I don't really care which gets committed, but I did have to think a
> lot more about yours (and look up what CEILING did) to be sure it did
> the same thing. Maybe you like mine better?

I find the helper function easier to understand and less error-prone
than the open-coded version.  I admit though that the "easier to
understand" part may not be the case for someone that is not familiar
with CEILING(), as its meaning is far from obvious from its name.  Maybe
we could come up with a more descriptive name?
On Saturday, February 07, 2015 02:10:19 AM Francisco Jerez wrote:
> Matt Turner <mattst88@gmail.com> writes:
> 
> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
> >> Scalar registers are required to have zero stride, fix the
> >> regs_written calculation not to assume that the instruction writes
> >> zero registers in that case.
> >> ---
> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> index 0a82fb7..bafe438 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
> >>     case HW_REG:
> >>     case MRF:
> >>     case ATTR:
> >> -      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
> >> +      this->regs_written =
> >> +         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
> >
> > I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
> > subregister writes to be writing a register.")
> >
> > I don't really care which gets committed, but I did have to think a
> > lot more about yours (and look up what CEILING did) to be sure it did
> > the same thing. Maybe you like mine better?
> 
> I find the helper function easier to understand and less error-prone
> than the open-coded version.  I admit though that the "easier to
> understand" part may not be the case for someone that is not familiar
> with CEILING(), as its meaning is far from obvious from its name.  Maybe
> we could come up with a more descriptive name?

Wow, I don't like the name of that macro at all.  I would expect it to
do the well known mathematical function, ceil().  It doesn't.

In most places in the driver, we do:

   ALIGN(value, 32) / 32;

which would be my preference.  I definitely dislike the old open coded
(val + 31) / 32 business.

Beyond the style...Matt put MAX2 around the whole expression, and you
put it around the inside.  Those might differ - is one more correct than
the other?
Kenneth Graunke <kenneth@whitecape.org> writes:

> On Saturday, February 07, 2015 02:10:19 AM Francisco Jerez wrote:
>> Matt Turner <mattst88@gmail.com> writes:
>> 
>> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>> >> Scalar registers are required to have zero stride, fix the
>> >> regs_written calculation not to assume that the instruction writes
>> >> zero registers in that case.
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>> >>  1 file changed, 2 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> index 0a82fb7..bafe438 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> >> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
>> >>     case HW_REG:
>> >>     case MRF:
>> >>     case ATTR:
>> >> -      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
>> >> +      this->regs_written =
>> >> +         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
>> >
>> > I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
>> > subregister writes to be writing a register.")
>> >
>> > I don't really care which gets committed, but I did have to think a
>> > lot more about yours (and look up what CEILING did) to be sure it did
>> > the same thing. Maybe you like mine better?
>> 
>> I find the helper function easier to understand and less error-prone
>> than the open-coded version.  I admit though that the "easier to
>> understand" part may not be the case for someone that is not familiar
>> with CEILING(), as its meaning is far from obvious from its name.  Maybe
>> we could come up with a more descriptive name?
>
> Wow, I don't like the name of that macro at all.  I would expect it to
> do the well known mathematical function, ceil().  It doesn't.
>
It does, well, sort of...  I'm not sure who came up with that macro, but
it seems that what the author had in mind was indeed taking the ceiling
of the fraction given by p/q for CEILING(p, q).

> In most places in the driver, we do:
>
>    ALIGN(value, 32) / 32;
>
> which would be my preference.  I definitely dislike the old open coded
> (val + 31) / 32 business.
>
That only works for powers of two, and it still forces you to duplicate
the denominator -- in this specific case it's okay but it starts to get
annoying when the denominator is longer than two keystrokes.  How about
we rename CEILING() to DIV_ROUND_UP() as the kernel calls it?

> Beyond the style...Matt put MAX2 around the whole expression, and you
> put it around the inside.  Those might differ - is one more correct than
> the other?

It can only make a difference for type_sz(x) == 0 or for type_sz(x)
larger than the unit regs_written is expressed in.  So right now they
are effectively equivalent but they wouldn't be if at some point we
wanted to keep track of sub-register (e.g. scalar) writes, if we ever do
that we'll have to take the max of width * stride only.
On Sun, Feb 8, 2015 at 4:52 AM, Francisco Jerez <currojerez@riseup.net> wrote:
> Kenneth Graunke <kenneth@whitecape.org> writes:
>
>> On Saturday, February 07, 2015 02:10:19 AM Francisco Jerez wrote:
>>> Matt Turner <mattst88@gmail.com> writes:
>>>
>>> > On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez <currojerez@riseup.net> wrote:
>>> >> Scalar registers are required to have zero stride, fix the
>>> >> regs_written calculation not to assume that the instruction writes
>>> >> zero registers in that case.
>>> >> ---
>>> >>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>>> >>  1 file changed, 2 insertions(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> >> index 0a82fb7..bafe438 100644
>>> >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>>> >> @@ -126,7 +126,8 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg &dst,
>>> >>     case HW_REG:
>>> >>     case MRF:
>>> >>     case ATTR:
>>> >> -      this->regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
>>> >> +      this->regs_written =
>>> >> +         CEILING(MAX2(dst.width * dst.stride, 1) * type_sz(dst.type), 32);
>>> >
>>> > I sent basically the same patch ("[PATCH 06/10] i965/fs: Consider
>>> > subregister writes to be writing a register.")
>>> >
>>> > I don't really care which gets committed, but I did have to think a
>>> > lot more about yours (and look up what CEILING did) to be sure it did
>>> > the same thing. Maybe you like mine better?
>>>
>>> I find the helper function easier to understand and less error-prone
>>> than the open-coded version.  I admit though that the "easier to
>>> understand" part may not be the case for someone that is not familiar
>>> with CEILING(), as its meaning is far from obvious from its name.  Maybe
>>> we could come up with a more descriptive name?
>>
>> Wow, I don't like the name of that macro at all.  I would expect it to
>> do the well known mathematical function, ceil().  It doesn't.
>>
> It does, well, sort of...  I'm not sure who came up with that macro, but
> it seems that what the author had in mind was indeed taking the ceiling
> of the fraction given by p/q for CEILING(p, q).
>
>> In most places in the driver, we do:
>>
>>    ALIGN(value, 32) / 32;
>>
>> which would be my preference.  I definitely dislike the old open coded
>> (val + 31) / 32 business.
>>
> That only works for powers of two, and it still forces you to duplicate
> the denominator -- in this specific case it's okay but it starts to get
> annoying when the denominator is longer than two keystrokes.  How about
> we rename CEILING() to DIV_ROUND_UP() as the kernel calls it?

I like that suggestion.
On Sunday, February 08, 2015 02:52:40 PM Francisco Jerez wrote:
> Kenneth Graunke <kenneth@whitecape.org> writes:
[snip]
> > Wow, I don't like the name of that macro at all.  I would expect it to
> > do the well known mathematical function, ceil().  It doesn't.
> >
> It does, well, sort of...  I'm not sure who came up with that macro, but
> it seems that what the author had in mind was indeed taking the ceiling
> of the fraction given by p/q for CEILING(p, q).
> 
> > In most places in the driver, we do:
> >
> >    ALIGN(value, 32) / 32;
> >
> > which would be my preference.  I definitely dislike the old open coded
> > (val + 31) / 32 business.
> >
> That only works for powers of two, and it still forces you to duplicate
> the denominator -- in this specific case it's okay but it starts to get
> annoying when the denominator is longer than two keystrokes.  How about
> we rename CEILING() to DIV_ROUND_UP() as the kernel calls it?

Good point.  I like that suggestion.

> 
> > Beyond the style...Matt put MAX2 around the whole expression, and you
> > put it around the inside.  Those might differ - is one more correct than
> > the other?
> 
> It can only make a difference for type_sz(x) == 0 or for type_sz(x)
> larger than the unit regs_written is expressed in.  So right now they
> are effectively equivalent but they wouldn't be if at some point we
> wanted to keep track of sub-register (e.g. scalar) writes, if we ever do
> that we'll have to take the max of width * stride only.

Let's rename CEILING to DIV_ROUND_UP and go with your patch.

With that change,
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>