[3/3,RFC] Revert "drm/i915: use variadic macros and arrays to choose port/pipe based registers"

Submitted by Arnd Bergmann on March 20, 2017, 9:40 a.m.

Details

Message ID 20170320094335.1266306-3-arnd@arndb.de
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in DRI devel

Not browsing as part of any series.

Commit Message

Arnd Bergmann March 20, 2017, 9:40 a.m.
The varargs macro trick in _PIPE3/_PHY3/_PORT3 was meant as an optimization
to shrink the i915 kernel module by around 1000 bytes. However, the
downside is a size regression with CONFIG_KASAN, as I found from stack size
warnings with gcc-7.0.1:

before:
drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_get_hw_state':
drivers/gpu/drm/i915/intel_dpll_mgr.c:1644:1: error: the frame size of 176 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_enable':
drivers/gpu/drm/i915/intel_dpll_mgr.c:1548:1: error: the frame size of 224 bytes is larger than 100 bytes [-Werror=frame-larger-than=]

after:
drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_get_hw_state':
drivers/gpu/drm/i915/intel_dpll_mgr.c:1644:1: error: the frame size of 1016 bytes is larger than 1000 bytes [-Werror=frame-larger-than=]
drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_enable':
drivers/gpu/drm/i915/intel_dpll_mgr.c:1548:1: error: the frame size of 1960 bytes is larger than 1000 bytes [-Werror=frame-larger-than=]

I also checked the module sizes and got

before:
   text	   data	    bss	    dec	    hex	filename
2704592	 412025	  11104	3127721	 2fb9a9	drivers/gpu/drm/i915/i915.o

after:
   text	   data	    bss	    dec	    hex	filename
2718538	 412025	  11104	3141667	 2ff023	drivers/gpu/drm/i915/i915.o

showing a significant code size growth. This reverts the patch to avoid
the warning and get back to the better code with CONFIG_KASAN. If someone
has another idea to avoid the warning while keeping the more efficient
code for the non-KASAN case, that would obviously be better.

Fixes: ce64645d86ac ("drm/i915: use variadic macros and arrays to choose port/pipe based registers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/i915/i915_reg.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Patch hide | download patch | download mbox

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 04c8f69fcc62..aa732eccdeb5 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -48,8 +48,6 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 	return !i915_mmio_reg_equal(reg, INVALID_MMIO_REG);
 }
 
-#define _PICK(__index, ...) (((const u32 []){ __VA_ARGS__ })[__index])
-
 #define _PIPE(pipe, a, b) ((a) + (pipe)*((b)-(a)))
 #define _MMIO_PIPE(pipe, a, b) _MMIO(_PIPE(pipe, a, b))
 #define _PLANE(plane, a, b) _PIPE(plane, a, b)
@@ -58,11 +56,14 @@  static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define _MMIO_TRANS(tran, a, b) _MMIO(_TRANS(tran, a, b))
 #define _PORT(port, a, b) ((a) + (port)*((b)-(a)))
 #define _MMIO_PORT(port, a, b) _MMIO(_PORT(port, a, b))
-#define _PIPE3(pipe, ...) _PICK(pipe, __VA_ARGS__)
+#define _PIPE3(pipe, a, b, c) ((pipe) == PIPE_A ? (a) : \
+			       (pipe) == PIPE_B ? (b) : (c))
 #define _MMIO_PIPE3(pipe, a, b, c) _MMIO(_PIPE3(pipe, a, b, c))
-#define _PORT3(port, ...) _PICK(port, __VA_ARGS__)
+#define _PORT3(port, a, b, c) ((port) == PORT_A ? (a) : \
+			       (port) == PORT_B ? (b) : (c))
 #define _MMIO_PORT3(pipe, a, b, c) _MMIO(_PORT3(pipe, a, b, c))
-#define _PHY3(phy, ...) _PICK(phy, __VA_ARGS__)
+#define _PHY3(phy, a, b, c) ((phy) == DPIO_PHY0 ? (a) : \
+			     (phy) == DPIO_PHY1 ? (b) : (c))
 #define _MMIO_PHY3(phy, a, b, c) _MMIO(_PHY3(phy, a, b, c))
 
 #define _MASKED_FIELD(mask, value) ({					   \

Comments

On Mon, 20 Mar 2017, Arnd Bergmann <arnd@arndb.de> wrote:
> The varargs macro trick in _PIPE3/_PHY3/_PORT3 was meant as an optimization
> to shrink the i915 kernel module by around 1000 bytes.

To be clear, it was not at all intended to be an optimization, nothing
of the sort. The intention was to make it easier and less error prone to
add more parameters to said macros. The text size shring was just a
bonus.

> However, the
> downside is a size regression with CONFIG_KASAN, as I found from stack size
> warnings with gcc-7.0.1:

In his review of the original change, Chris provided this comparison
https://godbolt.org/g/YCK1od

How does CONFIG_KASAN change this? Would be nice to see how the
generated code blows up.


BR,
Jani.


>
> before:
> drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_get_hw_state':
> drivers/gpu/drm/i915/intel_dpll_mgr.c:1644:1: error: the frame size of 176 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
> drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_enable':
> drivers/gpu/drm/i915/intel_dpll_mgr.c:1548:1: error: the frame size of 224 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
>
> after:
> drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_get_hw_state':
> drivers/gpu/drm/i915/intel_dpll_mgr.c:1644:1: error: the frame size of 1016 bytes is larger than 1000 bytes [-Werror=frame-larger-than=]
> drivers/gpu/drm/i915/intel_dpll_mgr.c: In function 'bxt_ddi_pll_enable':
> drivers/gpu/drm/i915/intel_dpll_mgr.c:1548:1: error: the frame size of 1960 bytes is larger than 1000 bytes [-Werror=frame-larger-than=]
>
> I also checked the module sizes and got
>
> before:
>    text	   data	    bss	    dec	    hex	filename
> 2704592	 412025	  11104	3127721	 2fb9a9	drivers/gpu/drm/i915/i915.o
>
> after:
>    text	   data	    bss	    dec	    hex	filename
> 2718538	 412025	  11104	3141667	 2ff023	drivers/gpu/drm/i915/i915.o
>
> showing a significant code size growth. This reverts the patch to avoid
> the warning and get back to the better code with CONFIG_KASAN. If someone
> has another idea to avoid the warning while keeping the more efficient
> code for the non-KASAN case, that would obviously be better.
>
> Fixes: ce64645d86ac ("drm/i915: use variadic macros and arrays to choose port/pipe based registers")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/gpu/drm/i915/i915_reg.h | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 04c8f69fcc62..aa732eccdeb5 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -48,8 +48,6 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  	return !i915_mmio_reg_equal(reg, INVALID_MMIO_REG);
>  }
>  
> -#define _PICK(__index, ...) (((const u32 []){ __VA_ARGS__ })[__index])
> -
>  #define _PIPE(pipe, a, b) ((a) + (pipe)*((b)-(a)))
>  #define _MMIO_PIPE(pipe, a, b) _MMIO(_PIPE(pipe, a, b))
>  #define _PLANE(plane, a, b) _PIPE(plane, a, b)
> @@ -58,11 +56,14 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  #define _MMIO_TRANS(tran, a, b) _MMIO(_TRANS(tran, a, b))
>  #define _PORT(port, a, b) ((a) + (port)*((b)-(a)))
>  #define _MMIO_PORT(port, a, b) _MMIO(_PORT(port, a, b))
> -#define _PIPE3(pipe, ...) _PICK(pipe, __VA_ARGS__)
> +#define _PIPE3(pipe, a, b, c) ((pipe) == PIPE_A ? (a) : \
> +			       (pipe) == PIPE_B ? (b) : (c))
>  #define _MMIO_PIPE3(pipe, a, b, c) _MMIO(_PIPE3(pipe, a, b, c))
> -#define _PORT3(port, ...) _PICK(port, __VA_ARGS__)
> +#define _PORT3(port, a, b, c) ((port) == PORT_A ? (a) : \
> +			       (port) == PORT_B ? (b) : (c))
>  #define _MMIO_PORT3(pipe, a, b, c) _MMIO(_PORT3(pipe, a, b, c))
> -#define _PHY3(phy, ...) _PICK(phy, __VA_ARGS__)
> +#define _PHY3(phy, a, b, c) ((phy) == DPIO_PHY0 ? (a) : \
> +			     (phy) == DPIO_PHY1 ? (b) : (c))
>  #define _MMIO_PHY3(phy, a, b, c) _MMIO(_PHY3(phy, a, b, c))
>  
>  #define _MASKED_FIELD(mask, value) ({					   \
On Mon, Mar 20, 2017 at 11:08 AM, Jani Nikula
<jani.nikula@linux.intel.com> wrote:
> On Mon, 20 Mar 2017, Arnd Bergmann <arnd@arndb.de> wrote:
>> The varargs macro trick in _PIPE3/_PHY3/_PORT3 was meant as an optimization
>> to shrink the i915 kernel module by around 1000 bytes.
>
> To be clear, it was not at all intended to be an optimization, nothing
> of the sort. The intention was to make it easier and less error prone to
> add more parameters to said macros. The text size shring was just a
> bonus.
>
>> However, the
>> downside is a size regression with CONFIG_KASAN, as I found from stack size
>> warnings with gcc-7.0.1:
>
> In his review of the original change, Chris provided this comparison
> https://godbolt.org/g/YCK1od
>
> How does CONFIG_KASAN change this? Would be nice to see how the
> generated code blows up.
>
I don't know how to generate a URL for it, but after adding this to the
command line for gcc-7,

-fsanitize=kernel-address -fasan-shadow-offset=0xdfff900000000000
--param asan-stack=1 --param asan-globals=1 --param
asan-instrumentation-with-call-threshold=10000
-fsanitize-address-use-after-scope

the code turned from really nice into the log series of checks below.
Without  -fsanitize-address-use-after-scope (which didn't exist before gcc-7),
it's less bad but still exceeds the (arbitrary) 1536 byte limit.

     Arnd

.LC0:
        .string "2 32 4 1 i 96 24 9 <unknown> "
main:
.LASANPC0:
        pushq   %r12
        pushq   %rbp
        movabsq $-2305966154516004864, %rdx
        pushq   %rbx
        subq    $160, %rsp
        movq    %rsp, %rbp
        leaq    96(%rsp), %rbx
        movq    $1102416563, (%rsp)
        shrq    $3, %rbp
        movq    $.LC0, 8(%rsp)
        movq    $.LASANPC0, 16(%rsp)
        leaq    0(%rbp,%rdx), %rax
        movl    $-235802127, (%rax)
        movl    $-218959356, 4(%rax)
        movl    $-218959118, 8(%rax)
        movl    $-234881024, 12(%rax)
        movl    $-202116109, 16(%rax)
        movq    %rbx, %rax
        movl    $1, 32(%rsp)
        shrq    $3, %rax
        movzbl  (%rax,%rdx), %eax
        testb   %al, %al
        je      .L2
        cmpb    $3, %al
        jle     .L53
.L2:
        leaq    4(%rbx), %rdi
        movabsq $-2305966154516004864, %rax
        movl    $0, 96(%rsp)
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %edx
        movq    %rdi, %rax
        andl    $7, %eax
        addl    $3, %eax
        cmpb    %dl, %al
        jl      .L3
        testb   %dl, %dl
        jne     .L54
.L3:
        leaq    8(%rbx), %rdi
        movabsq $-2305966154516004864, %rax
        movl    $1, 100(%rsp)
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %eax
        testb   %al, %al
        je      .L4
        cmpb    $3, %al
        jle     .L55
.L4:
        leaq    12(%rbx), %rdi
        movabsq $-2305966154516004864, %rax
        movl    $2, 104(%rsp)
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %edx
        movq    %rdi, %rax
        andl    $7, %eax
        addl    $3, %eax
        cmpb    %dl, %al
        jl      .L5
        testb   %dl, %dl
        jne     .L56
.L5:
        leaq    16(%rbx), %rdi
        movabsq $-2305966154516004864, %rax
        movl    $3, 108(%rsp)
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %eax
        testb   %al, %al
        je      .L6
        cmpb    $3, %al
        jle     .L57
.L6:
        leaq    20(%rbx), %rdi
        movabsq $-2305966154516004864, %rax
        movl    $4, 112(%rsp)
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %edx
        movq    %rdi, %rax
        andl    $7, %eax
        addl    $3, %eax
        cmpb    %dl, %al
        jl      .L7
        testb   %dl, %dl
        jne     .L58
.L7:
        movslq  32(%rsp), %r12
        movabsq $-2305966154516004864, %rax
        movl    $5, 116(%rsp)
        leaq    (%rbx,%r12,4), %rdi
        movq    %rdi, %rdx
        shrq    $3, %rdx
        movzbl  (%rdx,%rax), %edx
        movq    %rdi, %rax
        andl    $7, %eax
        addl    $3, %eax
        cmpb    %dl, %al
        jl      .L8
        testb   %dl, %dl
        jne     .L59
.L8:
        pxor    %xmm0, %xmm0
        movabsq $-2305966154516004864, %rdx
        movl    96(%rsp,%r12,4), %eax
        movl    $0, 16(%rdx,%rbp)
        movups  %xmm0, 0(%rbp,%rdx)
        addq    $160, %rsp
        popq    %rbx
        popq    %rbp
        popq    %r12
        ret
.L59:
        call    __asan_report_load4_noabort
        jmp     .L8
.L58:
        call    __asan_report_store4_noabort
        jmp     .L7
.L57:
        call    __asan_report_store4_noabort
        jmp     .L6
.L56:
        call    __asan_report_store4_noabort
        jmp     .L5
.L55:
        call    __asan_report_store4_noabort
        jmp     .L4
.L54:
        call    __asan_report_store4_noabort
        jmp     .L3
.L53:
        movq    %rbx, %rdi
        call    __asan_report_store4_noabort
        jmp     .L2
On Mon, 20 Mar 2017, Arnd Bergmann <arnd@arndb.de> wrote:
> I don't know how to generate a URL for it, but after adding this to the
> command line for gcc-7,
>
> -fsanitize=kernel-address -fasan-shadow-offset=0xdfff900000000000
> --param asan-stack=1 --param asan-globals=1 --param
> asan-instrumentation-with-call-threshold=10000
> -fsanitize-address-use-after-scope
>
> the code turned from really nice into the log series of checks below.
> Without  -fsanitize-address-use-after-scope (which didn't exist before gcc-7),
> it's less bad but still exceeds the (arbitrary) 1536 byte limit.

It seems to be the combination of --param asan-stack=1 and
-fsanitize-address-use-after-scope that really blows up the code [1]. I
filed a GCC bug on it, mostly to see what they say [2]. I don't know,
maybe they think it's expected. *shrug*.


BR,
Jani.

[1] https://godbolt.org/g/hgS817
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80114