[1/9,repost] armv7: Coalesce scalar accesses where possible

Submitted by Ben Avison on April 11, 2016, 12:26 p.m.

Details

Message ID 1460377590-23285-2-git-send-email-bavison@riscosopen.org
State New
Headers show
Series "Changes to existing ARMv7 routines" ( rev: 1 ) in Pixman

Not browsing as part of any series.

Commit Message

Ben Avison April 11, 2016, 12:26 p.m.
Where the alignment of a block of elements is known to equal the size of the
block, but the block is smaller than 8 bytes, it is safe to use a larger
element size in a scalar VLD or VST without risking an alignment exception.
Typically the effect of this can be seen when accessing leading or trailing
halfwords or words in the destination buffer for long scanlines.

Sadly, the effect of this is too small to be measured, but it seems like a
good idea anyway.

Signed-off-by: Ben Avison <bavison@riscosopen.org>
---
 pixman/pixman-arm-neon-asm.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Patch hide | download patch | download mbox

diff --git a/pixman/pixman-arm-neon-asm.h b/pixman/pixman-arm-neon-asm.h
index bdcf6a9..76b3985 100644
--- a/pixman/pixman-arm-neon-asm.h
+++ b/pixman/pixman-arm-neon-asm.h
@@ -183,6 +183,10 @@ 
     pixldst30 vst3, 8, %(basereg+0), %(basereg+1), %(basereg+2), 3, mem_operand
 .elseif (bpp == 24) && (numpix == 1)
     pixldst30 vst3, 8, %(basereg+0), %(basereg+1), %(basereg+2), 1, mem_operand
+.elseif numpix * bpp == 32 && abits == 32
+    pixldst 4, vst1, 32, basereg, mem_operand, abits
+.elseif numpix * bpp == 16 && abits == 16
+    pixldst 2, vst1, 16, basereg, mem_operand, abits
 .else
     pixldst %(numpix * bpp / 8), vst1, %(bpp), basereg, mem_operand, abits
 .endif