nir, i965/fs: Lower indirect local variables to scratch

Submitted by Jason Ekstrand on Dec. 5, 2016, 7:59 p.m.


Reviewer None
Submitted Dec. 5, 2016, 7:59 p.m.
Last Updated Dec. 5, 2016, 8 p.m.
Revision 1

Cover Letter(s)

Revision 1
      This little series implements lowering of indirectly accessed local
variables larger than some threshold (8 floats?) to scratch space.  This
improves the performance of the CSDof synmark test by about 45% because it
uses a large temporary array which we lower to if-ladders and then to piles
of scratch.

The approach I've taken here is to add a new set of NIR intrinsics for
reading and writing scratch.  It's treated like any other form of IO with a
new nir_lower_vars_to_scratch pass that lowers everything over a given size
threshold to scratch space.  Why do this in NIR?  The primary reason is
that this lets us lower to scratch *before* we do nir_lower_indirect_derefs
so we can still use registers for small indirects where an if-ladder is
more efficient than scratch space.  Also, after gaving it a try, I really
liked how those intrinsics turned out.

This series is marked RFC because it's still a bit sketchy at the moment.
There are a few things that would need to be finished before it's ready for

 1) I should probably run it through piglit.
 2) The back-end portion doesn't yet handle doubles
 3) We should use send-from-GRF for non-spill direct scratch reads/writes.
    Right now, it's still using MRFs which isn't great.

If people like where this series is going, I can probably find some time to
polish it to the point of mergeable.

Jason Ekstrand (6):
  nir: Add load/store_scratch intrinsics
  nir: Add a pass for selectively lowering variables to scratch space
  i965/fs: Add a CHANNEL_IDS opcode
  i965/fs: Add DWord scattered read/write opcodes
  i965/fs: Implement the new nir_scratch_load/store opcodes
  i965: Lower large local arrays to scratch

Timothy Arceri (1):
  i965: use nir_lower_indirect_derefs() for GLSL

 src/compiler/Makefile.sources                     |   1 +
 src/compiler/nir/nir.h                            |   8 +-
 src/compiler/nir/nir_clone.c                      |   1 +
 src/compiler/nir/nir_intrinsics.h                 |   6 +-
 src/compiler/nir/nir_lower_scratch.c              | 258 ++++++++++++++++++++++
 src/intel/vulkan/anv_pipeline.c                   |  10 -
 src/mesa/drivers/dri/i965/brw_defines.h           |  10 +
 src/mesa/drivers/dri/i965/brw_fs.cpp              | 113 ++++++++++
 src/mesa/drivers/dri/i965/brw_fs.h                |   6 +
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp          |   1 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp    | 170 ++++++++++++++
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp          |  42 +++-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |   4 +-
 src/mesa/drivers/dri/i965/brw_link.cpp            |  13 --
 src/mesa/drivers/dri/i965/brw_nir.c               |  13 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp          |  12 +
 16 files changed, 631 insertions(+), 37 deletions(-)
 create mode 100644 src/compiler/nir/nir_lower_scratch.c