panfrost: Support batch pipelining

Submitted by Boris Brezillon on Sept. 16, 2019, 9:36 a.m.

Details

Reviewer None
Submitted Sept. 16, 2019, 9:36 a.m.
Last Updated Sept. 18, 2019, 1:25 p.m.
Revision 2

Cover Letter(s)

Revision 1
      Hello,

This is the second attempt at supporting batch pipelining. This time I
implemented it using a dependency graph (as suggested by Alyssa and
Steven) so that batch submission can be delayed even more: the only
time we flush batches now is when we have an explicit flush or when
the CPU needs to access a BO (we might want to tweak that a bit to
avoid the extra latency incurred by this solution). With that in place
we hope to increase GPU utilization.

A few words about the patches in this series:

* Like the previous version, this series is a mix of cleanups and
  functional changes. Most of them should be pretty trivial to review
  and I intend to merge them independently once they have receive
  proper review (to avoid having to send another patch bomb like this
  one).

* The "rework BO API" batch has been split to ease review

* Patches 35 and 36 are not mandatory, but I remember reading (I think
  it was Steven who mentioned that) that draw order matters when
  queueing render operations for different frames (frame N should
  ideally be ready before frame N+1). Not sure if enforcing draw call
  order is enough to guarantee that rendering of frame N always
  finishes before frame N+1 though.

Regards,

Boris

Boris Brezillon (37):
  panfrost: Stop exposing internal panfrost_*_batch() functions
  panfrost: Use the correct type for the bo_handle array
  panfrost: Add missing panfrost_batch_add_bo() calls
  panfrost: Add polygon_list to the batch BO set at allocation time
  panfrost: Kill a useless memset(0) in panfrost_create_context()
  panfrost: Stop passing has_draws to panfrost_drm_submit_vs_fs_batch()
  panfrost: Get rid of pan_drm.c
  panfrost: Move panfrost_bo_{reference,unreference}() to pan_bo.c
  panfrost: s/PAN_ALLOCATE_/PAN_BO_/
  panfrost: Move the BO API to its own header
  panfrost: Stop exposing panfrost_bo_cache_{fetch,put}()
  panfrost: Don't check if BO is mmaped before calling
    panfrost_bo_mmap()
  panfrost: Stop passing screen around for BO operations
  panfrost: Stop using panfrost_bo_release() outside of pan_bo.c
  panfrost: Add panfrost_bo_{alloc,free}()
  panfrost: Don't return imported/exported BOs to the cache
  panfrost: Make sure the BO is 'ready' when picked from the cache
  panfrost: Add flags to reflect the BO imported/exported state
  panfrost: Add the panfrost_batch_create_bo() helper
  panfrost: Add FBO BOs to batch->bos earlier
  panfrost: Allocate tiler and scratchpad BOs per-batch
  panfrost: Extend the panfrost_batch_add_bo() API to pass access flags
  panfrost: Make panfrost_batch->bos a hash table
  panfrost: Cache GPU accesses to BOs
  panfrost: Add a batch fence
  panfrost: Use the per-batch fences to wait on the last submitted batch
  panfrost: Add a panfrost_freeze_batch() helper
  panfrost: Start tracking inter-batch dependencies
  panfrost: Prepare panfrost_fence for batch pipelining
  panfrost: Add a panfrost_flush_all_batches() helper
  panfrost: Add a panfrost_flush_batches_accessing_bo() helper
  panfrost: Kill the explicit serialization in panfrost_batch_submit()
  panfrost: Get rid of the flush in panfrost_set_framebuffer_state()
  panfrost: Do fine-grained flushing when preparing BO for CPU accesses
  panfrost: Rename ctx->batches into ctx->fbo_to_batch
  panfrost: Take draw call order into account
  panfrost/ci: New tests are passing

 .../drivers/panfrost/ci/expected-failures.txt |   4 -
 src/gallium/drivers/panfrost/meson.build      |   1 -
 src/gallium/drivers/panfrost/pan_allocate.c   |  22 +-
 src/gallium/drivers/panfrost/pan_allocate.h   |  20 -
 src/gallium/drivers/panfrost/pan_assemble.c   |   3 +-
 src/gallium/drivers/panfrost/pan_blend_cso.c  |  13 +-
 src/gallium/drivers/panfrost/pan_bo.c         | 331 +++++++-
 src/gallium/drivers/panfrost/pan_bo.h         | 130 +++
 src/gallium/drivers/panfrost/pan_compute.c    |   2 +-
 src/gallium/drivers/panfrost/pan_context.c    | 175 ++--
 src/gallium/drivers/panfrost/pan_context.h    |  22 +-
 src/gallium/drivers/panfrost/pan_drm.c        | 394 ---------
 src/gallium/drivers/panfrost/pan_fragment.c   |   3 -
 src/gallium/drivers/panfrost/pan_instancing.c |   6 +-
 src/gallium/drivers/panfrost/pan_job.c        | 760 ++++++++++++++++--
 src/gallium/drivers/panfrost/pan_job.h        |  85 +-
 src/gallium/drivers/panfrost/pan_mfbd.c       |   1 +
 src/gallium/drivers/panfrost/pan_resource.c   |  65 +-
 src/gallium/drivers/panfrost/pan_resource.h   |   6 -
 src/gallium/drivers/panfrost/pan_screen.c     |  91 ++-
 src/gallium/drivers/panfrost/pan_screen.h     |  62 +-
 src/gallium/drivers/panfrost/pan_sfbd.c       |   1 +
 src/gallium/drivers/panfrost/pan_varyings.c   |   6 +-
 23 files changed, 1456 insertions(+), 747 deletions(-)
 create mode 100644 src/gallium/drivers/panfrost/pan_bo.h
 delete mode 100644 src/gallium/drivers/panfrost/pan_drm.c
    
Revision 2
      Hello,

This is the third attempt at supporting batch pipelining. This time I
implemented it using a dependency graph (as suggested by Alyssa and
Steven) so that batch submission can be delayed even more: the only
time we flush batches now is when we have an explicit flush or when
the CPU needs to access a BO (we might want to tweak that a bit to
avoid the extra latency incurred by this solution). With that in place
we hope to increase GPU utilization.

Patches 15 and 16 are optional, but I remember reading (I think it was
Steven who mentioned that) that draw order matters when queueing render
operations for different frames (frame N should ideally be ready before
frame N+1). Not sure if enforcing draw call order is enough to guarantee
that rendering of frame N always finishes before frame N+1 though.
If that's something you don't want to merge, I can drop it.

Regards,

Boris

Boris Brezillon (17):
  panfrost: Extend the panfrost_batch_add_bo() API to pass access flags
  panfrost: Make panfrost_batch->bos a hash table
  panfrost: Add a batch fence
  panfrost: Use the per-batch fences to wait on the last submitted batch
  panfrost: Add a panfrost_freeze_batch() helper
  panfrost: Start tracking inter-batch dependencies
  panfrost: Prepare panfrost_fence for batch pipelining
  panfrost: Add a panfrost_flush_all_batches() helper
  panfrost: Add a panfrost_flush_batches_accessing_bo() helper
  panfrost: Kill the explicit serialization in panfrost_batch_submit()
  panfrost: Get rid of the flush in panfrost_set_framebuffer_state()
  panfrost: Add flags to reflect the BO imported/exported state
  panfrost: Make sure the BO is 'ready' when picked from the cache
  panfrost: Do fine-grained flushing when preparing BO for CPU accesses
  panfrost: Rename ctx->batches into ctx->fbo_to_batch
  panfrost: Take draw call order into account
  panfrost/ci: New tests are passing

 .../drivers/panfrost/ci/expected-failures.txt |   4 -
 src/gallium/drivers/panfrost/pan_allocate.c   |  14 +-
 src/gallium/drivers/panfrost/pan_blend_cso.c  |   6 +-
 src/gallium/drivers/panfrost/pan_bo.c         | 116 ++-
 src/gallium/drivers/panfrost/pan_bo.h         |  44 ++
 src/gallium/drivers/panfrost/pan_compute.c    |   2 +-
 src/gallium/drivers/panfrost/pan_context.c    | 121 ++--
 src/gallium/drivers/panfrost/pan_context.h    |  15 +-
 src/gallium/drivers/panfrost/pan_instancing.c |   5 +-
 src/gallium/drivers/panfrost/pan_job.c        | 668 ++++++++++++++++--
 src/gallium/drivers/panfrost/pan_job.h        |  58 +-
 src/gallium/drivers/panfrost/pan_resource.c   |  27 +-
 src/gallium/drivers/panfrost/pan_screen.c     |  65 +-
 src/gallium/drivers/panfrost/pan_screen.h     |   3 +-
 src/gallium/drivers/panfrost/pan_varyings.c   |  10 +-
 15 files changed, 956 insertions(+), 202 deletions(-)
    

Revisions

Patches download mbox

# Name Submitter State A F R T
[v2,01/37] panfrost: Stop exposing internal panfrost_*_batch() functions Boris Brezillon New 1
[v2,02/37] panfrost: Use the correct type for the bo_handle array Boris Brezillon Accepted 1
[v2,03/37] panfrost: Add missing panfrost_batch_add_bo() calls Boris Brezillon Accepted 1
[v2,04/37] panfrost: Add polygon_list to the batch BO set at allocation time Boris Brezillon Accepted 1
[v2,05/37] panfrost: Kill a useless memset(0) in panfrost_create_context() Boris Brezillon Accepted
[v2,06/37] panfrost: Stop passing has_draws to panfrost_drm_submit_vs_fs_batch() Boris Brezillon Accepted
[v2,07/37] panfrost: Get rid of pan_drm.c Boris Brezillon New
[v2,08/37] panfrost: Move panfrost_bo_{reference, unreference}() to pan_bo.c Boris Brezillon Accepted
[v2,09/37] panfrost: s/PAN_ALLOCATE_/PAN_BO_/ Boris Brezillon Accepted
[v2,10/37] panfrost: Move the BO API to its own header Boris Brezillon Accepted
[v2,11/37] panfrost: Stop exposing panfrost_bo_cache_{fetch, put}() Boris Brezillon Accepted
[v2,12/37] panfrost: Don't check if BO is mmaped before calling panfrost_bo_mmap() Boris Brezillon Accepted
[v2,13/37] panfrost: Stop passing screen around for BO operations Boris Brezillon Accepted
[v2,14/37] panfrost: Stop using panfrost_bo_release() outside of pan_bo.c Boris Brezillon Accepted
[v2,15/37] panfrost: Add panfrost_bo_{alloc, free}() Boris Brezillon New
[v2,16/37] panfrost: Don't return imported/exported BOs to the cache Boris Brezillon Accepted
[v2,17/37] panfrost: Make sure the BO is 'ready' when picked from the cache Boris Brezillon New
[v2,18/37] panfrost: Add flags to reflect the BO imported/exported state Boris Brezillon New
[v2,19/37] panfrost: Add the panfrost_batch_create_bo() helper Boris Brezillon Accepted
[v2,20/37] panfrost: Add FBO BOs to batch->bos earlier Boris Brezillon Accepted
[v2,21/37] panfrost: Allocate tiler and scratchpad BOs per-batch Boris Brezillon New
[v2,22/37] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags Boris Brezillon New
[v2,23/37] panfrost: Make panfrost_batch->bos a hash table Boris Brezillon Accepted
[v2,24/37] panfrost: Cache GPU accesses to BOs Boris Brezillon New
[v2,25/37] panfrost: Add a batch fence Boris Brezillon New
[v2,26/37] panfrost: Use the per-batch fences to wait on the last submitted batch Boris Brezillon New
[v2,27/37] panfrost: Add a panfrost_freeze_batch() helper Boris Brezillon New
[v2,28/37] panfrost: Start tracking inter-batch dependencies Boris Brezillon New
[v2,29/37] panfrost: Prepare panfrost_fence for batch pipelining Boris Brezillon New
[v2,30/37] panfrost: Add a panfrost_flush_all_batches() helper Boris Brezillon New
[v2,31/37] panfrost: Add a panfrost_flush_batches_accessing_bo() helper Boris Brezillon New
[v2,32/37] panfrost: Kill the explicit serialization in panfrost_batch_submit() Boris Brezillon New
[v2,33/37] panfrost: Get rid of the flush in panfrost_set_framebuffer_state() Boris Brezillon Accepted
[v2,34/37] panfrost: Do fine-grained flushing when preparing BO for CPU accesses Boris Brezillon New
[v2,35/37] panfrost: Rename ctx->batches into ctx->fbo_to_batch Boris Brezillon New
[v2,36/37] panfrost: Take draw call order into account Boris Brezillon New
[v2,37/37] panfrost/ci: New tests are passing Boris Brezillon New

Patches download mbox

# Name Submitter State A F R T
[v3,01/17] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags Boris Brezillon New
[v3,02/17] panfrost: Make panfrost_batch->bos a hash table Boris Brezillon New
[v3,03/17] panfrost: Add a batch fence Boris Brezillon Accepted
[v3,04/17] panfrost: Use the per-batch fences to wait on the last submitted batch Boris Brezillon Accepted
[v3,05/17] panfrost: Add a panfrost_freeze_batch() helper Boris Brezillon New 1
[v3,06/17] panfrost: Start tracking inter-batch dependencies Boris Brezillon New
[v3,07/17] panfrost: Prepare panfrost_fence for batch pipelining Boris Brezillon New 1
[v3,08/17] panfrost: Add a panfrost_flush_all_batches() helper Boris Brezillon Accepted 1
[v3,09/17] panfrost: Add a panfrost_flush_batches_accessing_bo() helper Boris Brezillon New 1
[v3,10/17] panfrost: Kill the explicit serialization in panfrost_batch_submit() Boris Brezillon Accepted 1
[v3,11/17] panfrost: Get rid of the flush in panfrost_set_framebuffer_state() Boris Brezillon New 1
[v3,12/17] panfrost: Add flags to reflect the BO imported/exported state Boris Brezillon Accepted 1
[v3,13/17] panfrost: Make sure the BO is 'ready' when picked from the cache Boris Brezillon New
[v3,14/17] panfrost: Do fine-grained flushing when preparing BO for CPU accesses Boris Brezillon New 1
[v3,15/17] panfrost: Rename ctx->batches into ctx->fbo_to_batch Boris Brezillon New
[v3,16/17] panfrost: Take draw call order into account Boris Brezillon New
[v3,17/17] panfrost/ci: New tests are passing Boris Brezillon New 1