[v2,22/37] panfrost: Extend the panfrost_batch_add_bo() API to pass access flags

Submitted by Boris Brezillon on Sept. 16, 2019, 9:37 a.m.

Details

Message ID 20190916093715.32203-23-boris.brezillon@collabora.com
State New
Headers show
Series "panfrost: Support batch pipelining" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Boris Brezillon Sept. 16, 2019, 9:37 a.m.
The type of access being done on a BO has impacts on job scheduling
(shared resources being written enforce serialization while those
being read only allow for job parallelization) and BO lifetime (the
fragment job might last longer than the vertex/tiler ones, if we can,
it's good to release BOs earlier so that others can re-use them
through the BO re-use cache).

Let's pass extra access flags to panfrost_batch_add_bo() and
panfrost_batch_create_bo() so the batch submission logic can take the
appropriate when submitting batches. Note that this information is not
used yet, we're just patching callers to pass the correct flags here.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 src/gallium/drivers/panfrost/pan_allocate.c   | 14 +++++-
 src/gallium/drivers/panfrost/pan_blend_cso.c  |  6 ++-
 src/gallium/drivers/panfrost/pan_bo.h         | 18 ++++++++
 src/gallium/drivers/panfrost/pan_context.c    | 43 +++++++++++++++----
 src/gallium/drivers/panfrost/pan_instancing.c |  5 ++-
 src/gallium/drivers/panfrost/pan_job.c        | 39 +++++++++++++----
 src/gallium/drivers/panfrost/pan_job.h        |  5 ++-
 src/gallium/drivers/panfrost/pan_varyings.c   |  5 ++-
 8 files changed, 111 insertions(+), 24 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/drivers/panfrost/pan_allocate.c b/src/gallium/drivers/panfrost/pan_allocate.c
index 3076c23ab1cc..1ca812c1fbaa 100644
--- a/src/gallium/drivers/panfrost/pan_allocate.c
+++ b/src/gallium/drivers/panfrost/pan_allocate.c
@@ -63,8 +63,18 @@  panfrost_allocate_transient(struct panfrost_batch *batch, size_t sz)
                 size_t bo_sz = sz < TRANSIENT_SLAB_SIZE ?
                                TRANSIENT_SLAB_SIZE : ALIGN_POT(sz, 4096);
 
-                /* We can't reuse the current BO, but we can create a new one. */
-                bo = panfrost_batch_create_bo(batch, bo_sz, 0);
+                /* We can't reuse the current BO, but we can create a new one.
+                 * We don't know what the BO will be used for, so let's flag it
+                 * RW and attach it to both the fragment and vertex/tiler jobs.
+                 * TODO: if we want fine grained BO assignment we should pass
+                 * flags to this function and keep the read/write,
+                 * fragment/vertex+tiler pools separate.
+                 */
+                bo = panfrost_batch_create_bo(batch, bo_sz, 0,
+                                              PAN_BO_GPU_ACCESS_PRIVATE |
+                                              PAN_BO_GPU_ACCESS_RW |
+                                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                              PAN_BO_GPU_ACCESS_FRAGMENT);
 
                 if (sz < TRANSIENT_SLAB_SIZE) {
                         batch->transient_bo = bo;
diff --git a/src/gallium/drivers/panfrost/pan_blend_cso.c b/src/gallium/drivers/panfrost/pan_blend_cso.c
index 6bd6ff71cdc7..f592522a4d26 100644
--- a/src/gallium/drivers/panfrost/pan_blend_cso.c
+++ b/src/gallium/drivers/panfrost/pan_blend_cso.c
@@ -273,7 +273,11 @@  panfrost_get_blend_for_context(struct panfrost_context *ctx, unsigned rti)
 
         /* Upload the shader */
         final.shader.bo = panfrost_batch_create_bo(batch, shader->size,
-                                                   PAN_BO_EXECUTE);
+                                                   PAN_BO_EXECUTE,
+                                                   PAN_BO_GPU_ACCESS_PRIVATE |
+                                                   PAN_BO_GPU_ACCESS_READ |
+                                                   PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                                   PAN_BO_GPU_ACCESS_FRAGMENT);
         memcpy(final.shader.bo->cpu, shader->buffer, shader->size);
 
         if (shader->patch_index) {
diff --git a/src/gallium/drivers/panfrost/pan_bo.h b/src/gallium/drivers/panfrost/pan_bo.h
index e141a60fc407..a4b4d05b96e9 100644
--- a/src/gallium/drivers/panfrost/pan_bo.h
+++ b/src/gallium/drivers/panfrost/pan_bo.h
@@ -62,6 +62,24 @@  struct panfrost_screen;
 /* BO has been exported */
 #define PAN_BO_EXPORTED           (1 << 7)
 
+/* GPU access flags */
+
+/* BO is either shared (can be accessed by more than one GPU batch) or private
+ * (reserved by a specific GPU job). */
+#define PAN_BO_GPU_ACCESS_PRIVATE         (0 << 0)
+#define PAN_BO_GPU_ACCESS_SHARED          (1 << 0)
+
+/* BO is being read/written by the GPU */
+#define PAN_BO_GPU_ACCESS_READ            (1 << 1)
+#define PAN_BO_GPU_ACCESS_WRITE           (1 << 2)
+#define PAN_BO_GPU_ACCESS_RW              (PAN_BO_GPU_ACCESS_READ | PAN_BO_GPU_ACCESS_WRITE)
+
+/* BO is accessed by the vertex/tiler job. */
+#define PAN_BO_GPU_ACCESS_VERTEX_TILER    (1 << 3)
+
+/* BO is accessed by the fragment job. */
+#define PAN_BO_GPU_ACCESS_FRAGMENT        (1 << 4)
+
 struct panfrost_bo {
         /* Must be first for casting */
         struct list_head link;
diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c
index 6cb6cdd1a686..a76caecef0e3 100644
--- a/src/gallium/drivers/panfrost/pan_context.c
+++ b/src/gallium/drivers/panfrost/pan_context.c
@@ -583,6 +583,7 @@  panfrost_layout_for_texture(struct panfrost_resource *rsrc)
 static mali_ptr
 panfrost_upload_tex(
         struct panfrost_context *ctx,
+        enum pipe_shader_type st,
         struct panfrost_sampler_view *view)
 {
         if (!view)
@@ -610,7 +611,11 @@  panfrost_upload_tex(
 
         /* Add the BO to the job so it's retained until the job is done. */
         struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
-        panfrost_batch_add_bo(batch, rsrc->bo);
+        panfrost_batch_add_bo(batch, rsrc->bo,
+                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_READ |
+                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                              (st == PIPE_SHADER_FRAGMENT ?
+                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));
 
         /* Add the usage flags in, since they can change across the CSO
          * lifetime due to layout switches */
@@ -653,7 +658,7 @@  panfrost_upload_texture_descriptors(struct panfrost_context *ctx)
 
                         for (int i = 0; i < ctx->sampler_view_count[t]; ++i)
                                 trampolines[i] =
-                                        panfrost_upload_tex(ctx, ctx->sampler_views[t][i]);
+                                        panfrost_upload_tex(ctx, t, ctx->sampler_views[t][i]);
 
                         trampoline = panfrost_upload_transient(batch, trampolines, sizeof(uint64_t) * ctx->sampler_view_count[t]);
                 }
@@ -729,7 +734,11 @@  static void panfrost_upload_ssbo_sysval(
         struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
         struct panfrost_bo *bo = pan_resource(sb.buffer)->bo;
 
-        panfrost_batch_add_bo(batch, bo);
+        panfrost_batch_add_bo(batch, bo,
+                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_RW |
+                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                              (st == PIPE_SHADER_FRAGMENT ?
+                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));
 
         /* Upload address and size as sysval */
         uniform->du[0] = bo->gpu + sb.buffer_offset;
@@ -795,6 +804,7 @@  panfrost_map_constant_buffer_cpu(struct panfrost_constant_buffer *buf, unsigned
 static mali_ptr
 panfrost_map_constant_buffer_gpu(
         struct panfrost_context *ctx,
+        enum pipe_shader_type st,
         struct panfrost_constant_buffer *buf,
         unsigned index)
 {
@@ -803,7 +813,12 @@  panfrost_map_constant_buffer_gpu(
         struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
 
         if (rsrc) {
-                panfrost_batch_add_bo(batch, rsrc->bo);
+                panfrost_batch_add_bo(batch, rsrc->bo,
+                                      PAN_BO_GPU_ACCESS_SHARED |
+                                      PAN_BO_GPU_ACCESS_READ |
+                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                      (st == PIPE_SHADER_FRAGMENT ?
+                                       PAN_BO_GPU_ACCESS_FRAGMENT : 0));
                 return rsrc->bo->gpu;
 	} else if (cb->user_buffer) {
                 return panfrost_upload_transient(batch, cb->user_buffer, cb->buffer_size);
@@ -936,7 +951,11 @@  panfrost_emit_for_draw(struct panfrost_context *ctx, bool with_vertex_data)
 
                 panfrost_patch_shader_state(ctx, variant, PIPE_SHADER_FRAGMENT, false);
 
-                panfrost_batch_add_bo(batch, variant->bo);
+                panfrost_batch_add_bo(batch, variant->bo,
+                                      PAN_BO_GPU_ACCESS_PRIVATE |
+                                      PAN_BO_GPU_ACCESS_READ |
+                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                      PAN_BO_GPU_ACCESS_FRAGMENT);
 
 #define COPY(name) ctx->fragment_shader_core.name = variant->tripipe->name
 
@@ -1121,7 +1140,11 @@  panfrost_emit_for_draw(struct panfrost_context *ctx, bool with_vertex_data)
 
                 struct panfrost_shader_state *ss = &all->variants[all->active_variant];
 
-                panfrost_batch_add_bo(batch, ss->bo);
+                panfrost_batch_add_bo(batch, ss->bo,
+                                      PAN_BO_GPU_ACCESS_PRIVATE |
+                                      PAN_BO_GPU_ACCESS_READ |
+                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                      PAN_BO_GPU_ACCESS_FRAGMENT);
 
                 /* Uniforms are implicitly UBO #0 */
                 bool has_uniforms = buf->enabled_mask & (1 << 0);
@@ -1176,7 +1199,7 @@  panfrost_emit_for_draw(struct panfrost_context *ctx, bool with_vertex_data)
                                 continue;
                         }
 
-                        mali_ptr gpu = panfrost_map_constant_buffer_gpu(ctx, buf, ubo);
+                        mali_ptr gpu = panfrost_map_constant_buffer_gpu(ctx, i, buf, ubo);
 
                         unsigned bytes_per_field = 16;
                         unsigned aligned = ALIGN_POT(usz, bytes_per_field);
@@ -1398,7 +1421,11 @@  panfrost_get_index_buffer_mapped(struct panfrost_context *ctx, const struct pipe
 
         if (!info->has_user_indices) {
                 /* Only resources can be directly mapped */
-                panfrost_batch_add_bo(batch, rsrc->bo);
+                panfrost_batch_add_bo(batch, rsrc->bo,
+                                      PAN_BO_GPU_ACCESS_SHARED |
+                                      PAN_BO_GPU_ACCESS_READ |
+                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                      PAN_BO_GPU_ACCESS_FRAGMENT);
                 return rsrc->bo->gpu + offset;
         } else {
                 /* Otherwise, we need to upload to transient memory */
diff --git a/src/gallium/drivers/panfrost/pan_instancing.c b/src/gallium/drivers/panfrost/pan_instancing.c
index 806823f0da0d..a8211abbe014 100644
--- a/src/gallium/drivers/panfrost/pan_instancing.c
+++ b/src/gallium/drivers/panfrost/pan_instancing.c
@@ -298,7 +298,10 @@  panfrost_emit_vertex_data(struct panfrost_batch *batch)
                 unsigned chopped_addr = raw_addr - addr;
 
                 /* Add a dependency of the batch on the vertex buffer */
-                panfrost_batch_add_bo(batch, rsrc->bo);
+                panfrost_batch_add_bo(batch, rsrc->bo,
+                                      PAN_BO_GPU_ACCESS_SHARED |
+                                      PAN_BO_GPU_ACCESS_READ |
+                                      PAN_BO_GPU_ACCESS_VERTEX_TILER);
 
                 /* Set common fields */
                 attrs[k].elements = addr;
diff --git a/src/gallium/drivers/panfrost/pan_job.c b/src/gallium/drivers/panfrost/pan_job.c
index 69047ca62a6c..6332529b2f9b 100644
--- a/src/gallium/drivers/panfrost/pan_job.c
+++ b/src/gallium/drivers/panfrost/pan_job.c
@@ -132,7 +132,8 @@  panfrost_get_batch_for_fbo(struct panfrost_context *ctx)
 }
 
 void
-panfrost_batch_add_bo(struct panfrost_batch *batch, struct panfrost_bo *bo)
+panfrost_batch_add_bo(struct panfrost_batch *batch, struct panfrost_bo *bo,
+                      uint32_t flags)
 {
         if (!bo)
                 return;
@@ -146,26 +147,30 @@  panfrost_batch_add_bo(struct panfrost_batch *batch, struct panfrost_bo *bo)
 
 void panfrost_batch_add_fbo_bos(struct panfrost_batch *batch)
 {
+        uint32_t flags = PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_WRITE |
+                         PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                         PAN_BO_GPU_ACCESS_FRAGMENT;
+
         for (unsigned i = 0; i < batch->key.nr_cbufs; ++i) {
                 struct panfrost_resource *rsrc = pan_resource(batch->key.cbufs[i]->texture);
-                panfrost_batch_add_bo(batch, rsrc->bo);
+                panfrost_batch_add_bo(batch, rsrc->bo, flags);
         }
 
         if (batch->key.zsbuf) {
                 struct panfrost_resource *rsrc = pan_resource(batch->key.zsbuf->texture);
-                panfrost_batch_add_bo(batch, rsrc->bo);
+                panfrost_batch_add_bo(batch, rsrc->bo, flags);
         }
 }
 
 struct panfrost_bo *
 panfrost_batch_create_bo(struct panfrost_batch *batch, size_t size,
-                         uint32_t create_flags)
+                         uint32_t create_flags, uint32_t access_flags)
 {
         struct panfrost_bo *bo;
 
         bo = panfrost_bo_create(pan_screen(batch->ctx->base.screen), size,
                                 create_flags);
-        panfrost_batch_add_bo(batch, bo);
+        panfrost_batch_add_bo(batch, bo, access_flags);
 
         /* panfrost_batch_add_bo() has retained a reference and
          * panfrost_bo_create() initialize the refcnt to 1, so let's
@@ -189,7 +194,11 @@  panfrost_batch_get_polygon_list(struct panfrost_batch *batch, unsigned size)
                 /* Create the BO as invisible, as there's no reason to map */
 
                 batch->polygon_list = panfrost_batch_create_bo(batch, size,
-				                               PAN_BO_INVISIBLE);
+                                                               PAN_BO_INVISIBLE,
+                                                               PAN_BO_GPU_ACCESS_PRIVATE |
+                                                               PAN_BO_GPU_ACCESS_RW |
+                                                               PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                                               PAN_BO_GPU_ACCESS_FRAGMENT);
         }
 
         return batch->polygon_list->gpu;
@@ -202,7 +211,11 @@  panfrost_batch_get_scratchpad(struct panfrost_batch *batch)
                 return batch->scratchpad;
 
         batch->scratchpad = panfrost_batch_create_bo(batch, 64 * 4 * 4096,
-                                                     PAN_BO_INVISIBLE);
+                                                     PAN_BO_INVISIBLE,
+                                                     PAN_BO_GPU_ACCESS_PRIVATE |
+                                                     PAN_BO_GPU_ACCESS_RW |
+                                                     PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                                     PAN_BO_GPU_ACCESS_FRAGMENT);
         assert(batch->scratchpad);
         return batch->scratchpad;
 }
@@ -215,7 +228,11 @@  panfrost_batch_get_tiler_heap(struct panfrost_batch *batch)
 
         batch->tiler_heap = panfrost_batch_create_bo(batch, 4096 * 4096,
                                                      PAN_BO_INVISIBLE |
-                                                     PAN_BO_GROWABLE);
+                                                     PAN_BO_GROWABLE,
+                                                     PAN_BO_GPU_ACCESS_PRIVATE |
+                                                     PAN_BO_GPU_ACCESS_RW |
+                                                     PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                                     PAN_BO_GPU_ACCESS_FRAGMENT);
         assert(batch->tiler_heap);
         return batch->tiler_heap;
 }
@@ -227,7 +244,11 @@  panfrost_batch_get_tiler_dummy(struct panfrost_batch *batch)
                 return batch->tiler_dummy;
 
         batch->tiler_dummy = panfrost_batch_create_bo(batch, 4096,
-                                                      PAN_BO_INVISIBLE);
+                                                      PAN_BO_INVISIBLE,
+                                                      PAN_BO_GPU_ACCESS_PRIVATE |
+                                                      PAN_BO_GPU_ACCESS_RW |
+                                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
+                                                      PAN_BO_GPU_ACCESS_FRAGMENT);
         assert(batch->tiler_dummy);
         return batch->tiler_dummy;
 }
diff --git a/src/gallium/drivers/panfrost/pan_job.h b/src/gallium/drivers/panfrost/pan_job.h
index e5e0e323f2fd..0b37a3131e86 100644
--- a/src/gallium/drivers/panfrost/pan_job.h
+++ b/src/gallium/drivers/panfrost/pan_job.h
@@ -131,13 +131,14 @@  void
 panfrost_batch_init(struct panfrost_context *ctx);
 
 void
-panfrost_batch_add_bo(struct panfrost_batch *batch, struct panfrost_bo *bo);
+panfrost_batch_add_bo(struct panfrost_batch *batch, struct panfrost_bo *bo,
+                      uint32_t flags);
 
 void panfrost_batch_add_fbo_bos(struct panfrost_batch *batch);
 
 struct panfrost_bo *
 panfrost_batch_create_bo(struct panfrost_batch *batch, size_t size,
-                         uint32_t create_flags);
+                         uint32_t create_flags, uint32_t access_flags);
 
 void
 panfrost_batch_submit(struct panfrost_batch *batch);
diff --git a/src/gallium/drivers/panfrost/pan_varyings.c b/src/gallium/drivers/panfrost/pan_varyings.c
index 9dbd1e2ebacf..14f52d3a4889 100644
--- a/src/gallium/drivers/panfrost/pan_varyings.c
+++ b/src/gallium/drivers/panfrost/pan_varyings.c
@@ -69,7 +69,10 @@  panfrost_emit_streamout(
         /* Grab the BO and bind it to the batch */
         struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
         struct panfrost_bo *bo = pan_resource(target->buffer)->bo;
-        panfrost_batch_add_bo(batch, bo);
+        panfrost_batch_add_bo(batch, bo,
+                              PAN_BO_GPU_ACCESS_SHARED |
+                              PAN_BO_GPU_ACCESS_WRITE |
+                              PAN_BO_GPU_ACCESS_VERTEX_TILER);
 
         mali_ptr addr = bo->gpu + target->buffer_offset + (offset * slot->stride);
         slot->elements = addr;

Comments

PAN_BO_GPU_ACCESS_* is rather wordy. We're a GPU driver, of course it's
GPU access :)

Could we just do PAN_BO_ACCESS_* instead?

>  static mali_ptr
>  panfrost_upload_tex(
>          struct panfrost_context *ctx,
> +        enum pipe_shader_type st,
>          struct panfrost_sampler_view *view)
>  {
>          if (!view)
> @@ -610,7 +611,11 @@ panfrost_upload_tex(
>  
>          /* Add the BO to the job so it's retained until the job is done. */
>          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> -        panfrost_batch_add_bo(batch, rsrc->bo);
> +        panfrost_batch_add_bo(batch, rsrc->bo,
> +                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_READ |
> +                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
> +                              (st == PIPE_SHADER_FRAGMENT ?
> +                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));

I'm not sure this is quite right... should it maybe be:

(st == PIPE_SHADER_FRAGMENT ? PAN_BO_ACCESS_FRAGMENT :
PAN_BO_ACCESS_VERTEX_TILER)

I.e., if it's accessed from the fragment shader, is it necessarily
needed for the vertex/tiler part?

> -        panfrost_batch_add_bo(batch, bo);
> +        panfrost_batch_add_bo(batch, bo,
> +                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_RW |
> +                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
> +                              (st == PIPE_SHADER_FRAGMENT ?
> +                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));

Ditto. We should maybe have a `pan_bo_access_for_stage(enum
pipe_shader_type)` to abstract this logic.

> @@ -803,7 +813,12 @@ panfrost_map_constant_buffer_gpu(
>          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
>  
>          if (rsrc) {
> -                panfrost_batch_add_bo(batch, rsrc->bo);
> +                panfrost_batch_add_bo(batch, rsrc->bo,
> +                                      PAN_BO_GPU_ACCESS_SHARED |
> +                                      PAN_BO_GPU_ACCESS_READ |
> +                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
> +                                      (st == PIPE_SHADER_FRAGMENT ?
> +                                       PAN_BO_GPU_ACCESS_FRAGMENT : 0));

Ditto.

>          if (!info->has_user_indices) {
>                  /* Only resources can be directly mapped */
> -                panfrost_batch_add_bo(batch, rsrc->bo);
> +                panfrost_batch_add_bo(batch, rsrc->bo,
> +                                      PAN_BO_GPU_ACCESS_FRAGMENT);
>                  return rsrc->bo->gpu + offset;

The index buffer is to determine geometry, so it is definitely accessed
from the vertex/tiler chain.

I'm not sure if it's also accessed by the fragment chain. Also, should
this have ACCESS_SHARED | ACCESS_READ to be consistent with the others?

> @@ -69,7 +69,10 @@ panfrost_emit_streamout(
>          /* Grab the BO and bind it to the batch */
>          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
>          struct panfrost_bo *bo = pan_resource(target->buffer)->bo;
> -        panfrost_batch_add_bo(batch, bo);
> +        panfrost_batch_add_bo(batch, bo,
> +                              PAN_BO_GPU_ACCESS_SHARED |
> +                              PAN_BO_GPU_ACCESS_WRITE |
> +                              PAN_BO_GPU_ACCESS_VERTEX_TILER);

We operate somewhat like:

[ Vertices ] -- vertex shader --> [ Varyings ] -- tiler --> [ Geometry ]

So varyings are WRITE from the perspective of the VERTEX but READ from
the perspective of the TILER and FRAGMENT.

Now, this is for streamout. However, streamout does not imply rasterize
discard. Hence, it is legal to have streamout and also render that
geometry with a FRAGMENT job. So it's premature to drop the READ and
FRAGMENT flags (this will presumably regress a bunch of dEQP-GLES3 tests
for streamout).
On Mon, 16 Sep 2019 09:59:15 -0400
Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

> PAN_BO_GPU_ACCESS_* is rather wordy. We're a GPU driver, of course it's
> GPU access :)

Well, the driver can also do CPU accesses to the same BOs :P.

> 
> Could we just do PAN_BO_ACCESS_* instead?

I guess that's fine as long as it's documented.

> 
> >  static mali_ptr
> >  panfrost_upload_tex(
> >          struct panfrost_context *ctx,
> > +        enum pipe_shader_type st,
> >          struct panfrost_sampler_view *view)
> >  {
> >          if (!view)
> > @@ -610,7 +611,11 @@ panfrost_upload_tex(
> >  
> >          /* Add the BO to the job so it's retained until the job is done. */
> >          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> > -        panfrost_batch_add_bo(batch, rsrc->bo);
> > +        panfrost_batch_add_bo(batch, rsrc->bo,
> > +                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_READ |
> > +                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
> > +                              (st == PIPE_SHADER_FRAGMENT ?
> > +                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));  
> 
> I'm not sure this is quite right... should it maybe be:
> 
> (st == PIPE_SHADER_FRAGMENT ? PAN_BO_ACCESS_FRAGMENT :
> PAN_BO_ACCESS_VERTEX_TILER)

That's a good question. I wasn't sure so I decided to put the
vertex/tiler unconditionally.

> 
> I.e., if it's accessed from the fragment shader, is it necessarily
> needed for the vertex/tiler part?
> 
> > -        panfrost_batch_add_bo(batch, bo);
> > +        panfrost_batch_add_bo(batch, bo,
> > +                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_RW |
> > +                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
> > +                              (st == PIPE_SHADER_FRAGMENT ?
> > +                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));  
> 
> Ditto. We should maybe have a `pan_bo_access_for_stage(enum
> pipe_shader_type)` to abstract this logic.

Good idea.

> 
> > @@ -803,7 +813,12 @@ panfrost_map_constant_buffer_gpu(
> >          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> >  
> >          if (rsrc) {
> > -                panfrost_batch_add_bo(batch, rsrc->bo);
> > +                panfrost_batch_add_bo(batch, rsrc->bo,
> > +                                      PAN_BO_GPU_ACCESS_SHARED |
> > +                                      PAN_BO_GPU_ACCESS_READ |
> > +                                      PAN_BO_GPU_ACCESS_VERTEX_TILER |
> > +                                      (st == PIPE_SHADER_FRAGMENT ?
> > +                                       PAN_BO_GPU_ACCESS_FRAGMENT : 0));  
> 
> Ditto.
> 
> >          if (!info->has_user_indices) {
> >                  /* Only resources can be directly mapped */
> > -                panfrost_batch_add_bo(batch, rsrc->bo);
> > +                panfrost_batch_add_bo(batch, rsrc->bo,
> > +                                      PAN_BO_GPU_ACCESS_FRAGMENT);
> >                  return rsrc->bo->gpu + offset;  
> 
> The index buffer is to determine geometry, so it is definitely accessed
> from the vertex/tiler chain.

Oops, that's a mistake. I meant PAN_BO_GPU_ACCESS_VERTEX_TILER here.

> 
> I'm not sure if it's also accessed by the fragment chain. Also, should
> this have ACCESS_SHARED | ACCESS_READ to be consistent with the others?

It should definitely have ACCESS_SHARED | ACCESS_READ.

> 
> > @@ -69,7 +69,10 @@ panfrost_emit_streamout(
> >          /* Grab the BO and bind it to the batch */
> >          struct panfrost_batch *batch = panfrost_get_batch_for_fbo(ctx);
> >          struct panfrost_bo *bo = pan_resource(target->buffer)->bo;
> > -        panfrost_batch_add_bo(batch, bo);
> > +        panfrost_batch_add_bo(batch, bo,
> > +                              PAN_BO_GPU_ACCESS_SHARED |
> > +                              PAN_BO_GPU_ACCESS_WRITE |
> > +                              PAN_BO_GPU_ACCESS_VERTEX_TILER);  
> 
> We operate somewhat like:
> 
> [ Vertices ] -- vertex shader --> [ Varyings ] -- tiler --> [ Geometry ]
> 
> So varyings are WRITE from the perspective of the VERTEX but READ from
> the perspective of the TILER and FRAGMENT.
> 
> Now, this is for streamout. However, streamout does not imply rasterize
> discard. Hence, it is legal to have streamout and also render that
> geometry with a FRAGMENT job. So it's premature to drop the READ and
> FRAGMENT flags (this will presumably regress a bunch of dEQP-GLES3 tests
> for streamout).

Okay, will add PAN_BO_GPU_ACCESS_FRAGMENT and turn
PAN_BO_GPU_ACCESS_WRITE into PAN_BO_GPU_ACCESS_RW.

Thanks for the review.

Boris
> > PAN_BO_GPU_ACCESS_* is rather wordy. We're a GPU driver, of course it's
> > GPU access :)
> 
> Well, the driver can also do CPU accesses to the same BOs :P.

Yes, but we won't be marking them off this way ever, no?

> > > +        panfrost_batch_add_bo(batch, rsrc->bo,
> > > +                              PAN_BO_GPU_ACCESS_SHARED | PAN_BO_GPU_ACCESS_READ |
> > > +                              PAN_BO_GPU_ACCESS_VERTEX_TILER |
> > > +                              (st == PIPE_SHADER_FRAGMENT ?
> > > +                               PAN_BO_GPU_ACCESS_FRAGMENT : 0));  
> > 
> > I'm not sure this is quite right... should it maybe be:
> > 
> > (st == PIPE_SHADER_FRAGMENT ? PAN_BO_ACCESS_FRAGMENT :
> > PAN_BO_ACCESS_VERTEX_TILER)
> 
> That's a good question. I wasn't sure so I decided to put the
> vertex/tiler unconditionally.

It should only be accessed from vertex/tiler if it is used by:

 - Vertex shaders (attributes, varyings, VS uniforms, VS texture)
 - Tiling (varyings, tiler structures)

Fragment use:

 - Fragment shaders (yaryings, FS uniforms, FS texture)
 - Tiler structures

> Oops, that's a mistake. I meant PAN_BO_GPU_ACCESS_VERTEX_TILER here.
> It should definitely have ACCESS_SHARED | ACCESS_READ.

+1