[v3,10/25] panfrost: Make sure the BO is 'ready' when picked from the cache

Submitted by Boris Brezillon on Sept. 5, 2019, 7:41 p.m.

Details

Message ID 20190905194150.17608-11-boris.brezillon@collabora.com
State New
Headers show
Series "panfrost: Rework the batch pipelining logic" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Boris Brezillon Sept. 5, 2019, 7:41 p.m.
This is needed if we want to free the panfrost_batch object at submit
time in order to not have to GC the batch on the next job submission.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 src/gallium/drivers/panfrost/pan_bo.c | 68 ++++++++++++++++++---------
 src/gallium/drivers/panfrost/pan_bo.h |  2 +
 2 files changed, 49 insertions(+), 21 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/gallium/drivers/panfrost/pan_bo.c b/src/gallium/drivers/panfrost/pan_bo.c
index 1f87c18e9ad5..3fd2e179aa72 100644
--- a/src/gallium/drivers/panfrost/pan_bo.c
+++ b/src/gallium/drivers/panfrost/pan_bo.c
@@ -23,6 +23,7 @@ 
  * Authors (Collabora):
  *   Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
  */
+#include <errno.h>
 #include <stdio.h>
 #include <fcntl.h>
 #include <xf86drm.h>
@@ -99,6 +100,23 @@  panfrost_bo_free(struct panfrost_bo *bo)
         ralloc_free(bo);
 }
 
+bool
+panfrost_bo_wait(struct panfrost_bo *bo, int64_t timeout_ns)
+{
+        struct drm_panfrost_wait_bo req = {
+                .handle = bo->gem_handle,
+		.timeout_ns = timeout_ns,
+        };
+        int ret;
+
+        ret = drmIoctl(bo->screen->fd, DRM_IOCTL_PANFROST_WAIT_BO, &req);
+        if (ret != -1)
+                return true;
+
+        assert(errno == ETIMEDOUT || errno == EBUSY);
+        return false;
+}
+
 /* Helper to calculate the bucket index of a BO */
 
 static unsigned
@@ -136,7 +154,7 @@  pan_bucket(struct panfrost_screen *screen, unsigned size)
 
 static struct panfrost_bo *
 panfrost_bo_cache_fetch(struct panfrost_screen *screen,
-                        size_t size, uint32_t flags)
+                        size_t size, uint32_t flags, bool dontwait)
 {
         pthread_mutex_lock(&screen->bo_cache_lock);
         struct list_head *bucket = pan_bucket(screen, size);
@@ -144,27 +162,29 @@  panfrost_bo_cache_fetch(struct panfrost_screen *screen,
 
         /* Iterate the bucket looking for something suitable */
         list_for_each_entry_safe(struct panfrost_bo, entry, bucket, link) {
-                if (entry->size >= size &&
-                    entry->flags == flags) {
-                        int ret;
-                        struct drm_panfrost_madvise madv;
+                if (entry->size < size || entry->flags != flags)
+                        continue;
 
-                        /* This one works, splice it out of the cache */
-                        list_del(&entry->link);
+                if (!panfrost_bo_wait(entry, dontwait ? 0 : INT64_MAX))
+                        continue;
 
-                        madv.handle = entry->gem_handle;
-                        madv.madv = PANFROST_MADV_WILLNEED;
-                        madv.retained = 0;
+                struct drm_panfrost_madvise madv = {
+                        .handle = entry->gem_handle,
+                        .madv = PANFROST_MADV_WILLNEED,
+                };
+                int ret;
 
-                        ret = drmIoctl(screen->fd, DRM_IOCTL_PANFROST_MADVISE, &madv);
-                        if (!ret && !madv.retained) {
-                                panfrost_bo_free(entry);
-                                continue;
-                        }
-                        /* Let's go! */
-                        bo = entry;
-                        break;
+                /* This one works, splice it out of the cache */
+                list_del(&entry->link);
+
+                ret = drmIoctl(screen->fd, DRM_IOCTL_PANFROST_MADVISE, &madv);
+                if (!ret && !madv.retained) {
+                        panfrost_bo_free(entry);
+                        continue;
                 }
+                /* Let's go! */
+                bo = entry;
+                break;
         }
         pthread_mutex_unlock(&screen->bo_cache_lock);
 
@@ -277,12 +297,18 @@  panfrost_bo_create(struct panfrost_screen *screen, size_t size,
         if (flags & PAN_ALLOCATE_GROWABLE)
                 assert(flags & PAN_ALLOCATE_INVISIBLE);
 
-        /* Before creating a BO, we first want to check the cache, otherwise,
-         * the cache misses and we need to allocate a BO fresh from the kernel
+        /* Before creating a BO, we first want to check the cache but without
+         * waiting for BO readiness (BOs in the cache can still be referenced
+         * by jobs that are not finished yet).
+         * If the cached allocation fails we fall back on fresh BO allocation,
+         * and if that fails too, we try one more time to allocate from the
+         * cache, but this time we accept to wait.
          */
-        bo = panfrost_bo_cache_fetch(screen, size, flags);
+        bo = panfrost_bo_cache_fetch(screen, size, flags, true);
         if (!bo)
                 bo = panfrost_bo_alloc(screen, size, flags);
+        if (!bo)
+                bo = panfrost_bo_cache_fetch(screen, size, flags, false);
 
         assert(bo);
 
diff --git a/src/gallium/drivers/panfrost/pan_bo.h b/src/gallium/drivers/panfrost/pan_bo.h
index b4bff645e257..5c0a00d944ec 100644
--- a/src/gallium/drivers/panfrost/pan_bo.h
+++ b/src/gallium/drivers/panfrost/pan_bo.h
@@ -57,6 +57,8 @@  struct panfrost_bo {
         uint32_t flags;
 };
 
+bool
+panfrost_bo_wait(struct panfrost_bo *bo, int64_t timeout_ns);
 void
 panfrost_bo_mmap(struct panfrost_bo *bo);
 void

Comments

> +bool
> +panfrost_bo_wait(struct panfrost_bo *bo, int64_t timeout_ns)
> +{
> +        struct drm_panfrost_wait_bo req = {
> +                .handle = bo->gem_handle,
> +		.timeout_ns = timeout_ns,
> +        };
> +        int ret;
> +
> +        ret = drmIoctl(bo->screen->fd, DRM_IOCTL_PANFROST_WAIT_BO, &req);
> +        if (ret != -1)
> +                return true;
> +
> +        assert(errno == ETIMEDOUT || errno == EBUSY);
> +        return false;
> +}

I would appreciate a comment explaining what the return value of this
ioctl is. `ret != -1` and asserting an errno is... suspicious? Not
wrong, to my knowledge, but hard to decipher without context.

> +        /* Before creating a BO, we first want to check the cache but without
> +         * waiting for BO readiness (BOs in the cache can still be referenced
> +         * by jobs that are not finished yet).
> +         * If the cached allocation fails we fall back on fresh BO allocation,
> +         * and if that fails too, we try one more time to allocate from the
> +         * cache, but this time we accept to wait.
>           */

Conceptually:

We first try a ready BO from the cache. OK.

If that fails, there is no BO in the cache that is currently ready for
use; by definition of BO readiness, this is because another concurrent
job is using it. We then try to create a new BO. Suppose a given job
uses an average of `b` BOs. Then for `j` concurrent jobs, assuming all
of these allocations succeed, we have `j * b` BOs in the cache. This is
an unfortunate bump in memory usage but necessary for pipelining.

If that allocation fails, by definition of memory allocation failures,
we ran out of memory and cannot proceed with the allocation. Either:

 - The BO cache is responsible for this. In this case, continuing to use
   the BO cache (even with the waits) will just dig us deeper into the
   hole. Perhaps we should call bo_evict_all from userspace to handle
   the memory pressure? Or does madvise render this irrelevant?

 - The BO cache is not responsible for this. In this case, we could
   continue to use the BO cache, but then either:

   	- There is a BO we can wait for. Then waiting is okay.
	- There is not. Then that cache fetch fails and we kerplutz.
	  What now? If we need an allocation, cache or no cache, if the
	  kernel says no, no means no. What then?

In short, I'm not convinced this algorithm (specifically the last step)
is ideal.

If there is no memory left for us, is it responsible to continue at all?
Should we just fail the allocation after step 2, and if the caller has a
problem with that, it's their issue? Or we abort here after step 2? I
don't like the robustness implications but low memory behaviour is a
risky subject as it is; I don't want to add more unknowns into it --
aborting it with an assert(0) is something we can recognize immediately.
Strange crashes in random places with no explanation, less so.

CC'ing Rob to see if he has any advise re Panfrost madvise interactions
as well as general kernel OOM policy.
On Thu, 5 Sep 2019 16:43:23 -0400
Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:

> > +bool
> > +panfrost_bo_wait(struct panfrost_bo *bo, int64_t timeout_ns)
> > +{
> > +        struct drm_panfrost_wait_bo req = {
> > +                .handle = bo->gem_handle,
> > +		.timeout_ns = timeout_ns,
> > +        };
> > +        int ret;
> > +
> > +        ret = drmIoctl(bo->screen->fd, DRM_IOCTL_PANFROST_WAIT_BO, &req);
> > +        if (ret != -1)
> > +                return true;
> > +
> > +        assert(errno == ETIMEDOUT || errno == EBUSY);
> > +        return false;
> > +}  
> 
> I would appreciate a comment explaining what the return value of this
> ioctl is. `ret != -1` and asserting an errno is... suspicious? Not
> wrong, to my knowledge, but hard to decipher without context.

Will document that.

> 
> > +        /* Before creating a BO, we first want to check the cache but without
> > +         * waiting for BO readiness (BOs in the cache can still be referenced
> > +         * by jobs that are not finished yet).
> > +         * If the cached allocation fails we fall back on fresh BO allocation,
> > +         * and if that fails too, we try one more time to allocate from the
> > +         * cache, but this time we accept to wait.
> >           */  
> 
> Conceptually:
> 
> We first try a ready BO from the cache. OK.
> 
> If that fails, there is no BO in the cache that is currently ready for
> use; by definition of BO readiness, this is because another concurrent
> job is using it. We then try to create a new BO. Suppose a given job
> uses an average of `b` BOs. Then for `j` concurrent jobs, assuming all
> of these allocations succeed, we have `j * b` BOs in the cache. This is
> an unfortunate bump in memory usage but necessary for pipelining.
> 
> If that allocation fails, by definition of memory allocation failures,
> we ran out of memory and cannot proceed with the allocation. Either:
> 
>  - The BO cache is responsible for this. In this case, continuing to use
>    the BO cache (even with the waits) will just dig us deeper into the
>    hole. Perhaps we should call bo_evict_all from userspace to handle
>    the memory pressure? Or does madvise render this irrelevant?

Evict won't help here as memory will only be released after the jobs
are done using it. And madvise doesn't help either, for the same reason.

> 
>  - The BO cache is not responsible for this. In this case, we could
>    continue to use the BO cache, but then either:
> 
>    	- There is a BO we can wait for. Then waiting is okay.
> 	- There is not. Then that cache fetch fails and we kerplutz.
> 	  What now? If we need an allocation, cache or no cache, if the
> 	  kernel says no, no means no. What then?

The behavior hasn't changed regarding allocation failures: it's still
an assert(), so the code is not more or less buggy than it was :p. What
happens when assert()s are disabled? probably a segfault because of a
NULL pointer dereference. So, adding the fprintf() is probably a good
idea as a first step, and then we can see if we can handle the OOM case
gracefully.

> 
> In short, I'm not convinced this algorithm (specifically the last step)
> is ideal.

It really depends on how robust you want to be when the system is under
memory pressure vs how long you accept to wait. Note that, in the worst
case scenario we wouldn't wait more than we currently do, as having each
batch wait on BOs of the previous batch is just like the serialization
we had in panfrost_flush(). I don't see it as a huge problem, but maybe
I'm wrong.

> 
> If there is no memory left for us, is it responsible to continue at all?

It's not exactly no memory, it's no immediately available memory.

> Should we just fail the allocation after step 2, and if the caller has a
> problem with that, it's their issue? Or we abort here after step 2?

I think that one is a separate issue. I mean, that's something we have
to handle even if we go through step 3 and step 3 fails. 

> I
> don't like the robustness implications but low memory behaviour is a
> risky subject as it is; I don't want to add more unknowns into it --
> aborting it with an assert(0) is something we can recognize immediately.
> Strange crashes in random places with no explanation, less so.

And that hasn't changed. We still have an assert after step 3.

> 
> CC'ing Rob to see if he has any advise re Panfrost madvise interactions
> as well as general kernel OOM policy.
> Will document that.
+1

> Evict won't help here as memory will only be released after the jobs
> are done using it. And madvise doesn't help either, for the same reason.

Ah-ha, I understand the distinction; thank you.

> The behavior hasn't changed regarding allocation failures: it's still
> an assert(), so the code is not more or less buggy than it was :p. What
> happens when assert()s are disabled? probably a segfault because of a
> NULL pointer dereference. So, adding the fprintf() is probably a good
> idea as a first step, and then we can see if we can handle the OOM case
> gracefully.

Haha, that's reasonable. I'm wondering if we should try some assert-less
stress test but maybe that doesn't matter until "productization".

> > In short, I'm not convinced this algorithm (specifically the last step)
> > is ideal.
> 
> It really depends on how robust you want to be when the system is under
> memory pressure vs how long you accept to wait. Note that, in the worst
> case scenario we wouldn't wait more than we currently do, as having each
> batch wait on BOs of the previous batch is just like the serialization
> we had in panfrost_flush(). I don't see it as a huge problem, but maybe
> I'm wrong.

Ya, I don't know; these seem like hard problems to say the least :-(

> > If there is no memory left for us, is it responsible to continue at all?
> 
> It's not exactly no memory, it's no immediately available memory.

'fraid I don't know enough about Linux allocators to grok the
distinction.

> > Should we just fail the allocation after step 2, and if the caller has a
> > problem with that, it's their issue? Or we abort here after step 2?
> 
> I think that one is a separate issue. I mean, that's something we have
> to handle even if we go through step 3 and step 3 fails. 
>
> > I
> > don't like the robustness implications but low memory behaviour is a
> > risky subject as it is; I don't want to add more unknowns into it --
> > aborting it with an assert(0) is something we can recognize immediately.
> > Strange crashes in random places with no explanation, less so.
> 
> And that hasn't changed. We still have an assert after step 3.

OK, let's hold off until after this series is merged :)