util/slab: add slab_free_fast()

Submitted by Rob Clark on Aug. 7, 2019, 10 p.m.

Details

Message ID 20190807220005.14722-1-robdclark@gmail.com
State New
Headers show
Series "util/slab: add slab_free_fast()" ( rev: 1 ) in Mesa

Not browsing as part of any series.

Commit Message

Rob Clark Aug. 7, 2019, 10 p.m.
From: Rob Clark <robdclark@chromium.org>

I noticed slab_free() showing up at the top of perf results in
gl_driver2, due to "streaming" GPU state objects, which are allocated
and destroyed within a single draw call.

In this case, it is guaranteed that we free on the same child pool,
from the same (only) thread accessing the child pool.  So add a faster,
but more restricted version of slab_free() to cut the overhead.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 src/util/slab.c | 20 ++++++++++++++++++++
 src/util/slab.h |  1 +
 2 files changed, 21 insertions(+)

Patch hide | download patch | download mbox

diff --git a/src/util/slab.c b/src/util/slab.c
index 62634034fdc..21076a80238 100644
--- a/src/util/slab.c
+++ b/src/util/slab.c
@@ -276,6 +276,26 @@  void slab_free(struct slab_child_pool *pool, void *ptr)
    }
 }
 
+/**
+ * Similar to slab_free(), except that freeing an object in a different pool
+ * from the one it was allocated from is *not* allowed.
+ */
+void slab_free_fast(struct slab_child_pool *pool, void *ptr)
+{
+   struct slab_element_header *elt = ((struct slab_element_header*)ptr - 1);
+
+   CHECK_MAGIC(elt, SLAB_MAGIC_ALLOCATED);
+   SET_MAGIC(elt, SLAB_MAGIC_FREE);
+
+   assert(p_atomic_read(&elt->owner) == (intptr_t)pool);
+
+   /* This is the simple case: The caller guarantees that we can safely
+    * access the free list.
+    */
+   elt->next = pool->free;
+   pool->free = elt;
+}
+
 /**
  * Allocate an object from the slab. Single-threaded (no mutex).
  */
diff --git a/src/util/slab.h b/src/util/slab.h
index 5a25adaf7f4..9748cd95858 100644
--- a/src/util/slab.h
+++ b/src/util/slab.h
@@ -78,6 +78,7 @@  void slab_create_child(struct slab_child_pool *pool,
 void slab_destroy_child(struct slab_child_pool *pool);
 void *slab_alloc(struct slab_child_pool *pool);
 void slab_free(struct slab_child_pool *pool, void *ptr);
+void slab_free_fast(struct slab_child_pool *pool, void *ptr);
 
 struct slab_mempool {
    struct slab_parent_pool parent;

Comments


Do you have any actual numbers for this?

On August 7, 2019 17:02:22 Rob Clark <robdclark@gmail.com> wrote:

> From: Rob Clark <robdclark@chromium.org>
>
> I noticed slab_free() showing up at the top of perf results in
> gl_driver2, due to "streaming" GPU state objects, which are allocated
> and destroyed within a single draw call.
>
> In this case, it is guaranteed that we free on the same child pool,
> from the same (only) thread accessing the child pool.  So add a faster,
> but more restricted version of slab_free() to cut the overhead.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
> src/util/slab.c | 20 ++++++++++++++++++++
> src/util/slab.h |  1 +
> 2 files changed, 21 insertions(+)
>
> diff --git a/src/util/slab.c b/src/util/slab.c
> index 62634034fdc..21076a80238 100644
> --- a/src/util/slab.c
> +++ b/src/util/slab.c
> @@ -276,6 +276,26 @@ void slab_free(struct slab_child_pool *pool, void *ptr)
>    }
> }
>
> +/**
> + * Similar to slab_free(), except that freeing an object in a different pool
> + * from the one it was allocated from is *not* allowed.
> + */
> +void slab_free_fast(struct slab_child_pool *pool, void *ptr)
> +{
> +   struct slab_element_header *elt = ((struct slab_element_header*)ptr - 1);
> +
> +   CHECK_MAGIC(elt, SLAB_MAGIC_ALLOCATED);
> +   SET_MAGIC(elt, SLAB_MAGIC_FREE);
> +
> +   assert(p_atomic_read(&elt->owner) == (intptr_t)pool);
> +
> +   /* This is the simple case: The caller guarantees that we can safely
> +    * access the free list.
> +    */
> +   elt->next = pool->free;
> +   pool->free = elt;
> +}
> +
> /**
>  * Allocate an object from the slab. Single-threaded (no mutex).
>  */
> diff --git a/src/util/slab.h b/src/util/slab.h
> index 5a25adaf7f4..9748cd95858 100644
> --- a/src/util/slab.h
> +++ b/src/util/slab.h
> @@ -78,6 +78,7 @@ void slab_create_child(struct slab_child_pool *pool,
> void slab_destroy_child(struct slab_child_pool *pool);
> void *slab_alloc(struct slab_child_pool *pool);
> void slab_free(struct slab_child_pool *pool, void *ptr);
> +void slab_free_fast(struct slab_child_pool *pool, void *ptr);
>
> struct slab_mempool {
>    struct slab_parent_pool parent;
> --
> 2.21.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Yes, although not in front of me.. overall for gl_driver2 it was
something along the lines of 2-3%.  Looking at perf-report, slab_free
(+ slab_free_fast) went from ~20% to nearly nothing(ish) iirc..

BR,
-R

On Wed, Aug 7, 2019 at 4:58 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> Do you have any actual numbers for this?
>
> On August 7, 2019 17:02:22 Rob Clark <robdclark@gmail.com> wrote:
>
> > From: Rob Clark <robdclark@chromium.org>
> >
> > I noticed slab_free() showing up at the top of perf results in
> > gl_driver2, due to "streaming" GPU state objects, which are allocated
> > and destroyed within a single draw call.
> >
> > In this case, it is guaranteed that we free on the same child pool,
> > from the same (only) thread accessing the child pool.  So add a faster,
> > but more restricted version of slab_free() to cut the overhead.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> > src/util/slab.c | 20 ++++++++++++++++++++
> > src/util/slab.h |  1 +
> > 2 files changed, 21 insertions(+)
> >
> > diff --git a/src/util/slab.c b/src/util/slab.c
> > index 62634034fdc..21076a80238 100644
> > --- a/src/util/slab.c
> > +++ b/src/util/slab.c
> > @@ -276,6 +276,26 @@ void slab_free(struct slab_child_pool *pool, void *ptr)
> >    }
> > }
> >
> > +/**
> > + * Similar to slab_free(), except that freeing an object in a different pool
> > + * from the one it was allocated from is *not* allowed.
> > + */
> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr)
> > +{
> > +   struct slab_element_header *elt = ((struct slab_element_header*)ptr - 1);
> > +
> > +   CHECK_MAGIC(elt, SLAB_MAGIC_ALLOCATED);
> > +   SET_MAGIC(elt, SLAB_MAGIC_FREE);
> > +
> > +   assert(p_atomic_read(&elt->owner) == (intptr_t)pool);
> > +
> > +   /* This is the simple case: The caller guarantees that we can safely
> > +    * access the free list.
> > +    */
> > +   elt->next = pool->free;
> > +   pool->free = elt;
> > +}
> > +
> > /**
> >  * Allocate an object from the slab. Single-threaded (no mutex).
> >  */
> > diff --git a/src/util/slab.h b/src/util/slab.h
> > index 5a25adaf7f4..9748cd95858 100644
> > --- a/src/util/slab.h
> > +++ b/src/util/slab.h
> > @@ -78,6 +78,7 @@ void slab_create_child(struct slab_child_pool *pool,
> > void slab_destroy_child(struct slab_child_pool *pool);
> > void *slab_alloc(struct slab_child_pool *pool);
> > void slab_free(struct slab_child_pool *pool, void *ptr);
> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr);
> >
> > struct slab_mempool {
> >    struct slab_parent_pool parent;
> > --
> > 2.21.0
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
It'd be nice to throw some numbers in the donut message before you push. 
"Real" ones of it's convenient but what you have below is probably fine if 
you've already thrown the data away.

On August 7, 2019 19:13:21 Rob Clark <robdclark@chromium.org> wrote:

> Yes, although not in front of me.. overall for gl_driver2 it was
> something along the lines of 2-3%.  Looking at perf-report, slab_free
> (+ slab_free_fast) went from ~20% to nearly nothing(ish) iirc..
>
> BR,
> -R
>
> On Wed, Aug 7, 2019 at 4:58 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>>
>> Do you have any actual numbers for this?
>>
>> On August 7, 2019 17:02:22 Rob Clark <robdclark@gmail.com> wrote:
>>
>> > From: Rob Clark <robdclark@chromium.org>
>> >
>> > I noticed slab_free() showing up at the top of perf results in
>> > gl_driver2, due to "streaming" GPU state objects, which are allocated
>> > and destroyed within a single draw call.
>> >
>> > In this case, it is guaranteed that we free on the same child pool,
>> > from the same (only) thread accessing the child pool.  So add a faster,
>> > but more restricted version of slab_free() to cut the overhead.
>> >
>> > Signed-off-by: Rob Clark <robdclark@chromium.org>
>> > ---
>> > src/util/slab.c | 20 ++++++++++++++++++++
>> > src/util/slab.h |  1 +
>> > 2 files changed, 21 insertions(+)
>> >
>> > diff --git a/src/util/slab.c b/src/util/slab.c
>> > index 62634034fdc..21076a80238 100644
>> > --- a/src/util/slab.c
>> > +++ b/src/util/slab.c
>> > @@ -276,6 +276,26 @@ void slab_free(struct slab_child_pool *pool, void *ptr)
>> >    }
>> > }
>> >
>> > +/**
>> > + * Similar to slab_free(), except that freeing an object in a different pool
>> > + * from the one it was allocated from is *not* allowed.
>> > + */
>> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr)
>> > +{
>> > +   struct slab_element_header *elt = ((struct slab_element_header*)ptr - 1);
>> > +
>> > +   CHECK_MAGIC(elt, SLAB_MAGIC_ALLOCATED);
>> > +   SET_MAGIC(elt, SLAB_MAGIC_FREE);
>> > +
>> > +   assert(p_atomic_read(&elt->owner) == (intptr_t)pool);
>> > +
>> > +   /* This is the simple case: The caller guarantees that we can safely
>> > +    * access the free list.
>> > +    */
>> > +   elt->next = pool->free;
>> > +   pool->free = elt;
>> > +}
>> > +
>> > /**
>> >  * Allocate an object from the slab. Single-threaded (no mutex).
>> >  */
>> > diff --git a/src/util/slab.h b/src/util/slab.h
>> > index 5a25adaf7f4..9748cd95858 100644
>> > --- a/src/util/slab.h
>> > +++ b/src/util/slab.h
>> > @@ -78,6 +78,7 @@ void slab_create_child(struct slab_child_pool *pool,
>> > void slab_destroy_child(struct slab_child_pool *pool);
>> > void *slab_alloc(struct slab_child_pool *pool);
>> > void slab_free(struct slab_child_pool *pool, void *ptr);
>> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr);
>> >
>> > struct slab_mempool {
>> >    struct slab_parent_pool parent;
>> > --
>> > 2.21.0
>> >
>> > _______________________________________________
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>>
should be easy enough to re-create..  I will note that the perf
results differ a *lot* if I pin things to small (in-order) calls, so
I'm guestimating that what perf is showing me w/ slab_free() is where
reality catches up to mis-predicted branches.

BR,
-R


On Wed, Aug 7, 2019 at 9:44 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>
> It'd be nice to throw some numbers in the donut message before you push.
> "Real" ones of it's convenient but what you have below is probably fine if
> you've already thrown the data away.
>
> On August 7, 2019 19:13:21 Rob Clark <robdclark@chromium.org> wrote:
>
> > Yes, although not in front of me.. overall for gl_driver2 it was
> > something along the lines of 2-3%.  Looking at perf-report, slab_free
> > (+ slab_free_fast) went from ~20% to nearly nothing(ish) iirc..
> >
> > BR,
> > -R
> >
> > On Wed, Aug 7, 2019 at 4:58 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
> >>
> >> Do you have any actual numbers for this?
> >>
> >> On August 7, 2019 17:02:22 Rob Clark <robdclark@gmail.com> wrote:
> >>
> >> > From: Rob Clark <robdclark@chromium.org>
> >> >
> >> > I noticed slab_free() showing up at the top of perf results in
> >> > gl_driver2, due to "streaming" GPU state objects, which are allocated
> >> > and destroyed within a single draw call.
> >> >
> >> > In this case, it is guaranteed that we free on the same child pool,
> >> > from the same (only) thread accessing the child pool.  So add a faster,
> >> > but more restricted version of slab_free() to cut the overhead.
> >> >
> >> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> >> > ---
> >> > src/util/slab.c | 20 ++++++++++++++++++++
> >> > src/util/slab.h |  1 +
> >> > 2 files changed, 21 insertions(+)
> >> >
> >> > diff --git a/src/util/slab.c b/src/util/slab.c
> >> > index 62634034fdc..21076a80238 100644
> >> > --- a/src/util/slab.c
> >> > +++ b/src/util/slab.c
> >> > @@ -276,6 +276,26 @@ void slab_free(struct slab_child_pool *pool, void *ptr)
> >> >    }
> >> > }
> >> >
> >> > +/**
> >> > + * Similar to slab_free(), except that freeing an object in a different pool
> >> > + * from the one it was allocated from is *not* allowed.
> >> > + */
> >> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr)
> >> > +{
> >> > +   struct slab_element_header *elt = ((struct slab_element_header*)ptr - 1);
> >> > +
> >> > +   CHECK_MAGIC(elt, SLAB_MAGIC_ALLOCATED);
> >> > +   SET_MAGIC(elt, SLAB_MAGIC_FREE);
> >> > +
> >> > +   assert(p_atomic_read(&elt->owner) == (intptr_t)pool);
> >> > +
> >> > +   /* This is the simple case: The caller guarantees that we can safely
> >> > +    * access the free list.
> >> > +    */
> >> > +   elt->next = pool->free;
> >> > +   pool->free = elt;
> >> > +}
> >> > +
> >> > /**
> >> >  * Allocate an object from the slab. Single-threaded (no mutex).
> >> >  */
> >> > diff --git a/src/util/slab.h b/src/util/slab.h
> >> > index 5a25adaf7f4..9748cd95858 100644
> >> > --- a/src/util/slab.h
> >> > +++ b/src/util/slab.h
> >> > @@ -78,6 +78,7 @@ void slab_create_child(struct slab_child_pool *pool,
> >> > void slab_destroy_child(struct slab_child_pool *pool);
> >> > void *slab_alloc(struct slab_child_pool *pool);
> >> > void slab_free(struct slab_child_pool *pool, void *ptr);
> >> > +void slab_free_fast(struct slab_child_pool *pool, void *ptr);
> >> >
> >> > struct slab_mempool {
> >> >    struct slab_parent_pool parent;
> >> > --
> >> > 2.21.0
> >> >
> >> > _______________________________________________
> >> > mesa-dev mailing list
> >> > mesa-dev@lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >>
> >>
>
>
>