[Spice-devel,v5,09/20] server: Avoid copying the input frame in the GStreamer encoder.

Submitted by Francois Gouget on Sept. 29, 2015, 11:01 a.m.

Details

Message ID alpine.DEB.2.20.1509281853390.32474@amboise
State New
Headers show

Not browsing as part of any series.

Commit Message

Francois Gouget Sept. 29, 2015, 11:01 a.m.
I attached the new patch to give an idea of what things look like now.


On Tue, 22 Sep 2015, Christophe Fergeau wrote:
[...]
> > +    SpiceChunks *chunks = bitmap->data;
> > +    while (*chunk_offset >= chunks->chunk[*chunk].len) {
> > +        /* Make sure that the chunk is not padded */
> > +        if (chunks->chunk[*chunk].len % bitmap->stride != 0) {
> > +        }
> 
> Empty block?

Fixed. I introduced this bug when moving the code to the zero_copy() 
function.


> > +    int max_mem = gst_buffer_get_max_memory();
> > +    if (chunks->num_chunks - *chunk > max_mem) {
> > +        /* There are more chunks than we can fit memory objects in a
> > +         * buffer. This will cause the buffer to merge memory objects to
> > +         * fit the extra chunks, which means doing wasteful memory copies.
> > +         * So use the zero-copy approach for the first max_mem-1 chunks,
> > +         * and directly copy the remainder to the last memory object.
> > +         */
> 
> The copy of the remainder is done in push_raw_frame if I followed the
> code correctly?

Yes. I tweaked the comment to make that more explicit.


> > +        max_mem = *chunk + max_mem - 1;
> 
> max_mem = max_mem - skipped_chunk_count - 1; ?

No.
Let's say there are 7 chunks, that we skip the first two, and that we 
can put 4 GstMemory objects a GstBuffer.

chunk 0 -> zero_copy() skip
chunk 1 -> zero_copy() skip

Now chunk == 2, max_mem == 4 and what we want to know is when to exit 
the zero-copy loop:
      while (*len && *chunk < max_mem) {
And that's when *chunk == 2 + 4 -1

chunk 2 -> zero_copy() wrap it in a GstMemory
chunk 3 -> zero_copy() wrap it in a GstMemory
chunk 4 -> zero_copy() wrap it in a GstMemory

Now we return to push_raw_frame() when chunk is now 5 so it will pick up 
where we left off and copy the rest in a single GstMemory object.

chunk 5 -> push_raw_frame() copy to the last GstMemory
chunk 6 -> push_raw_frame() copy to the last GstMemory
chunk 7 -> push_raw_frame() copy to the last GstMemory


> > +    } else {
> > +        max_mem = chunks->num_chunks;
> > +    }
> > +
> > +    while (*len && *chunk < max_mem) {
> 
> and here 'chunk' becomes a loop index initialized at 
> skipped_chunk_count, maybe call it 'i' ? or make it a SpiceChunk * 
> initialized to chunks->chunk[skipped_chunk_count], this would make the 
> loop body a bit easier to read.

It is a loop index all along. Also we need to compare it to max_mem so 
keeping it as an integer is clearer. I've renamed it to chunk_index 
though.


[...]
> > @@ -405,15 +484,14 @@ static int push_raw_frame(GstEncoder *encoder, const SpiceBitmap *bitmap,
> >      const uint32_t height = src->bottom - src->top;
> >      const uint32_t stream_stride = (src->right - src->left) * encoder->format->bpp / 8;
> >      uint32_t len = stream_stride * height;
> > -    GstBuffer *buffer = gst_buffer_new_and_alloc(len);
> >  #ifdef HAVE_GSTREAMER_0_10
> > +    GstBuffer *buffer = gst_buffer_new_and_alloc(len);
> >      uint8_t *b = GST_BUFFER_DATA(buffer);
> > +    uint8_t *dst = b;
> 
> Seems like the declaration of uint8_t *dst = GST_BUFFER_DATA(buffer);
> could also be moved close to the 'gst_allocator_alloc()' calls in an
> inner block.

It's used in two sibling blocks so this would requires duplicating that 
line and some #if check. Not really simpler.


[...]
> > +    GstMapInfo map = { .memory = NULL };
> 
> Could be GST_MAP_INFO_INIT

I didn't know about it. However it turns out it was buggy until 
recently. So I added a TODO.

https://bugzilla.gnome.org/show_bug.cgi?id=751881


[...]
> > +#ifndef HAVE_GSTREAMER_0_10
> > +        GstMemory *mem = gst_allocator_alloc(NULL, len, NULL);
> > +        gst_memory_map(mem, &map, GST_MAP_WRITE);
> 
> Apparently it's possible for gst_memory_map() to fail, would be nicer to
> check its return value

The GstMemory object is allocated with the specific intent of mapping 
it. So if the map operation fails it's either because we're out of 
memory or because of a coding error. So I added a spice_assert() on the 
gst_memory_map() and code to check the allocation.


> > +        uint8_t *b = map.data;
> > +        uint8_t *dst = b;
> > +#endif
> 
> Could gst_buffer_new_allocate() be used instead? I don't think the
> temporary 'b' variable is used at all apart for being assigned to 'dst'.

gst_buffer_new_allocate() allocates a new buffer which is not what we 
want. Also note that we cannot use gst_buffer_append() as that would 
cause a copy of all the memory objects of the first buffer, thus 
defeating the purpose of the zero_copy code..

I removed the b variables.


> >          chunk_offset += src->left * encoder->format->bpp / 8;
> >          if (!line_copy(encoder, bitmap, chunk_offset, stream_stride, height, dst)) {
> > +#ifndef HAVE_GSTREAMER_0_10
> > +            gst_memory_unmap(map.memory, &map);
> > +            gst_memory_unref(map.memory);
> > +#endif
> 
> Looks like some gst_buffer_unmap() should have been there before this
> commit.

Yep. Added.

Patch hide | download patch | download mbox

commit 88c975388d6c5c9c9a02eadaad1e2e9bb08bf688
Author: Francois Gouget <fgouget@codeweavers.com>
Date:   Fri Jun 5 18:40:22 2015 +0200

    server: Avoid copying the input frame in the GStreamer encoder
    
    This relies on the GStreamer buffer's lifetime being short enough which
    it is because:
     - We encode frames one by one.
     - For all encoders but MJPEG, the first element of the pipeline will
       convert the bitmap to another image format which entails copying it.
       So by the time the encoder starts its work, this buffer will not be
       needed anymore.
     - The MJPEG encoder does not perform inter-frame compression and thus
       does not need to keep hold of this buffer once it has processed it.
    
    Note that we can only avoid copies for the first 1 Mpixels or so.
    That's because Spice splits larger frames into more chunks than we can
    fit GstMemory fragments in a GStreamer buffer. So if there are more
    pixels we will avoid copies for the first 3840 KB and copy the rest.
    
    Signed-off-by: Francois Gouget <fgouget@codeweavers.com>

diff --git a/server/gstreamer_encoder.c b/server/gstreamer_encoder.c
index d241b20..fe45e0e 100644
--- a/server/gstreamer_encoder.c
+++ b/server/gstreamer_encoder.c
@@ -31,6 +31,10 @@ 
 
 #define GSTE_DEFAULT_FPS 30
 
+#ifndef HAVE_GSTREAMER_0_10
+# define DO_ZERO_COPY
+#endif
+
 
 typedef struct {
     SpiceBitmapFmt spice_format;
@@ -83,6 +87,11 @@  typedef struct SpiceGstEncoder {
     /* The default video buffer */
     SpiceGstVideoBuffer *default_buffer;
 
+#ifdef DO_ZERO_COPY
+    /* Set to TRUE until GStreamer no longer needs the raw bitmap buffer. */
+    gboolean needs_bitmap;
+#endif
+
     /* The frame counter for GStreamer buffers */
     uint32_t frame;
 
@@ -397,6 +406,77 @@  static inline int line_copy(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
      return TRUE;
 }
 
+#ifdef DO_ZERO_COPY
+/* A helper for zero_copy() */
+static void unref_bitmap(gpointer mem)
+{
+    SpiceGstEncoder *encoder = (SpiceGstEncoder*)mem;
+    encoder->needs_bitmap = FALSE;
+}
+
+/* A helper for push_raw_frame(). */
+static inline int zero_copy(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
+                            GstBuffer *buffer, uint32_t *chunk_index,
+                            uint32_t *chunk_offset, uint32_t *len)
+{
+    /* We cannot control the lifetime of the bitmap but we can wrap it in
+     * the buffer anyway because:
+     * - Before returning from gst_encoder_encode_frame() we wait for the
+     *   pipeline to have converted this frame into a compressed buffer.
+     *   So it has to have gone through the frame at least once.
+     * - For all encoders but MJPEG, the first element of the pipeline will
+     *   convert the bitmap to another image format which entails copying
+     *   it. So by the time the encoder starts its work, this buffer will
+     *   not be needed anymore.
+     * - The MJPEG encoder does not perform inter-frame compression and thus
+     *   does not need to keep hold of this buffer once it has processed it.
+     * encoder->needs_bitmap lets us verify that these conditions still hold
+     * true through an assert.
+     */
+    SpiceChunks *chunks = bitmap->data;
+    while (*chunk_offset >= chunks->chunk[*chunk_index].len) {
+        /* Make sure that the chunk is not padded */
+        if (chunks->chunk[*chunk_index].len % bitmap->stride != 0) {
+            return FALSE;
+        }
+        *chunk_offset -= chunks->chunk[*chunk_index].len;
+        (*chunk_index)++;
+    }
+
+    int max_mem = gst_buffer_get_max_memory();
+    if (chunks->num_chunks - *chunk_index > max_mem) {
+        /* There are more chunks than we can fit memory objects in a
+         * buffer. This will cause the buffer to merge memory objects to
+         * fit the extra chunks, which means doing wasteful memory copies.
+         * So use the zero-copy approach for the first max_mem-1 chunks, and
+         * let push_raw_frame() add another memory object to copy the rest.
+         */
+        max_mem = *chunk_index + max_mem - 1;
+    } else {
+        max_mem = chunks->num_chunks;
+    }
+
+    while (*len && *chunk_index < max_mem) {
+        /* Make sure that the chunk is not padded */
+        if (chunks->chunk[*chunk_index].len % bitmap->stride != 0) {
+            return FALSE;
+        }
+        uint32_t thislen = MIN(chunks->chunk[*chunk_index].len - *chunk_offset, *len);
+        GstMemory *mem = gst_memory_new_wrapped(GST_MEMORY_FLAG_READONLY,
+                                                chunks->chunk[*chunk_index].data,
+                                                chunks->chunk[*chunk_index].len,
+                                                *chunk_offset, thislen,
+                                                encoder, &unref_bitmap);
+        gst_buffer_append_memory(buffer, mem);
+        *len -= thislen;
+        *chunk_offset = 0;
+        (*chunk_index)++;
+    }
+    encoder->needs_bitmap = TRUE;
+    return TRUE;
+}
+#endif
+
 /* A helper for gst_encoder_encode_frame(). */
 static int push_raw_frame(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
                           const SpiceRect *src, int top_down)
@@ -404,13 +484,13 @@  static int push_raw_frame(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
     const uint32_t height = src->bottom - src->top;
     const uint32_t stream_stride = (src->right - src->left) * encoder->format->bpp / 8;
     uint32_t len = stream_stride * height;
-    GstBuffer *buffer = gst_buffer_new_and_alloc(len);
 #ifdef HAVE_GSTREAMER_0_10
+    GstBuffer *buffer = gst_buffer_new_and_alloc(len);
     uint8_t *dst = GST_BUFFER_DATA(buffer);
 #else
-    GstMapInfo map;
-    gst_buffer_map(buffer, &map, GST_MAP_WRITE);
-    uint8_t *dst = map.data;
+    GstBuffer *buffer = gst_buffer_new();
+    /* TODO Use GST_MAP_INFO_INIT once GStreamer 1.4.5 is no longer relevant */
+    GstMapInfo map = { .memory = NULL };
 #endif
 
     /* Note that we should not reorder the lines, even if top_down is false.
@@ -423,10 +503,21 @@  static int push_raw_frame(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
         /* We have to do a line-by-line copy because for each we have to leave
          * out pixels on the left or right.
          */
+#ifndef HAVE_GSTREAMER_0_10
+        GstMemory *mem = gst_allocator_alloc(NULL, len, NULL);
+        if (!mem) {
+            gst_buffer_unref(buffer);
+            return VIDEO_ENCODER_FRAME_UNSUPPORTED;
+        }
+        spice_assert(gst_memory_map(mem, &map, GST_MAP_WRITE));
+        uint8_t *dst = map.data;
+#endif
+
         chunk_offset += src->left * encoder->format->bpp / 8;
         if (!line_copy(encoder, bitmap, chunk_offset, stream_stride, height, dst)) {
 #ifndef HAVE_GSTREAMER_0_10
-            gst_buffer_unmap(buffer, &map);
+            gst_memory_unmap(map.memory, &map);
+            gst_memory_unref(map.memory);
 #endif
             gst_buffer_unref(buffer);
             return VIDEO_ENCODER_FRAME_UNSUPPORTED;
@@ -436,11 +527,35 @@  static int push_raw_frame(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
         SpiceChunks *chunks = bitmap->data;
         uint32_t chunk_index = 0;
         /* We can copy the bitmap chunk by chunk */
+#ifdef DO_ZERO_COPY
+        if (!zero_copy(encoder, bitmap, buffer, &chunk_index, &chunk_offset, &len)) {
+            gst_buffer_unref(buffer);
+            return VIDEO_ENCODER_FRAME_UNSUPPORTED;
+        }
+        /* Now we must avoid any write to the GstBuffer object as it would
+         * cause a copy of the read-only memory objects we just added.
+         * Fortunately we can append extra writable memory objects instead.
+         */
+#endif
+
+#ifndef HAVE_GSTREAMER_0_10
+        if (len) {
+            GstMemory *mem = gst_allocator_alloc(NULL, len, NULL);
+            if (!mem) {
+                gst_buffer_unref(buffer);
+                return VIDEO_ENCODER_FRAME_UNSUPPORTED;
+            }
+            spice_assert(gst_memory_map(mem, &map, GST_MAP_WRITE));
+        }
+        uint8_t *dst = map.data;
+#endif
+
         while (len && chunk_index < chunks->num_chunks) {
             /* Make sure that the chunk is not padded */
             if (chunks->chunk[chunk_index].len % bitmap->stride != 0) {
 #ifndef HAVE_GSTREAMER_0_10
-                gst_buffer_unmap(buffer, &map);
+                gst_memory_unmap(map.memory, &map);
+                gst_memory_unref(map.memory);
 #endif
                 gst_buffer_unref(buffer);
                 return VIDEO_ENCODER_FRAME_UNSUPPORTED;
@@ -464,7 +579,10 @@  static int push_raw_frame(SpiceGstEncoder *encoder, const SpiceBitmap *bitmap,
 #ifdef HAVE_GSTREAMER_0_10
     gst_buffer_set_caps(buffer, encoder->src_caps);
 #else
-    gst_buffer_unmap(buffer, &map);
+    if (map.memory) {
+        gst_memory_unmap(map.memory, &map);
+        gst_buffer_append_memory(buffer, map.memory);
+    }
 #endif
     GST_BUFFER_OFFSET(buffer) = encoder->frame++;
 
@@ -564,6 +682,10 @@  static int gst_encoder_encode_frame(VideoEncoder *video_encoder,
     int rc = push_raw_frame(encoder, bitmap, src, top_down);
     if (rc == VIDEO_ENCODER_FRAME_ENCODE_DONE) {
         rc = pull_compressed_buffer(encoder, video_buffer);
+#ifdef DO_ZERO_COPY
+        /* GStreamer should have released the source frame buffer by now */
+        spice_assert(!encoder->needs_bitmap);
+#endif
     }
     return rc;
 }