[Spice-devel,v2] streaming: Always delegate bit rate control to the video encoder

Submitted by Francois Gouget on Oct. 27, 2016, 6:38 p.m.

Details

Message ID alpine.DEB.2.20.1610271925070.6217@amboise
State New
Headers show
Series "streaming: Always delegate bit rate control to the video encoder" ( rev: 3 ) in Spice

Not browsing as part of any series.

Commit Message

Francois Gouget Oct. 27, 2016, 6:38 p.m.
Here are some pointers on ways to test this patch.


1. Tweaking the available bandwith at will
-------------------------------------------

The point of the bit rate control algorithm is to adapt the video 
stream's bitrate to the available network bandwidth. So you could test 
how it reacts in various conditions by moving closer or further from 
your wireless access point, switching to an ethernet connection, or 
setting up port fowarding to send the stream around hosts on the 
internet. But to get reproducible conditions the best is to use tc.

Here is an example of the commands I use:
  To limit the Spice server->client traffic and not the client->server one:
  tc qdisc add dev lo handle 1:0 root htb direct_qlen 10
  tc class add dev lo parent 1:0 classid 1:11 htb rate 6000kbit #direct_qlen 10
  # Optionally increase latency. Note that a high latency will limit bandwidth.
  # With 150ms we can still achieve >> 20Mbps.
  tc qdisc add dev lo parent 1:11 netem delay 150ms 10ms
  tc filter add dev lo protocol ip prio 1 u32 match ip sport 5903 0xffff flowid 1:11
  # Prevent too much packet queueing, otherwise packets could arrive seconds
  # after being sent which is not realistic and broken (see Bufferbloat).
  ifconfig lo txqueuelen 1

The above assumes the server listens on port 5903 on localhost. It 
limits the server -> client bandwidth to 6 Mbps with a 150 ms RTT (150 
ms in one direction, essentially 0 ms in the other) with 10ms of jitter.

Initially the client will totally mis-estimate the available bandwidth, 
sometimes coming up with > 20 Gbps. So it may take a dozen seconds for 
the server to drop the bitrate low enough to get a working stream.

Then you can drop the available bitrate to see how it reacts:

  tc class change dev lo parent 1:0 classid 1:11 htb rate 4000kbit

Finally you can check the configuration and whether packets are treated as expected:
  tc -s qdisc ls dev lo
  tc -s class show dev lo
  tc -s filter show dev lo

And clear everything once you've done all your tests.
  tc qdisc del dev lo root

For more details see:
  http://www.insightfullogic.com/blog/2013/jul/25/performance-tests-slow-networks-tc/
  http://luxik.cdi.cz/~devik/qos/htb/manual/userg.htm



2. Testing the current "server-drops-only" bit rate control
-----------------------------------------------------------

It's a good idea to play with the above commands to get an idea of how 
the baseline behaves before modifying the code. If you've done so, 
you've tested the bitrate control of either the MJPEG or the GStreamer 
video encoder.

But what we want to test here is the bitrate control implemented in 
stream.c and dcc-send.c. But it is only used when two conditions are 
met:

 * dcc_use_video_encoder_rate_control() must return false, which means 
   the client cannot send stream report messages (to summarize these 
   tell the server how much margin there is at the client between when a 
   frame is received and when it must be displayed; the bigger the 
   frame, the longer it spends in transit and the shorter that margin). 
   This is essnetially for compatibility with old clients (like the 
   defunct spicec).

 * dcc_is_low_bandwidth() must return true, which means the initial
   bandwidth estimate was under 10 Mbps. This can be a problem for the 
   tests as, at least when using tc, the initial bandwidth estimate can 
   be wildly off (like >> 20Gbps when the bandwidth is capped at 6Mbps).


This gives us three cases:
 * use_video_encoder_rate_control == true
   -> Then it is the video encoder that adjusts the stream's bitrate.

 * use_video_encoder_rate_control == false && 
   is_low_bandwidth == true
   -> Then stream.c & dcc-send.c adjust the fps to tweak the stream's 
      bitrate. This is what we want.

 * use_video_encoder_rate_control == false &&
   is_low_bandwidth == false
   -> Then there is no bitrate control whatsoever. I'll talk more about 
      this case in the next section.

So to test how the generic bitrate control algorithm behaves and 
establish a baseline I recommend to:

 * Apply the attached patch. The first hunk will let you track the 
   dcc-send's frame drops, while the second will let you force 
   is_low_bandwidth to true simply by setting SPICE_CAP_BR=1.

 * And set SPICE_DISABLE_ADAPTIVE_STREAMING=1 when running spicy.

To make sure it worked check that you have no line containing 
"stream_report" in the server log and check for lines containing 
"generic frame drop".

It can be useful to tail your server log through:

  egrep "(handle_pong|reattach_stream|marshall_stream|set_bit_rate|: jpeg)"


Testing this you'll notice that reducing the available bandwidth results 
in freezes and recovery times that take a dozen second or more. This is 
because it's only after a sufficient backlog has been created (possibly 
a single large video frame in the queue) that the server drops a frame 
which makes the bitrate control aware that something is wrong. This 
issue is common to all algorithms when all they have to rely on is 
server frame drop notifications.

Here are the flaws I see in this generic bitrate control algorithm:
 * The bitrate is adjusted only once per second. So you get drops for 1 
   second before things even have a chance of getting better.

 * Even after being notified of a server frame drop the generic code 
   continues trying to push as much data as before for up to a second.
   But server frame drops indicate the server queue is full and 
   delaying action will delay a return to normal.

 * If you halve the network bandwidth it will take 15 seconds before it 
   halves the framerate from 30 fps down to 15 (one adjustement per 
   second). With the 15 seconds delay the network queue is sure to still 
   be full so it overshoots and drops the framerate down to 1 fps before 
   increasing it again.

 * It never stabilizes the bitrate. So as soon as there are no drops 
   for 1 second it increases the framerate until there are drops again, 
   then it decreases the framerate until there are no drops. Rince, 
   repeat. The result is you never have smooth video.



3. Testing the no-bitrate-control-whatsoever case
-------------------------------------------------

To test this mode, don't set SPICE_CAP_BR=1 when starting the server, 
and start the client with SPICE_DISABLE_ADAPTIVE_STREAMING=1.

Then verify in your server log that the initial bitrate estimate is 
10Mbps or greater (which is very likely, I have seen values as high as 
73Gbps).

What happens then is that in before_reattach_stream() agent->drops is 
never updated in the first loop because is_low_bandwidth == false. Then 
the second loop never updates agent->fps since drops == 0.

So how can this works?

In the logs this mode results in regular 'generic frame drop' messages 
despite agent->fps being set to 30. This seems to imply frames get 
bunched up. But there are in fact way more dropped frames than these 
messages imply.

My theory is that with the network queue full the server stays blocked 
longer trying to send the frames, causing it to ignore most frames drawn 
by the application (or maybe even preventing the application from 
drawing them).

But having the server block on network traffic does not seem right.



4. Testing the video encoders with server-drops-only
----------------------------------------------------

To test this configuration, simply apply the patch on top of the 
previous one and test as you did for case 2.

In this mode the bitrate control is handled by the video encoder. With 
the initial bitrate estimate being too high it will first spend some 
time dropping the bitrate before stabilizing on a value suitably low.

Unlike in the previous cases the image quality will be lowered to 
preserve a reasonable framerate.

Patch hide | download patch | download mbox

diff --git a/server/dcc-send.c b/server/dcc-send.c
index e33f428..b22c1f4 100644
--- a/server/dcc-send.c
+++ b/server/dcc-send.c
@@ -1692,6 +1692,7 @@  static int red_marshall_stream_data(RedChannelClient *rcc,
 
     if (!dcc->priv->use_video_encoder_rate_control) {
         if (time_now - agent->last_send_time < (1000 * 1000 * 1000) / agent->fps) {
+            spice_debug("generic frame drop fps=%d elapsed=%uns < %uns", agent->fps, (unsigned)(time_now - agent->last_send_time), (1000 * 1000 * 1000) / agent->fps);
             agent->frames--;
 #ifdef STREAM_STATS
             agent->stats.num_drops_fps++;
diff --git a/server/main-channel-client.c b/server/main-channel-client.c
index b47b1e0..42f9bfb 100644
--- a/server/main-channel-client.c
+++ b/server/main-channel-client.c
@@ -509,6 +509,11 @@  void main_channel_client_handle_pong(MainChannelClient *mcc, SpiceMsgPing *ping,
                        mcc->priv->bitrate_per_sec,
                        (double)mcc->priv->bitrate_per_sec / 1024 / 1024,
                        main_channel_client_is_low_bandwidth(mcc) ? " LOW BANDWIDTH" : "");
+        /* FIXME Traffic shaping breaks bitrate detection, cap to 9Mbps */
+        if (getenv("SPICE_CAP_BR")) {
+            spice_printerr("-> capping to 9Mbps");
+            mcc->priv->bitrate_per_sec = MIN(9000000, mcc->priv->bitrate_per_sec);
+        }
         red_channel_client_start_connectivity_monitoring(RED_CHANNEL_CLIENT(mcc),
                                                          CLIENT_CONNECTIVITY_TIMEOUT);
         break;

Comments

On Thu, 27 Oct 2016, Francois Gouget wrote:
[...]
> Testing this you'll notice that reducing the available bandwidth results 
> in freezes and recovery times that take a dozen second or more. This is 
> because it's only after a sufficient backlog has been created (possibly 
> a single large video frame in the queue) that the server drops a frame 
> which makes the bitrate control aware that something is wrong. This 
> issue is common to all algorithms when all they have to rely on is 
> server frame drop notifications.

I'll add some more thoughts on this because I think the server could do 
better on this.

I some place that I have not precisely identified the server has to be 
writing each frame to the network socket. This will take more or less 
long depending on the state of the network queue.

- If there is enough room in the queue for the frame the write 
  should complete instantly.
- If the queue is full or there is not enough space the write will only 
  complete once enough data has been sent and received by the client. 
  How long this takes will depend on how much space needed freeing and 
  on the network bandwidth.

Dividing the frame size by the time to write gives and upper bound on 
the available bandwidth. That value is probably not directly usable (one 
would have to substract the bandwidth used by other traffic like audio) 
but its variations could prove informative.

Thus notifying the video encoder of the time it took to push each frame 
to the network could provide useful and early information on the network 
state:

* If the time to write is ~0 then it means there is plenty of bandwidth 
  available so the stream bitrate can be increased. This type of 
  information is currently completely unavailable if the client does not 
  send stream reports.

* If the time to write shoots up from ~0 then it means the queue is now 
  full so the stream bitrate should not be increased further.

* If the time to write was already high and the calculated 
  bandwidth dropped, then it means the available network bandwidth 
  dropped. So decrease the stream bitrate.

* Since we have a bandwidth upper bound it should be higher than the 
  stream bitrate. If that's not the case it's another indicator that the 
  stream bitrate may be too high.

* What makes this interesting is that catching congestion conditions 
  early is key to avoid them escalating to frame drops: if you don't 
  then large frames will keep accumulating in the network queue until 
  you get a lag of at least 1 frame interval, or until you get a client 
  report back which you only get once every 166ms (5 frames at 30 fps, 
  plus it's also stale by RTT/2 ms). Here you'd get feedback as soon as 
  the frame is in the network queue, likely even before the client has 
  received it.

* Of course that source of data is going to be quite noisy and it's 
  likely dealing with that noise will reintroduce some lag. But at least 
  the lag is not built-in so it still has the potential of being more 
  reactive.
> 
> On Thu, 27 Oct 2016, Francois Gouget wrote:
> [...]
> > Testing this you'll notice that reducing the available bandwidth results
> > in freezes and recovery times that take a dozen second or more. This is
> > because it's only after a sufficient backlog has been created (possibly
> > a single large video frame in the queue) that the server drops a frame
> > which makes the bitrate control aware that something is wrong. This
> > issue is common to all algorithms when all they have to rely on is
> > server frame drop notifications.
> 
> I'll add some more thoughts on this because I think the server could do
> better on this.
> 
> I some place that I have not precisely identified the server has to be
> writing each frame to the network socket. This will take more or less
> long depending on the state of the network queue.
> 
> - If there is enough room in the queue for the frame the write
>   should complete instantly.
> - If the queue is full or there is not enough space the write will only
>   complete once enough data has been sent and received by the client.
>   How long this takes will depend on how much space needed freeing and
>   on the network bandwidth.
> 
> Dividing the frame size by the time to write gives and upper bound on
> the available bandwidth. That value is probably not directly usable (one
> would have to substract the bandwidth used by other traffic like audio)
> but its variations could prove informative.
> 

I don't know how much information can give the upper bound. But
I think that if you continue to see the queue full the upper bound
should approximate to the real one which is much more useful.

> Thus notifying the video encoder of the time it took to push each frame
> to the network could provide useful and early information on the network
> state:
> 
> * If the time to write is ~0 then it means there is plenty of bandwidth
>   available so the stream bitrate can be increased. This type of
>   information is currently completely unavailable if the client does not
>   send stream reports.
> 

I think you are not considering proxy case. In this case the proxy is
providing extra space reducing potentially the queue.
The queue (in this case the tcp stream one) depends on many aspects like
- system setting;
- average network usage (so not only this connection);
- network latency;
- proxy presence.
But if you can send to client X bytes and client give an ACK using S
seconds you had a bandwidth of X/S and this is a lower bound, not an
upper bound. Here the S time can be bigger due to preexisting queue
(due to previous data on this connections or others sharing same
network path).

> * If the time to write shoots up from ~0 then it means the queue is now
>   full so the stream bitrate should not be increased further.
> 

Agreed basically you are using more bandwidth than available one.

> * If the time to write was already high and the calculated
>   bandwidth dropped, then it means the available network bandwidth
>   dropped. So decrease the stream bitrate.
> 

Here the problem I think is the calculated bandwidth.
We should compute it using more global data so to include
all possible streams (like sound) and connections usage
(even image and cursor for instance)

> * Since we have a bandwidth upper bound it should be higher than the
>   stream bitrate. If that's not the case it's another indicator that the
>   stream bitrate may be too high.
> 

Here you mean basically that you cannot support such high bitrate
on the stream and you should decrease the bitrate, right?
Maybe terminology confusion, here by bitrate you mean the configured
stream bitrate (set to gstreamer if gstreamer is used) and not the
networks one (bandwidth).

> * What makes this interesting is that catching congestion conditions
>   early is key to avoid them escalating to frame drops: if you don't
>   then large frames will keep accumulating in the network queue until
>   you get a lag of at least 1 frame interval, or until you get a client
>   report back which you only get once every 166ms (5 frames at 30 fps,
>   plus it's also stale by RTT/2 ms). Here you'd get feedback as soon as
>   the frame is in the network queue, likely even before the client has
>   received it.
> 

Currently we don't use it but beside the streaming report there is
a PING/PONG protocol that we could use for better bandwidth computation.
It's used at the beginning to get the low/high bandwidth estimation
and to do some weird bandwidth limitation  (IMHO wrong) later.

> * Of course that source of data is going to be quite noisy and it's
>   likely dealing with that noise will reintroduce some lag. But at least
>   the lag is not built-in so it still has the potential of being more
>   reactive.
> 

When you test the streaming do you look at the network queues?
I find it very interesting, I usually keep a terminal windows open
with a command like "watch 'netstat -anp | grep 5900'".

Frediano
On Wed, 16 Nov 2016, Frediano Ziglio wrote:
[...]
> I don't know how much information can give the upper bound.

When the available bandwidth drops suddenly (e.g. degraded wifi / 3G 
connection or multiple competing network streams starting) it can take 
quite a few iterations before the video stream's bitrate is slashed 
sufficiently to fit. A network bandwidth upper bound could let us 
immediately drop the stream bitrate to a lower value. Of course there's 
no point if the upper bandwidth estimate value takes too long to react 
to network changes, or if it is unreliable.


> But I think that if you continue to see the queue full the upper bound 
> should approximate to the real one which is much more useful.

Yes.


[...]
> > * If the time to write is ~0 then it means there is plenty of bandwidth
> >   available so the stream bitrate can be increased.
[...]
> But if you can send to client X bytes and client give an ACK using S
> seconds you had a bandwidth of X/S and this is a lower bound, not an
> upper bound.

Note that in my thought experiment the time S would be the time it 
takes to put the data in the kernel's network buffer, not the time 
it takes for the client to acknowledge receipt of that data.

The latter would indeed give a lower bound. But the former gives an 
upper bound because if there is already sufficient space in the kernel 
buffer then that time will essentially be 0 resulting in an infinite 
bandwidth.

It's only when the buffer is already full and we need to wait for the 
TCP stack to receive IP acks of old data that the calculated value may 
be too low. But in that case I expect it will still be close to the 
right value.


> > * If the time to write was already high and the calculated
> >   bandwidth dropped, then it means the available network bandwidth
> >   dropped. So decrease the stream bitrate.
> 
> Here the problem I think is the calculated bandwidth.
> We should compute it using more global data so to include
> all possible streams (like sound) and connections usage
> (even image and cursor for instance)

It would be nice but my feeling is that image and cursor data network 
usage is pretty bursty and unpredictable. Audio bandwidth is the 
opposite: it should be pretty constant and predictable so knowing that 
would help.

Other video streams are a bit in the middle: if there is lots of 
bandwidth available then their bitrate will be limited by the quantizer 
cap meaning that it will depend a lot on the scene: low bandwidth on 
simple scenes and higher bandwidth on complex ones. If bandwidth is 
limited then all scenes will bump against the bitrate limit we impose on 
the stream meaning it should be more constant and thus known and 
predictable for other streams.


> > * Since we have a bandwidth upper bound it should be higher than the
> >   stream bitrate. If that's not the case it's another indicator that the
> >   stream bitrate may be too high.
> > 
> 
> Here you mean basically that you cannot support such high bitrate
> on the stream and you should decrease the bitrate, right?

Yes.


> Maybe terminology confusion, here by bitrate you mean the configured
> stream bitrate (set to gstreamer if gstreamer is used) and not the
> networks one (bandwidth).

Yes.


[...]
> When you test the streaming do you look at the network queues?
> I find it very interesting, I usually keep a terminal windows open
> with a command like "watch 'netstat -anp | grep 5900'".

I did not but this could be interesting. In my tests I just tried to 
limit the interface queue length to avoid excessive queuing in the 
kernel (ifconfig lo txqueuelen 1). But I did not notice a clear impact.

It feels like getting data like RTT times for IP-level acks straight 
from the TCP stack could provide valuable information. I don't know if 
other streaming applications have tried that.