[Spice-devel] migration: Don't assert() if MIGRATE_DATA comes before attaching the agent

Submitted by Uri Lublin on July 22, 2014, 9:16 p.m.

Details

Message ID 1406063816-30030-2-git-send-email-uril@redhat.com
State New
Headers show

Not browsing as part of any series.

Commit Message

Uri Lublin July 22, 2014, 9:16 p.m.
During seamless migration, after switching host, if a client was connected
during the migration, it will have data to send back to the new
qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
SPICE char devices use such MIGRATE_DATA messages to restore their state.

However, the MIGRATE_DATA message can arrive any time after the new qemu
instance has started, this can happen before or after the SPICE char
devices have been created. In order to handle this, if the migrate data
arrives early, it's stored in reds->agent_state.mig_data, and
attach_to_red_agent() will restore the agent state as appropriate.

Unfortunately this does not work as expected, for main
channel (agent messages).
If attach_to_red_agent() is called before the MIGRATE_DATA
message reaches the server, all goes well,
but if MIGRATE_DATA reaches the server before
attach_to_red_agent() gets called, then some assert() gets
triggered in spice_char_device_state_restore():

((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):

When restoring state, a client must already be added to the
spice-char-device.
What happens is that a client is not being added to the char-device
when when MIGRATE_DATA arrives first, which leaves both
dev->num_clients and dev->wait_for_migrate_data value at 0.

This commit changes the logic in spice_server_char_device_add_interface(),
such that if there is migrate data pending in reds->agent_state.mig_data
but no client was added to the spice-char-device yet,
then first the client is added to the device by calling
spice_char_device_client_add(), and only then the state is restored.

This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184

Based-on-a-patch-by: Christophe Fergeau <cfergeau@redhat.com>
---
 server/reds.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

Patch hide | download patch | download mbox

diff --git a/server/reds.c b/server/reds.c
index 2c437ac..ed142ec 100644
--- a/server/reds.c
+++ b/server/reds.c
@@ -1274,6 +1274,7 @@  int reds_handle_migrate_data(MainChannelClient *mcc, SpiceMigrateDataMain *mig_d
 {
     VDIPortState *agent_state = &reds->agent_state;

+    spice_debug("main-channel: got migrate data");
     /*
      * Now that the client has switched to the target server, if main_channel
      * controls the mm-time, we update the client's mm-time.
@@ -1295,15 +1296,18 @@  int reds_handle_migrate_data(MainChannelClient *mcc, SpiceMigrateDataMain *mig_d
                     main_channel_push_agent_disconnected(reds->main_channel);
                     main_channel_push_agent_connected(reds->main_channel);
                 } else {
+                    spice_debug("restoring state from mig_data");
                     return reds_agent_state_restore(mig_data);
                 }
             }
         } else {
             /* restore agent starte when the agent gets attached */
+            spice_debug("saving mig_data");
             spice_assert(agent_state->plug_generation == 0);
             agent_state->mig_data = spice_memdup(mig_data, size);
         }
     } else {
+        spice_debug("agent was not attached on the source host");
         if (vdagent) {
             /* spice_char_device_client_remove disables waiting for migration data */
             spice_char_device_client_remove(agent_state->base,
@@ -2857,17 +2861,15 @@  static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
     state->read_filter.discard_all = FALSE;
     reds->agent_state.plug_generation++;

-    if (reds->agent_state.mig_data) {
-        spice_assert(reds->agent_state.plug_generation == 1);
-        reds_agent_state_restore(reds->agent_state.mig_data);
-        free(reds->agent_state.mig_data);
-        reds->agent_state.mig_data = NULL;
-    } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
-        /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
-         * in response to MSGC_AGENT_START */
-        main_channel_push_agent_connected(reds->main_channel);
-    } else {
-       spice_debug("waiting for migration data");
+    if (reds->agent_state.mig_data ||
+        red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
+        /* Migration in progress (code is running on the destination host):
+         * 1.  Add the client to spice char device, if it was not already added.
+         * 2.a If this (qemu-kvm state load side of migration) happens first
+         *     then wait for spice migration data to arrive. Otherwise
+         * 2.b If this happens second ==> we already have spice migrate data
+         *     then restore state
+         */
         if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
             int client_added;

@@ -2883,9 +2885,24 @@  static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
                 spice_warning("failed to add client to agent");
                 reds_disconnect();
             }
+        }

+        if (reds->agent_state.mig_data) {
+            spice_debug("restoring state from stored migration data");
+            spice_assert(reds->agent_state.plug_generation == 1);
+            reds_agent_state_restore(reds->agent_state.mig_data);
+            free(reds->agent_state.mig_data);
+            reds->agent_state.mig_data = NULL;
+        }
+        else {
+            spice_debug("waiting for migration data");
         }
+    } else {
+        /* we will associate the client with the char device, upon reds_on_main_agent_start,
+         * in response to MSGC_AGENT_START */
+        main_channel_push_agent_connected(reds->main_channel);
     }
+
     return state->base;
 }


Comments

This is very close to the last iteration I sent, except that you moved
the dead code path my patch had to a more appropriate place ;) ACK,
though would be nice to have a reproducer in the commit log.

Christophe

On Wed, Jul 23, 2014 at 12:16:56AM +0300, Uri Lublin wrote:
> During seamless migration, after switching host, if a client was connected
> during the migration, it will have data to send back to the new
> qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
> SPICE char devices use such MIGRATE_DATA messages to restore their state.
> 
> However, the MIGRATE_DATA message can arrive any time after the new qemu
> instance has started, this can happen before or after the SPICE char
> devices have been created. In order to handle this, if the migrate data
> arrives early, it's stored in reds->agent_state.mig_data, and
> attach_to_red_agent() will restore the agent state as appropriate.
> 
> Unfortunately this does not work as expected, for main
> channel (agent messages).
> If attach_to_red_agent() is called before the MIGRATE_DATA
> message reaches the server, all goes well,
> but if MIGRATE_DATA reaches the server before
> attach_to_red_agent() gets called, then some assert() gets
> triggered in spice_char_device_state_restore():
> 
> ((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
> Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
> Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
> Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
> 
> When restoring state, a client must already be added to the
> spice-char-device.
> What happens is that a client is not being added to the char-device
> when when MIGRATE_DATA arrives first, which leaves both
> dev->num_clients and dev->wait_for_migrate_data value at 0.
> 
> This commit changes the logic in spice_server_char_device_add_interface(),
> such that if there is migrate data pending in reds->agent_state.mig_data
> but no client was added to the spice-char-device yet,
> then first the client is added to the device by calling
> spice_char_device_client_add(), and only then the state is restored.
> 
> This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184
> 
> Based-on-a-patch-by: Christophe Fergeau <cfergeau@redhat.com>
> ---
>  server/reds.c | 39 ++++++++++++++++++++++++++++-----------
>  1 file changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/server/reds.c b/server/reds.c
> index 2c437ac..ed142ec 100644
> --- a/server/reds.c
> +++ b/server/reds.c
> @@ -1274,6 +1274,7 @@ int reds_handle_migrate_data(MainChannelClient *mcc, SpiceMigrateDataMain *mig_d
>  {
>      VDIPortState *agent_state = &reds->agent_state;
> 
> +    spice_debug("main-channel: got migrate data");
>      /*
>       * Now that the client has switched to the target server, if main_channel
>       * controls the mm-time, we update the client's mm-time.
> @@ -1295,15 +1296,18 @@ int reds_handle_migrate_data(MainChannelClient *mcc, SpiceMigrateDataMain *mig_d
>                      main_channel_push_agent_disconnected(reds->main_channel);
>                      main_channel_push_agent_connected(reds->main_channel);
>                  } else {
> +                    spice_debug("restoring state from mig_data");
>                      return reds_agent_state_restore(mig_data);
>                  }
>              }
>          } else {
>              /* restore agent starte when the agent gets attached */
> +            spice_debug("saving mig_data");
>              spice_assert(agent_state->plug_generation == 0);
>              agent_state->mig_data = spice_memdup(mig_data, size);
>          }
>      } else {
> +        spice_debug("agent was not attached on the source host");
>          if (vdagent) {
>              /* spice_char_device_client_remove disables waiting for migration data */
>              spice_char_device_client_remove(agent_state->base,
> @@ -2857,17 +2861,15 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
>      state->read_filter.discard_all = FALSE;
>      reds->agent_state.plug_generation++;
> 
> -    if (reds->agent_state.mig_data) {
> -        spice_assert(reds->agent_state.plug_generation == 1);
> -        reds_agent_state_restore(reds->agent_state.mig_data);
> -        free(reds->agent_state.mig_data);
> -        reds->agent_state.mig_data = NULL;
> -    } else if (!red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
> -        /* we will assoicate the client with the char device, upon reds_on_main_agent_start,
> -         * in response to MSGC_AGENT_START */
> -        main_channel_push_agent_connected(reds->main_channel);
> -    } else {
> -       spice_debug("waiting for migration data");
> +    if (reds->agent_state.mig_data ||
> +        red_channel_waits_for_migrate_data(&reds->main_channel->base)) {
> +        /* Migration in progress (code is running on the destination host):
> +         * 1.  Add the client to spice char device, if it was not already added.
> +         * 2.a If this (qemu-kvm state load side of migration) happens first
> +         *     then wait for spice migration data to arrive. Otherwise
> +         * 2.b If this happens second ==> we already have spice migrate data
> +         *     then restore state
> +         */
>          if (!spice_char_device_client_exists(reds->agent_state.base, reds_get_client())) {
>              int client_added;
> 
> @@ -2883,9 +2885,24 @@ static SpiceCharDeviceState *attach_to_red_agent(SpiceCharDeviceInstance *sin)
>                  spice_warning("failed to add client to agent");
>                  reds_disconnect();
>              }
> +        }
> 
> +        if (reds->agent_state.mig_data) {
> +            spice_debug("restoring state from stored migration data");
> +            spice_assert(reds->agent_state.plug_generation == 1);
> +            reds_agent_state_restore(reds->agent_state.mig_data);
> +            free(reds->agent_state.mig_data);
> +            reds->agent_state.mig_data = NULL;
> +        }
> +        else {
> +            spice_debug("waiting for migration data");
>          }
> +    } else {
> +        /* we will associate the client with the char device, upon reds_on_main_agent_start,
> +         * in response to MSGC_AGENT_START */
> +        main_channel_push_agent_connected(reds->main_channel);
>      }
> +
>      return state->base;
>  }
> 
> -- 
> 1.9.3
> 
> _______________________________________________
> Spice-devel mailing list
> Spice-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/spice-devel