[1/3] client: Send errors should be fatal but not abort

Submitted by Lloyd Pique on Sept. 8, 2018, 1:14 a.m.

Details

Message ID 20180908011405.102768-1-lpique@google.com
State New
Headers show
Series "Series without cover letter" ( rev: 1 ) in Wayland

Not browsing as part of any series.

Commit Message

Lloyd Pique Sept. 8, 2018, 1:14 a.m.
If the core-client code encounters an error when doing an internal flush
because the send buffers are full, the previous behavior is to
wl_abort(). This is not very friendly if the client is a nested server
running as a service.

This patch instead sets a fatal display error, which is one step
better, and causes the send logic to do nothing once a fatal display
error is set. The client should eventually fall back to its main loop,
and be able to detect the fatal error there.

The existing display test is extended to exercise the error state.

Signed-off-by: Lloyd Pique <lpique@google.com>
---
 COPYING              |  1 +
 src/wayland-client.c |  9 +++++-
 tests/display-test.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 1 deletion(-)

Patch hide | download patch | download mbox

diff --git a/COPYING b/COPYING
index eb25a4e..bace5d7 100644
--- a/COPYING
+++ b/COPYING
@@ -2,6 +2,7 @@  Copyright © 2008-2012 Kristian Høgsberg
 Copyright © 2010-2012 Intel Corporation
 Copyright © 2011 Benjamin Franzke
 Copyright © 2012 Collabora, Ltd.
+Copyright © 2018 Google LLC
 
 Permission is hereby granted, free of charge, to any person obtaining a
 copy of this software and associated documentation files (the "Software"),
diff --git a/src/wayland-client.c b/src/wayland-client.c
index 0ccfc66..29ae1e0 100644
--- a/src/wayland-client.c
+++ b/src/wayland-client.c
@@ -736,6 +736,9 @@  wl_proxy_marshal_array_constructor_versioned(struct wl_proxy *proxy,
 			goto err_unlock;
 	}
 
+	if (proxy->display->last_error)
+		goto err_unlock;
+
 	closure = wl_closure_marshal(&proxy->object, opcode, args, message);
 	if (closure == NULL)
 		wl_abort("Error marshalling request: %s\n", strerror(errno));
@@ -744,7 +747,11 @@  wl_proxy_marshal_array_constructor_versioned(struct wl_proxy *proxy,
 		wl_closure_print(closure, &proxy->object, true);
 
 	if (wl_closure_send(closure, proxy->display->connection))
-		wl_abort("Error sending request: %s\n", strerror(errno));
+		// Convert EAGAIN into EPIPE, since EAGAIN is not really
+		// supposed to be fatal, especially when returned by
+		// wl_display_flush().
+		display_fatal_error(proxy->display,
+			errno != EAGAIN ? errno : EPIPE);
 
 	wl_closure_destroy(closure);
 
diff --git a/tests/display-test.c b/tests/display-test.c
index 9b49a0e..23f0635 100644
--- a/tests/display-test.c
+++ b/tests/display-test.c
@@ -1315,3 +1315,80 @@  TEST(zombie_fd_errant_consumption)
 
 	display_destroy(d);
 }
+
+static void
+terminate_on_bind_attempt(struct wl_client *client, void *data,
+			  uint32_t version, uint32_t id)
+{
+	wl_display_terminate(((struct display *)data)->wl_display);
+}
+
+static void
+fill_client_send_buffers(struct wl_display *display,
+	void (*send_check)(struct wl_display *, int *), int *send_check_result)
+{
+	int sync_count = 4096;
+	int socket_buf = 1024;
+
+	/* Use the minimum send buffer size. We want the client to be
+	 * able to fill the buffer easily. */
+	assert(setsockopt(wl_display_get_fd(display), SOL_SOCKET, SO_SNDBUF,
+		         &socket_buf, sizeof(socket_buf)) == 0);
+
+	/* Fill up the client and server buffers With the default error
+	 * handling, this will abort the client. */
+	while(sync_count--) {
+		wl_callback_destroy(wl_display_sync(display));
+		if (send_check != NULL) {
+			send_check(display, send_check_result);
+		}
+	}
+
+	wl_display_roundtrip(display);
+}
+
+static void
+send_error_client(void)
+{
+	struct client *c = client_connect();
+
+	/* Try and bind a wl_set, which isn't used but has a side effect. */
+	wl_proxy_destroy((struct wl_proxy *)client_get_seat(c));
+
+	/* Verify we do not have any errors at this point */
+	assert(wl_display_get_error(c->wl_display) == 0);
+
+	fill_client_send_buffers(c->wl_display, NULL, NULL);
+
+	/* Verify that we see a fatal display error from the send. */
+	assert(wl_display_get_error(c->wl_display) == EPIPE);
+
+	client_disconnect_nocheck(c);
+}
+
+TEST(client_send_error_tst)
+{
+	struct display *d = display_create();
+
+	test_set_timeout(2);
+
+	client_create_noarg(d, send_error_client);
+
+	/* The client must connect before simulating an unresponsive state. Use
+	 * the request for a set to terminate the event loop. */
+	wl_global_create(d->wl_display, &wl_seat_interface,
+			 1, d, terminate_on_bind_attempt);
+
+	/* This handles the initial connect, then returns. */
+	display_run(d);
+
+	/* Not processing any events for a moment should allow the buffers
+	 * to be filled up by the client, and it to enter an error state then
+	 * disconnect. */
+	test_sleep(1);
+
+	/* We need one last dispatch to handle the terminated client. */
+	wl_event_loop_dispatch(wl_display_get_event_loop(d->wl_display), 0);
+
+	display_destroy(d);
+}

Comments

On Fri,  7 Sep 2018 18:14:03 -0700
Lloyd Pique <lpique@google.com> wrote:

> If the core-client code encounters an error when doing an internal flush
> because the send buffers are full, the previous behavior is to
> wl_abort(). This is not very friendly if the client is a nested server
> running as a service.
> 
> This patch instead sets a fatal display error, which is one step
> better, and causes the send logic to do nothing once a fatal display
> error is set. The client should eventually fall back to its main loop,
> and be able to detect the fatal error there.
> 
> The existing display test is extended to exercise the error state.
> 
> Signed-off-by: Lloyd Pique <lpique@google.com>
> ---
>  COPYING              |  1 +
>  src/wayland-client.c |  9 +++++-
>  tests/display-test.c | 77 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/COPYING b/COPYING
> index eb25a4e..bace5d7 100644
> --- a/COPYING
> +++ b/COPYING
> @@ -2,6 +2,7 @@ Copyright © 2008-2012 Kristian Høgsberg
>  Copyright © 2010-2012 Intel Corporation
>  Copyright © 2011 Benjamin Franzke
>  Copyright © 2012 Collabora, Ltd.
> +Copyright © 2018 Google LLC

Hi Lloyd,

would you like to add this in wayland-client.c as well?

The overall logic looks good, but there are details to discuss below.

>  
>  Permission is hereby granted, free of charge, to any person obtaining a
>  copy of this software and associated documentation files (the "Software"),
> diff --git a/src/wayland-client.c b/src/wayland-client.c
> index 0ccfc66..29ae1e0 100644
> --- a/src/wayland-client.c
> +++ b/src/wayland-client.c
> @@ -736,6 +736,9 @@ wl_proxy_marshal_array_constructor_versioned(struct wl_proxy *proxy,
>  			goto err_unlock;
>  	}
>  
> +	if (proxy->display->last_error)
> +		goto err_unlock;
> +
>  	closure = wl_closure_marshal(&proxy->object, opcode, args, message);
>  	if (closure == NULL)
>  		wl_abort("Error marshalling request: %s\n", strerror(errno));
> @@ -744,7 +747,11 @@ wl_proxy_marshal_array_constructor_versioned(struct wl_proxy *proxy,
>  		wl_closure_print(closure, &proxy->object, true);
>  
>  	if (wl_closure_send(closure, proxy->display->connection))
> -		wl_abort("Error sending request: %s\n", strerror(errno));
> +		// Convert EAGAIN into EPIPE, since EAGAIN is not really
> +		// supposed to be fatal, especially when returned by
> +		// wl_display_flush().

The style is to use /* */ comments.

> +		display_fatal_error(proxy->display,
> +			errno != EAGAIN ? errno : EPIPE);
>  
>  	wl_closure_destroy(closure);

The logic in wayland-client.c looks good. I'm just wondering if EPIPE is
a good error code to return. EPIPE can result otherwise as well,
particularly when the server disconnects.

Could we use something else like ENOMEM or ENOSPC that would be more
unique? I see ENOMEM already used, so maybe ENOSPC?

I checked, and libwayland-client does not seem to be checking
last_error against EPIPE or really any other specific value, so I think
the concern is about what the users of the library will do. FWIW,
Weston has no occurrences of EPIPE.

>  
> diff --git a/tests/display-test.c b/tests/display-test.c
> index 9b49a0e..23f0635 100644
> --- a/tests/display-test.c
> +++ b/tests/display-test.c
> @@ -1315,3 +1315,80 @@ TEST(zombie_fd_errant_consumption)
>  
>  	display_destroy(d);
>  }
> +
> +static void
> +terminate_on_bind_attempt(struct wl_client *client, void *data,
> +			  uint32_t version, uint32_t id)
> +{
> +	wl_display_terminate(((struct display *)data)->wl_display);
> +}
> +
> +static void
> +fill_client_send_buffers(struct wl_display *display,
> +	void (*send_check)(struct wl_display *, int *), int *send_check_result)

Align this line with "struct" above and split the last argument to a
new line.

> +{
> +	int sync_count = 4096;
> +	int socket_buf = 1024;
> +
> +	/* Use the minimum send buffer size. We want the client to be
> +	 * able to fill the buffer easily. */
> +	assert(setsockopt(wl_display_get_fd(display), SOL_SOCKET, SO_SNDBUF,
> +		         &socket_buf, sizeof(socket_buf)) == 0);

One more space of alignment to right.

> +
> +	/* Fill up the client and server buffers With the default error
> +	 * handling, this will abort the client. */
> +	while(sync_count--) {

Add space between "while" and "(".

> +		wl_callback_destroy(wl_display_sync(display));

This should consume 12 bytes per call IIRC, so it should easily fill up
the 4096+1024 bytes of wl_buffer and socket send buffer. Ok.

> +		if (send_check != NULL) {
> +			send_check(display, send_check_result);
> +		}
> +	}
> +
> +	wl_display_roundtrip(display);
> +}
> +
> +static void
> +send_error_client(void)
> +{
> +	struct client *c = client_connect();
> +
> +	/* Try and bind a wl_set, which isn't used but has a side effect. */

typo: "wl_seat"

What side-effect are you relying on?

> +	wl_proxy_destroy((struct wl_proxy *)client_get_seat(c));

Why not use wl_seat_destroy()?

> +
> +	/* Verify we do not have any errors at this point */
> +	assert(wl_display_get_error(c->wl_display) == 0);
> +
> +	fill_client_send_buffers(c->wl_display, NULL, NULL);
> +
> +	/* Verify that we see a fatal display error from the send. */
> +	assert(wl_display_get_error(c->wl_display) == EPIPE);
> +
> +	client_disconnect_nocheck(c);

Looks good.

> +}
> +
> +TEST(client_send_error_tst)

How about "client_send_flood_error" or such?

> +{
> +	struct display *d = display_create();
> +
> +	test_set_timeout(2);
> +
> +	client_create_noarg(d, send_error_client);
> +
> +	/* The client must connect before simulating an unresponsive state. Use
> +	 * the request for a set to terminate the event loop. */

Is it a typo "seat"?

> +	wl_global_create(d->wl_display, &wl_seat_interface,
> +			 1, d, terminate_on_bind_attempt);
> +
> +	/* This handles the initial connect, then returns. */
> +	display_run(d);
> +
> +	/* Not processing any events for a moment should allow the buffers
> +	 * to be filled up by the client, and it to enter an error state then
> +	 * disconnect. */
> +	test_sleep(1);
> +
> +	/* We need one last dispatch to handle the terminated client. */
> +	wl_event_loop_dispatch(wl_display_get_event_loop(d->wl_display), 0);
> +
> +	display_destroy(d);
> +}

Yeah, this logic looks correct. I tried to think how to use the "stop
display" mechanism, but that does not seem to fit because you need the
server to stall while the client continues.

I don't like sleeps much, I'd prefer to wait until the client process
exits, but I don't see a good way to implement that. Waiting on the pid
would probably make handle_client_destroy() fail unexpectedly.


Thanks,
pq