_xcb_conn_wait(): Do better at detecting closed sockets

Submitted by Uli Schlachter on Oct. 13, 2016, 6:20 p.m.

Details

Message ID ddad00f8-f577-3b58-49fd-11c3c59d411b@znc.in
State New
Series "_xcb_conn_wait(): Do better at detecting closed sockets"
Headers show

Commit Message

Uli Schlachter Oct. 13, 2016, 6:20 p.m.
On 13.10.2016 17:02, Adam Jackson wrote:
[...]
> @@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond, struct iovec **vec
>              ret = -1;
>              break;
>          }
> +
> +	/* hangup with no data left to read is an error */

Hangup with data left to read is not an error? Why?

> +	if (fd.revents & POLLHUP)
> +	{
> +	    int unread = -1;
> +	    ioctl(c->fd, FIONREAD, &unread);
> +	    if (unread <= 0)
> +	    {
> +		/* coerce errno to not be EAGAIN */
> +		errno = EPIPE;
> +		ret = -1;
> +	    }
> +	}
>  #else
>          ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
>  #endif
> 

The above might work in 80% of cases, but won't work if some data is
left to read. How about the following idea which will work in 90% of cases?

     return len;
 }
@@ -988,6 +996,7 @@ int _xcb_in_read(xcb_connection_t *c)
      */
     if (msg.msg_flags & (MSG_TRUNC|MSG_CTRUNC)) {
         _xcb_conn_shutdown(c, XCB_CONN_CLOSED_FDPASSING_FAILED);
+        errno = ENO_IDEA_WHAT_TO_DO_HERE;
         return 0;
     }
 #else
@@ -1038,6 +1047,12 @@ int _xcb_in_read(xcb_connection_t *c)
     if((n > 0) || (n < 0 && WSAGetLastError() == WSAEWOULDBLOCK))
 #endif /* !_WIN32 */
         return 1;
+    if (n == 0)
+        /* Xlib assumes that errno is meaningful. That's not going to
+         * work (e.g. if the connection goes into an error state at some
+         * user code using xcb directly), but at least try.
+         */
+        errno = EPIPE;
     _xcb_conn_shutdown(c, XCB_CONN_ERROR);
     return 0;
 }

(Sorry for the line wrapping)
The comments tell you how much I like this idea.

How about fixing this properly in Xlib instead? If it wants to check if
the error was due to "the other end closed the connection", it can do
something like this:

int fd = xcb_get_file_descriptor(conn);
if (fd != -1) {
  call select() or poll() or some magic ioctl to check for hung up
} else {
  /* Either old libxcb where the above returns -1 on error connection or
the connection always has been broken (should not happen) */
}

This patch should always work while errno-magic depends on the call
chain when the hang up happens.

What do you think?
Uli

Patch hide | download patch | download mbox

--- xcb_in.c.orig	2016-10-13 20:07:52.464589712 +0200
+++ xcb_in.c	2016-10-13 20:13:31.177710447 +0200
@@ -400,7 +400,15 @@  static int read_block(const int fd, void
 #endif /* USE_POLL */
         }
         if(ret <= 0)
+        {
+            if (ret == 0)
+                /* Xlib assumes that errno is meaningful. That's not
going to
+                 * work (e.g. if the connection goes into an error
state at some
+                 * user code using xcb directly), but at least try.
+                 */
+                errno = EPIPE;
             return ret;
+        }
     }

Comments

Adam Jackson Oct. 13, 2016, 7:53 p.m.
On Thu, 2016-10-13 at 20:20 +0200, Uli Schlachter wrote:
> On 13.10.2016 17:02, Adam Jackson wrote:
> [...]
> > @@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c,
> > pthread_cond_t *cond, struct iovec **vec
> >              ret = -1;
> >              break;
> >          }
> > +
> > +	/* hangup with no data left to read is an error */
> 
> Hangup with data left to read is not an error? Why?

Surely from a correctness standpoint every reply or event that we've
already received should be processed. Consider a pair of applications
using ClientMessage events for IPC. If one side does:

    XSendEvent();
    XSync();
    XKillClient();

then it would be entirely reasonable to expect that the event has
actually been sent to the other end before it was disconnected. If we
treated POLLHUP-with-data-remaining as an error, we'd likely throw away
an event already enqueued.

It doesn't end up mattering on Linux, because POLLHUP isn't raised
until there's no data left in the read queue. But I'm told there are
other OSes in the world.

> > +	if (fd.revents & POLLHUP)
> > +	{
> > +	    int unread = -1;
> > +	    ioctl(c->fd, FIONREAD, &unread);
> > +	    if (unread <= 0)
> > +	    {
> > +		/* coerce errno to not be EAGAIN */
> > +		errno = EPIPE;
> > +		ret = -1;
> > +	    }
> > +	}
> >  #else
> >          ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
> >  #endif
> > 
> 
> The above might work in 80% of cases, but won't work if some data is
> left to read.

No. If there's data left to read, the ioctl will return a non-negative
integer in 'unread'. Only if there is nothing left to read will we
tranform this into an error condition.

Although, I am missing a break in the inner if block.  Tsk tsk.

> How about fixing this properly in Xlib instead? If it wants to check
> if the error was due to "the other end closed the connection", it can
> do something like this:

Fair enough. Do note that the existing code already does consider this
case to be an error, due to the stanza above the bit I added:

        ret = poll(&fd, 1, -1);
        /* If poll() returns an event we didn't expect, such as POLLNVAL, treat
         * it as if it failed. */
        if(ret >= 0 && (fd.revents & ~fd.events))
        {
            ret = -1;
            break;
        }

Since we're not presently adding POLLHUP to fd.events...

- ajax
Uli Schlachter Oct. 17, 2016, 12:34 p.m.
On 13.10.2016 21:53, Adam Jackson wrote:
> On Thu, 2016-10-13 at 20:20 +0200, Uli Schlachter wrote:
>> On 13.10.2016 17:02, Adam Jackson wrote:
>> How about fixing this properly in Xlib instead? If it wants to check
>> if the error was due to "the other end closed the connection", it can
>> do something like this:
> 
> Fair enough. Do note that the existing code already does consider this
> case to be an error, due to the stanza above the bit I added:
> 
>         ret = poll(&fd, 1, -1);
>         /* If poll() returns an event we didn't expect, such as POLLNVAL, treat
>          * it as if it failed. */
>         if(ret >= 0 && (fd.revents & ~fd.events))
>         {
>             ret = -1;
>             break;
>         }
> 
> Since we're not presently adding POLLHUP to fd.events...

Oh, thanks for reminding me. "man poll" tells me that setting "POLLHUP"
in "events" has no effect:

       [The field "events"] may be specified as zero, in
       which case the only events that can be returned in
       revents are POLLHUP, POLLERR, and POLLNVAL (see below).

       The field revents is an output parameter, filled by the
       kernel with the events that actually occurred.  The bits
       returned in revents can include any of those specified
       in events, or one of the values POLLERR, POLLHUP, or POLLNVAL.
       (These three bits are meaningless in the events field, and
       will be set in the revents field whenever the corresponding
       condition is true.)

Uli