[libX11,1/2] xcb_io: Fix Xlib 32-bit request number wrapping

Submitted by Jonas Petersen on Nov. 16, 2013, 9:37 p.m.

Details

Message ID 1384637846-5343-2-git-send-email-jnsptrsn1@gmail.com
State Superseded
Headers show

Not browsing as part of any series.

Commit Message

Jonas Petersen Nov. 16, 2013, 9:37 p.m.
By design the Xlib 32-bit internal request sequence numbers may wrap. There
is two locations within xcb_io.c that are not wrap-safe. The value of
last_flushed relies on the request to be sequential all the time. This is
not given when the sequence has just wrapped. Applications may then crash
with a "Fatal IO error 11 (Resource temporarily unavailable)".

This patch fixes this by unwrapping the sequence number when needed to retain
the sequence relative to last_flushed.

Signed-off-by: Jonas Petersen <jnsptrsn1@gmail.com>
---
 src/xcb_io.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Patch hide | download patch | download mbox

diff --git a/src/xcb_io.c b/src/xcb_io.c
index 727c6c7..f2978d0 100644
--- a/src/xcb_io.c
+++ b/src/xcb_io.c
@@ -455,7 +455,7 @@  void _XSend(Display *dpy, const char *data, long size)
 	static const xReq dummy_request;
 	static char const pad[3];
 	struct iovec vec[3];
-	uint64_t requests;
+	uint64_t requests, unwrapped_request;
 	_XExtension *ext;
 	xcb_connection_t *c = dpy->xcb->connection;
 	if(dpy->flags & XlibDisplayIOError)
@@ -464,6 +464,10 @@  void _XSend(Display *dpy, const char *data, long size)
 	if(dpy->bufptr == dpy->buffer && !size)
 		return;
 
+	/* Set bit 8 of 'request' when a 32-bit wrap has just happened
+	 * so the sequence stays correct relatively to 'last_flushed'. */
+	unwrapped_request = ((uint64_t)(dpy->request < dpy->xcb->last_flushed) << 32) + dpy->request;
+
 	/* iff we asked XCB to set aside errors, we must pick those up
 	 * eventually. iff there are async handlers, we may have just
 	 * issued requests that will generate replies. in either case,
@@ -471,10 +475,10 @@  void _XSend(Display *dpy, const char *data, long size)
 	if(dpy->xcb->event_owner != XlibOwnsEventQueue || dpy->async_handlers)
 	{
 		uint64_t sequence;
-		for(sequence = dpy->xcb->last_flushed + 1; sequence <= dpy->request; ++sequence)
+		for(sequence = dpy->xcb->last_flushed + 1; sequence <= unwrapped_request; ++sequence)
 			append_pending_request(dpy, sequence);
 	}
-	requests = dpy->request - dpy->xcb->last_flushed;
+	requests = unwrapped_request - dpy->xcb->last_flushed;
 	dpy->xcb->last_flushed = dpy->request;
 
 	vec[0].iov_base = dpy->buffer;

Comments

On 16.11.2013 22:37, Jonas Petersen wrote:
> By design the Xlib 32-bit internal request sequence numbers may wrap. There
> is two locations within xcb_io.c that are not wrap-safe. The value of
> last_flushed relies on the request to be sequential all the time. This is
> not given when the sequence has just wrapped. Applications may then crash
> with a "Fatal IO error 11 (Resource temporarily unavailable)".
> 
> This patch fixes this by unwrapping the sequence number when needed to retain
> the sequence relative to last_flushed.
> 
> Signed-off-by: Jonas Petersen <jnsptrsn1@gmail.com>
> ---
>  src/xcb_io.c |   10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/src/xcb_io.c b/src/xcb_io.c
> index 727c6c7..f2978d0 100644
> --- a/src/xcb_io.c
> +++ b/src/xcb_io.c
> @@ -455,7 +455,7 @@ void _XSend(Display *dpy, const char *data, long size)
>  	static const xReq dummy_request;
>  	static char const pad[3];
>  	struct iovec vec[3];
> -	uint64_t requests;
> +	uint64_t requests, unwrapped_request;
>  	_XExtension *ext;
>  	xcb_connection_t *c = dpy->xcb->connection;
>  	if(dpy->flags & XlibDisplayIOError)
> @@ -464,6 +464,10 @@ void _XSend(Display *dpy, const char *data, long size)
>  	if(dpy->bufptr == dpy->buffer && !size)
>  		return;
>  
> +	/* Set bit 8 of 'request' when a 32-bit wrap has just happened
> +	 * so the sequence stays correct relatively to 'last_flushed'. */

"1 << 32" does not set bit 8 and this doesn't set anything in request, really.

> +	unwrapped_request = ((uint64_t)(dpy->request < dpy->xcb->last_flushed) << 32) + dpy->request;

Also, I don't think that this code is intuitive this way.

I would propose this instead:

  unwrapped_request = dpy->request;
  /* If there was a sequence number wrap since our last flush, make sure we
   * use the correct 64 bit sequence number */
  if (sizeof(uint64_t) > sizeof(unsigned long)
         && dpy->request < dpy->xcb_last_flushed)
     unwrapped_request += UINT64_C(1) << 32;

(I am not sure/convinced if the sizeof comparision is necessary, but I saw
something like this in require_socket() and then thought that this might be
necessary on systems where unsigned long already is a 64 bit type.)

>  	/* iff we asked XCB to set aside errors, we must pick those up
>  	 * eventually. iff there are async handlers, we may have just
>  	 * issued requests that will generate replies. in either case,
> @@ -471,10 +475,10 @@ void _XSend(Display *dpy, const char *data, long size)
>  	if(dpy->xcb->event_owner != XlibOwnsEventQueue || dpy->async_handlers)
>  	{
>  		uint64_t sequence;
> -		for(sequence = dpy->xcb->last_flushed + 1; sequence <= dpy->request; ++sequence)
> +		for(sequence = dpy->xcb->last_flushed + 1; sequence <= unwrapped_request; ++sequence)
>  			append_pending_request(dpy, sequence);
>  	}
> -	requests = dpy->request - dpy->xcb->last_flushed;
> +	requests = unwrapped_request - dpy->xcb->last_flushed;
>  	dpy->xcb->last_flushed = dpy->request;
>  
>  	vec[0].iov_base = dpy->buffer;
>
Am 17.11.2013 11:58, schrieb Uli Schlachter:
> On 16.11.2013 22:37, Jonas Petersen wrote:
>> +	/* Set bit 8 of 'request' when a 32-bit wrap has just happened
>> +	 * so the sequence stays correct relatively to 'last_flushed'. */
> "1 << 32" does not set bit 8 and this doesn't set anything in request, really.

You're right, sorry, "bit 8" is a typo, it's bit 32 of course. Other 
than that it will indeed set something. Not exactly "in request", but it 
will end up in unwrapped_request.

>
>> +	unwrapped_request = ((uint64_t)(dpy->request < dpy->xcb->last_flushed) << 32) + dpy->request;
> Also, I don't think that this code is intuitive this way.

I agree somewhat on that. I thought efficiency would have precedence 
here. I was actually inspired by static void widen() in xcb_io.c where 
it reads:

   *wide = new + ((unsigned long) (new < *wide) << 16 << 16);

The comment says "Treating the comparison as a 1 and shifting it avoids 
a conditional branch".


> I would propose this instead:
>
>    unwrapped_request = dpy->request;
>    /* If there was a sequence number wrap since our last flush, make sure we
>     * use the correct 64 bit sequence number */
>    if (sizeof(uint64_t) > sizeof(unsigned long)
>           && dpy->request < dpy->xcb_last_flushed)
>       unwrapped_request += UINT64_C(1) << 32;
>
> (I am not sure/convinced if the sizeof comparision is necessary, but I saw
> something like this in require_socket() and then thought that this might be
> necessary on systems where unsigned long already is a 64 bit type.)

I admit I haven't tought it all through on other system configurations. 
So thanks for giving your input.
I guess the sizeof comparison would not be necessary since the condition 
should never meet with 64-bit longs. And if it does, something is wrong 
anyway (from my understanding). After all this is the trigger of the bug.

Btw. as you might have guessed, on my system unsigned longs are 32-bit.

- Jonas
>    *wide = new + ((unsigned long) (new < *wide) << 16 << 16);

> The comment says "Treating the comparison as a 1 and shifting it
> avoids a conditional branch".

Only on architectures with conditional moves - and, on those, the
version using ? : is likely to compile down to a conditional move
anyway.  I think that comment should be fixed.

> I guess the sizeof comparison would not be necessary since the
> condition should never meet with 64-bit longs.

Unless it's in a code fragment that's used only on machines with
<64-bit longs, it will; X runs on systems with 64-bit longs.

> And if it does, something is wrong anyway (from my understanding).
> After all this is the trigger of the bug.

Does anyone know whether the bug triggers on systems with 64-bit longs?

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
Am 17.11.2013 20:20, schrieb Mouse:
>> I guess the sizeof comparison would not be necessary since the
>> condition should never meet with 64-bit longs.
> Unless it's in a code fragment that's used only on machines with
> <64-bit longs, it will; X runs on systems with 64-bit longs.
I meant the condition "dpy->request < dpy->xcb_last_flushed" should not 
meet on systems with 64-bit longs (at least not until the 64-bit wrap). 
So the "sizeof(uint64_t) > sizeof(unsigned long)" would not be 
necessary. It would just increase the overhead on <64-bit systems.

>
>> And if it does, something is wrong anyway (from my understanding).
>> After all this is the trigger of the bug.
> Does anyone know whether the bug triggers on systems with 64-bit longs?

I would say no, because there is no 32-bit to wrap. But who knows, I 
have not tried it.

- Jonas
Am 17.11.2013 21:30, schrieb Jonas Petersen:
> Am 17.11.2013 20:20, schrieb Mouse:
>>> And if it does, something is wrong anyway (from my understanding).
>>> After all this is the trigger of the bug.
>> Does anyone know whether the bug triggers on systems with 64-bit longs?
>
> I would say no, because there is no 32-bit to wrap. But who knows, I 
> have not tried it.

I have just run a test on a blank 64-bit Ubuntu 13.10. It shows 
sizeof(unsigned long) == 8. The bug does *not* trigger.

- Jonas
Am 17.11.2013 20:20, schrieb Mouse:
>>     *wide = new + ((unsigned long) (new < *wide) << 16 << 16);
>> The comment says "Treating the comparison as a 1 and shifting it
>> avoids a conditional branch".
> Only on architectures with conditional moves - and, on those, the
> version using ? : is likely to compile down to a conditional move
> anyway.  I think that comment should be fixed.
What do you mean exactly? I thought the comment was saying that using 
"(x < y) << z" is avoiding a conditinal branch introduced by using an 
"if" or "? :" in some way to get the calculation done. Isn't the comment 
true then?

- Jonas
>>>     *wide = new + ((unsigned long) (new < *wide) << 16 << 16);
>>> The comment says "Treating the comparison as a 1 and shifting it
>>> avoids a conditional branch".
>> Only on architectures with conditional moves - and, on those, the
>> version using ? : is likely to compile down to a conditional move
>> anyway.  I think that comment should be fixed.
> What do you mean exactly?

Computing the numeric value of (a < b) - as opposed to using it in a
control-flow construct - is normally implemented much the same way that
((a < b) ? 1 : 0) is.  By definition of the < operator, they are
semantically identical anyway.

> I thought the comment was saying that using "(x < y) << z" is
> avoiding a conditinal branch introduced by using an "if" or "? :" in
> some way to get the calculation done.

That's what I think the comment is saying, yes.  I also think it's
false - or, at least, it's false more often than it's true, and in a
way that's unpredictable from the point of view of anyone who actually
needs the comment.

On architectures without conditional moves, it's difficult[%] to
compute the numerical value of (x < y) without a conditional branch.
And on architectures with conditional moves, ((x < y) ? 1 : 0) will
likely use a conditional move rather than a branch anyway.

> Isn't the comment true then?

It depends on the compiler, the target CPU architecture, and possibly
other things (like the optimization tradeoffs the compiler has been
told to use).  But if the optimizer is even half-decent, (x < y) << z
and ((x < y) ? 1 : 0) << z will compile to exactly the same
instructions.

I think it is likely enough that it's false that leaving the comment
there with that wording is a bad idea; I also see it as a instance of
pushing a very very machine-dependent tradeoff up to the C level,
something that is better left to the optimizer, or at the very least
buried in a machine-dependent preprocessor macro.

[%] Not impossible; it can be done with arithemtic and logical
    operations.  I'd render the core of the idea in 32-bit C as
    ((x-y)>>31)&1 - that's got issues with numbers large enough that
    their high few bits aren't all the same, but they can be fixed with
    a little more code.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
On Sun, Nov 17, 2013 at 21:30:44 +0100, Jonas Petersen wrote:

> Am 17.11.2013 20:20, schrieb Mouse:
> >>I guess the sizeof comparison would not be necessary since the
> >>condition should never meet with 64-bit longs.
> >Unless it's in a code fragment that's used only on machines with
> ><64-bit longs, it will; X runs on systems with 64-bit longs.
> I meant the condition "dpy->request < dpy->xcb_last_flushed" should
> not meet on systems with 64-bit longs (at least not until the 64-bit
> wrap). So the "sizeof(uint64_t) > sizeof(unsigned long)" would not
> be necessary. It would just increase the overhead on <64-bit
> systems.
> 
The sizeof comparison is a compile-time constant so I'd expect it to be
optimized out by the compiler anyway, so no overhead.

Cheers,
Julien
Am 18.11.2013 19:16, schrieb Julien Cristau:
> On Sun, Nov 17, 2013 at 21:30:44 +0100, Jonas Petersen wrote:
>
>> Am 17.11.2013 20:20, schrieb Mouse:
>>>> I guess the sizeof comparison would not be necessary since the
>>>> condition should never meet with 64-bit longs.
>>> Unless it's in a code fragment that's used only on machines with
>>> <64-bit longs, it will; X runs on systems with 64-bit longs.
>> I meant the condition "dpy->request < dpy->xcb_last_flushed" should
>> not meet on systems with 64-bit longs (at least not until the 64-bit
>> wrap). So the "sizeof(uint64_t) > sizeof(unsigned long)" would not
>> be necessary. It would just increase the overhead on <64-bit
>> systems.
>>
> The sizeof comparison is a compile-time constant so I'd expect it to be
> optimized out by the compiler anyway, so no overhead.

This makes sense. Being curious, I did some tests (with gcc 4.7.2 on 
i686). They showed that this code:

     int i = 0;
     for (;;) {
         ++i;
         if (sizeof(unsigned long) > sizeof(unsigned int) && i < 100) {
             printf("%d\n", i);
         }
     }


on a 32-bit system compiles into the same binary as this code:

     for (;;) {
     }

and on a 64-bit system it compiles into the same binary as this code:

     int i = 0;
     for (;;) {
         ++i;
         if (i < 100) {
             printf("%d\n", i);
         }
     }

All that without any optimization options.

So adding the size comparison seems to not only not increasing overhead, 
but even effectively optimizing for the targets. At least with this 
compiler on these architectures.

- Jonas
Am 18.11.2013 00:33, schrieb Mouse:
>>>>      *wide = new + ((unsigned long) (new < *wide) << 16 << 16);
>>>> The comment says "Treating the comparison as a 1 and shifting it
>>>> avoids a conditional branch".
>>> Only on architectures with conditional moves - and, on those, the
>>> version using ? : is likely to compile down to a conditional move
>>> anyway.  I think that comment should be fixed.
>> What do you mean exactly?
> Computing the numeric value of (a < b) - as opposed to using it in a
> control-flow construct - is normally implemented much the same way that
> ((a < b) ? 1 : 0) is.  By definition of the < operator, they are
> semantically identical anyway.
>
>> I thought the comment was saying that using "(x < y) << z" is
>> avoiding a conditinal branch introduced by using an "if" or "? :" in
>> some way to get the calculation done.
> That's what I think the comment is saying, yes.  I also think it's
> false - or, at least, it's false more often than it's true, and in a
> way that's unpredictable from the point of view of anyone who actually
> needs the comment.
>
> On architectures without conditional moves, it's difficult[%] to
> compute the numerical value of (x < y) without a conditional branch.
> And on architectures with conditional moves, ((x < y) ? 1 : 0) will
> likely use a conditional move rather than a branch anyway.
>
>> Isn't the comment true then?
> It depends on the compiler, the target CPU architecture, and possibly
> other things (like the optimization tradeoffs the compiler has been
> told to use).  But if the optimizer is even half-decent, (x < y) << z
> and ((x < y) ? 1 : 0) << z will compile to exactly the same
> instructions.
>
> I think it is likely enough that it's false that leaving the comment
> there with that wording is a bad idea; I also see it as a instance of
> pushing a very very machine-dependent tradeoff up to the C level,
> something that is better left to the optimizer, or at the very least
> buried in a machine-dependent preprocessor macro.
>
> [%] Not impossible; it can be done with arithemtic and logical
>      operations.  I'd render the core of the idea in 32-bit C as
>      ((x-y)>>31)&1 - that's got issues with numbers large enough that
>      their high few bits aren't all the same, but they can be fixed with
>      a little more code.

Thanks for your explanation. I see what you mean.

I guess it was just from a pure (C) code level point of view then, 
considering "if" or "?:" as "conditional branch" as is, without 
regarding what's happening at the lower levels when compiling. And 
possibly not regarding the meaning of "conditional branch" in other 
contexts.

So I guess I will revoke my inspiration then and go for Ulis proposal 
(using and if including the size comparison).

- Jonas