A continuation, perhaps culmination, of a series that includes rough idling firefox vs rthreads , and browser ktrace browsing

I really like using ktrace to inspect programs. It’s somewhat primitive, to be sure, and unlike source review, it can be difficult to understand the programmer’s intentions. However, my CPU doesn’t execute intentions; it executes instructions.



poll

Browser source code can be nearly impenetrable due to size, but some other programs share similar behaviors. Perhaps they will prove a tractable target for further investigation. xterm contains some code that looks about like this. (Despite the “Abandon All Hope” warning at the entrance to this ifdef maze, I still found it quite navigable.)

void do_xevents ( void ) { if ( xtermAppPending () || GetBytesAvailable ( ConnectionNumber ( screen -> display )) > 0 ) xevents () ; } void xevents ( void ) { while (( input_mask = xtermAppPending ()) != 0 ) { /* ... */ } }

xtermAppPending boils down to poll, and GetBytesAvailable boils down to ioctl(FIONREAD) which is the moral equivalent to poll. Using a sophisticated optimization technique known as hand inlining, we can rewrite the above code as follows.

void xevents ( void ) { if ( poll () || poll ()) while ( poll ()) { /* ... */ } }

Somewhere in chrome I suspect there’s a similar loop, only it probably looks more like this.

void chrome ( void ) { if ( poll () && ( poll () || poll ())) while ( poll ()) switch ( poll ()) { case poll (): poll () ; } }

I think it’s safe to say this design is not intentional, but the many layers of abstraction prevent us from seeing this at a source level.

In libxcb, we find the following code used to read from a connection.

int _xcb_in_read ( xcb_connection_t * c ) { n = recvmsg ( c -> fd , &msg , 0 ) ; /* ... */ if (( n > 0 ) || ( n < 0 && errno == EAGAIN )) return 1 ; return 0 ; }

A successful read and an unavailable read return the same status, without distinction. At a slightly higher level we have some code which appears to assume this indicates success.

static xcb_generic_event_t * poll_for_next_event ( xcb_connection_t * c , int queued ) { ret = get_event ( c ) ; if (! ret && ! queued && c -> in . reading == 0 && _xcb_in_read ( c )) ret = get_event ( c ) ; }

If we failed to get an event from our connection buffer, we read from the socket. If that succeeds, we try to process an event again. But what if we read nothing? No difference, we still try getting an event from that nothing.

I write poll, referring both to the specific function, but without loss of generality also mean select, epoll, kevent, and various other techniques.



rebound

Living in a glass house, I thought it would be beneficial to apply the same treatment to some programs I’ve written recently.

In rebound, the DNS proxy I wrote, I initially started with blocking sockets. Then in a fit of overzealous optimization, I switched them to nonblocking. Yesterday I switched it back.

My first rationale doesn’t hold up under inspection. rebound is pretty quick about processing packets. You need to be really flooding it in order to get two packets into the socket before rebound can handle the first. So the net result was that this optimization really resulted in an extraneous recvfrom call per request, which returned EAGAIN . What was I thinking?



nonblocking

How do so many programs find themselves in this situation? I have two related theories, one concerning excessive polling and one concerning excessive EAGAIN errors from functions like recvfrom.

First, as noted above, we have many layers of abstraction. These interfaces are not always designed in such a way to make information sharing possible. We call poll once, and learn something about the state of the outside world, but we are not in a position to reliably update the internal state of our program. Our knowledge is used locally to make a decision and then discarded, requiring all other code to ask again. And again.

This problem is easily detected. If poll (or kevent or select or similar) is called with a zero timeout, it means we are likely failing to communicate and coordinate. Some previous poll that did have a timeout already knows the answer.

Second, as programs grow larger, their state machines grow more complex. Mistakes are made, and recv is called on a blocking socket. The program stalls and users complain. However, instead of a proper fix, the developer changes the socket to nonblocking. Now recv will return a nice error code which we can ignore. But this does not fix the underlying cause, that the program’s internal state machine has become unsynced with reality. Instead we revert to bumping and feeling our way along.

In other words, EAGAIN does not mean try again soon. EAGAIN means try again when you get it right. (Conveniently, the most important use for nonblocking sockets, connect, returns EINPROGRESS so we don’t need to make any exceptions.)

The combination of these two antipatterns results in fragile software. Once a quick fix is made a few places, the program is harder to revise as more and more of the design comes to depend on these little peeks. A process that occasionally spins waiting for input is hard to detect. Maybe somebody notices the fans are spinning, but possibly not if it’s only until a delayed packet finally arrives.

Imagine for a moment how programs would be different if all polls had timeouts and all sockets were blocking. For a little while, there’d be some unpleasant stalls. But these would not be insurmountable problems. With a little concentration, it’s possible to rearchitect the program with a much more robust design that neither loses events nor requires speculative guesses.

The fact that tcpbench still uses blocking sockets is what allowed us to notice a bug I introduced in libevent. A nonblocking socket could have skipped past accept easily enough, but the mistake would still be there.



mail

Although it’s not directly related to nonblocking sockets, as an appendix of sorts here’s a few notes on what happened after I applied the ktrace treatment to a terminal mail client I use. It’s written in luajit for convenience, but uses some fairly low level bindings for terminal access. If you’ve seen one terminal mail program, you’ve seen them all, but here’s a partial screenshot.

Jonathan Gray ------ CVS: cvs.openbsd.org: src -------- today Ted Unangst CVS: cvs.openbsd.org: src today

Here’s a trace of me pressing the down arrow, shortly after a big write to paint the screen.

26745/1026745 luajit51 0.104709 CALL write(1,0xdb9e9290000,0x184c) 26745/1026745 luajit51 0.104916 RET write 6220/0x184c 26745/1026745 luajit51 0.104936 CALL kevent(4,0,0,0x10066000,1,0) 26745/1026745 luajit51 5.961010 STRU struct kevent { ident=0, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0<>, data=3, udata=0x0 } 26745/1026745 luajit51 5.961032 RET kevent 1 26745/1026745 luajit51 5.961078 CALL read(0,0x1002e898,0x1) 26745/1026745 luajit51 5.961093 RET read 1 26745/1026745 luajit51 5.961109 CALL kevent(4,0,0,0x10066000,1,0x10066098) 26745/1026745 luajit51 5.961117 STRU struct timespec { 0 } 26745/1026745 luajit51 5.961120 STRU struct kevent { ident=0, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0<>, data=2, udata=0x0 } 26745/1026745 luajit51 5.961123 RET kevent 1 26745/1026745 luajit51 5.961131 CALL read(0,0x1002e898,0x1) 26745/1026745 luajit51 5.961139 RET read 1 26745/1026745 luajit51 5.961144 CALL read(0,0x1002e898,0x1) 26745/1026745 luajit51 5.961150 RET read 1 26745/1026745 luajit51 5.961243 CALL gettimeofday(0x7f7ffffc8b00,0) 26745/1026745 luajit51 5.961250 STRU struct timeval { 1464973226<"Jun 3 13:00:26 2016">.599098 } 26745/1026745 luajit51 5.961253 RET gettimeofday 0 26745/1026745 luajit51 5.961335 CALL gettimeofday(0x7f7ffffc8b00,0) 26745/1026745 luajit51 5.961340 STRU struct timeval { 1464973226<"Jun 3 13:00:26 2016">.599189 } 26745/1026745 luajit51 5.961343 RET gettimeofday 0 26745/1026745 luajit51 5.961364 CALL write(1,0xdb9e9290000,0xee) 26745/1026745 luajit51 5.961383 RET write 238/0xee 26745/1026745 luajit51 5.961394 CALL kevent(4,0,0,0x10066000,1,0) 26745/1026745 luajit51 9.298559 STRU struct kevent { ident=0, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0<>, data=3, udata=0x0 } 26745/1026745 luajit51 9.298578 RET kevent 1

First, the good news. In the five seconds between starting up and waiting for input, we didn’t do anything wasteful. No spinning. But I see two problems here. We call kevent again after reading one byte and there’s two of those damnable gettimeofday calls in the middle of the operation.

The second kevent call was obvious from the source. Special keys like arrows, home, end, page down are multibyte sequences that start with byte 27, otherwise indistinguishable from escape. In order to tell if the user pressed down arrow, we need to read ahead, but we don’t want to block if the user pressed escape. So I polled to check. However, this failed to make use of the information provided by kevent. The returned struct indicates how much data is available for reading. As we can see, it’s initially 3 bytes, then 2 bytes after consuming one. If there’s three bytes, then I know it’s safe to go ahead and read again, so I changed the program to do that. I could see this bug in the source, though it was somewhat obscured by an if, and reviewing the ktrace made it obvious that yes, this code does execute.

The gettimeofday calls were less obvious. When printing timestamps in list mode, I shorten them so today’s mail gets listed as “today”. This requires checking the current time. Redraw two emails, recheck the current time twice. There’s really no need to check the current time so frequently however, so I made a few adjustments to provide the pretty printer with a timestamp for comparison. Even though I had written all of the code in question, it was not at all apparent that calling updateline(), which called msg.date:short(), would eventually call gettimeofday. It just looked like a function that returned a string. (An interesting case for a purity enforcing type system.)

I thought about reducing the series of read calls via buffering, but this is an API for realtime interaction. Most key presses will only be single bytes. But maybe another day. As a final note, the write at the end is 238 bytes. My terminal is 106 columns wide, so that’s about exactly what I’d expect to redraw two lines (moving the selection highlight down a line). At least I know I’m not accidentally redrawing the whole screen for small updates.