I spend most of my time digging through software-in-execution rather than software-at-rest (e.g. source code). Sometimes the subject of study is malware hissing like a snake and lashing out at the barriers of a virtual machine; sometimes it is terrible software deserving of an exploit being written; sometimes it is a driver on a far-away device that flips bits when a clock got skewed or circuits gets too hot — most of the time it is something between these extremes.

Many years ago, a friend and I distilled some thoughts on the matter into the free-as-in-beer ‘systemic software debugging‘. Maybe one day I have enough of both structure and incentive to revisit the topic in a less confined setting. Until such a time, it might be useful to document some experiments and experiences along the way, which brings us to the topic of this post: ways of using the display server infrastructure and its position in the user-space stack, to reason about- and bootstrap- debugging.

While that may come off as a bit strange, first recall that the”Display server” is a misnomer (hence the “”): the tradition is that it, of course, serve much more than the display. In the majority of cases you would also find that they ‘serve’ user-initiated blob transfers (‘clipboard’ and ‘drag&drop’) as well as a range of input devices (keyboard, mice, …). In that sense a terminal emulator, web browser and virtual machine monitor rightly fall into this category. What I try to refer to is the IPC system or amalgamation of IPC systems that stitch user-space together into the thing you interact with.

The article is structured around three groups of problems and respective solutions:

Activation and Output

Chain of Trust

Process and Identity

Each explaining parts of what goes on in the following video:

The central theme is that the code that comes with the IPC system (xlib, xcb, arcan-shmif, …) can be used to construct debugging tools from inside the targeted process, and piggyback on the same IPC system for activation, inputs and information gathering/output.

The gains is that you get a user friendly, zero-cost until activation, high-bandwidth, variable number of channels to collect process information, that can cooperate with the client and let it provide its higher level view of its debug state, while at the same time add custom primitives with few to none additional allocations post-instrumentation.

Context

To add some demarcation, the more interesting target here is live inspection of mature software as part of a grander system – far away from the comforts of a debug build with click-to-set breakpoints launched from the safety of an integrated development environment. The culprit is not obvious and the issue might not be reliably repeatable.

Your goals might not be to fix the issue, but to gather evidence and communicate to someone that can.

Also bear in mind that this is not directly about “the debugger” in the sense of the nuclear powered Suisse army knife that is also known a ‘symbolic debugger’ tool such as ‘gdb’ and ‘lldb’, but rather the whole gamut of tooling needed to understand what role a piece of software fulfils and how it is currently functioning within that role.

Think how the ‘intervention’ friendly version of this intimidating chart from Brendan Gregg’s post on Linux Performance would look like for the ‘applications block’ and you get closer to the idea:

Activation and Output

This first group of problems covers software that wants to cooperate by adding features now that may be useful for debugging later. Some refer to this as ‘pre-bugging’.

Consider the notion that you are a responsible, pro-active developer. You understand the need for others to inspect what your application or library is doing, and that there are things in the execution environment you simply cannot account for while remaining sane and getting things done. You want to make it easier for the next one in line, and get higher quality feedback about what went wrong out there in the field.

What are your normal practical options?

Command-line argument Environment Variable Specific User Interface toggle

These are all problematic, though in somewhat different ways.

With the first two options you have the problem of communicating that the feature is available (how will the user discover it, how will you remember it is there) – README.md, man pages, FAQ/Wiki, ancients words of wisdom spray painted on a live chicken, and so on – something need to announce that the option is there.

Options 1 and 2 are also quite static; they get set and evaluated when the program is started, and if you want to activate debug output dynamically, well, tough luck. Your problem then needs to be reproducible both with- and without- debug output enabled.

The actual output also comes with noticeable system impact – sending strings to STDOUT, STDERR may break (introduce new bugs) other processing steps the user might have, common in the traditional UNIX pipes and filters structure. They are also not necessarily ‘dumb pipes’, isatty() is very much a thing (as is baud rate), as is threading. The combination of these properties makes for an awful communication channel to try and multiplex debug output on.

The other option, writing to a log device or file can clog up storage, wear down flash storage, and inadvertently leave sensitive user information around. Formatting the output itself is also a thing to consider, even ‘simple’ printf has some with serious gotcha’s (read up on locale-defined behaviour) and information that is better presented as changes over time than a stream of samples will need other tools to be built for post-processing.

Option 3 involve quite a lot more of work to implement, as the feature should mesh with other user interface properties. When properly done however, it can add quite a lot of power to the software – look no further than the set of debugging tools built into Chrome or Firefox. Of course, it helps that these are also development environment in and of itself to incentive the cost and effort. While often a better option, it still composes poorly with other non-cooperative information collection.

Post-mortem (crash-dump) is a slightly different story and one that calls for a much longer discussion. This is out of scope for this article, though the primitives that emerge will work as both a transport for core dumps and as a way of annotating them, but is a decent follow-up topic.

Enough with the problem space description, where does the display server fit in?

In the most trivial of window systems, a client connects somehow, requests/allocates some kind of storage container (window), draws into it either directly as characters or pixels, or indirectly through drawing commands to the server itself or through an intermediary (like the GPU). In return, the client gets input events, clipboard actions and so on back over the same or related channel. This holds true even for a terminal emulator.

These windows may come with some kind of type model attached; indirectly through the combination of properties set (parent window, draw order) and directly through some other type tag (X11, for instance, has a long list of popups, tooltips, menus, dnd, and so on), and arcan has a really (too much actually) long one.

Step 1 – Add a debug type

This allows other client agnostic tools to enumerate windows of this type, compose them together and record/stream/share. Controls for activation are there, as well as a data channel that is high bandwidth, capable of cleanly multiplex multiple data types.

Step 2 – Two-sided allocation

Now for the more complicated step. Recall the problem of saying when debugging features are needed (command-line or environment). The ideal approach would be initiated at user request during run time, with zero cost when not in use.

This is more complicated to achieve as it taps into the capabilities of the IPC system and its API design. A simplified version would be possible in the context of X11 as a notification message about a debug window being requested, then let the client allocate a window with the right type. This still leaves us with some of the drawbacks of the third option, namely what to do with uncooperative clients, which will be covered in the next section.

Now the first part of the video is explainable – The heart-emoji button in the title-bar simply sends a request to the client that a debug window is wanted. The client in question (the terminal emulator that comes with Arcan) responds by creating a new window, then renders relevant parts of the emulator state machine.

Chain of Trust

After the trace- like outputs of getting debug data out of a client, the more complicated matter comes with proper debug interfaces that also provides process control, resource modification and so on; interfaces that do not rely on the client actually cooperating.

Assuming we have a process identifier to the client in question (also a problem, see the last category, identity, for that). Lets try and attach a debugger. The normal way for that is firing up an IDE or terminal, with something to the effect of:

# gdb -p SWEETPIDOFMINE ptrace

Only to be promptly met with something like:

Operation not permitted

Low level debugging interfaces tend to be ‘problematic’ in every conceivable way. If you are not convinced, read the horror show of a manpage that belongs to ptrace if you have not done so already; Search around a little bit and count the number of vulnerabilities it has had a leading role in. Study the role of J-Tag in embedded system exploitation. My point is that these things are not designed as much as they spring in to life as a intermittent necessity that evolves into a permanent liability. Originally, the /proc (PDF, Processes as Files) filesystem was proposed as a better design. One can safely say that it did not turn out that way.

So what is at fault in the example above?

To lessen the damage, and to make malware authors work for it a little bit, Linux-land has YAMA (and other, equally intimidating mechanisms) which imposes certain rules on who gets to attach as a tracer depending on what is to be traced and when. You can turn it off universally – a universally bad idea – by poking around in procfs.

You can also use the ‘I don’t have time for this’ workaround of prefixing ‘gdb -p’ with the battering ram that is sudo. My regular stance on anything sudo, suid, polkit etc. is that it is an indicator of poor design somewhere or everywhere. Friend’s don’t let friends sudo. From power-on to regular use, there should not ever be a need or even an implementation for converting a lower privilege context to a higher one. Any privilege manipulation a client should have access to is reducing towards the least amount of privileges. You should, of course, have means to place yourself where you (or your organization) like in the chain of trust and have the controls to reduce from there – but I digress.

The problem with the YAMA design from the ptrace perspective is that you are practically left with no other choice. Given that your lower privilege client (gdb) now gets escalated to a much higher one, then attached to parse super complex data from a process that you by the very action indicate that you don’t comprehend or trust is a fantastically poor idea.

So what to do about the situation? Well there are other rules to modify ptrace relationships. Normally, a parent is allowed to trace its child – and that is how gdb attaches to begin with, but that does not work in the attach case.

Enter the subtleties of prctl, and now we come into the Fantastic (debug) Voyage of Isaac Asimov-fame part of the adventure.

From the last section we got the mechanisms for negotiating a debug control channel initiated from the display server. Now the extension is a bit more complicated, and this is one place where arcan-shmif has the upper hand. Instead of just sending an event saying ‘could you please create a window of type X’, we have the ability to force a window upon a client.

The shorthand form of what happened in the demo was roughly (pseudo-Lua):

on_click:

local buf = alloc_buffer(width, height)

target_alloc(client_handle, buf, event_handler, "debug")

This actually creates a full IPC channel, and sends it to the client over the existing one, with type already locked to DEBUG.

This is a server-side initiated signal of user-intent. The same mechanism is used to request a screen-reader friendly view for accessibility options, and it is used for initiating screenshots, video streams, … (the later angel is covered in the article on interfacing with a stream deck device).

The client gets it as part of its normal event loop (see also: the handover part in the trayicon article). Pseudo-code wise, the important bit is:

case TARGET_COMMAND_NEWSEGMENT:

if segment_type == "debug" or does_not_match(request):

debug_wnd = arcan_shmif_acquire(...)

But what happens if the client does not actually map it?

Well, if you look at this part of the video, the second window is populated from within the client process, but without cooperation from the client.

The IPC library code knows if the client mapped an incoming window or not. If it didn’t, the library takes it upon itself to provide an implementation. The current one is a bit bare, but there is a lot of fun potential in this one – pending a much needed primitive for ‘suspend all threads except me’. This is quite close to a ‘grey area set’ of techniques known as process parasites.

Now, we can use this to spin off a debugger, though it is neither clean nor easy.

Prctl has a property called ‘PR_SET_PTRACER’. This allows a process to specify another process that will be allowed to attach to it via ptrace. The naive approach here would be to fork out, set the tracer to the pid returned from fork(). It also would not work, for multiple reasons.

One is that gdb then lacks a way to draw, and distinguishes from stdin/stdout being TTYs or not. Luckily enough we have a terminal emulator that can inherit an arcan-shmif debug window and use as its display.

So the hidden debug interface uses the debug window to request another debug window that gets inherited into the terminal and used to set up the drawing and emulation for gdb to work.

The experienced POSIX connoisseur will see the chicken and egg problem here; the process to be debugged needs to wait until it knows the PID of the debugger process, in order to set the permitted tracer, and the debugger needs the PID of the target process in order to attach.

So the current solution is:

Create two pipe pairs [parent to debugger, debuffer to parent] and inherit the appropriate ends. Have the terminal emulator stop before exec()ing the child process, and write back the ‘to become debugger’ PID back over the pipe. Block on read. (In parent/debug target) receive the PID, set prtctl, write the own process PID back. The child process received the trace-target PID, adds it to the arguments and continues exec() into the debugger. Profit.

The clip below shows it in action:

If the tool to launch this way does not come from a controlled context because it, in turn, spawns new processes where some subprocess needs to perform the attach action it becomes even more masochistic. In that case, one would need to repeat this dance by ptracing the chain of children until a child attempts to call ptrace, and then perform the setup.

Now we have communication channels and bootstrapping ptrace-tools, retaining chain of trust and negotiated over the display server, initiated by the user. The tools have access to APIs that can break free of the terminal emulator shackles. Good building blocks for more advanced instruments.

Process & Identity

Procfs can be used to explore how the operating system perceives a specific process, which resources are currently allocated and so on. In order to do that, some identifier is needed. So the trick question to start this off, is how do you figure out the process identifier of a process tied to an X window?

A quick search around and we get this (stack overflow suggestion):

xprop _NET_WM_PID | cut -d' ' -f3

The problem is just that this does not really work. The atom, _NET_WM_PID is based on the client being nice enough to provide it, and nice enough to provide the real one and not just some random nonsense, like the pid of the X server itself – fun basic anti-debugging.

Even if the process ID is retrieved is truthful and correct, it might not be the case when you start reading from its proc entries – it is inherently race:y.

In modern times, this problem does not get easier when we take containers and other para-virtualization techniques into account where the view of the outer system and its resources from the perspective of a process is vastly different from other processes.

On the other hand, if the code collecting the data runs from within the target process itself, the /proc/self directory can be relied on. There are a number of really interesting venues to pursue; look at the debug-if implementation in the code-base for some of those ideas, or ping me if you are interested in chatting about these things.

For the remainder of this article, we will settle in the bits implemented thus far, which brings us to the last part of this video.

The bits showcased here is that we can open and modify environment variables, as well as enumerate the currently open file descriptors – and even make a copy if their contents from the current state. The HUD menu that appears in the outer WM is server-side file-picking, and the hex-editor that appears is one of the standard arcan-tui widgets, bufferwnd.

What is not showcased is that we can spawn out a shell from the context of this process for normal interactive exploration, and redirect/intercept/man-in-the middle intercept pipes and sockets (with live editing through the hex editor).

Rounding things off, two powerful venues around the corner here is that:

When combined with dynamic network transparency/redirection, we have an easy to use and lightning fast way of setting up multiple attached troubleshooting tools and ‘one-click’ share them with someone else. Since these primitives can nest by ‘handing over’ new window primitives, we get ways of building dynamic hierarchies reusing the carrier for multiple types of data – think in the context of a whole-system virtualization like Qemu. All layers [Qemu itself], [Guest OS shell], [Application inside guest].

If you use Arcan, that is 🙂