On Hurd, Linux and the (mis)adventures of cross-compiling a GNU Hurd toolchain

by V.R.

This article is both a tutorial, a war story and a conceptual introduction to GNU Hurd in which I set up a cross-toolchain, and give a colorful tour through some rough edges of the GNU build system. My host system is Slackware Linux 14.1 (running on -current), i686 – which I find preferable due to its highly vanilla nature, running software almost entirely without distro-specific patching.

As of recent, I have found myself more interested in the Hurd – a well-known yet surprisingly simultaneously unknown project that has had very little attention given to it. In fact, I ran GitStats on the Savannah Hurd repo and found a total lifetime contributor count of 51. Only 51 people have touched the code in the 25 years of the Hurd.

The Hurd is not Linux. It’s not even Unix, though it does impart a Unix-like personality. Its lower level semantics are completely unrelated, however, as I shall elaborate later.

My interest in cross-compiling the GNU Hurd on GNU/Linux wasn’t to build software for it from the relative convenience of my GNU/Linux system. It was, in fact, to study the interactions between the various Hurd libraries (including Mach-specific and Hurd-specific glibc interfaces), the ELF contents of the resulting binaries, the generated RPC definitions and headers, all so as to observe the nature of the Hurd’s runtime from a foreign system. This could potentially allow me to begin mapping a framework to write and load stub interfaces for running the Hurd servers and libraries on a monolithic POSIX system. The forensics of the Hurd, if you will.

It is worth noting that the Hurd developers in fact discourage the practice of cross-compilation and largely insist on using a native Hurd distribution, like Debian GNU/Hurd (which is surprisingly stable these days). However, this did not fit in my aforementioned motive, and I also enjoy prying into the depths of the lesser known.

I hope this document will be useful to people embarking on a similar enterprise, given that all other references on setting up a cross-Hurd are now outdated. I also hope the madness of a cross-compiled GNU toolchain will entertain you.

Cross-compilation is indeed known to be a dark art. You’ll be seeing all sorts of spooky system interactions from targeting foreign machine ABIs. Because you’ll be testing corner cases in your compilation and build suites, you’ll also see undocumented macros and you’ll be wiring in the proper paths, watching versioned symbols fail, munging headers, modifying build recipes and so on. Furthermore, it’s a matter of bootstrapping. You’ll be building the same things in multiple passes to handle mutual dependencies. Get the sequence wrong and you might have to redo steps, or even manually invoke the tools to get the intended result.

But the educational experience is definitely worthwhile. I’m certainly more appreciative of the Plan 9 toolchain now.

Let’s dive into an intro, and then we can begin the action. Feel free to skip as desired.

Introduction to the Hurd, Mach and RPC

Beginnings

The GNU kernel goes back to the GNU Project’s conception in 1983. For a few years, the initially proposed target was an MIT research kernel dating from 1980 onwards called TRIX. It had an in-kernel RPC mechanism and ran certain services like file systems in userspace, so it was a bit of a proto-microkernel. In most other aspects, it implemented a conventional V7 Unix interface. Serious work in trying to adopt it for the GNU system began in late 1986, and it was ultimately scrapped in 1987 for architectural portability reasons, and because of the growing interest in procuring a license for CMU’s Mach kernel, which was quickly becoming all the rage at that time.

The chief architect of the Hurd for a long time was Thomas (then Michael) Bushnell, who ended up leaving the project around 2001. He presently works for Google and is also an ordained Gregorian friar, along with remaining a Debian package maintainer.

Bushnell initially intended to adapt a monolithic 4.4BSD-Lite kernel, but was offset by RMS' aspirations for using Mach, waiting for the licensing situation behind it to clear up. Whether or not this was a mistake or not depends on your point of view – does one value shipping fast, or shipping properly.

The work on what we know as GNU Hurd today began in 1990, as Bushnell’s brainchild. To this day much of the core architectural principles survive from his design.

The misunderstood intentions of the GNU Hurd

The Hurd is a prime example of a technology that’s famous for being famous. Everyone has a vague knowledge of it, but very few have any understanding of it. Moreover, there are some profound misconceptions that frequently float around it (mostly relating to political topics), which I will now debunk.

First of all the question of microkernels. Microkernels aren’t about code size. They’re about breadth of responsibilities and separation of concerns. V6 Unix had a very small kernel, but it was not a microkernel. Memory management, IPC, units of CPU time, units of resource usage (tasks), a representation of the machine host and even some device I/O handling can all constitute a microkernel, even if more recent designs are further “micro”.

The Hurd is not dead. It has never been a thriving community, but it has always managed to persevere out of capturing the interests of a few devoted developers who have periodically came and went. With Debian GNU/Hurd having >80% of the package base building and DDE drivers from Linux 2.6.x, it is also surprisingly usable. One can run Firefox and Xfce on it, for example. Because of the very low (at least relative to Linux) developer count, it has a remarkably different set of priorities, however.

The Hurd is not some symbolic thing that is kept alive as a political statement by GNU or the FSF, contrary to popular belief. In fact, the FSF shifted away from promoting the Hurd a long time ago and have accepted Linux as the canonical GNU target. The people who work on the Hurd are mostly hobbyists who do not have direct relations to GNU or FSF. Actually, the FSF stopped being interventionist stewards of their projects ages ago, these days occasionally stopping by to raise some points in some of their really visible crown jewels, mostly GCC and Emacs at this point. For most other projects, being part of GNU is largely a ceremonial aspect. The real significance is in the use of a GNU toolchain.

The Hurd’s benefits are not just theoretical. You can experience them for yourself by running a Debian GNU/Hurd distro, though it does require some lower level technical appreciation. Unix process semantics being a userspace server, a permission model that goes beyond Unix, relinquishing capabilities from any process, Berkeley sockets, FIFOs, pipes and TCP/IP in userspace servers, clean separation of disk, storage and VFS, a generic object server mechanism for translating one data representation to another that may or may not be accessed from a file system node (provided it uses the FS node as a point of registration and discovery, but that’s it) and so forth. Of further interest is that page fault handling and page replacement is done by userspace servers, as well. Individual applications and libraries can set default memory managers implemented through a common interface distinct from the system-wide one, and this is done e.g. by libdiskfs and ext2fs. Thus there is a separation between managing memory as a resource and memory as application content.

The Hurd is not a kernel. Mach is.

The Hurd developers haven’t really had the intention of replacing or “defeating” Linux for a long while, nor do they hold any grudge against it. People work on the Hurd for its own sake, because they see it as an interesting platform. That’s all.

So what is the Hurd?

The Hurd is a set of userspace servers, libraries and daemons that in combination with a microkernel, libc and binutils form a complete, POSIX-y (but still very distinct from Unix) multiserver operating system. The actual logic that gives a Hurd system its Unix personality exists purely as an abstraction in libc on top of more basic primitives like IPC and tasks.

Strictly, the Hurd has no necessity to run on top of any one given microkernel or even run as a particularly Unix-like personality at all. Several attempts to port the Hurd away from Mach into kernels like L4 or Viengoos have fizzled out due to conceptual non-fits or lack of interest/time/resources.

Key to the Hurd is the idea of the translator. This is a server that registers itself as a node in the file system (but does not have to export a VFS itself - this is important) which through standard library interfaces converts one representation of data to another. Since the basis of programming is input-process-output, translators are used to implement near arbitrary system logic.

It’s important to note that my last paragraph wasn’t exactly precise. Unlike Mach, the Hurd makes a distinction between “translator” and “server”. Translators do export virtual file systems, but servers do not. All translators are servers in Mach parlance, but not all servers are translators. Most services in the Hurd are translators because of the flexibility of a VFS namespace, but there is no obligation of any sort to use it for the actual data you’ll be delivering to a client, beyond registration in the FS namespace for discovery purposes.

There are three general types of translators with corresponding libraries - virtual translators (libnetfs), single-file [“trivial”] translators (libtrivfs) and physical store-backed translators (libdiskfs).

Other services include authentication, Unix process semantics, a VT console, a termios-compatible mode setting subsystem, /dev/random, crash handling, /proc/mtab, binary format registration and execution and so on. Extras outside the main Hurd repo also exist, packaged as hurd-recommended under Debian GNU/Hurd.

The Hurd also ships with some standard userland tools like an RPC tracer, login, su, mount/umount, vmstat, ps, getty, swapon/swapoff and so forth.

What is Mach?

Mach is a first-generation microkernel that originated as a research project at Carnegie Mellon University (CMU), but later sprouted at other places including the Open Software Foundation (OSF – its variant of Mach later becoming part of the XNU kernel used in Darwin/OS X) and the University of Utah (its variant called Mach4 being largely backwards compatible patches to the CMU Mach 3.0 codebase).

The flavor of Mach that GNU Hurd targets is unsurprisingly named GNU Mach. It was based on the CMU Mach 3.0 code, later integrated the Utah Mach4 extensions and is now an established variant on its own.

Key differences between GNU/CMU Mach as used in the Hurd and OSF Mach as used in OS X is that OSF Mach also implements semaphores, lock sets, a resource ledger and extensions to the Mach clock. GNU Mach does not, but on the other hand has a device interface and has since gained extensions specific to itself, like notification on IPC ports and some round-trip optimizations of message transport.

The chief concepts of Mach are ports (IPC channels), messages, tasks, threads, virtual memory and external memory management.

A task is a unit of resources running its own virtual address space with its own port name space (which is a collection of port names – integer descriptors, each of which is associated with some capability [send, send-once, receive…] called a port right).

Tasks by themselves do not do work unless they are backed by threads, which is the actual unit of CPU time. A thread may belong to only one task.

The virtual memory interface is implemented as data structures called memory objects which supply VM regions in a virtual address space. Memory objects are controlled by memory managers, called pagers (as I alluded to earlier by mentioning user-level page fault handling). This means the Mach kernel leaves VMM policy up to userspace, though it does supply a default pager from which anonymous memory is paged out whenever other pagers exhaust or time out freeing their memory cache – the actual data structure that maps to physical memory when a thread accesses a page in the controlling task’s virtual address space.

Anonymous memory and paged memory are usually distinct. The former has standard interfaces that more-or-less are equivalent to standard POSIX semantics (wiring => locking, alloc/dealloc, etc.) However, where Mach and Unix significantly differ is that tasks can access each other’s address spaces with protection boundaries still being enforced.

Ports and RPC

Mach ports are the nucleus of the Mach kernel (Yes.) Ports are the unit of communication. They are unidirectional asynchronous channels, each holding a single fixed-length message queue. They are unnamed and with a single receiver but multiple senders. They are accessible only via capabilities called port rights, which are represented as 32-bit positive integers and usually sent as part of the message body. Just about all Mach resources have ports implicitly associated to them, sans anonymous virtual memory.

A message is a typed collection of data objects. Messages are sent and received through the mach_msg() system call (of which Mach has only about 7 to 11, depending on variant – rest is library calls exposed by libc in libmachuser in the Hurd’s case). Messages may be simple or non-simple, i.e. containing only inline data or containing OOL (out-of-line) data. Inline data is directly copied by the receiver from the message structure, whereas OOL data is paged with kernel assistance (OOL data is usually used for sending regions of virtual memory or variable payloads).

Further are port sets which group ports with receive rights under a single unit for multiplexed I/O. They can’t be sent in messages, but must be recreated by a receiving task. Members of a port set indicate themselves whenever a receive operation is performed on the set, this being done in random order or FIFO if only one port in the set has a queued message.

There’s so-called special ports, which are implicitly created as part of a thread’s state. These include the bootstrap port for accessing system services like Mach devices and the exception port where the kernel sends software-based interrupts, not unlike Unix signals.

Ports have a reference count which is incremented, decremented or left in stasis depending on port operation – whether a send right is received or deallocated. When refcnt hits 0, the port name is freed. Ports die when their receive right is deallocated, leading to send and send-once rights becoming dead names, triggering a message queue sweep. Furthermore, receive rights are tracked with a make-send count, which holds the number of times a send right has been generated from a receive right, reset to 0 upon creation of a new port or when a receive right is transferred.

The semantics of Mach IPC are complex, but in practice they are abstracted either behind RPC or in the Hurd specifically, through libports for lower level port operations beyond send/receive.

RPC is made through a port held by a Hurd translator. All POSIX calls are internally implemented as RPC in glibc, thus having duties shared by glibc, Mach and Hurd. They are written as .defs files, which are written in a simple header-like configuration language called Matchmaker and compiled by the GNU MIG (Mach Interface Generator). MIG reads the RPC definition and builds a functional C source/header combination that packs the proper message arguments for the data and the port. This is done at build-time. For instance, open(2)-ing a file on the rootfs actually translates to calling a dir_lookup RPC from the compiled MIG definition, with the rootfs driver (e.g. ext2fs)’s loop keeping a demultiplexer function which dispatches the proper interface action by matching the RPC ID. Since the rootfs is backed by a physical store, it implements a libdiskfs stub, e.g. diskfs_S_dir_lookup that checks file consistency, creates a port to store the file handle and structure and crafts a reply buffer to send to the user program. The benefits of this approach are the separation between interface and implementation, generic interfaces for standard operations that each translator can hook in, and location transparency with dynamic binding (remember, it’s just a message send from the caller’s perspective, but the receiver is an object that may perform complex processing irrespective of how it actually chooses to represent its data or where it even resides). Downsides are round-trip calls, potentially three-way, though in practice the Hurd has some optimizations.

For more details on RPC, see the Hurd wiki article on it.

The unfortunate reputation of Mach (microkernel FUD)

Mach clearly had good ideas, but it took a while for it to take shape and shed off its initially heavyweight kernel interfaces. The semantics of IPC are complicated, if general. A variety of reasons, including elaborate ports right and type checks on messages led to disappointing performance results that were later addressed by more slim microkernel designs like L4, or QNX (though the latter is strictly synchronous and heavily coupled to CPU scheduling).

That said, Mach and microkernels in general, despite the latter actually dominating in real-time, mission-critical and certain embedded (like baseband processors) industries, as well as hypervisors like Xen, have never lived down their reputation as slow or inefficient. The infamous Torvalds-Tanenbaum debate did not help matters. With the popularity of Linux and Linus Torvalds having his various opinions elevated to deity-esque wisdom, there is probably a non-negligible contingent of people whose knowledge of microkernels boils down solely to echoing Linus' dislike of them without even really knowing why. Which is, well:

No matter the communication overhead, the fault tolerance and reliabilty gains of a microkernel are undeniable. See the MINIX 3 reliability overview for examples.

The Linux microkernel?

It’s also worth noting that several more recent developments in Linux like kdbus (which was called “neutered Mach IPC” by Neal Walfield), the “tinification” effort [http://tiny.wiki.kernel.org], FUSE, NSUSE, the MADV_USERFAULT flag in madvise(2), kmscon and other userland VT subsystems, have demonstrated that there is in fact user demand for microkernel-like features in Linux, whether people realize this directly or not. It is not unthinkable to conceive of Linux (particularly with systemd) growing into a hybrid kernel approach with certain low-level subsystems adequately usable from a user context.

Cross-compiling the Hurd, Part I: Prerequisites

Well, now that our (unexpectedly long) intro is out of the damned way, let’s dive right in!

The architecture we will be targeting is i686-pc-gnu. We will be using Thomas Schwinge’s cross-gnu and cross-gnu-env scripts, but with some modifications to handle more recent developments in the Hurd’s toolchain.

The cross-gnu process is (outdatedly) documented at the Hurd wiki. I will be elaborating on how we will be deviating from its exact steps.

You can either clone the scripts from the Hurd incubator Git repository (which is where cutting-edge developments are stored outside main repos), or just download cross-gnu and cross-gnu-env directly as raw files. They’re just shell scripts.

Component overview

A cross-toolchain for the Hurd consists of six components: binutils, gcc, gnumach, mig, hurd and glibc.

binutils handle assembly, linking, object file analysis and management, library archiving and so on.

gcc is self-explanatory.

gnumach is the GNU Mach kernel with the relevant Mach headers needed for glibc and hurd.

mig is the Mach Interface Generator, used for generating RPC interfaces from .defs files in glibc and hurd.

hurd is all the servers, daemons and libraries that power the system services themselves, along with RPC definitions for all those. We will only be importing the headers in our cross-build root for glibc to compile. We will not be building the Hurd binaries themselves within cross-gnu, as that is beyond its scope. This will be done manually in the end.

glibc is the GNU C Library, implementing the C standard library, POSIX, low-level name resolution and in our specific case the libhurduser (the low-level RPCs to servers – basically hurd/.defs, including abstractions over ioctl()s, signal handling, file descriptors, path/file lookups, etc. – along with the few Hurd-specific glibc APIs) and the libmachuser (the user interface for the Mach kernel API - the low-level traps, thread and message code which are the base of the MIG-generated interfaces like mach_port_allocate() and vm_wire() – basically mach/.defs).

Wait, what about libpthread?

You might notice the cross-gnu wiki article mentions libpthread, which I’m omitting.

The Hurd uses its own implementation of POSIX Threads (libpthread) which for a while used to be maintained and packaged as a separate library.

As of more recent, however, despite still being treated as such for development purposes, it can now be used directly as a glibc add-on. This is, in fact, how Debian GNU/Hurd does it (libpthread is inside glibc), so it’s best for us to follow its lead.

Setting up the environment with the proper packages

The cross-gnu page mentions exact versions, but many of these are now no longer relevant.

For instance, it recommends gcc-4.5 with a configure file patch. We will instead be using an unmodified gcc-4.8.0 from GNU’s FTP servers. Older versions of gcc appear to fail on some of the Mach spinlock code in more recent versions of glibc.

We will be using binutils-2.25. cross-gnu recommends binutils-2.20, but does hint 2.22 or later should be fine. It evidently is.

I ended up fetching GNU Mach, GNU Hurd and GNU MIG from tarballs as opposed to cloning from Git. These are, respectively: 1.5, 0.6, 1.5.

The trickiest part by far (as you shall see) is glibc. After much mucking around, I pulled a glibc_source-2.19-.deb package from Richard Braun’s Debian GNU/Hurd ports mirror. This is a build that comes with libpthread integrated as an add-on and various patches applied by the Debian GNU/Hurd maintainers as they keep tracking changes in Hurd development, so it’s the path of least resistance. A .deb is just a structured tarball, so extract the data.tar.xz inside.

You can get the direct links as follows, though I encourage experimentation on your part: hurd, gnumach, mig, gcc, binutils, glibc.

After placing your cross-gnu, cross-gnu-env and the six GNU toolchain packages somewhere, you should create a new directory at the top-level, another directory inside that called src where you will symlink plain unversioned names of the packages so that they can be built and installed in the bottom level, which is effectively your cross-build root.

Like so:

mkdir -p hurd-cross-build/src && cd hurd-cross-build/src ln -s ../binutils-2.25 binutils/ ln -s ../gcc-4.8.0 gcc/ ln -s ../glibc-2.19 glibc/ ln -s ../gnumach-1.5 gnumach/ ln -s ../hurd-0.6 hurd/ ln -s ../mig-1.5 mig/

Though not documented in the wiki, you can do the same for a seventh package: gdb. I have not tested this.

Cross-compiling the Hurd, Part II: Modifying cross-gnu

The cross-gnu and cross-gnu-env scripts are highly useful in automating the multi-pass phase nature of the cross-compilation process, but the former makes old assumptions that we must revise. cross-gnu-env requires no modifications; leave it untouched.

Disable C++ in GCC

We do not require it, and it may complicate the process.

gcc is compiled in two passes. The first pass builds it without threading, shared library support and NLS (Native Language Support), sufficient for a cross-MIG and a first-pass cross-glibc. The second is a full compiler suite.

In both passes, make sure –enable-build-with-cxx is removed and –enable-languages only has a value of c.

Disable Texinfo for GCC

Pass the MAKEINFO=missing flag separated with a slash on a newline for both passes. GCC is quite strict about Texinfo and it has been known to break. We do not strictly need docs here, anyway.

Disable nscd for glibc

GNU Hurd seemingly cannot build nscd (name server caching daemon), and Debian GNU/Hurd is not known to ship with it. As of glibc-2.17, there is a –disable-nscd flag you can pass, so add it to both build passes.

Remove the libpthread pass

It is redundant, since libpthread is a glibc add-on. It will also not build right away due to a circular dependency on a Hurd library, which I will mention later.

I am referring to this code block:

mkdir -p "$LIBPTHREAD_OBJ" && cd "$LIBPTHREAD_OBJ"/ && if ./config.status --version > /dev/null 2>&1; then :; else # `$TARGET-gcc' doesn't work yet (to satisfy the Autoconf checks), but isn't # needed either. CC=gcc \ "$LIBPTHREAD_SRC"/configure \ --host="$TARGET" \ --prefix="$CROSS_GNU_USR" \ ac_cv_lib_ihash_hurd_ihash_create=yes fi && "$MAKE" \ DESTDIR="$SYS_ROOT" \ install-data-local-headers && # Below, we will reconfigure for allowing to build libpthread. if grep -q '^CC = gcc$' Makefile then rm config.status else : fi &&

Remove the Hurd library pass

Though commented as “GNU Hurd’s core libraries”, this only builds and installs libihash - the Hurd’s generic hash table library.

libpthread uses libihash for maintaining thread-local storage.

From my experience, this pass fails due to a lack of glibc by this point, regardless of whether a relative or absolute compiler name is used. We will later get around this in a rather crafty (read: unremittingly horrifying) way.

The issue is that the Hurd roadblocks us by invoking the undocumented AC_NO_EXECUTABLES autoconf macro when it detects a cross-compilation, which turns off the standard autoconf link tests under the assumption that our linker is not yet bootstrapped in the absence of libc.

Autotools are a landmine as a matter of principle.

The code block is this:

# Install the GNU Hurd's core libraries. cd "$HURD_OBJ"/ && if ./config.status --version > /dev/null 2>&1; then :; else "$HURD_SRC"/configure \ --host="$TARGET" \ --prefix="$CROSS_GNU_USR" \ --disable-profile \ --without-parted fi && "$MAKE" \ libihash && "$MAKE" \ prefix="$SYS_ROOT""$CROSS_GNU_USR" \ libihash-install &&

This should be it. On to the first-pass!

Cross-compiling the GNU Hurd, Part III: First pass and the great glibc swindle

This is where you put the round in the chamber and fire.

Total disk space taken by the end of the whole process should be ~2GB.

Head into your cross-build directory (e.g. hurd-cross-build) and designate it as ROOT, then invoke cross-gnu-env:

ROOT=. . cross-gnu-env # or wherever it's located

cross-gnu-env will set various build-time variables like $SYS_ROOT, $TARGET, $PROGNAME_OBJ and $PROGNAME_SRC so that the build steps can run with minimal manual configuration on your part (or none for most of the process, as we’ll simply be running cross-gnu and trying to resolve SNAFUs).

Examples:

echo $SYS_ROOT $TARGET $CROSS_GNU_USR

Now fire!

cross-gnu # assuming it's in $PATH, otherwise use absolute directory name

You will be rerunning this a lot after fixing failures. It’s safe.

binutils

The easiest part. It passes without issue.

gcc, first-pass

This should also pass smoothly. It may also complain of a missing $SYS_ROOT/include, in which case just touch it.

GNU Mach headers

Needed for glibc later on. This is a simple operation. We never actually build a gzipped Mach kernel, though if you want to, see here.

GNU MIG

For generating the RPC definitions. Again, no qualms.

GNU Hurd headers

Are really just RPC defs that are compiled by MIG, hence the previous step. They will go in $SYS_ROOT/usr/include, along with Mach ones and all others.

glibc, first-pass

“Strange, I never saw your name on my paycheck. Since if that’s not the case you cannot order me around.” – Ulrich Drepper

glibc breaks spectacularly and we will be doing plenty of ugly dissection to get it up and building. This is all in a libpthread-integrated Debian build with patches, so you can only imagine what the actual maintainers must fight against.

Missing RPC definitions

We observe that hurd/Makefile references exec_experimental and fs_experimental in the user-interfaces directive. Except, they’re not actually included with the tarball. Or really in any public release I could find. The former is a more recent API used by the exec server to run #!-scripts, and the latter has a comment explaining its intended use.

I thus had to copy them from mailing list patches and create them, minus licensing info due to laziness.

Remember, this software comes with no warranty. For details, please read ‘WARRANTY’.

exec_experimental.defs:

subsystem exec_experimental 434242; #include #ifdef EXEC_IMPORTS EXEC_IMPORTS #endif INTR_INTERFACE routine exec_exec_file_name ( execserver: file_t; file: mach_port_send_t; oldtask: task_t; flags: int; filename: string_t; argv: data_t SCP; envp: data_t SCP; dtable: portarray_t SCP; portarray: portarray_t SCP; intarray: intarray_t SCP; deallocnames: mach_port_name_array_t; destroynames: mach_port_name_array_t);

fs_experimental.defs:

subsystem fs_experimental 444242; #include #ifdef FILE_IMPORTS FILE_IMPORTS #endif /* Operations supported on all files */ INTR_INTERFACE /* Overlay a task with a file. Necessary initialization, including authentication changes associated with set[ug]id execution must be handled by the filesystem. Filesystems normally implement this by using exec_newtask or exec_loadtask as appropriate. */ routine file_exec_file_name ( exec_file: file_t; RPT exec_task: task_t; flags: int; filename: string_t; argv: data_t SCP; envp: data_t SCP; fdarray: portarray_t SCP; portarray: portarray_t SCP; intarray: intarray_t SCP; deallocnames: mach_port_name_array_t SCP; destroynames: mach_port_name_array_t SCP);

Disable Texinfo from Makefile targets

glibc is even more anal about its insistence on installing docs for you which we can’t actually cross-compile, our target failing. As there is no switch, we need to edit manual/Makefile and reduce the install, install-data, subdir_install and catchall rule to this:

.PHONY: install-data subdir_install: ifneq ($(strip $(MAKEINFO)),:) install: endif # Catchall implicit rule for other installation targets from the parent. install-%: ;

pthread surgery, Mk I

libpthread has a private, libc-internal mutex lock header libc-lockP.h, as well as pthread-functions.h for condition variables and thread attributes.

For some reason, glibc will also be expecting them in sysdeps/mach/hurd/bits, so you should copy them over from libpthread/sysdeps/pthread/bits and libpthread/sysdeps/pthread/, respectivly.

pthread surgery, Mk II: Mangling our libhurduser

Again, we come back to the chicken-egg problem of libihash I mentioned earlier. glibc needs libpthread, which needs libihash. libihash is part of Hurd, which needs glibc. We don’t have it, and so link tests are turned off for Hurd, stopping us from configuring a build.

I ultimately settled on a horrible hack, but oddly one that the Hurd developers themselves have considered doing: making libihash a part of libhurduser. By doing this, I could also keep intact the header definitions set in the thread-local storage handling code.

Three things need to be done.

Firstly, take the ihash source file and the ihash header file from hurd-0.6/libihash and paste them into glibc, respectively at hurd/ and hurd/hurd/.

Second, edit hurd/Makefile and add definitions for ihash.h in the headers and inline-headers directives, as well as ihash in the routines directive so they’re registered in the build system:

headers = hurd.h $(interface-headers) \ $(addprefix hurd/,fd.h id.h port.h signal.h sigpreempt.h ioctl.h\ userlink.h resource.h threadvar.h lookup.h ihash.h) inline-headers = hurd.h $(addprefix hurd/,fd.h signal.h \ userlink.h threadvar.h port.h ihash.h) [...] # cut off routines = hurdstartup hurdinit \ hurdid hurdpid hurdrlimit hurdprio hurdexec hurdselect \ ihash \ [...] # cut off

Finally, the most unsavory part: actually register the hurd_ihash functions as versioned symbols for linker visibility in the hurd/Versions file. Yep, it’s “officially” now part of glibc.

GLIBC_2.13_DEBIAN_19 { # functions used by libpthread and _hurd_sigstate_set_global_rcv; _hurd_sigstate_lock; _hurd_sigstate_pending; _hurd_sigstate_unlock; _hurd_sigstate_delete; hurd_ihash_find; hurd_ihash_remove; hurd_ihash_free; hurd_ihash_add; hurd_ihash_create; }

Stop and take a shower, realizing you will never be able to engage with normies. REEEEEEEEEEEEEEEE

Forcing nscd through, if necessary

If for some reason –disable-nscd doesn’t work (I can’t imagine wh-

“The behavior is correct and wanted. Now stop wasting people’s time.”

Alright, actually I can.

Anyway, such an event can hopefully be mitigated by:

a) Commenting out the thread_info_t overloading in nscd/nscd.c, which conflicts with <mach/thread_info.h>:

/* Structure used by main() thread to keep track of the number of active threads. Used to limit how many threads it will create and under a shutdown condition to wait till all in-progress requests have finished before "turning off the lights". */ typedef struct { int num_active; pthread_cond_t thread_exit_cv; pthread_mutex_t mutex; } thread_info_t; thread_info_t thread_info;

b) Redefining the connection database initialization locks in nscd/connections.c from the non-present in Hurd PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP to the closely equvalent __PTHREAD_RWLOCK_INITIALIZER.

In general, since nscd is useless in Hurd, anything you do to mangle your way out is likely permissible.

This should hopefully be it.

Cross-compiling the GNU Hurd, Part IV: Second pass

Damn, I hope we’re glad to get that out of the way.

Second pass is mostly smooth sailing.

gcc will build itself with full shared library and threading support modulo what we edited out from cross-gnu. As I mentioned earlier, it might complain of a missing $SYS_ROOT/lib that you might need to touch, either in the first or second pass.

glibc, too, after being smacked with a trout in the first pass, should be fine. You should now have a complete dynamic loader (ld.so.1).

And you’re done!

Almost.

We do have a cross-toolchain targeting i686-pc-gnu and thus the Hurd, but we haven’t built the Hurd itself. Nor is this a recommended practice either, since the ABI, kernel and libraries are all a different platform. But it is what we’re seeking for research purposes.

Cross-compiling the GNU Hurd, Part V: The actual fucking Hurd

This process is pretty much bound to be fragile, since we’re pushing the purposes of a cross-toolchain (cross-compiling packages) right into actually building the Hurd servers, that are of a totally alien platform. Nonetheless, I managed to get most of them – more than enough to observe ELF headers, symbols, library interactions and generated RPC definitions to see how Mach RPC is structured.

This one is manual, but still reusing the variables set by cross-gnu-env.

cd over to the Hurd directory.

Configure:

./configure --host="$TARGET" --prefix="$CROSS_GNU_USR" --disable-profile --without-parted

Take it for a spin:

gmake PREFIX="$SYS_ROOT""$CROSS_GNU_USR" all

Several things to note.

The Hurd is structured in a recursive per-directory Makefile build layout, each Makefile sourcing from the root Makeconf. As most servers and libraries use Mach threads, since abstracted by pthreads, they attach the -lpthread linker flag to the HURDLIBS and LDLIBS variables.

However, this actually has the effect of linking to our system-wide /lib/libpthread.so.0, when we really want to link to our cross-compiled, Hurd-specific $SYS_ROOT/lib/libpthread.so.3. To do this, you must comment out or remove all instances of HURDLIBS and LDLIBS that reference -lpthread. Simply grep for it and apply a sed patch.

Secondly, the proc server in proc/mgt.c uses a relatively recent RPC added to the Mach kernel called mach_notify_new_task, which may or may not have been compiled by MIG in the Mach header pass. Either head to gnumach/include/mach and invoke i686-pc-gnu-mig on the task_notify.defs file manually and copy it over to $SYS_ROOT/usr/include, or since it’s a low-level server that may not be of major relevance to this, just define the RPC function signature manually on top of proc/mgt.c, like this:

extern kern_return_t mach_notify_new_task ( mach_port_t notify, mach_port_t task, mach_port_t parent ) { } ;

The Hurd mount(8) binary uses libblkid, which we don’t have. Either compile it or simply comment out or remove it from utils/Makefile in the targets, special-targets, mountlibs-LDFLAGS and mountlibs-CPPFLAGS variables.

I was unable to build the ext2fs, storeio, pflocal, hello-mt and fatfs translators. There were issues with pthread read-write locks which I deferred from debugging, since the rest of the Hurd was quite enough for analysis. I do intend on revisiting them later down the road. If you figure it out in the meantime, do contact V.R. at Dark n' Edgy forums.

mach-defpager also failed, but that one is understandable and of little relevance to us. Applications and libraries set their own memory managers/pagers (the Hurd usually basing them on libpager interfaces), with mach-defpager being inherently unportable and intrinsic to Mach. In monolithic kernels, you will be using the page replacement algorithms of your kernel.

I had to comment out the aforementioned programs from hurd/Makefile’s prog-subdirs. hello-mt is just a trivial single-file translator, so that one should be taken out from trans/Makefile.

If you want to install:

gmake DESTDIR="$SYS_ROOT" install

Quickly analyzing the Hurd

Obviously you won’t be able to run the Hurd because you don’t actually have anything remotely resembling a Hurd-compatible runtime on your GNU/Linux box. ldd isn’t of much use, either, but other tools (particularly from binutils) are quite handy, besides looking at the outputs of our $SYS_ROOT.

Displaying the ELF file and program headers of the hello translator:

foo@bar:~/gnuhurd/hurd-cross-build/src$ readelf -h -l hurd/trans/hello ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048dfe Start of program headers: 52 (bytes into file) Start of section headers: 32368 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 8 Size of section headers: 40 (bytes) Number of section headers: 36 Section header string table index: 33 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4 INTERP 0x000134 0x08048134 0x08048134 0x0000b 0x0000b R 0x1 [Requesting program interpreter: /lib/ld.so] LOAD 0x000000 0x08048000 0x08048000 0x01754 0x01754 R E 0x1000 LOAD 0x001754 0x0804a754 0x0804a754 0x001c0 0x00290 RW 0x1000 DYNAMIC 0x001768 0x0804a768 0x0804a768 0x00100 0x00100 RW 0x4 NOTE 0x000140 0x08048140 0x08048140 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x0014e0 0x080494e0 0x080494e0 0x0006c 0x0006c R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 04 .dynamic 05 .note.ABI-tag 06 .eh_frame_hdr 07

Dumping the symbol table for the isofs/main object file:

foo@bar:~/gnuhurd/hurd-cross-build/src$ objdump -t hurd/isofs/main.o hurd/isofs/main.o: file format elf32-i386 SYMBOL TABLE: 00000000 l df *ABS* 00000000 main.c 00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .bss 00000000 .bss 00000000 l d .rodata.str1.1 00000000 .rodata.str1.1 00000000 l d .rodata.str1.4 00000000 .rodata.str1.4 00000000 l F .text 0000016d read_sblock 00000000 l d .text.startup 00000000 .text.startup 00000000 l O .rodata 0000000b __PRETTY_FUNCTION__.11124 00000000 l d .rodata 00000000 .rodata 00000000 l d .debug_info 00000000 .debug_info 00000000 l d .debug_abbrev 00000000 .debug_abbrev 00000000 l d .debug_loc 00000000 .debug_loc 00000000 l d .debug_aranges 00000000 .debug_aranges 00000000 l d .debug_ranges 00000000 .debug_ranges 00000000 l d .debug_line 00000000 .debug_line 00000000 l d .debug_str 00000000 .debug_str 00000000 l d .note.GNU-stack 00000000 .note.GNU-stack 00000000 l d .eh_frame 00000000 .eh_frame 00000000 l d .comment 00000000 .comment 00000000 *UND* 00000000 _GLOBAL_OFFSET_TABLE_ 00000000 *UND* 00000000 diskfs_exception_diu 00000000 *UND* 00000000 _setjmp 00000004 O *COM* 00000004 disk_image 00000000 *UND* 00000000 malloc 00000004 O *COM* 00000004 sblock 00000004 O *COM* 00000004 logical_block_size 00000000 *UND* 00000000 memcmp 00000000 *UND* 00000000 error 00000000 *UND* 00000000 __errno_location 00000170 g F .text 00000038 diskfs_append_args 00000000 *UND* 00000000 diskfs_append_std_options 00000008 g O .bss 00000004 store_parsed 00000000 *UND* 00000000 store_parsed_append_args 00000000 g F .text.startup 000000f0 main 00000000 *UND* 00000000 diskfs_readonly 00000000 *UND* 00000000 diskfs_hard_readonly 00000000 *UND* 00000000 diskfs_init_main 0000000c g O .bss 00000004 store 00000000 *UND* 00000000 create_disk_pager 00000000 *UND* 00000000 rrip_initialize 00000000 *UND* 00000000 rrip_lookup 00000004 O *COM* 00000004 diskfs_root_node 00000000 *UND* 00000000 load_inode 00000000 *UND* 00000000 pthread_mutex_unlock 00000000 *UND* 00000000 diskfs_startup_diskfs 00000000 *UND* 00000000 pthread_exit 00000000 *UND* 00000000 __assert_perror_fail 000001b0 g F .text 00000003 diskfs_reload_global_state 000001c0 g F .text 00000003 diskfs_set_hypermetadata 000001d0 g F .text 00000008 diskfs_readonly_changed 00000000 *UND* 00000000 abort 00000000 g O .data 00000004 diskfs_maxsymlinks 00000004 g O .data 00000004 diskfs_name_max 00000008 g O .data 00000004 diskfs_link_max 00000000 g O .bss 00000004 diskfs_synchronous 0000000c g O .data 00000004 diskfs_extra_version 00000010 g O .data 00000004 diskfs_server_version 00000014 g O .data 00000004 diskfs_server_name 00000004 g O .bss 00000004 diskfs_disk_name 00000004 O *COM* 00000004 mounted_on 00000004 O *COM* 00000004 host_name 00000004 O *COM* 00000004 diskfs_read_symlink_hook 00000004 O *COM* 00000004 diskfs_create_symlink_hook 00000004 O *COM* 00000004 diskfs_shortcut_ifsock 00000004 O *COM* 00000004 diskfs_shortcut_fifo 00000004 O *COM* 00000004 diskfs_shortcut_blkdev 00000004 O *COM* 00000004 diskfs_shortcut_chrdev 00000004 O *COM* 00000004 diskfs_shortcut_symlink

Displaying notes, unwind info, relocations and dynamic section for ftpfs:

foo@bar:~/gnuhurd/hurd-cross-build/src$ readelf -nrud hurd/ftpfs/ftpfs Dynamic section at offset 0x690c contains 30 entries: Tag Type Name/Value 0x00000001 (NEEDED) Shared library: [libhurdbugaddr.so.0.3] 0x00000001 (NEEDED) Shared library: [libnetfs.so.0.3] 0x00000001 (NEEDED) Shared library: [libfshelp.so.0.3] 0x00000001 (NEEDED) Shared library: [libiohelp.so.0.3] 0x00000001 (NEEDED) Shared library: [libports.so.0.3] 0x00000001 (NEEDED) Shared library: [libihash.so.0.3] 0x00000001 (NEEDED) Shared library: [libftpconn.so.0.3] 0x00000001 (NEEDED) Shared library: [libshouldbeinlibc.so.0.3] 0x00000001 (NEEDED) Shared library: [libc.so.0.3] 0x00000001 (NEEDED) Shared library: [libmachuser.so.1] 0x00000001 (NEEDED) Shared library: [libhurduser.so.0.3] 0x0000000c (INIT) 0x804989c 0x0000000d (FINI) 0x804d45c 0x00000004 (HASH) 0x8048160 0x00000005 (STRTAB) 0x8048cd8 0x00000006 (SYMTAB) 0x80484e8 0x0000000a (STRSZ) 2084 (bytes) 0x0000000b (SYMENT) 16 (bytes) 0x00000015 (DEBUG) 0x0 0x00000003 (PLTGOT) 0x804fa28 0x00000002 (PLTRELSZ) 576 (bytes) 0x00000014 (PLTREL) REL 0x00000017 (JMPREL) 0x804965c 0x00000011 (REL) 0x804961c 0x00000012 (RELSZ) 64 (bytes) 0x00000013 (RELENT) 8 (bytes) 0x6ffffffe (VERNEED) 0x80495fc 0x6fffffff (VERNEEDNUM) 1 0x6ffffff0 (VERSYM) 0x80494fc 0x00000000 (NULL) 0x0 Relocation section '.rel.dyn' at offset 0x161c contains 8 entries: Offset Info Type Sym.Value Sym. Name 0804fa24 00003506 R_386_GLOB_DAT 00000000 __gmon_start__ 0804fc00 00005305 R_386_COPY 0804fc00 __vm_page_size 0804fc04 00001e05 R_386_COPY 0804fc04 netfs_node_refcnt_lock 0804fc08 00002005 R_386_COPY 0804fc08 __mach_task_self_ 0804fc0c 00002705 R_386_COPY 0804fc0c netfs_std_runtime_argp 0804fc28 00003005 R_386_COPY 0804fc28 netfs_root_node 0804fc2c 00003805 R_386_COPY 0804fc2c stderr 0804fc30 00005905 R_386_COPY 0804fc30 netfs_std_startup_argp Relocation section '.rel.plt' at offset 0x165c contains 72 entries: Offset Info Type Sym.Value Sym. Name 0804fa34 00000307 R_386_JUMP_SLOT 00000000 netfs_startup 0804fa38 00000407 R_386_JUMP_SLOT 00000000 pthread_mutex_lock 0804fa3c 00000607 R_386_JUMP_SLOT 00000000 mmap64 0804fa40 00000707 R_386_JUMP_SLOT 00000000 netfs_init 0804fa44 00000807 R_386_JUMP_SLOT 00000000 ftp_conn_start_retriev 0804fa48 00000a07 R_386_JUMP_SLOT 00000000 _pthread_spin_lock 0804fa4c 00000b07 R_386_JUMP_SLOT 00000000 fflush 0804fa50 00000c07 R_386_JUMP_SLOT 00000000 malloc 0804fa54 00000f07 R_386_JUMP_SLOT 00000000 netfs_nput 0804fa58 00001007 R_386_JUMP_SLOT 00000000 fclose 0804fa5c 00001107 R_386_JUMP_SLOT 00000000 pthread_hurd_cond_wait 0804fa60 00001207 R_386_JUMP_SLOT 00000000 asprintf 0804fa64 00001407 R_386_JUMP_SLOT 00000000 hurd_ihash_destroy 0804fa68 00001507 R_386_JUMP_SLOT 00000000 ftp_conn_set_type 0804fa6c 00001607 R_386_JUMP_SLOT 00000000 munmap 0804fa70 00001707 R_386_JUMP_SLOT 00000000 hurd_ihash_init 0804fa74 00001807 R_386_JUMP_SLOT 00000000 bcopy 0804fa78 00001907 R_386_JUMP_SLOT 00000000 netfs_make_node 0804fa7c 00001c07 R_386_JUMP_SLOT 00000000 hstrerror 0804fa80 00001f07 R_386_JUMP_SLOT 00000000 error 0804fa84 00002407 R_386_JUMP_SLOT 00000000 ftp_conn_create 0804fa88 00002507 R_386_JUMP_SLOT 00000000 pthread_mutex_unlock 0804fa8c 00002807 R_386_JUMP_SLOT 00000000 ftp_conn_free 0804fa90 00002907 R_386_JUMP_SLOT 00000000 fshelp_access 0804fa94 00002b07 R_386_JUMP_SLOT 00000000 argp_failure 0804fa98 00003507 R_386_JUMP_SLOT 00000000 __gmon_start__ 0804fa9c 00003a07 R_386_JUMP_SLOT 00000000 free 0804faa0 00003c07 R_386_JUMP_SLOT 00000000 __errno_location 0804faa4 00003e07 R_386_JUMP_SLOT 00000000 ftp_conn_get_names 0804faa8 00003f07 R_386_JUMP_SLOT 00000000 fshelp_isowner 0804faac 00004007 R_386_JUMP_SLOT 00000000 strcpy 0804fab0 00004107 R_386_JUMP_SLOT 00000000 pthread_cond_broadcast 0804fab4 00004207 R_386_JUMP_SLOT 00000000 fshelp_touch 0804fab8 00004607 R_386_JUMP_SLOT 00000000 netfs_nrele 0804fabc 00004907 R_386_JUMP_SLOT 00000000 getpid 0804fac0 00004b07 R_386_JUMP_SLOT 00000000 ftp_conn_append_name 0804fac4 00004c07 R_386_JUMP_SLOT 00000000 strchr 0804fac8 00004e07 R_386_JUMP_SLOT 00000000 argp_state_help 0804facc 00005007 R_386_JUMP_SLOT 00000000 hurd_ihash_locp_remove 0804fad0 00005207 R_386_JUMP_SLOT 00000000 netfs_server_loop 0804fad4 00005407 R_386_JUMP_SLOT 00000000 gethostbyname_r 0804fad8 00005507 R_386_JUMP_SLOT 00000000 strlen 0804fadc 00005607 R_386_JUMP_SLOT 00000000 strrchr 0804fae0 00005707 R_386_JUMP_SLOT 00000000 ftp_conn_finish_transf 0804fae4 00005807 R_386_JUMP_SLOT 00000000 stpcpy 0804fae8 00005c07 R_386_JUMP_SLOT 00000000 argp_error 0804faec 00005d07 R_386_JUMP_SLOT 00000000 snprintf 0804faf0 00005e07 R_386_JUMP_SLOT 00000000 maptime_map 0804faf4 00005f07 R_386_JUMP_SLOT 00000000 pthread_mutex_init 0804faf8 00006007 R_386_JUMP_SLOT 00000000 memset 0804fafc 00006307 R_386_JUMP_SLOT 00000000 __assert_fail 0804fb00 00006407 R_386_JUMP_SLOT 00000000 __strdup 0804fb04 00006507 R_386_JUMP_SLOT 00000000 pthread_cond_init 0804fb08 00006707 R_386_JUMP_SLOT 00000000 __libc_start_main 0804fb0c 00006807 R_386_JUMP_SLOT 00000000 strcmp 0804fb10 00006907 R_386_JUMP_SLOT 00000000 netfs_nref 0804fb14 00006a07 R_386_JUMP_SLOT 00000000 vm_allocate 0804fb18 00006b07 R_386_JUMP_SLOT 00000000 close 0804fb1c 00006c07 R_386_JUMP_SLOT 00000000 io_stat 0804fb20 00006e07 R_386_JUMP_SLOT 00000000 argz_add 0804fb24 00006f07 R_386_JUMP_SLOT 00000000 memchr 0804fb28 00007007 R_386_JUMP_SLOT 00000000 argp_parse 0804fb2c 00007207 R_386_JUMP_SLOT 00000000 read 0804fb30 00007407 R_386_JUMP_SLOT 00000000 ftp_conn_get_stats 0804fb34 00007507 R_386_JUMP_SLOT 00000000 calloc 0804fb38 00007607 R_386_JUMP_SLOT 00000000 hurd_ihash_add 0804fb3c 00007807 R_386_JUMP_SLOT 00000000 __strndup 0804fb40 00007907 R_386_JUMP_SLOT 00000000 fopen64 0804fb44 00007b07 R_386_JUMP_SLOT 00000000 task_get_special_port 0804fb48 00007c07 R_386_JUMP_SLOT 00000000 fprintf 0804fb4c 00007d07 R_386_JUMP_SLOT 00000000 strtol 0804fb50 00007e07 R_386_JUMP_SLOT 08049d50 ports_self_interrupted The decoding of unwind sections for machine type Intel 80386 is not currently supported. Notes at offset 0x00000140 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Hurd, ABI: 0.0.0

So on and so forth. A possible next step is to load stubs for Hurd libraries and gauge the results.

That’s all from me, here’s my final words.

Closing remarks

The Hurd is severely underrated. This wouldn’t be a problem if people didn’t propagate misconceptions and falsehoods regarding, though it is evidently an emotionally charged issue for whatever reason.

I recommend people try out Debian GNU/Hurd on QEMU or VirtualBox and browse the wiki to get a taste of the Hurd’s offerings. Educating yourself about other OS never hurts.

The cross-toolchain I documented in building was motivated by my own interest in peaking at a live, though non-functional Hurd system that I can excavate and analyze from a foreign platform (GNU/Linux). How I will continue with this is, I don’t quite know yet, but I hope this is of help to anyone embarking on a similar endeavor, or that you found this article to be interesting.