Table of Contents

Storage and Identification of Cabalized Packages Albert Y. C. Lai, trebla [at] vex [dot] net

This article describes where library packages for GHC are stored, how GHC remembers them, and corollaries. Cabal tries to abstract this away from you, but the abstraction leaks. You will run into problems. You may have already run into problems. You will need this information to solve problems. Ignorance is not bliss anymore. You will know. You are forced to know.

This article is Linux-centric. The Windows and MacOS stories will be covered in the future, but the only real difference is directory organization.

The title of this article is deliberately, naughtily contrived to admit the acronym SICP.

Global vs User

You have the choice of installing a package as either global or user; one choice for one package. In most environments, global means in system-wide directories and requires escalated privilege to install and uninstall, and user means under your home directory and requires your privilege to install and uninstall.

Exceptions are possible by cunning setups. In fact I use one: I own the suitable system-wide directories, and so global requires my privilege, not escalated privilege, to install and uninstall. Here is another exception, though I don't use it: “system-wide directories” is configurable, and you may configure them to go under your home directory.

A choice is always made, even when you are unconscious. The choice affects storage, identification, and even whether the package is ignored or not. You cannot afford to be unconscious. When you are unconscious, here are the typical automatic choices, depending on how you install:

how choice remarks comes with GHC global comes with Haskell Platform global overridable if build from source Setup.hs or Setup.lhs global see how it's different from the next cabal install user see how it's different from the previous Linux distro global you write package.conf yourself with magnets, bananas, lenses, envelopes, and barbed wire my little article here has nothing new to offer you

A point of the global vs user distinction is that global packages are not supposed to depend on user packages. (The other direction is fine.) So for example, when building a package to be installed as global, all user packages are momentarily ignored.

Storage

The pathnames of a package's files are derived from how the package is installed, the package name, the version, and which GHC version it is built for.

GHC version is needed because library files are sensitive to it.

Take for example a package called “HUnit”, version 1.2.2.1, built for GHC 6.12.3. Let prefix be the following directory depending on how you install a package:

case prefix = user $HOME/.cabal global from Linux distro /usr global from GHC N/A, see below global otherwise /usr/local

Then the package's files are stored in:

file type directory library files (*.hi, *.a, *.so, *.lib) prefix /lib/HUnit-1.2.2.1/ghc-6.12.3 data files prefix /share/HUnit-1.2.2.1 license, docs prefix /share/doc/HUnit-1.2.2.1 executables prefix /bin

Packages that come with GHC, together with GHC itself, are stored a bit differently. Using GHC 6.12.3, array-0.3.0.1 for example:

file type directory library files (*.hi, *.a, *.so, *.lib) /usr/local/lib/ghc-6.12.3/array-0.3.0.1 data files /usr/local/lib/ghc-6.12.3 license, docs /usr/local/share/doc/ghc/html/libraries/array-0.3.0.1 executables /usr/local/bin

Change /usr/local to /usr if you obtain GHC from your Linux distro.

Deviations are always possible because there are a million configuration knobs. At the end of the day, it is OK because GHC keeps metadata to record full path names, per package, and per file type. See the next section.

Identification

GHC keeps metadata to identify what packages are installed and where; it does not enumerate directory contents to find packages, contrary to popular belief. If the metadata does not record a package, then the package is not installed, end of story, file existence is irrelevant. And if the metadata does record a package, then the package is installed, deleting files does not make it uninstalled.

Moreover, GHC only identifies packages containing libraries, since GHC needs the libraries only. For example, alex is an executable-only package, and GHC does not identify it.

Cabal does not keep the metadata. Cabal calls GHC to get and set the metadata. (You can too.) As an easy corollary, no one tracks executable-only packages such as alex (unless your Linux distro tracks them).

Use the command ghc-pkg list to find out a summary of the metadata. It lists which packages, which versions, are installed. It is actually two lists: one of those installed as global, and one of user.

Use the command ghc-pkg describe to see the detailed record of a package, e.g., ghc-pkg describe network for the package “network”. The breadth of the metadata is impressive. They include the locations of important files of the package (except executables). Of particular interest in this article are:

id: network-2.2.1.7-ea4c90e3415be421311952f340195f0d depends: base-4.2.0.2-5fc3ebcb886ceae9a06b0bab7e8d4680 parsec-2.1.0.1-ea096577115d95b0cfde0225e2011564

You may also use ghc-pkg field network id and ghc-pkg field network depends to see just those two pieces.

The id has the package name, version, and a long hexadecimal number since GHC 6.12.*. (Your long hexadecimal numbers may be different.) The dependency also uses such long id's of other packages. The long hexadecimal number is a hash computed from the ABI of the package, which means *.hi files of exposed modules; this is described in the next section, but to preview, they contain exported things, but there are more, such as some imported things too, paradoxically.

Lastly, the locations of the metadata are also listed in ghc-pkg list . Typically, assuming GHC version 6.12.3:

The global metadata are in one of (depending on where you got GHC):

/usr/local/lib/ghc-6.12.3/package.conf.d

/usr/lib/ghc-6.12.3/package.conf.d

/var/lib/ghc-6.12.3/package.conf.d

/etc/ghc-6.12.3/package.conf.d

The user metadata are in $HOME/.ghc/ arch -6.12.3/package.conf.d depending on your computer platform, e.g., arch = i386-linux for 32-bit x86 Linux.

ABI Hash

Since GHC 6.12.*, every installed package is assigned a long hexadecimal number for unique identification beyond name and version; the three together forms the id of the package. The long number is a cryptographic hash of the *.hi files of the exposed modules. The *.hi files define compatibility at the binary level or ABI level, and therefore the hash reflects it probabilistically.

For example, if you have two instances of package X version 5.0 installed (one global, one user), and their id's are respectively

X-5.0-12cb3a153b7c612a838aff6c3ebaf767 X-5.0-3c7ab2fe167671b185e18f065db9d369

then the two are definitely not interchangeable. This is why when another package Y depends on X, Cabal chooses one X instance only and records the full id of the choice made, so later GHC can use the record for sanity checks.

Conversely, if the two X instances are both

X-5.0-12cb3a153b7c612a838aff6c3ebaf767 X-5.0-12cb3a153b7c612a838aff6c3ebaf767

then they are highly likely interchangeable and usually causes no problems.

The question now is what *.hi files contain and why the same package X-5.0 — the very same source code, compiled by the same compiler too — can possibly lead to different *.hi files and hashes. It is even more puzzling if you have known that *.hi files say what are exported, and reason that exporting the same names and types should lead to identical *.hi files and hashes.

This understanding of *.hi files is adequate when optimizations are turned off. But things get interesting when optimizations are turned on; indeed Cabal turns on -O by default, and some packages further specify -O2 .

Inlining code across module boundaries and even package boundaries is absolutely necessary to trigger much-needed optimizations such as fusion and deforestation. Famous high-performance packages such as bytestring totally rely on it (it also specifies -O2 ). How do you inline code across modules and still have separate compilation? By putting actual code, not just names and types, into *.hi files.

Now that internal code of modules also appear in *.hi files, we have two slippery slopes.

First slippery slope. Suppose package X depends on W; then by transitive inlining, some W's internal code appears in X's *.hi files. Building the same X version against different versions of W implies different X's hashes. But it gets better. Suppose X depends on W, and W depends on V; then even some V's internal code may appear in X's *.hi files too. So even if you fix X's version and W's version, X's hash may still vary just by varying V's version.

Second slippery slope. Now that code appears in *.hi files, you don't even have to vary package versions. Tweaking individual optimization flags already changes generated code and affects *.hi files and hashes.

Treacherous, eh? Many elusive corollaries ensue and are described in later sections.

Beyond Global vs User

So far I have covered only two metadata databases: global and user. More databases are supported by adding command line options, and it is how sandboxing works. So, you could do sandboxing by hand too.

A database can be in one of two formats:

A text file, you have to initialize it by echo '[]' > /database1 # or equivalent In general, it contains a list literal of package metadata.

A directory, you have to initialize it by mkdir /database1 # or other ways to make it exist and empty ghc-pkg --package-db=/database1 recache In general, it contains text files of package metadata and a binary file package.cache caching them all. This is the format used by the global and the user databases.

Suppose you have an extra database /database1 (it can be a text file or a directory, as said), then the command line options to use it are:

cabal install/configure --package-db=/database1 --prefix=/files1 ghc/ghci -package-db=/database1 ghc-pkg --package-db=/database1

or

--global --user --package-db=/database1

depending on whether you want to include global and user, but beware of this bug

There is no typo in the table, it is really one dash in one case and two dashes in others. Don't you love gratuitous inconsistencies? (Even better, in GHC versions 7.4 and before, it was package-conf not package-db . This means 3 forms to remember because cabal has always used --package-db . Just be thankful that not all 4 combinations were used.)

When using an extra database, I recommend using a custom --prefix to go with it. Without it, cabal would put files in $HOME/.cabal, which is unlikely what you want when you use an extra database. But it's your call.

You can stack up more extra databases, like this:

cabal install/configure --package-db=/database1 --package-db=/database2 --prefix=/files2 ghc/ghci -package-db=/database1 -package-db=/database2 ghc-pkg --package-db=/database1 --package-db=/database2

or

--global --user --package-db=/database1 --package-db=/database2

Priority goes backwards: /database2 has the highest priority (and also where cabal registers packages), then /database1, then implicitly user, then implicitly global. (Except that ghc-pkg does not do implicit user or global like everybody else, except that sometimes it does. Don't you love gratuitous inconsistencies? If you are confused here, it is not your fault.) In general, latter in the command line is higher priority. This is how overlaps are resolved (e.g., when two databases have HUnit-1.1).

If you want a non-standard priority, here is an example: user highest, then /database1, then global:

cabal install/configure --package-db=clear --package-db=global --package-db=/database1 --package-db=user ghc/ghci -clear-package-db -global-package-db -package-db=/database1 -user-package-db ghc-pkg --global --package-db=/database1 --user

--package-db=clear and -clear-package-db both mean “clear the stack, build it afresh from subsequent options”. Then we can control the stack order explicitly.

Some people want to disable user when sandboxing; apparently all authors of common sandboxers do, not just on themselves but also on everyone. (I don't understand why. Perhaps they assume that everyone makes user a mess, therefore it is to be avoided. (Ah but, how is a messy sandbox any better?) But I assume that you, after reading this article, is judicious with user, therefore it is reusable and useful.) If you want to disable, here is an example:

cabal install/configure --package-db=clear --package-db=global --package-db=/database1 ghc/ghci -clear-package-db -global-package-db -package-db=/database1

there is a shortcut

-no-user-package-db -package-db=/database1 ghc-pkg --global --package-db=/database1

It is also possible to disable global similarly, but impractical. GHC's foundational packages are there.

All these custom databases and orders could also be specified to GHC by the environment variable GHC_PACKAGE_PATH. However, cabal obstinately refuses to cooperate with it.

You can read more in GHC User's Guide, this section.

Corollary: Removing Packages

To remove a package, the most important step is using ghc-pkg unregister to update the metadata; deleting files is of secondary concern only. If a package should not be removed (yet) because other packages depend on it, you also get informed and denied by ghc-pkg unregister . So it is really important to not delete files first.

(If other packages depend on the package you want removed, you have the choice of giving up or removing those other packages. The latter requires you to compute and execute the transitive closure by hand.)

Here is an example. I remove a package called “binary-search” version 0.0. It was built for GHC 6.12.3 and installed as user.

ghc-pkg unregister binary-search-0.0 But beware of this bug. rm -rf $HOME/.cabal/lib/binary-search-0.0/ghc-6.12.3 Perhaps you also plan to rm -rf $HOME/.cabal/lib/binary-search-0.0 rm -rf $HOME/.cabal/share/doc/binary-search-0.0 You may proceed if you have only one GHC version or your other GHC versions don't register binary-search-0.0.

Sometimes your metadata are messed up — but thank God it happens that the mess is confined to the user packages — and you want to erase all user packages for a clean restart. Some people think rm -rf $HOME/.cabal will do. This is insufficient and unnecessary: insufficient because it fails to erase the metadata mess, and unnecessary because, if you plan to re-install the same packages anyway, you are not permanently freeing any disk space. The necessary and sufficient condition is

rm -rf $HOME/.ghc/ arch - ghcversion

where arch depends on your computer platform and ghcversion depends on your GHC version; but you can use an easy ls to find out.

Corollary: cabal install as root

sudo cabal install and other ways of running cabal install as root do the opposite of what many people presume without checking. To see this, recall that the global/user choice is user because you do not say --global . That user just happens to be an account called “root” with $HOME being “/root”. It follows that:

presumption reality storage /usr/local/bin, /usr/local/lib, … /root/.cabal/bin, /root/.cabal/lib, … metadata global list:

/usr/local/lib/ghc-*/package.conf.d user list:

/root/.ghc/*/package.conf.d conclusion system-wide root-only

Therefore, cabal install as root is pretty useless in practice.

If you want system-wide installs, the desired way is, as non-root, cabal install --global --root-cmd=sudo (or replace sudo by your favourite escalation command):

run as non-root to minimize privilege escalation

--root-cmd=sudo to say how to escalate privilege when cabal needs it

to say how to escalate privilege when needs it --global explicitly to choose global

Corollary: The Pigeon Drop Con



(by typoclass in IRC #haskell)

Imagine hypothetical packages conman-1.1, moneyholder-1.1, and pigeon-1.1 installed, where pigeon depends on moneyholder, and moneyholder depends on conman.

Some time later, the newer conman-1.2 comes out and you add it ( cabal install conman-1.2 ) because you are cavalier with “upgrading”. (Because of the dependence on ABI hashes, there can be no real upgrading, there can only be piling up more versions and confusion.)

Some more time later, for one reason or another, you re-install moneyholder-1.1. There are usually three reasons: you feel like doing it out of the blue; you do it in desperate hope (and in vain) of solving some package problem; or more normally, you add package swapper ( cabal install swapper ), and it depends on conman and moneyholder.

Then cabal-install reasons like this: you need conman, let's prefer the latest greatest conman-1.2; you also need moneyholder-1.1, let's re-build it against conman-1.2. But the new build is an ABI hash change! Your pigeon-1.1 will be hosed. It depends on moneyholder-1.1-oldhash, which will vanished. You will only have moneyholder-1.1-newhash around. Or moneyholder-1.1-nocash if you get my analogy (pigeon drop).

Therefore, modern cabal-install aborts the whole operation and warns you that it will break pigeon-1.1. This safety guard was added because this article explicates the problem to the public. Early versions of cabal-install went cavalier with the re-install and broke pigeon quietly.

If you disregard the breakage and hammer on with --force-reinstalls , in most cases you can use cabal install pigeon to repair it. Except…

Except there are two further problems in some cases. The first problem is perpetual version war described in Chris Smith's article The Butterfly Effect. The second problem is that sometimes pigeon-1.1 simply cannot be re-built.

Here is a concrete example, starring GHC 6.12.x, array-0.3.0.1, containers-0.3.0.0, ghc-6.12.x (“GHC API”, “GHC as library”), and QuickCheck>=2.1.0.3.

Since array-0.3.0.1 is “old”, you wantonly upgrade to the “bleeding edge” array-0.3.0.2 like it is a ground-breaking change. (It will be an earth-shattering disaster.) Since containers depends on array, next time you do something related to containers, for example cabal install binary which depends on containers, cabal-install will re-build containers-0.3.0.0 against array-0.3.0.2. ABI hash changes!

Now who are hosed when containers-0.3.0.0 is changed? Answer: Too many to list; but notably ghc-6.12.x depends on containers-0.3.0.0, so it is hosed; and QuickCheck depends on ghc-6.12.x if you use GHC 6.12.x, so it is also hosed. (This does not happen on GHC 7.) When ordinary packages are hosed, you expect some future event to trigger a re-build; but when ghc-6.12.x is hosed, it cannot be re-built: it is intimately tied to GHC, and it is not even on Hackage. There is nothing cabal-install can do about it.

To see the full problem report, use ghc -v ( ghc-pkg check does not notice any problem):

package Cabal-1.8.0.6-0fa5fba8bc5459391e6ec30b2b2ff632 is unusable due to missing or recursive dependencies: containers-0.3.0.0-ee442470d8dcc9e45f31677c400c5379 [...] package ghc-6.12.3-1d98765af6d253e91dfb24129b4e20b4 is unusable due to missing or recursive dependencies: Cabal-1.8.0.6-0fa5fba8bc5459391e6ec30b2b2ff632 bin-package-db-0.0.0.0-0dffb74a73bb78b5dc02ca941bbcbea0 containers-0.3.0.0-ee442470d8dcc9e45f31677c400c5379 hpc-0.5.0.5-3f3ed89da2117953d6ef3acc2332a32b template-haskell-2.4.0.1-bf08798b1934e4d6a3f903f58e0d5159 [...]

Do not wantonly upgrade packages; not piecemeal. If you upgrade, upgrade a whole fleet of extra packages (those not included in GHC) in sync as one single transaction, but still do not upgrade packages included in GHC unless it is part of upgrading GHC. I personally only upgrade at Haskell Platform release points.

Corollary: GHC 6.12.1 Bug

ABI hashes are introduced in GHC 6.12.* as a safety measure, so that worst comes to worst a package is hosed and GHC refuses to produce an executable, rather than agrees to produce an inconsistent executable that will later crash and launch missiles. Except that its implementation in 6.12.1 has a funny bug. It also involves global vs user.

When you have two instances of the same package same version installed, say X-5, the correct default behaviour of GHC picks the user instance and shadows the global instance. The bug in 6.12.1: pick the instance who has the bigger hash value. Example:

xmonad-contrib-0.9.1-4570c2899a5e9e70faf5054ed78d5702 xmonad-contrib-0.9.1-16a4fe3d427a319d7ad3405c264f50f4

The first instance is picked because its hash is bigger, even if it is global.

What's the problem with it? One single package does not show any problem; the problem always shows with a combination of packages. Below is a true story with true hashes. The victim used cabal install to install xmonad-contrib as user and got this:

id: xmonad-0.9.1-ef38b1d022aeba8679b59386e2bee835 id: xmonad-contrib-0.9.1-16a4fe3d427a319d7ad3405c264f50f4 depends: xmonad-0.9.1-ef38b1d022aeba8679b59386e2bee835 ...

The victim also used apt-get install to get xmonad-contrib from the Ubuntu 10.10 repo (probably out of curiosity or confusion), which became this as global:

id: xmonad-0.9.1-0eef453625fbb4f7d689ad94e41d456e id: xmonad-contrib-0.9.1-4570c2899a5e9e70faf5054ed78d5702 depends: xmonad-0.9.1-0eef453625fbb4f7d689ad94e41d456e ...

Now GHC 6.12.1 goes for the bigger hashes and picks xmonad-user and xmonad-contrib-global. xmonad-contrib-global needs xmonad-global but that's shadowed. This choice combination is unusable. GHC concludes by declaring xmonad-contrib not found. The output of ghc -v explains:

package xmonad-0.9.1-0eef453625fbb4f7d689ad94e41d456e is shadowed by package xmonad-0.9.1-ef38b1d022aeba8679b59386e2bee835 package xmonad-contrib-0.9.1-16a4fe3d427a319d7ad3405c264f50f4 is shadowed by package xmonad-contrib-0.9.1-4570c2899a5e9e70faf5054ed78d5702 package xmonad-contrib-0.9.1-4570c2899a5e9e70faf5054ed78d5702 is unusable due to missing or recursive dependencies: xmonad-0.9.1-0eef453625fbb4f7d689ad94e41d456e

The victim used GHC 6.12.1 because that's also in the Ubuntu 10.10 repo, which is really sad because Ubuntu 10.10 is already current at the time of writing (released just a month ago).

In retrospect, probably a lot of mysterious package problems reported in IRC #haskell in the past were also of this kind, since they also involved apt-get from Ubuntu, and they occurred during the time of Ubuntu 10.04, which also used GHC 6.12.1 (reasonably in this case).

The solution is to switch to GHC 6.12.3 and learn this lesson as one more reason not to use most Linux distro's GHC and related packages. Sadly most Linux distro's update cycles lag far behind GHC's debug cycle, so sticking to the distro does not buy you any reliability — far from it, you get unusability.

Corollary: unsafeInterleaveInstall

A popular advice suggests you to look for Haskell packages from your Linux distro first, and only if not there use cabal install . That is the most harmful advice ever. Even without the bug in the previous section, there are unsafe interleavings of the two kinds of installs.

Example: First, you want maccatcher, which is not in your Linux distro, and you use cabal install . This depends on binary, which you don't have yet, and cabal install grabs that too. The result is like this in user:

id: binary-0.5.0.2-300339c66a688207241e4643a9e17721 id: maccatcher-1.0.0-909ec4708b8344b205cdd15ddd3280f2 depends: binary-0.5.0.2-300339c66a688207241e4643a9e17721 ...

Next, you also want Agda, and (following popular harmful advice) you find and use libghc6-agda-dev by apt-get install . Now this also depends on binary, or more precisely libghc6-binary-dev, and apt-get install happily grabs that too. The result is like this in global:

id: binary-0.5.0.2-32d59ff8fdfc79aa888e82997612374d id: Agda-2.2.6-8c324824d5e0f9333c0deb2268ef7952 depends: binary-0.5.0.2-32d59ff8fdfc79aa888e82997612374d ...

This renders your shiny new Agda unusable: it needs binary-0.5.0.2-global, but whenever GHC starts, that's ignored and binary-0.5.0.2-user is chosen. The output of ghc -v shows:

package Agda-2.2.6-8c324824d5e0f9333c0deb2268ef7952 is unusable due to missing or recursive dependencies: binary-0.5.0.2-32d59ff8fdfc79aa888e82997612374d package binary-0.5.0.2-32d59ff8fdfc79aa888e82997612374d is shadowed by package binary-0.5.0.2-300339c66a688207241e4643a9e17721

This is the harm of the popular advice. Usually, it gets even better because as you find that Agda doesn't work, you panic and interleave more cabal install and apt-get install , widening your scope to other packages, adding --reinstall and --force-reinstalls to the mix, weaving a more convolved mess.

Exactly the same problem arises if you interleave cabal install --user and cabal install --global , since up to this point the working principle is getting a package installed as user first and then again as global.

Some people try to salvage the harmful advice by reasoning that interleaving cabal install --global and apt-get install avoids the problem and saves the part “try the distro first”. Ideally this is safer — in the sole sense that both installers eventually call ghc-pkg --global register to modify package metadata, which has a safety check against multiple package instances, and so the chronologically second installer aborts rather than adding damage. Except…

Fedora, Debian, and Ubuntu packages for GHC libraries do not call ghc-pkg --global register . They modify the metadata themselves and completely circumvent the safety check. So you can still get two instances of binary-0.5.0.2, and you are just as hosed. This one is particularly treacherous.

There is no hope trying to mix distro installation with cabal installation. The distro installer assumes it has the monopoly; cabal assumes there is no monopoly. They are fundamentally in contradiction.

The only safe ways to use the distro and cabal-install are:

Designate two points of time A,B such that: before A, obtain from the distro exclusively; between A and B, cabal install --global exclusively; after B, cabal install --user exclusively. Never interleave the three.

There is a safe way to interleave. Defeat the whole point of automatic dependency chasing, and chase dependency yourself. You manually examine the list of packages to be brought in transitively to ensure there will be no duplication, and you manually tune packages if you foresee duplication.

I have more Haskell Notes and Examples