When you see a CVS commit message like this:

You really ought to wonder (and drool) about the possible goodies waiting for you in the upcoming OpenBSD 4.7 release. Marc Espie took the time to let us know about some of the changes and enhancements to the package system.

The program that creates "browsable" man pages on www.OpenBSD.org has a small problem with the file/page names of the original perl pod files, so they're not available in the typical "pretty format" for web browsers. Your options are to read the pod files without markup on the web, or better, install -current and read them on your own system.

Marc Espie (espie@)



The pkg_add changes leading to OpenBSD 4.7 started a little bit before 4.6 came out. I was playing with changes at that time, but Theo changed the schedule of the 4.6 freeze to accommodate the network hackathon, and I decided to sit on it, as it was not ready at the time.

Back then, pkg_add did handle updates in a very simple way. An old package, a new package of the same name. To update the old package, extract the new package, delete the old files, and move the new files to their final location. I did know it was totally incomplete and did not deal with a lot of weird scenarios.

Adding update capabilities had been done in the OpenBSD way. First, replace the old pkgtools with the new perl version. Provide the required annotations so that eventually, we would be able to deal with updates (mostly shared libs handling), then allow for single package replacement, and finally add a pass that would discover how to replace each package.

Enter the notion of UpdateSets. In many cases, I did know that updating a single package at a time was not enough. I would have to update several packages at the same time. For instance, some older kdelibs update would move files from kdelibs to kdebase. There was no sane way to deal with that without updating both kdelibs and kdebase at the same time. Up to 4.6, pkg_add was cheating, by temporarily removing kdebase, updating kdelibs, and putting kdebase back in.

The second problem was exceptional situations. At times, packages would be renamed, and pkg_add would not be able to find anything without user help. There was a long term plan for that by having a special package, known as quirks, that could get wedged in in most of pkg_add computation. This would be a bit more of perl code, with data mixed in. And this would evolve independently from the main tools. (having perl code meant I could change its internals as I wanted and make it very compact if need be. And I could also easily change the API without breaking anything).

But in order to do that, I would need to handle quirks up-and-foremost. Discover there might be a new one, update the old one, and THEN do the rest.

This meant that I had to have incremental updates. Instead of discovering all updates, and then proceed, I would have to do updates on the fly.

So, there were quite a few changes to do to the overall structure of pkg_add: create updatesets with "old packages" for everything I would want to update, and ask the Update module what should get updated to what in an incremental way.

Eventually, this started working. And then, things got deeply complicated. At that point, I started figuring out that updatesets could get more complex, as some packages would need to have several new packages as a replacement, or as files would move, I would have to pull two updatesets together, and figure out an update for both.

I got something nice "for free": quirks could know about old ports that got folded as part of the base system, and it would automatically remove the old port. That part was real easy to do.

Enter libfam. As you might know, libfam is a systems monitoring tool, initially from sgi. And it recently got superseded by libgamin. This would be great, except that the name did change, and so it needed a special exception in quirks.

Enter dependencies. In the normal case, you have a package A that depends on libfam. If you update libfam to gamin, we used to cheat a bit (we said "oh, let's update A as well, but we'll deal with it later"). With new and shiny updatesets, the idea was to update both. Oh, old A depends on libfam, so let's create an updateset that contains both old A and libfam, and we'll update it to gamin and new A.

Turned out to be ways more fun than anticipated. Because there were lots of packages that depended on libfam. And a lot of them did have extra dependencies, and I would end up with endless loops in updatesets.

After a bit of scratching my head, I did solve the looping problem. I would have to merge things with weird inter-dependencies... and end up with a jumbo updateset with ~50 packages. Yes, libfam to libgamin was THAT bad. On a machine fully loaded with gnome and kde, you would have to update 50 packages in one go.

All would be well, except it was unbearably slooow. Computing things would take several hours on a fast machine.

So back to the drawing board. By the way, I have to recommend Devel::NYTProf, it is the best profiler I've ever used, and it really helped in there.

Turns out my code was spending a huge amount of time recomputing inter-dependencies. So, I had to revamp the i dependency solver, and the pkgspec handler. (Ed: see the man pages for OpenBSD::RequiredBy and OpenBSD::PkgSpec)

As far as pkgspec go, it was mostly a question of objects with no optimizations. I would create maybe 6000 pkgspecs/ pkgnames, most of them duplicates of existing ones. So, applying standard tricks of recognizing duplicates, and creating unique objects would help a great deal. In fact, this is the biggest contribution to making pkg_add go faster in 4.7 (it does not help that I had a bug in my regexp code that would totally disable an optimization in there... the profiler directed me straight to that bug). (Ed: see man page for OpenBSD::PackageName)

As far as the solver went, I would have to do things incrementally. So the solver was almost entirely rewritten to do things that way. There were some fun bugs during the rewrite, all of them leading to loops in the way pkg_add did things. So, at that point, I had solved two features:

pkg_add would deal with bizarre update scenarios, merging updatesets as needed, and no longer cheating with anything.

updates would be incremental. Instead of spending two hours discovering things, and then starting to work, pkg_add would start updating things almost right away.

The unexpected benefit was that I had to optimize a lot of code for fairly infrequent scenarios (I don't expect a lot of fam -> gamin scenarios to creep in), but this optimization would be useful in every case. pkg_add is a lot faster now. You won't notice it in many cases, since it still need to actually DO work, like fetching packages and extracting files, but the part where it sits around, apparently doing nothing, is almost totally gone now.

Things you will definitely notice is tied updates. All database users will see postgresql-client/server updating together (since those are usually tied), and mysql as well. What's very cool is that there is no special-case code for these. The engine can deal with an amazing amount of shit.

That's not all that changed for 4.7. With the incremental updates, failed pkg_add became less painful, but still a bit annoying. Haven't you ever whined at pkg_add stopping half-way through because of a problem on one single package ? Making pkg_add ways more fault-tolerant became very important. There are actually yet more changes (mostly to vstat.pm) to transform fatal errors into non fatal errors. If an update fails, pkg_add will just mark the UpdateSet as failed, and keep going on other stuff it can still deal with (of course, the update failure may trigger other problems, but it will update everything it can, and that's usually very useful).

There was also that old question of version numbering. pkg_add now actually knows about version number orders, and will no longer downgrade anything (unless you explicitly ask for it, of course).

So that's it for a summary of what changed between 4.6 and 4.7. There are hundreds more of small bugfixes, a lot of them triggered by the big changes.

If we talk about the future, this is not over yet. There are still a few border-case scenario where pkg_add can go into a loop. Those will get fixed. pkg_delete is now lagging behind a bit. For instance, we have orphaned packages, and we could deal with it (check your pkg installation for @option manual-install, any package that doesn't have it was installed automatically, and could be orphaned when you remove the stuff that requires it).

I will now probably work on better error messages. As I said, pkg_add will keep going after errors. But right now, it doesn't consider that when outputting error messages. So, for instance, if dbus doesn't manage to update, you'll see a flurry of other error messages that say the dbus library doesn't match, with lots of extra details. Those details don't matter. pkg_add will become able to trace that to the failed dbus install, and simply tell you that it can't update such because the dbus update failed. (this is what you would deduce of all the noise you will see anyways).

I'm also working on another very exciting project, aka "new dpb". But that's definitely meat for another article. ;-)