What's new in Linux-2.6.33-libre

We don't maintain the Linux-libre source files directly. Instead we maintain “deblobbing” scripts that clean Linux “sources”, thus producing Linux-libre sources. The main improvement in this generation of Linux-libre, the fourth since we got involved, consisted of making the deblobbing scripts more efficient.

As we accumulated thousands of patterns to recognize blobs, sequences that look like blobs but that aren't, requests for non-Free firmware external to Linux, and documentation that induces users to install it, running the GNU sed script generated to locate and remove blobs became too expensive for many users: in recent releases of Linux-libre, GNU sed took some 15 seconds and more than 2GB of RAM to compile all the patterns in the script.

The solution was to rewrite the main script in higher-level scripting languages. GNU awk reduced the start-up time to about 3 seconds, and memory requirements dropped by an order of magnitude, but 3 seconds multiplied by the 260 files that get cleaned up with this script to form Linux-2.6.33-libre is a lot of time to waste. Python and PERL compile our huge collection of patterns in tenths of a second, while reducing memory use by almost another order of magnitude. However, internal limits in PERL's pattern matching algorithm produce incorrect results in deblob-check, so using it with PERL is not recommended for now.

For deblob-main's cleaning-up of small files in Linux, Python was determined to be fastest, which is why it is the new default. For verifying that a large tarball is clean, Python and PERL's run-time jump to more than 90 minutes, up from 5 minutes with GNU awk and as little as 3 minutes with GNU sed. GNU awk comes ahead when listing all the blobs in a Linux tarball, now with a long-wished feature: printing before each blob the name of the file within the tarball that contains it.

Future releases may be smarter in choosing suitable backend depending on task and inputs. For now, users of deblob-check should be aware of the new flags: –use-python, –use-awk, –use-perl, and –use-sed, and the corresponding environment variables PYTHON, AWK, PERL, and SED.

The lower memory footprint and CPU requirements for checking and cleaning up individual files means it is again possible to clean up Linux trees on the fly, which a number of users used to find valuable.

Over the next few days, we'll also roll out Linux-libre, generation 4, for earlier Linux releases, fixing a few deblobbing errors in staging drivers and catching a few more occurrences of non-Free blob names in documentation and error messages.