MP3 Feed | OGG Feed | iTunes Feed | Video Feed | HD Vid Feed | HD Torrent Feed

Finally other developers who have started, motivated and did a lot of work getting us here like Joerg Sonnenberger and Thomas Klausner for their work on reproducible builds, and Todd Vierling and Luke Mewburn for their work on build.sh.

I would also like to acknowledge the work done by the Debian folks who have provided a platform to run, test and analyze reproducible builds. Special mention to the diffoscope tool that gives an excellent overview of what’s different between binary files, by finding out what they are (and if they are containers what they contain) and then running the appropriate formatter and diff program to show what’s different for each file.

I have been working on and off for almost a year trying to get reproducible builds (the same source tree always builds an identical cdrom) on NetBSD. I did not think at the time it would take as long or be so difficult, so I did not keep a log of all the changes I needed to make. I was also not the only one working on this. Other NetBSD developers have been making improvements for the past 6 years. I would like to acknowledge the NetBSD build system (aka build.sh) which is a fully portable cross-build system. This build system has given us a head-start in the reproducible builds work.

Last week I gave a talk for the security class at Notre Dame based on features are faults but with some various commentary added. It was an exciting trip, with the opportunity to meet and talk with the computer vision group as well. Some other highlights include the Indiana skillet I had for breakfast, which came with pickles and was amazing, and explaining the many wonders of cvs to the Linux users group over lunch. After that came the talk, which went a little something like this.

I got started with OpenBSD back about the same time I started college, although I had a slightly different perspective then. I was using OpenBSD because it included so many security features, therefore it must be the most secure system, right? For example, at some point I acquired a second computer. What’s the first thing anybody does when they get a second computer? That’s right, set up a kerberos domain. The idea that more is better was everywhere. This was also around the time that ipsec was getting its final touches, and everybody knew ipsec was going to be the most secure protocol ever because it had more options than any other secure transport. We’ll revisit this in a bit.

There’s been a partial attitude adjustment since then, with more people recognizing that layering complexity doesn’t result in more security. It’s not an additive process. There’s a whole talk there, about the perfect security that people can’t or won’t use. OpenBSD has definitely switched directions, including less code, not more. All the kerberos code was deleted a few years ago.

Let’s assume about one bug per 100 lines of code. That’s probably on the low end. Now say your operating system has 100 million lines of code. If I’ve done the math correctly, that’s literally a million bugs. So that’s one reason to avoid adding features. But that’s a solveable problem. If we pick the right language and the right compiler and the right tooling and with enough eyeballs and effort, we can fix all the bugs. We know how to build mostly correct software, we just don’t care.

As we add features to software, increasing its complexity, new unexpected behaviors start to emerge. What are the bounds? How many features can you add before craziness is inevitable? We can make some guesses. Less than a thousand for sure. Probably less than a hundred? Ten maybe? I’ll argue the answer is quite possibly two. Interesting corollary is that it’s impossible to have a program with exactly two features. Any program with two features has at least a third, but you don’t know what it is

My first example is a bug in the NetBSD ftp client. We had one feature, we added a second feature, and just like that we got a third misfeature

Our story begins long ago. The origins of this bug are probably older than I am. In the dark times before the web, FTP sites used to be a pretty popular way of publishing files. You run an ftp client, connect to a remote site, and then you can browse the remote server somewhat like a local filesystem. List files, change directories, get files. Typically there would be a README file telling you what’s what, but you don’t need to download a copy to keep. Instead we can pipe the output to a program like more. Right there in the ftp client. No need to disconnect.

Fast forward a few decades, and http is the new protocol of choice. http is a much less interactive protocol, but the ftp client has some handy features for batch downloads like progress bars, etc. So let’s add http support to ftp. This works pretty well. Lots of code reused.

http has one quirk however that ftp doesn’t have. Redirects. The server can redirect the client to a different file. So now you’re thinking, what happens if I download http://somefile and the server sends back 302 http://|reboot. ftp reconnects to the server, gets the 200, starts downloading and saves it to a file called |reboot. Except it doesn’t. The function that saves files looks at the first character of the name and if it’s a pipe, runs that command instead. And now you just rebooted your computer. Or worse.

It’s pretty obvious this is not the desired behavior, but where exactly did things go wrong? Arguably, all the pieces were working according to spec. In order to see this bug coming, you needed to know how the save function worked, you needed to know about redirects, and you needed to put all the implications together.