Unused code is untested code, which probably means that it harbors bugs—sometimes significant security bugs. That lesson has been reinforced by the recent OpenSSH "roaming" vulnerability. Leaving a half-finished feature only in the client side of the equation might seem harmless on a cursory glance but, of course, is not. Those who mean harm can run servers that "implement" the feature to tickle the unused code. Given that the OpenSSH project has a strong security focus (and track record), it is truly surprising that a blunder like this could slip through—and keep slipping through for roughly six years.

The first notice of the bug was posted by Theo de Raadt on January 14. He noted that an update was coming soon and that users could turn off the experimental client roaming feature by setting the undocumented UseRoaming configuration variable to "no". The update was announced by Damien Miller later that day. It simply disabled the roaming feature entirely, though it also fixed a few other security bugs as well. The problems have been present since the roaming feature was added to the client (but not the server) in OpenSSH 5.4, which was released in March 2010.

The bug was found by Qualys, which put out a detailed advisory that described two separate flaws, both of which were in the roaming code. The first is by far the most dangerous; it is an information leak that can provide the server with a copy of the client system's private SSH keys (CVE-2016-0777). The second is a buffer overflow (CVE-2016-0778) that is "unlikely to have any real-world impact" because it relies on two non-default options being used by the client ( ProxyCommand and either ForwardAgent or ForwardX11 )

The private keys of an SSH client are, of course, the most important secret that is used to authenticate the client to a server where the corresponding public key has been installed. An attacker who has that private key can authenticate to any of the servers authorized by the user, assuming that there is no second authentication factor required. So they can effectively act as that user on the remote host(s). It should be noted that password-protected private keys are leaked in their encrypted form, which would still allow an attacker to try to break the passphrase offline. Also, if an agent such as ssh-agent is used, no key material is leaked.

The Qualys advisory includes patches to the OpenSSH server that implement a proof of concept of what a malicious server could do. The proof of concept is incomplete as there are environment-variable parameters used in the examples in the advisory that are not present in that code (notably, "heap_massaging:linux").

At its core, the problem in the client code (aside from still being present long after the server side was removed) is that it uses a server-supplied length to determine the size of a buffer to allocate—without much in the way of sanity checks. It also allocates the buffer using malloc() , which doesn't clear the memory being allocated.

The roaming feature is meant to handle the case when the SSH connection is lost (due to a transient problem of some sort) and allow the client to reconnect transparently. The client stores data that it has sent, but may not yet have been received by the server (and might get lost during the interruption). After the reconnect, the server can request that the client "resend" a certain number of bytes—even if the client never sent that many bytes. The server-controlled offset parameter can be used to trick the client into sending the entire contents of the buffer even though it has not written anything to it, thus leaking the data that was previously stored there.

So malicious servers can offer roaming to clients during the key-exchange phase, disconnect the client, then request a whole buffer's worth of data be "resent" after reconnection. There are some conditions that need to be met in order to exploit the flaw that are described in the advisory, such as "heap massaging" to force malloc() to return sensitive data and guessing the client send buffer size. But Qualys was able to extract some private key information from clients running on a number of different systems (including OpenBSD, FreeBSD, CentOS, and Fedora).

Qualys initially believed that the information leak would not actually leak private keys for a few different reasons. For one, the leak is from memory that has been freed, but is recycled in a subsequent allocation, rather than reading data beyond the end of a buffer, such as in a more-typical buffer overflow. In addition, OpenSSH took some pains to clear the sensitive data from memory.

It turns out that some of those attempts to clear sensitive information (like private keys) out of memory using memset() and bzero() were optimized away by some compilers. Clang/LLVM and GCC 5 use an optimization known as "dead store elimination" that gets rid of store operations to memory that is never read again. Some of the changes in the OpenSSH update are to use explicit_bzero() to avoid that optimization in sensitive places.

But a much bigger factor in disclosing the key information is the use of the C library's standard I/O functions—in this case fopen() and friends. The OpenSSH client uses those functions to read in the key files from the user's .ssh directory; they do buffered I/O, which means they have their own internal buffers that are allocated and freed as needed. On Linux, that's not a problem because the GNU C library (Glibc) effectively cleanses the buffers before freeing them. But on BSD-based systems, freed buffers will contain data from previous operations.

It is not entirely clear why Qualys was able to extract key information on Linux systems given the Glibc behavior. The advisory does note that there may be other ways for the key material to leak "as suggested by the CentOS and Fedora examples at the end of this section".

Beyond that, OpenSSH versions from 5.9 onward read() the private key in 1KB chunks into a buffer that is grown using realloc() . Since realloc() may return a newly allocated buffer, that can leave partial copies of the key information in freed memory. Chris Siebenmann has analyzed some of the lessons to be learned from OpenSSH's handling of this sensitive data.

Interactive SSH users who were communicating with a malicious server might well have noticed a problem, though. The OpenSSH client prints a message, "[connection suspended, press return to resume]", whenever a server disconnect is detected. Since causing a disconnect is part of tickling the bug, that message will appear. It would likely cause even a non-savvy user to wonder—and perhaps terminate the connection with Ctrl-C, which would not leak any key information.

But a large number of SSH sessions are not interactive. Various backup scripts and the like use SSH's public-key authentication to authenticate to the server and do their jobs, as does the SSH-based scp command. As Qualys showed, those can be tricked into providing the needed carriage return to resume the connection. Thus they are prime targets for an attack using this vulnerability.

While the bug is quite serious, it is hard to believe it wouldn't have been found if both sides of the roaming feature had been rolled out. Testing and code inspection might have led the OpenSSH developers to discover these problems far earlier. It was presumably overlooked because there was no server code, so it "couldn't hurt" to have the code still present in the client. Enabling an experimental feature by default is a little harder to understand.

For a project that "is developed with the same rigorous security process that the OpenBSD group is famous for", as the OpenSSH security page notes, it is truly a remarkable oversight. It also highlights a lack of community code review. We are sometimes a bit smug in the open-source world because we can examine all of the security-sensitive code running on our systems. But it appears that even for extremely important tools like OpenSSH, the "can" does not always translate into "do". It would serve us well to change that tendency.

Companies and organizations like Qualys are likely to have done multiple code audits on the OpenSSH code over the last six years. Attackers too, of course. The latter are not going to publish what they find, but security researchers generally do. A high-profile bug like this in a security tool that is in widespread use is exactly the kind of bug they are looking for, so it is surprising this was missed (in white hat communities, anyway) for so long. In hindsight, leaving the unused code in the client seems obviously wrong—that's a lesson we can all stand to relearn.

Comments (49 posted)

Verifying the authenticity of a digital image is no simple task, given the array of image-manipulation tools available and the difficulty inherent in tracing the provenance of any digital file. But attempting to establish the origin and veracity of a photo is not a lost cause. The non-profit organization Sourcefabric, which produces open-source journalism tools, has developed a web-based utility called Verified Pixel that attempts to make assessing an image's reliability an attainable goal.

The Verified Pixel site notes that news organizations are increasingly struggling with how to verify the authenticity of smartphone footage and other images captured by eyewitnesses' cameras in the field. A lot of news organizations solicit user-contributed images, which results in a glut of input and not enough time to perform forensic analysis of every photo. This is in spite of the fact that there are several well-known forensic tools available.

In April 2015, Sourcefabric and Eyewitness Media received a grant from the Knight Foundation to develop an image-verification web service to address the problem. Verified Pixel is the result of that effort. After teasing the prototype on Twitter in October 2015, Sourcefabric began rolling out access to a test server to beta testers in January 2016. I requested an invite, then spent some time kicking its tires and asking questions of the development team.

The source code is available on GitHub, although the dependencies are likely large enough to ward off casual users. Verified Pixel is implemented as a module for Sourcefabric's newsroom-management tool Superdesk (although the beta is currently a standalone web application, the intent is likely to make Verified Pixel a standard Superdesk component). The server side is written in Python, with an AngularJS client front-end. Like Superdesk, Verified Pixel is designed to be run on an organization's own server; the test server used for the prototype was maintained by Sourcefabric simply to solicit feedback on the program's functionality.

The idea is for Verified Pixel to provide a consistent, multi-user workflow: any user can upload images to the database, and a flexible battery of verification tests will be run to assess each image's reliability. Users can then flag individual images as suspect or potentially valid based on the scores of the validation tests. Authors or editors can subsequently select images to use taking those test results into account.

My pixel is my passport, verify me

The current battery of tests includes three online services: Google's reverse-image search (which will find suspiciously similar images if they exist), TinEye (which will attempt to locate probable duplicates of an image and determine which is the oldest), and Izitru (which runs a set of forensic tests to determine if an image has been edited). In addition, Verified Pixel automatically extracts any Exif data from the image and highlights key fields (for example, plotting GPS locations and camera direction, if available, on a map).

All users can add comments to images in the database, and the image database can be searched on all metadata fields, including the verification-test results. Images can also be grouped into collections several layers deep; the topmost layer is called a "desk," which is representative of a separate office within a publication's newsroom.

The Google and TinEye search results can help attest to whether or not an image has been published previously, which presumably would not be the case for a user-submitted photo of a current event. Such a test would still have value even for images that are not coming in as part of a breaking news story, such as detecting stock photos that were presented as original coverage.

The Izitru service runs six forensic tests and returns a "trust rating" on a one-to-five scale that corresponds to whether or not the image is in a "direct from camera" state or has been modified after the fact. Unfortunately, the service does not detail what the six tests are, although the Verified Pixel test-results page provides some rough descriptions. They include tests of how JPEG data is packaged within the file (based on differences between how cameras and software package JPEGs), tests to detect the traces of camera sensor patterns, and tests that look for artifacts of recompression. Founder Harry Farid has authored a lengthy list of research papers on image forensics that likely cover the same ground.

These tests also depend, at least in part, on analysis of known camera characteristics. The FAQ page notes that "each digital camera has distinct ways of applying JPEG settings when saving a file. Other differences result from artifacts that are introduced when images are saved multiple times." The site elsewhere says that it "relies on a database of 'signatures' that describe the distinct ways that different camera models store their JPEG files" and that cameras not in the database will likely not receive the highest trust rating.

So Izitru can rate the likelihood that an image is unmodified; if the image has been modified, though, further testing is required to determine what the modifications were. Simple resizing is not particularly bad, while adjusting color curves or editing for content are far more serious issues. Izitru is limited to analyzing JPEG files, which may be of concern to some users who would be interested in support for camera raw files. But the JPEG limitation is in line with what many news organizations ask from user-contributed content; in November 2015, the Reuters news service announced it would only accept image files from freelancers that were shot as JPEG.

Looking forward

For free-software developers, however, the service's proprietary and secret test battery is likely a larger issue. The good news is that Verified Pixel is designed to support pluggable test modules, and Sourcefabric's Douglas Arellanes said support for additional verification services is still to come. In an email, Arellanes said that the project spent a fair amount of time working with both OpenCV and ccv on modification-detection and other tasks using machine vision, but ultimately found that it was not a priority for users:

But for a lot of the basic things verifiers need to answer - like "has this image been published elsewhere on the Internet?" or "is this image altered?" it really made more sense to use APIs to services like Google reverse image search or to Izitru, because we were not going to be able to duplicate those any time soon.

Still, he said that he hopes to attract code contributions to support open-source libraries, since there is a lot that could be applicable. The team has an informal wishlist of additional tests it would like to see, such as automatic image tagging to recognize images containing traumatic content and a method to compare weather data (based on geolocation and timestamps) with the apparent conditions indicated in an image. The traumatic-content issue is one that Verified Pixel developer Sam Dubberley has explored in his own research.

But, for now, the project is focused on getting real-world feedback from news organizations. Arellanes said that "we need to see where the bottlenecks are for image verifiers - is it in workflow and getting the images into newsrooms' CMSes? Is it in making metadata easier? This is what we'd like to glean" from testing. He also added that Verified Pixel may be useful beyond the initial newsroom use case. "We think it has uses in a human rights context as well as in any situation requiring image verification - insurance claims, for example."

The beta runs its test battery on new uploads quickly, and searching and sorting are both painless. Given that the Google and TinEye assessments rely on a massive corpus of published-image data, it is hard to imagine comparable tests that do not dictate the use of a proprietary service. The Izitru modification-detection tests, though, could face stiff competition from other libraries if the project attracts a development community. It the meantime, it is clear to see how automating all such tests and collating the results can help users—in a newsroom or elsewhere—simplify the job of sifting through digital images of unknown trustworthiness.

Comments (14 posted)

In 1992 I interviewed Linus Torvalds for my little Linux newsletter, Linux News. That was traumatic enough that I haven't interviewed anyone since, but a few months ago I decided that it would be fun to interview Joey Hess, who kindly agreed to it.

Joey is known for many things. He wrote, alone and with others, debhelper, its dh component, and the debian-installer, as well as other tools like debconf. He wrote the ikiwiki wiki engine; and co-founded (with me) the Branchable hosting service for ikiwiki. He wrote git-annex, and ran a couple of successful crowdfunding campaigns to work on it full time. He lives in the eastern US, off-grid, in a solar-powered cabin, conserving power when necessary. He has retired from Debian.

The interview was done over email. This is a write-up of the messages into a linear form, to make it easier to read. All questions are by me, all answers by Joey, except in some places edited by me. The interview took several months, because I had a lot of other things I was doing, so sometimes it took me weeks to ask my next question. At one point, Joey pointed out that "the interviewee may become a different person than at the beginning".

Most of the credit for this interview goes to Joey. I merely asked some questions and wrote up the answers.

Lars: You were one of the most productive and respected Debian developers for a very long time. What made you want to leave the project?

Joey: A hard question to start with! Probably you didn't mean for it to be a hard question, but I guess you've read my blog post on leaving and my resignation, and it seems they didn't answer the question well enough. Perhaps they hint vaguely at problems I saw without giving enough detail, or suggest I had some ideas to solve them. And so, I guess, you (and others) ask this question, and I feel I should do my best to find an answer to it.

Thing is, I don't know if I can answer it well. Our experience of big problems can seem vague (recall the blind men and the elephant). Where I had good ideas, I had a very long time indeed to try to realize them, and firing all my dud ideas off as parting shots on the way out is not likely to have achieved much.

I do have the perspective now for a different kind of answer, which is that if I'd known how bothersome the process of leaving Debian turns out to be, I might not have bothered to formally leave.

Perhaps it would be easier to stop participating, just let things slide. Easier to not need to worry about my software going unmaintained in Debian; to not worry about users (or DNS registrars) who might try to contact me at my Debian email address and get a ugly "Unrouteable address" bounce; to not feel awkward when I meet old friends from Debian.

But, if I'd have gone that route, I'd lack the perspective I have now, of seeing Debian from the outside. I'd not have even the perspective to give this unsatisfying answer.

Lars: From the blog post, I understand that you prefer to work on smaller projects, where it's easier to make changes. Or perhaps I'm over-interpreting, since that's a feeling I have myself. I have, from time to time, spent a bit of thought on ways to make things better in Debian in this regard. My best idea, mostly untried, is to be able to branch and merge at the distro level: any developer (preferably anyone, not just Debian developers) could do what is effectively " git checkout -b my/feature/branch ", make any changes they want in as many packages as they want, have an easy, effective way to build any .debs affected by the changes, and test. If the changes turn out to be useful, there would be a way to merge the source changes back. Do you have any thoughts on that?

Joey: I'm fairly addicted to that point in development of a project where it's all about exploring a vast solution space, and making countless little choices that will hopefully add up to something coherent and well thought out and useful. Or might fail gloriously.

Some projects seem to be able to stay in that state for a long time, or at least re-enter it later; in others it's a one-time thing; and in less fun areas, I hear this may never happen in the whole life cycle of an enterprise thingamajig.

Nothing wrong with the day-to-day work of fixing bugs and generally improving software, but projects that don't sometimes involve that wide-open sense of exploration are much less fun and interesting for me to work on.

Feels like a long time since I got much of that out of working on Debian. It certainly happened back in debian-installer days, and when I added dh to debhelper (though on a smaller scale), but I remember it used to all seem much more wide open.

I don't think this is entirely a social problem; technology is very important too. When I can make changes to data types and a strong type system lets me explore the complete ramifications of my my changes, it's easier to do exploratory programming in an established code base than when I'm stumbling over technical debt at every turn. But I feel in the case of Debian, a lot of it does have to do with accumulated non-technical debt.

Lars: You mention a strong type system, and you're known as a Haskell programmer. Previously you used Perl a lot. How would you compare programming in Haskell versus Perl? Especially on non-small programs, such as debhelper and ikiwiki versus git-annex? All are, by now, quite mature programs with a long history.

Joey: It's weird to be known as a Haskell programmer, since I still see myself as a beginner, and certainly not an exemplar. Indeed, I recently overheard someone complaining about some code in git-annex not being a good enough example of Haskell code, to merit whatever visibility it has on GitHub.

And they were right, this code is bad code in at least 3 ways; it's doing a lot of imperative I/O work, it's complicated by a hack that was put in to improve behavior without breaking backwards compatibility, and it implements half of an ad-hoc protocol, with no connection to the other half. There should be a way to abstract it out to higher level pure code, something more like this code.

So, I can write bad code in either language. But, I couldn't see so many of the problems with my bad Perl code. And, it's a lot more sane to rework bad Haskell code into better code, generally by improving the types, to add abstractions and preventing whole classes of problems from happening, and letting that change seep out into the code. And I continue to grow as a Haskell programmer, in ways that just didn't happen when I was writing Perl.

A couple other differences that I've noticed:

When I get a patch to a Haskell program, it's oh so much easier to tell if it's a good patch than when I get a patch in some other language.

My Haskell code often gets up to a high enough level of abstraction that it's generally reusable. Around 15% of the code in git-annex is not specific to it at all, and I hope to break it out into libraries.

For example, here is a library written for the code I linked to, and then reused in two other places in git-annex. Maybe three places if I get around to fixing that bad code I linked to earlier. Debconf contains an implementation of basically the same thing, but being written in Perl, I never thought to abstract it for reuse this way.

Lars: Speaking of Haskell, what got you interested in it initially? What got you to switch?

Joey: I remember reading about it in some blog posts on Planet Debian by John Goerzen and others, eight or nine years ago. There was a lot of mind-blowing stuff, like infinite lists and type inference. And I found some amazing videos of Simon Peyton Jones talking about Haskell. So I started to see that there were these interesting and useful areas that my traditional Unix programming background barely touched on or omitted. And, crucially, John pointed out that ghc can be used to build real world programs that are as fast and solid as C programs, while having all this crazy academic stuff available.

So, I spent around a year learning the basics of Haskell — very slowly. Didn't do much with it for a couple of years because all I could manage were toy programs and xmonad configurations, and I'd get stuck for hours on some stupid type error.

It was actually five years ago this week that I buckled down and wrote a real program in Haskell, because I had recently quit my job and had the time to burn, even though it felt like I could have dashed off in Perl in one day what took me a week to write in Haskell. That turned out to be git-annex.

After around another three years of writing Haskell, I finally felt comfortable enough with it that it seemed easier than using other languages. Although often mind-blowing still.

Lars: Haskell has a strong, powerful type system. Do you feel that does away with the need for unit testing completely? Do you do any unit testing, yourself? How about integration testing of an entire program? If you do that, what kind of tool do you use? Have you heard of my yarn tool and if so, what are your opinions on that?

Joey: It's a myth that strongly typed or functional programs don't need testing. Although they really do sometimes work correctly once you get them to compile, that's a happy accident, and even if they do, so what — some future idiot version of the savant who managed that feat will be in the code later and find a way to mess it up.

Often it's easier to think of a property that some data would have, and write a test for it, than would be to refine the data's type to only allow data with that property. Quickcheck makes short work of such tests, since you can just give it the property and let it find cases where it doesn't hold.

My favorite Quickcheck example is where I have two functions that serialize and deserialize some data type. Write down:

prop_roundtrip val = deserialize (serialize val) == val

and it will automatically find whatever bugs there are in edge cases of the functions. This is good because I'm lazy and not good at checking edge cases. Especially when they involve something like Unicode.

Most of my unit testing is of the Quickcheck variety. I probably do more integration testing overall though. My test infrastructure for git-annex makes temporary git repositories and runs git-annex in them and checks the results. I'm not super happy with the 2000 lines of Haskell code that runs all the tests, and it's too slow, but it does catch problems from time to time and by now checks a lot of weird edge cases due to regression tests.

I generally feel I'm quite poor at testing. I've never written tests that do mocking of interfaces, all that seems like too much work. I don't always write regression tests, even if when I don't manage to use the type system to close off any chance of a bug returning. I probably write an average of one to five tests a month. Propellor has twelve-thousand lines of code that runs as root on servers and not a single test. I'm not really qualified to talk about testing, am I?

I've read the yarn documentation before, and it's neat how it's an executable human readable specification. I'd worry about bugs in the tests themselves though, without strong types.

The best idea I ever had around testing is: put the test suite in your program, so it can be run at anytime, anywhere. Being able to run " git annex test " or ask users to run it is really useful for testing how well git-annex gets on in foreign environments.

Lars: One of the things you're known for, and which repeatedly is remarked on by Hacker News commenters, is that you live off the grid on the middle of the wilderness, relying on a dial-up modem for Internet. You've blogged about that. What led you on this path? What is your current living situation? Why do you stay there? Do you ever think about going somewhere to live in a more mainstream fashion? What are the best and worst things about that lifestyle?

Joey: I seem to have inverted some typical choices regarding life and work...

Rather than live in a city and take vacations to some rustic place in the country, I live a rustic life and travel to the city only when I want stimulation. This gives me a pleasant working environment with low distractions, and is more economical.

Rather than work for some company on whatever and gain only a paycheck and a resume, I work because I want to make something; the resulting free software is my resume, and the money somehow comes when someone finds my work valuable and wants it to continue. (Dartmouth College at the moment.)

Right now I'm renting a nice house with enough woods surrounding it to feel totally apart, located in a hole in the map that none of the nearby states of Tennessee, Virginia, or Kentucky have much interest in, so it's super cheap. It's got grape arbors and fruit trees, excellent earth-sheltered insulation, ancient solar panels and a spring and phone line and not much else by way of utilities or resources. I haul water, chop firewood, and now in the waning days of the year, have to be very careful about how much electricity I use.

I love it. I'm forced to get out away from keyboard to tend to basic necessities, and I feel in tune with the seasons, with the light, with the water, with everything that comes in and goes out. Even the annoying parts, like a week of clouds that mean super low power budget, or having to hike in food after a blizzard, or not being able to load a bloated web page in under an hour, seem like opportunities to learn and grow and have more intense experiences.

I kind of fell into this, by degrees. Working on free software was a key part, and then keeping working on it until I'd done things that mattered. Also, being willing to try a different lifestyle and keep living it until it became natural. Being willing to take chances and follow through, basically.

I've done this on and off for over ten years, but it still seems it could fall apart any time. I'm enjoying the ride anyway, and I feel super lucky to have been able to experience this.

Lars: What got you started with programming? When? What was your first significant program?

Joey: I bought an Atari computer with 128KB of RAM and BASIC. It came with no interesting programs, so provided motivation to write your own. I think that some of the money to pay for it, probably $50 or so, was earned working on the family tobacco farm. I was ten.

I have a blog post with some other stories about that computer. And I still have the programs I wrote, you can see them at http://joeyh.name/blog/entry/saved_my_atari_programs/.

But "significant" programs? That's subjective. Writing my own Tetris clone seemed significant at the time. The first program that seems significant in retrospect would be something from much later on, like debhelper.

Lars: What got you into free software?

Joey: I got into Linux soon after I got on the Internet at college, and from there learned about the GNU project and free software. I started using the GPL on my software pretty much immediately, mostly because it seemed to be what all the cool kids were doing.

Took me rather longer to really feel free software was super important in its own right. I remember being annoyed in the late 90's to be stereotyped as a Debian guy and thus a free-software fanatic, when I was actually very much on the pragmatic side. Sometime since then, free software has come to seem crucially important to me.

These days feel kind of like when the scientific method was still young and not widely accepted. Crucial foundational stuff is being built thanks to free software, but at the same time we have alchemists claiming to be able to turn their amassed user data into self-driving cars. People are using computers in increasingly constrained ways, so they are cut off from understanding how things work and become increasingly dis-empowered. These are worrying directions when you try to think long-term, and free software seems the only significant force in a better direction.

Lars: What advice would you give to someone new to programming? Or to someone new to free software development?

Joey: Programming can be a delight, a font of inspiration, of making better things and perhaps even making things better. Or it can be just another job. Give it the chance to be more, even if that involves quitting the job and living cheap in a cabin in the woods. Also, learn a few quite different things very deeply; there's too much quick, shallow learning of redundant stuff.

Comments (10 posted)