OpenStreetMap contemplates licensing

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

Maps are cool; there's no end of applications which can make good use of mapping data. There is plenty of map data around, but it's almost exclusively proprietary in nature. That makes this data hard to use with free applications; it's also inherently annoying. We, as taxpayers, own those streets; why should we have to pay somebody else to know where the streets are?

Your editor likes to grumble about such things; meanwhile, the OpenStreetMap project (OSM) is busily doing something about it. OSM has put together a database and a set of tools making it easy for anybody to enter location data with the intent of producing a free mapping database with global coverage. It is an ambitious project, to say the least, but it's working:

Right now on each and every day, 25,000km of roads gets added to the OpenStreetMap database, on the historical trend that will be over 200,000km per day by the end of 2009. And that doesn't include all the other data that makes OpenStreetMap the richest dataset available online.

OSM data is not limited to roads; just about any point or track of interest can be added to the database. If current trends continue, OSM could well grow into the most extensive geolocation database anywhere - free or proprietary. And those trends could well continue; one of the nice aspects of this kind of project is that no particular expertise is needed to contribute. All you need is a GPS receiver and some time; some OSM local groups have even acquired a set of receivers to lend out to interested volunteers. This is our planet, and we can all help to map it.

All this work raises an interesting question, though: under what license should this accumulated data be distributed? Currently, the OSM database is covered by the Creative Commons Attribution-ShareAlike 2.0 license. It is a copyleft-style license, requiring that derived products be made available under the same license. So, for example, if a GPS navigator manufacturer were to include an enhanced version of the OSM database in its products, it would have to release the enhanced version under the CC by-SA license.

The OSM project is not happy with this license, though, and is looking to make a change. The attribution requirement is ambiguous in this context; do users need to credit every OSM contributor? Does making a plot of OSM data with added data layered on top create a derived product? But the scariest question is a different one: can the CC by-SA license cover the OSM database at all?

Copyright law covers creative expression, not facts. The information in the OSM database is almost entirely factual in nature; one cannot copyright the location of a street corner. So what OSM is trying to protect is not the individual locations, but the database as a whole. Copyright law does allow for the protection of databases, but that law is far more complex than the law for pure creative works, and it varies far more between jurisdictions. Europe has a specific (though much-derided) database right, the US has far weaker database protections, and other parts of the planet lack this protection altogether. So it may well be that, if some evil corporation decides to appropriate the OSM database for its own nefarious, proprietary purposes, there will be nothing that the OSM project can do about it.

So the project is thinking of making a switch to the Open Database License (ODbL), which is still being developed. It, too, is a copyleft-style license, but it is crafted to make use of whatever database protection is available in a given jurisdiction. To that end, the ODbL is explicitly structured as a contract between the database owner and the user. In any jurisdiction where database rights are not recognized under copyright law, the contractual nature of the ODbL should provide a legal basis to go after license violators.

But the use of contract law muddies the water considerably; there are good reasons why free software licenses are carefully written to avoid that path. Contracts are only valid if they are explicitly and voluntarily entered into by all parties. If the OSM cannot show that a license violator agreed to abide by the license, it has no case under contract law. The project has a plan to address this problem:

To ensure that potential users are aware of and agree to the contract terms, we are proposing to require a click-through agreement before downloading data. (All registered users would agree to this on signing up so will not need a further click-through on each download.)

Registration and clickthrough licensing are obnoxious, to say the least. But, in any case, the only people who will go through that process are those who obtain the database directly from OpenStreetMap. The ODbL allows redistribution, naturally, and it does not require that explicit agreement be obtained from recipients of the database. So it is hard to see an outcome where copies of the database lacking a "signed" contract do not proliferate. Additionally, reliance on contract law makes it very hard to get injunctive relief, weakening any enforcement efforts considerably.

The ODbL includes an anti-DRM measure; if a vendor locks down a copy of the database with some sort of DRM scheme, that vendor must also make an unrestricted copy available. This license tries to distinguish between "collective databases" (which are not derived works) and "derivative databases" (which are). Drawing layers on top of an OSM-based map is a collective work; tracing lines from such a map is a derivative work. It is, in general, a complex bit of work.

It is complex enough that a number of OSM contributors are wondering if it's all worth it. Jordan Hatcher is one of the authors of the ODbL, and he supports its use with OSM, but even he understands the concerns that some people have:

The [Science Commons] point is that all this sort of stuff can be a real pain, and isn't what you are really doing is wanting to create and manipulate factual data? Why spend all the time on this when the innovation happens in what you can do with the data, and not with trying to protect the data in the first place.

There is an active group with OSM which is opposed to this kind of licensing and would, in fact, rather just get down to the task of collecting and distributing the data. They express themselves in terms like this:

One thing I really love about OSM is the pragmatic, un-political approach: You don't give us your data, fine, then we create our own and you can shove it. Not: You don't give us your data, fine, then we create a complex legal licensing framework that will ultimately get you bogged down in so many requests by prospective users who would like to use our data and yours but cannot and you will sooner or later have to release your data according to the terms we dictate and then we will have won and the world will be a better place.

These contributors would rather that OSM release its data into the public domain - or something very close to that. Rather than put together a complicated license, they prefer to just publish their data for anybody to use as they see fit. There have been all of the usual discussions which resemble any "GPL vs. BSD" licensing flame war one has ever seen - except that the OSM folks appear to be a very polite crowd. It comes down to the usual question: will the OSM database become more complete and useful if those who extend it are forced to contribute back their changes?

The public domain contingent clearly does not believe that any improvements to the database obtained via licensing constraints will be worth the trouble. So it seems likely that there will be some sort of fork involving the creation of a smaller, purely public-domain OSM database. It may well be an in-house fork, with the public domain data being merged into the larger, more restrictively licensed database for distribution. Regardless of how that goes, this split raises issues of its own: how are the two databases to be kept distinct in the face of cooperative additions and edits?

Any relicensing of the database also brings up another interesting question: what to do about all of the existing data, which may or may not be copyrighted by those who contributed or edited it? The license change may well require a process of getting assent from all contributors and purging data obtained from those who do not agree. This proposed timeline shows how the project is thinking about working through this task. It is hard to imagine this process going entirely smoothly.

The OSM community clearly has a set of thorny issues to work out. Given that, it's not surprising that this process has already been dragged out over the better part of a year. How this issue is eventually resolved will certainly serve as an example - not necessarily a good example - for other projects working on free compilations of factual data. Let us hope that OSM can come to a solution which lets this project continue to grow and generate a valuable database that we all will benefit from.

