Toggle side column

Wikidata World Maps June 2015

Wikidata contains a significant amount of geographic data: over 2 million items have a location on Earth. This can be used to generate maps that show the distribution of these items. This has first been done by Denny Vrandecic in the context of the EU Render project. However, the software used for this purpose is no longer functional, and the original maps are quite dated. Therefore, we have rebuild this tool using Wikidata Toolkit.

The resulting maps are shown below. Following Denny's original concept, we create a map for all data and further maps that are restricted to items that have a Wikipedia article in a particular language. The following images are scaled down: click on each map for the full resolution image.

Wikidata items (2,165,028) English Wikipedia articles (950,277) German Wikipedia articles (356,904) French Wikipedia articles (347,419) Polish Wikipedia articles (328,891) Dutch Wikipedia articles (328,162) Russian Wikipedia articles (298,215) Spanish Wikipedia articles (261,495) Italian Wikipedia articles (249,666) Chinese Wikipedia articles (218,602) Portuguese Wikipedia articles (185,133) Ukrainian Wikipedia articles (181,372) Swedish Wikipedia articles (176,639) Vietnamese Wikipedia articles (170,164) Serbian Wikipedia articles (153,675) Catalan Wikipedia articles (151,013) Serbo-Croatian Wikipedia articles (149,511) Malay Wikipedia articles (143,254) Romanian Wikipedia articles (143,136) Persian Wikipedia articles (131,844) Japanese Wikipedia articles (112,344) Volapük Wikipedia articles (108,071) Waray-Waray Wikipedia articles (102,505) Wikimedia Commons pages (89,122) Arabic Wikipedia articles (87,017)

These maps are based on the data dump of 22 June 2015. The languages picked here are ordered by the number of geolocated articles (shown in parentheses). The colours have the same meaning on all images. The scale at the bottom of each image gives an idea of the absolute magnitude: it is a logarithmic scale that shows a vertical line between numbers 1-9, 20-90, 100-900, and finally 1000-9000.

Finally, there is also a monochrome image of all Wikidata items in super-high resolution (17280x8640 pixels, about 2MB).

Discussion

It's important to understand what the colours mean, at least roughly. The scale is logarithmic. At this resolution, the first few colours mark single steps: blue denotes a single item, green denotes two items, ochre denotes three, red until rather bright orange are still representing numbers below 10, everything above 30 is essentially white. Nevertheless, all maps contain pixels with more than 1000 items. At this high resolution, however, the patterns of nearby bright and dark spots also give an impression of density in a certain area. If one would generate even much larger maps, a single item would be plain white, but the patterns would still give an impression of brightness for certain areas. On the other hand, if one picks a smaller resolution, the distinguishable colours extend into the 100s or even 1000s of items, but many more items come together in each point.

We can immediately make some basic observations:

Wikipedia (and Wikidata) has a general bias towards areas in Europe and Northern America.

All Wikipedias contain much fewer items than Wikidata, with even English having less than half of the overall entries (a complete list with numbers of geoitems for every Wikimedia project is at the bottom of this page).

Each Wikipedia has its distinct strengths in coverage, where it clearly surpasses other Wikipedias (even the English).

Items of interest follow real-world structures, e.g., the river Nile, the train lines in Japan, or the shore of Lake Erie (zoom in to see this more clearly)

Administrative regions may receive an above or below average coverage. One can see countries and states within countries having very different concentrations from nearby regions in many maps. This can create a rather patchy look in some maps (e.g., Russian Wikipedia).

Wikimedia Commons has relatively few items so far (possibly owing to the fact that links to Commons are a much more recent feature than links to Wikipedias). It seems to have a focus on points of (touristic) interest, rather than on administrative regions.



Some of the data shown will have been imported into Wikipedias from other projects, which can explain the patchiness of some item concentrations (e.g., importing Italian towns would be distinct from importing Austrian towns). Indeed, strong boundaries may suggest an artificial, non-organic growth of articles that does not so much reflect community interests but rather the priorities of individual special-interest groups and bot authors.

Some other random observations are:

Most Wikipedias with many geoitems have a good coverage of Europe, especially of France and Italy, while most Wikipedias other than English have very little coverage of the US and the UK. There are some exceptions: Spanish Wikipedia has a decent coverage of North America. Russian Wikipedia has many articles about places in Minnesota, but not about any other U.S. state. Polish Wikipedia has a strong interest in England, but not in the rest of the UK.

English and Russian Wikipedia seem to divide Europe: there is a sharp line through Europe where interest seems to end for English Wikipedia and seems to start for Russian Wikipedia.

Japanese Wikipedia is mainly about Japan, and about Italy. Italian Wikipedia is mainly about Europe, and about Japan.

Possible Errors and Sources of Misinterpretation

The maps translate latitudes and longitudes of coordinates directly into coordinates in the image without compensating for the spherical shape of Earth. Therefore, pixels closer to the equator represent a bigger area than pixels closer to the poles. This means that the same level of brightness signifies a higher concentration of items when closer to the poles.

Common projections of maps use different ways of mapping coordinates to locations on a map. The most typical approach is Web Mercator, which is used in major mapping services online. It makes shapes closer to the poles appear stretched vertically compared to our naive projection. Images of maps of, say, Europe (which is rather far north) will thus give a slightly different impression of the shapes of some coastlines and borders.

The logarithmic colour scale may lead to false impressions. In addition, human colour perception is of course individually different and generally unreliable. The total amount of items can also be seen on ViziData, and the subjective impression of density may already be different there.

The absolute numbers of items do not reflect the quality of the underlying Wikipedia articles. Some Wikipedias have automatically created large amounts of very short articles from external data, while others have organically gathered detailed descriptions of many places. The maps do not distinguish these cases. It would be interesting to do, e.g., a "words per geographic location" analysis, but this would require further datasets to be combined with the Wikidata dumps.

Create Your Own Maps

The images have been created from the Wikidata JSON dumps of 2015-06-22 using Wikidata Toolkit (see the WorldMapProcessor example in the examples package). Developers can easily download this software and run it with different settings. It is possible to:

Create maps for arbitrary Wikimedia projects or for Wikidata as a whole

Change the size of the resulting image

Change the color scale (brightness)

Run the code on arbitrary past or future data dumps to get updated maps

License

The images have been created by Markus Krötzsch. The source data is published under Creative Commons CC-0, and Wikidata Toolkit is free software (License: Apache 2.0). Running the software does not constitute an act of creativity, and indeed the images can be recreated by anybody using the same software. The images are therefore released under Creative Commons CC-0 as well. However, if you find them useful, links to this page and to Wikidata Toolkit are still appreciated.

Number of geoitems for all Wikimedia projects

Below is a table that gives the total number of all pages with associated geographic coordinates for each Wikimedia project. See List of Wikipedias for the meaning of the language codes.