Recently I came across an article that had used the GitHub API to scrape information regarding the number of users in major cities in the US. The article gave me the idea to take this a little further and see if we could map out the number of users in each city, or perhaps more importantly the percentage of people in that city with a GitHub accounts. Before we begin let met point out the obvious flaws with the methodology of this application:

Populations are estimations (plus the UK census is now 4 years old)

Populations for cities can be difficult to define (city, urban, metro area)

GitHub accounts can be owned by companies as well as people

Not everyone gives their location on their GitHub accounts, or people may lie/not update

Having said this, it's still interesting to explore the available data and try to see or explain any patterns. Plus it's fun!

Data Scraping

Firstly to get the data into a format that could be mapped, it was necessary to instantiate a list of cities that I was interested in, and assign these their populations (I used Wikipedia).

Then using Python and the GitHub API I scraped the number accounts that matched the town name. Here it was necessary to try multiple different matches to get an accurate data. For example, with London it was necessary to try "London, England", "London, Great Britain", "London, United Kingdom" and "London, UK" as these are all valid locations representing the same place. You will need a GitHub account and a token to avoid rate limiting.

The Results

City GitHub Accounts City Population Rate Cambridge, England 1313 128515 1.022 Brighton, England 588 163000 0.361 Oxford, England 551 171380 0.322 Bath, England 231 88859 0.260 Reading, England 291 160825 0.181 Durham, England 68 48069 0.141 Bristol, England 837 617000 0.136 York, England 260 204439 0.127 Norwich, England 165 140452 0.117 Edinburgh, Scotland 801 782000 0.102 London, England 9291 9787426 0.095 Glasgow, Scotland 558 589900 0.095 Dundee, Scotland 133 153990 0.086 Exeter, England 98 121800 0.080 Belfast, Northern-Ireland 216 276705 0.078 Bangor, Wales 11 16358 0.067 Aberdeen, Scotland 125 189120 0.066 Cardiff, Wales 283 447287 0.063 Bournemouth, England 116 183491 0.063 Sheffield, England 362 640720 0.056 Nottingham, England 389 729977 0.053 Liverpool, England 236 466415 0.051 Manchester, England 1291 2553379 0.051 Plymouth, England 122 256600 0.048 Swansea, Wales 101 239023 0.042 Newcastle, England 351 879996 0.040 Southampton, England 312 855569 0.036 Inverness, Scotland 21 57960 0.036 Leicester, England 143 509000 0.028 Leeds, England 496 1777934 0.028 Gloucester, England 29 125649 0.023 Warwick, England 29 139396 0.021 Birmingham, England 503 2440986 0.021 Newport, Wales 26 145700 0.018 Derry, Northern-Ireland 6 83652 0.007 Aylesbury, England 13 184560 0.007 Lisburn, Northern-Ireland 4 71403 0.006

We can see highest on the list is Cambridge with over 1 % of the population having a GitHub account. Lowest on the list was Lisburn with 0.006 closely followed by Aylesbury (where I live ) and Derry with 0.007%. To put this into perspective the original Hirily analysis found 3% of San Francisco's population had a GitHub account!

Making the Map

The script outputs a CSV which was then uploaded into ArcGIS Online content pane using a developer account. When uploading the CSV we can set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.

The process asks if you want to review (probably worth while as some points can end up astray). Once this was done, I gained a Feature Service of the data (a REST end point we can get our data from). From here I took this into a Esri Leaflet map (one of Esri's GitHub projects!). The main bulk of the mapping is outlined in the JavaScript code below:

var map = L.map('map').setView([ 54.514, -2.122], 6); L.esri.basemapLayer("Gray").addTo(map); L.esri.basemapLayer("GrayLabels").addTo(map); var ukGitHub = "http://services1.arcgis.com/Q6SkXeZHDxVxhXA4/arcgis/rest/services/GitHub_Data/FeatureServer/0"; var gh = L.esri.featureLayer(ukGitHub, { pointToLayer: function (geojson, latlng) { console.log(geojson); var rate = geojson.properties.Rate; var size; if (rate >= 0.361 && rate < 1.2 ) { size = [65, 63]; } else if (rate >= 0.181 && rate < 0.361 ) { size = [55, 53]; } else if (rate >= 0.095 && rate < 0.181 ) { size = [45, 43]; } else if (rate >= 0.046 && rate < 0.095 ) { size = [35, 33]; } else if (rate >= 0 && rate < 0.046 ) { size = [25, 23]; } return L.marker(latlng, { icon: L.icon({ iconUrl: 'imgs/github4.png', iconSize: size, iconAnchor: [size[0] / 2, size[1] / 2], popupAnchor: [0, -11] }) }); } }).addTo(map);

Screenshot and Live Demo

A screenshot of the map can be seen below, a live demo can be seen here.

Where's the code?

You can find the code on my GitHub account: JamesMilnerUK/github-mapping · GitHub

set the city column to be geocoded. This allows us to take the address of the city and turn it into a latitude and longitude, in turn allowing us to map the data.