It is quite likely to get address info when scraping data from the web, but not geo-coordinates which may be required for further analysis like clustering. Thus geocoding is often needed to get a location’s coordinates by its address.

There are several options, including one of the most popular, google geocoding API. This option can be easily implemented into R with the function geocode from the library ggmap. It has the limitation of 2500 request a day (when it’s used free of charge), see details here.

To increase the number of free of charge geocoding requests, OpenStreetMap (OSM) Nominatim API can be used. OSM allows up to 1 request per second (see the usage policy), that gives about 35 times more API calls compared to the google geocoding API.

Here is one of the ways on how to implement OSM nominatim API in R:

## geocoding function using OSM Nominatim API ## details: http://wiki.openstreetmap.org/wiki/Nominatim ## made by: D.Kisler nominatim_osm <- function(address = NULL) { if(suppressWarnings(is.null(address))) return(data.frame()) tryCatch( d <- jsonlite::fromJSON( gsub('\\@addr\\@', gsub('\\s+', '\\%20', address), 'http://nominatim.openstreetmap.org/search/@[email protected]?format=json&addressdetails=0&limit=1') ), error = function(c) return(data.frame()) ) if(length(d) == 0) return(data.frame()) return(data.frame(lon = as.numeric(d$lon), lat = as.numeric(d$lat))) }

The function requires the library jsonlite.

Function input: the location address as string.

Function output: a data.frame with lon (longitude) and lat (latitude) of the input location, or empty data.frame if no/invalid address provided as the function input.

Let’s test the function.

#dplyr will be used to stack lists together into a data.frame and to get the pipe operator '%>%' suppressPackageStartupMessages(library(dplyr)) #input addresses addresses <- c("Baker Street 221b, London", "Brandenburger Tor, Berlin", "Platz der Deutschen Einheit 1, Hamburg", "Arc de Triomphe de l’Etoile, Paris", "Дворцовая пл., Санкт-Петербург, Россия") # d <- suppressWarnings(lapply(addresses, function(address) { #set the elapsed time counter to 0 t <- Sys.time() #calling the nominatim OSM API api_output <- nominatim_osm(address) #get the elapsed time t <- difftime(Sys.time(), t, 'secs') #return data.frame with the input address, output of the nominatim_osm function and elapsed time return(data.frame(address = address, api_output, elapsed_time = t)) }) %>% #stack the list output into data.frame bind_rows() %>% data.frame()) #output the data.frame content into console d address lon lat elapsed_time 1 Baker Street 221b, London -0.1584945 51.52376 0.2216313 secs 2 Brandenburger Tor, Berlin 13.3777025 52.51628 0.1038268 secs 3 Platz der Deutschen Einheit 1, Hamburg 9.9842058 53.54129 0.1253307 secs 4 Arc de Triomphe de l’Etoile, Paris 2.2950372 48.87378 0.1097755 secs 5 Дворцовая пл., Санкт-Петербург, Россия 30.3151066 59.93952 0.1000750 secs