Data is spread across the web in different formats and an analyst needs to parse this data to build a local datastore. JSON is one of the most popular data exchange formats on the web today and many API services will return data in this format. Hence, it is important to know how to read and process JSON data.

In this article, we will begin putting together a Fantasy Soccer team for the English Premier League. The inspiration comes from this article by Bill Mill, who has analyzed the data using Python. We will be analyzing the dataset in R.

Quick View

This is a data sample of player data from the Fantasy League. This is the raw data we will be using for our analysis.

Setting Up An R JSON Environment

Using CRAN packages is one of the best ways to read JSON in R. Here, we will install the jsonlite package.

install.packages("jsonlite")

Once installed, we begin using the package by loading it into the search namespace.

library(jsonlite)

Reading JSON

The jsonlite package provides the fromJSON function to read JSON strings. We will use this function to read the JSON from the web API.

url <- "http://fantasy.premierleague.com/web/api/elements/" names(fromJSON(paste0(url,1))) # Concatenate URL and player id to fetch player data

This code reads the JSON data and prints the various key values as shown:

TIP: paste0 is a special use of the paste function and uses a default blank separator for concatenating strings.

url <- "http://fantasy.premierleague.com/web/api/elements/" ## Both paste commands produce same output paste(url, 1, sep = "") paste0(url, 1)

There are 59 columns in this dataset for a single player. To see the output in a formatted layout, go ahead and type the following and see what you get.

url <- "http://fantasy.premierleague.com/web/api/elements/" toJSON(fromJSON(paste0(url, 1)), pretty = TRUE)

We’ve identified that there are 567 players in the player database. Now, we want to build a local R dataset using this API. To do this, we run the following code:

## List of relevant fields we are interested in relevantFields <- c("points_per_game","total_points","type_name", "team_name","team_code","team_id", "id","status","first_name","second_name", "now_cost","value_form","team", "ep_next","minutes","goals_scored", "assists","clean_sheets","goals_conceded", "own_goals","penalties_saved","penalties_missed", "yellow_cards","red_cards","saves", "bonus","bps","ea_index", "value_form","value_season","selected_by") numCols = length(relevantFields) # Length of relevant string vector # Initializing an empty dataframe allplayerdata <- data.frame(matrix(NA,nrow=1,ncol=numCols)) allplayerdata <- allplayerdata[-1,] fetchData <- function(i) { res <- try(jsondata <- fromJSON(paste0(url,i))) if(!inherits(res, "try-error")) { jsondata <- jsondata[which(names(jsondata) %in% relevantFields)] } } allplayerdata <- lapply(1:567, fetchData) allplayerdata <- do.call(rbind, lapply(allplayerdata, data.frame, stringsAsFactors=FALSE))

This code iteratively fetches player data from the web API and appends to the dataframe allplayerdata. If you wish to track the code performance, you can save all the code in an R code file (extension: .R) and use system.time() to see how much time it takes.

In the next articles in this series, we will cover data reshaping, visualizations and linear optimization modeling.

Key Takeaways