Dealing with JSON data; pulling it in and reading it was something I strugled with significantly when I began coding. This article is for anyone who is facing that issue or just wants a diffrent point of view.

Recently I was challenged to create a sentiment analysis from Twitter data; so I created this api endpoint and it can be used for this example.

http://www.sportsbookanalysis.com/restapi/tweets/?format=json

To access this data we need json and request libraries or we can use the built in pandas read_json() method. We will go through not using the pd.read_json() method because it is good practice and it is helpful know what is going on when using the data outside of pandas, such as in js. Moving from Python to Javascript learning how to deal with json objects in greater detail will lower make picking up js much easier.

In actual practice it clearly would be more sensible to just use the read_json method and then pass in the appropriate "orient" parameter.https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html

import json, requests response = requests.get("http://www.sportsbookanalysis.com/restapi/tweets/?format=json") print(response) <Response [200]>

This <Response [200]> was one of my first "gothas" because I see that format=json, however this is just giving us a status code 200 which is a good thing to see in our case. However, to access the actual data we need .json()

response = requests.get("http://www.sportsbookanalysis.com/restapi/tweets/?format=json").json() # Dont print this as its fairly large

From here we can see what is going on with the data. It can be hit or miss, but the first thing I try is to just print out the keys of the data. If that fails it means the data is in a list so we can use indexing

print(response.keys()) #AttributeError: 'list' object has no attribute 'keys' print(len(response)) print(response[1250]) #output {'id': 42750, 'team': 'NYG', 'sport': 'NFL', 'tw_time': '2019-07-27T11:00:26Z', #'tweet': "Giants' Saquon Barkley rips Daniel Jones critics: You don't know football #nyg #giants #NFL https://t.co/5T6oz6sRc3", #'score': -0.296, 'followers': 3688} print(response[1250].keys()) #dict_keys(['id', 'team', 'sport', 'tw_time', 'tweet', 'score', 'followers']) #and finally we can do soemthing like for i in response[1250].keys(): print("{}:{}".format(i,response[1250][i])) #output id:42750 #team:NYG #sport:NFL #tw_time:2019-07-27T11:00:26Z #tweet:Giants' Saquon Barkley rips Daniel Jones critics: You don't know football #nyg #giants #NFL https://t.co/5T6oz6sRc3 #score:-0.296 #followers:3688

Going another step further we can extract all of the the team and score data into seperate lists and then assign it to empty dataframes.