Tutorial Outline¶

Twitter provides two types of API to access their data:

RESTful API: Used to get data about existing data objects like statuses "tweets", user, ... etc

Streaming API: Used to get live statuses "tweets" as they are sent

The reason why you would like to use streaming API:

Capture large amount of data because RESTful API has limited access to older data

Real-time analysis like monitoring social discussion about a live event

In house archive like archiving social discussion about your brand(s)

AI response system for a twitter account like automated reply and filing questions or providing answers

Python 2 or 3

Jupyter /w IPyWidgets

Pandas

Numpy

Matplotlib

MogoDB Installtion

Pymongo

Scikit-learn

Tweepy

Twitter account

How does it work?¶

Twitter streaming API can provide data through a streaming HTTP response. This is very similar to downloading a file where you read a number of bytes and store it to disk and repeat until the end of file. The only difference is this stream is endless. The only things that could stop this stream are:

If you closed your connection to the streaming response

If your connection speed is not capable of receiving data and the servers buffer is filling up

This means that this process will be using the thread that it was launched from until it is stopped. In production, you should always start this in a different thread or process to make sure your software doesn't freeze until you stop the stream.

You will need four numbers from twitter development to start using streaming API. First, let's import some important libraries for dealing with twitter API, data analysis, data storage ... etc