News Headline Analysis¶

In this project we're analyzing news headlines written by two journalists – a finance reporter from the Business Insider, and a celebrity reporter from the Huffington post – to find similarities and differences between the ways that these authors write headlines for their news articles and blog posts. Our selected reporters are:

Akin Oyedele the Business Insider who covers market updates; and

Carly Ledbetter from the Huffington Post who mainly writes about celebrities.

We're initially going to collect and parse news headline from each of the authors, to obtain a parse tree, and then we're going to extract certain information from these parse trees that are indicative of the overall structure of the headline.

Next, we will define a simple sequence similarity metric to compare any pair of headlines quantitatively, and we will apply the same method to all of the headlines we've gathered for each author, to find out how similar each pair of headlines is.

Finally, we're going to use K-Means and tSNE to produce a visual map of all the headlines, where we can see the similarities and the differences between the two authors more clearly.

For this project we've gathered 700 headlines for each author using the AYLIEN News API which we're going to analyze using Python. You can obtain the Pickled data files directly from the GitHub repository, or by using the data collection notebook that we've prepared for this project.