Methodology

Objective

Quantifying and measuring bias on the election coverage. I concentrated on the following publications: The Irish Times, RTE, Irish Examiner, Independent, and TheJournal.ie.

Getting the Data.

I first looked at doing a keyword search for each political party on each publication website, but this didn’t work, for many reasons. Advanced search was not always available, sorting was difficult, and searching returned articles, headlines, links etc.

Instead, I turned to Google and used the following search terms:

allintitle:“<political party>” OR “<party leader>” site:<media>

This offered only headlines, and was a stricter way to fairly quantify headlines.

The search was restricted to the date at which the election was announced to the “moratorium”:

The initial terms for each political party were as follows:

allintitle:“Sinn Fein” OR “Gerry Adams” site:independent.ie

allintitle:“Sinn Fein” OR “Gerry Adams” site:rte.ie

allintitle:“Sinn Fein” OR “Gerry Adams” site:irishtimes.com

allintitle:“Sinn Fein” OR “Gerry Adams” site:irishexaminer.com

allintitle:“Sinn Fein” OR “Gerry Adams” site:the journal.ie

And repeated for each political party.

Scraping the Search results

Once the page was loaded with results, I needed to save it. I turned to the Data-Miner Chrome extension and a clean “Google SERP Detailed” recipe. This saved the list of results in a nice spreadsheet.

First Sentiment Analysis

After I had search results for all political parties across the 5 media outlets I had chosen, I started conducting a sentiment analysis on the headlines.

I used Semantria. I first tried their Mac app but then moved to their Excel add-on on Windows:

This is where I realised, while analysing the results, that there were some issues with the text. Fine Gael was considered as positive (Fine) and Fianna Fail as negative (Fail).

If I were to consolidate on this, I would have to replace all occurrences.

And I felt that headlines, although providing some sense on the overall sentiment of the article, might not be the most accurate sampling material.

To extract articles, I used import.io. After showing it what elements I wanted on a page (e.g the text of an article), I fed it the list of urls from the initial Google search.

It exported all the articles, and I could save them in Excel with the rest of the results. (headlines, urls etc )

To avoid a bad classification, I replaced all “Fine Gael” with “FG” and all “Fianna Fail” with “FF”.

Second Sentiment Analysis

Semantria has a 2048 character limit on text to be analysed during trial. And an account is a grand a month. So I had to find another tool.

After some research, I came across this fantastic (and free!) tool by HP:

HPE Haven OnDemand.

It offers many APIs, such as Document Categorisation, Language Identification, Sentiment Analysis, Entity Extraction…

Getting the results

After collecting documents and analysing them, we get results:

Preliminary research: Headline & Description analysis

As outlined above, these were incomplete or deemed unsatisfactory as some terms were not cleaned up when the sentiment analysis ran, but I’m including them here nonetheless.

Headline analysis from scraped information was ran on lines like these examples:

-“Fine Gael’s Tom Barry hopes to be standing tall after vote”

-“Enda Kenny in eleventh-hour email snub to Labour”

-“Enda Kenny and Joan Burton share a cuppa and a possible final farewell”

Sentiment Analysis on headlines:

Description analysis was ran on scraped samples like these ones:

-Feb 26, 2016 — Is it a sign Civil War politics are at an end? The leaders of Fine Gael and Fianna Fáil cast their votes today, wearing the colours of the other party.

-Feb 26, 2016 — There has been much talk about the possibility that Fianna Fáil and Fine Gael will enter into a coalition government following the election. The issue can be …

-Feb 25, 2016 — Taoiseach Enda Kenny has been shaking a lot of hands over the past three weeks, including some very recognisable faces.

Sentiment Analysis on descriptions:

Despite being incomplete, and from a different engine than the one used later for full articles, the same trend emerges across the data sets.