Whilst I usually spend my spare time reading highbrow literature, attending poetry readings and sipping on fine wine, of late I have been spending my evenings watching Love Island. For those that have been living under a rock for the last 3 years (or aren’t from the UK!) and haven’t heard of Love Island I shall quickly fill you in; it’s a reality TV show that films contestants for 24 hours a day, 7 days a week, the hypothetical end goal being to find your one true love. Contestants are picked mainly for their looks/image rather than anything else, a topic which has consistently caused controversy across the years where they spend 8 weeks in a villa in Majorca over the summer forming relationships and battling it out as couples to gain the publics affection for a chance to win £50,000.

We are now into the 5th series of Love Island, however it only really gained significant mainstream success during series 3 and given the main target audience (the instagram generation) this brought a huge wave of sponsorship and product placement. With all this extra sponsorship, the ad breaks have grown in length which has caused me to look for something to do to fill the time that is at least slightly more productive than being advertised suncream. I quickly landed on the idea of seeing what data I could collect on Love Island to start a little “adbreak analysis project”.

The most obvious place to collect data from was twitter, given the huge social media following the show has got and the fact that every episode is full of controversy and gossip, viewers love talking about it with strangers online. In the first week of the series there were on average 63992.8 tweets that explicitly reference the show via the hashtag #LoveIsland per the 6 days the show was on for so I am going to have plenty of data to work with. Love Island is on every night (with the exception of Saturday) which results in there being a lot of time to fill in the breaks, therefore I’m going to split this project up a bit, this first part will focus on tweets sent before the program starts. The first broadcast was scheduled for Monday 3rd June 2019, so we will scrape tweets from two weeks before then (starting from 20th May 2019); in this time the initial line up of contestants was announced, the tabloids ran a lot of stories on the new series and the hype began to build all over social media.

The first wave on contestants for Love Island 2019

Firstly, before diving in deeper lets investigate if we can see the hype building over time as we move towards the first airing; plotting the number of tweets per day that mention Love Island we see a huge spike between 27/28th May, this was when the new series was officially revealed, the broadcast date announced and the new cast revealed. We see that the week after the series was confirmed each day of the week saw well over double the number of tweets than the same day of the prior week.

To see if the official launch had an effect of the content of the tweets as well as the number of them, we can search each one individually to see if it explicitly references the name of one of the original cast members. Plotting the percentage of tweets that reference an islander per day, we see a large increase from the 27th May onwards as tweets shift from talking about rumoured contestants to talking about the confirmed ones.

Of the original islanders announced there are two that had claims to already be branded as a celebrity; Curtis Prichard is a professional ballroom dancer having appeared on two series of Dancing in the Stars and as a contestant on a celebrity version of Hunted. The other person announced who already had some ‘celebrity status’ was Tommy Fury, a professional boxer and younger brother of the heavyweight champion Tyson Fury. If we count the number of tweets that mention each islander specifically we see that these two stand out as generating the the most conversation online. In general, the girls have more people tweeting about them than the guys; excluding the two mentioned, the guys average just over 100 tweets each whereas the girls have on average over 200 each.

To get a very simple insight into what is being spoken about other than the islanders themselves we can pull out bigrams and trigrams from our corpus, as well as the count of their occurrence.

From this list we can again see that ‘tommy_fury’ is generating a lot of tweets, with his name being referenced in full 414 times. We see a number of themes in the 15 most popular bigram/trigrams, these include general anticipation of the new series (‘not_wait’, ‘start_tomorrow’, ‘new_series’, etc) and tweets about the cast (‘tommy_fury’, ‘caroline_flack’, ‘new_cast’, etc). We also see the theme of body image (‘plus_size’, ‘body_diversity’), as mentioned at the start of this post the show has often been under fire for lack of inclusivity and there has been a push to include more bodyshapes and be more representative of society rather than pushing the idea of “the perfect body”. When the shows lineup was revealed there was anger online about Anna Vakili supposedly being their “plus size” token contestant, whereas in reality she clearly is not, the shows creative director defended the line up and the backlash to this can be seen with this topic being a popular one. On top of this we see the second most popular bigram is ‘mental_health’ which has been one of the main topics being written about by the tabloids (on the back of two previous contestants committing suicide) in the run up to the series so it is encouraging that we see this here.

To move away from a very simplistic proxy for a topic model and to look at the raw counts of words/bigrams/trigrams we don’t find anything unexpected. Having cleaned the tweets, removed stop words* and counted the occurrence of each token we can plot the top 30 most commonly used words.

*on inspection we might not have been heavy-handed enough during this stage, given some of the words that have got though therefore this step might need to be fine tuned if anything more complex is to be done at a later date.

As mentioned, there is nothing unexpected in this graph, even without the bigram/trigram analysis one would be able to tell that these are from tweets prior to the series starting and seem to be pulled from a corpus that is anticipating the start of something (in this case the summer of love!).

Switching our attention to who is doing the tweeting rather than what is being tweeted about gives us some more very unsurprising results; we find that the vast majority of accounts that are tweeting the most are those of tabloid newspapers.

Whilst those tweeting the most are the tabloid press, users who are generating the most activity are usually users who have “got lucky” and a particular tweet resinated with enough people to go viral, or accounts with a connection to the show whose follower base are interested in the show. Plotting the number of favourites for the tweet with the most favourites per day, overlayed with the user and the tweet itself we get the following chart.