Interesting article posted recently in MIT Technology Reviews. What kind of metrics would help detect such tweets? We think the following might be useful:

Local time (like late at night) Whether a picture or not is associated with the tweet Whether a link or not is associated with the tweet Number of typos for the tweet in question, compared with average for the user in question Frequency of tweets (sudden spike) for user in question Keywords (and mi-spelled keywords) typically found in such tweets across drunk users Time elapsed between successive tweets Keywords change for the user in question Replies / re-tweets from other twitters (volume, do they contain specific keywords?) Gender and age if available Location (different from usual location) assuming you can identify location The tone of the tweet

Which algorithm would you use? I am thinking about tweets indexation, The same NLP (natural language processing) technique can be applied to email messages and other texts produced by users, maybe even to detect if a piece of code was written when the programmer was drunk.This would require the use of a training set, to train the algorithm. But no matter the ML algorithm used, you will need to work with a training set anyway. So how do you create one in the first place? With a design of experiment, having 20 subjects write tweets when they are drunk, and when they are sober, and compare? I can't even imagine how you could successfully do that, even less drawing useful machine learning rules, from such an experiment. And what about people that only tweet when they are drunk?

Finally, the same 12 metrics listed above can be used to perform any kind of tweet analysis.

Active drunk Twitters detected by ML algorithm

Here's the MIT article:

Sending your ex-partner a teary-eyed tweet at 1 a.m. after a bottle of chardonnay isn’t necessarily the best of way of achieving reconciliation. We all know that alcohol and tweeting is not always a good combination.

Yet a surprising number of us indulge in this peculiar form of indiscretion. And this practice has given Nabil Hossain and pals at the University of Rochester an interesting idea.

Today, these guys show how they’ve trained a machine to spot alcohol-related tweets. And they also show how to use this data to monitor alcohol-related activity and the way it is distributed throughout society. They say the method could have a significant impact on the way we understand and respond to the public health issues that alcohol and other activities raise.

Hossain and co’s work is based on two breakthroughs. The first is a way to train a machine-learning algorithm to spot tweets that relate to alcohol and those sent by people drinking alcohol at the time. The second is a way to find a Twitter user’s home location with much greater accuracy than has ever been possible and therefore to determine whether they are drinking at home or not.

The team began by collecting geotagged tweets sent during the year up to July 2014 from New York City and from Monroe County on the northern border of the state, which includes the city of Rochester. From this set, they filter all the tweets that mention alcohol or alcohol-related words, such as drunk, beer, party, and so on.

They then used workers on Amazon’s Mechanical Turk crowdsourcing service to analyze the tweets in more detail. For each tweet, they asked three Turkers to decide whether the message referred to alcohol and if so whether it referred to the tweeter drinking alcohol. Finally, they asked whether the tweet was sent at the same time the tweeter was imbibing.

Read full article.(the picture is from the article)

PS: My plan was initially to show a screenshot of a few "drunken tweets". I searched Google images for "drunken tweets", and there are many to choose from, but I can not publish any of them in a respected outlet.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge