Mahendra Singh Dhoni on Wednesday announced that he was stepping down as captain of limited-overs cricket. As expected, his decision sent his fans, other cricketers and, experts into a tizzy. Many congratulated Dhoni on his glorious captaincy career with cricket legend Sachin Tendulkar leading from the front, saying “it’s a day to celebrate his successful career and respect the decision"

Then after, We(BDCoE Lab) decided to let's analyze the tweets and find the sentiment using Big Data Technology Hadoop and Data Science tool R.

Data Collection: In the 1st stage we fetched data from Twitter service and store it in HDFS using Apache Flume.

Store Data in HDFS: The twitter JSON data stored in the HDFS.

Apache Hive: Using Hive transform the data into a formatted dataset for the data science process.

Data Science using R:

Word Frequencies: A common task in text mining is to look at word frequencies.

word_tweets_dhoni %>% count (word, sort = TRUE ) %>% filter (n > 3000 ) %>% mutate(word = reorder(word, n)) %>% ggplot(aes(word, n)) + geom_bar(stat = "identity" ) + xlab( NULL ) + coord_flip()





WordCloud: An image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance.

library(wordcloud) word_tweets_dhoni anti_join(stop_words) count(word) with(wordcloud(word, n, max.words = 200 ))





Sentiment Analysis:

Sentiment wordcloud

word_tweets_dhoni %>% inner_join(get_sentiments( "bing" )) %>% count (word, sentiment, sort = TRUE ) %>% acast(word ~ sentiment, value. var = "n" , fill = 0 ) %>% comparison.cloud(colors = c ( "#F8766D" , "#00BFC4" ), max .words = 200 )





Combinations of words using n-grams:Using bigrams to provide context in sentiment analysis.

Words preceded by Captain

captain_words <- tweets_dhoni_bigrams_separated %>% filter (word1 == "captain" ) %>% inner_join( AFINN , by = c (word2 = "word" )) %>% count (word2, score, sort = TRUE ) %>% ungroup() captain_words %>% mutate(contribution = n * score) %>% arrange(desc( abs (contribution))) %>% head( 20 ) %>% mutate(word2 = reorder(word2, contribution)) %>% ggplot(aes(word2, n * score, fill = n * score > 0 )) + geom_bar(stat = "identity" , show.legend = FALSE ) + ylab( "Words preceded by \"captain\"" ) + xlab( "Sentiment score * #dhoni of occurrences" ) + coord_flip()





Words preceded by Negation

negation_words <- c ( "not" , "no" , "never" , "without" , "like" ) negated_words <- tweets_dhoni_bigrams_separated %>% filter (word1 % in % negation_words) %>% inner_join( AFINN , by = c (word2 = "word" )) %>% count (word1, word2, score, sort = TRUE ) %>% ungroup() negated_words %>% mutate(contribution = n * score) %>% mutate(word2 = reorder(word2, contribution)) %>% group_by(word1) %>% top_n( 10 , abs (contribution)) %>% ggplot(aes(word2, contribution, fill = n * score > 0 )) + geom_bar(stat = "identity" , show.legend = FALSE ) + facet_wrap(~ word1, scales = "free" ) + xlab( "Words preceded by negation" ) + ylab( "Sentiment score * #dhoni of occurrences" ) + coord_flip()





Visualizing a network of bigrams with igraph

tweets_dhoni_bigrams_counts < - tweets_dhoni_bigrams_filtered %> % count(word1, word2, sort = TRUE) library(igraph) tweets_dhoni_bigrams_graph < - tweets_dhoni_bigrams_counts %> % filter(n > 500 & n < 3000 ) %> % graph_from_data_frame() tweets_dhoni_bigrams_graph library(ggraph) set.seed(2016) a < - grid::arrow ( type = "closed" , length = unit(.15, " inches ")) ggraph ( tweets_dhoni_bigrams_graph , layout = "fr" ) + geom_edge_link ( aes ( edge_alpha = n), show.legend = FALSE, arrow = a) + geom_node_point ( color = "lightblue" , size = 5) + geom_node_text ( aes ( label = name), vjust = 1, hjust = 1) + theme_void ()





Source Code

Thanks....!!



