Sentiment Analysis For Farmers’ Protests in India Using Python

Arivan T
5 min readJan 29, 2021

Nowadays, social media takes a prominent role in reflecting the feelings and emotions of the public. Peoples express their opinions and feelings on social media revealing their emotions towards the issues or services. Therefore, private sectors choose sentiment analysis to measure the public’s opinion towards their products and services. In this article, I will share how to do the sentiment analysis for Farmer’s Protests happening in Delhi.

Gather Data from Twitter

Twitter is a standard platform to gather such emotions of the public where million of users expose their feelings about the crisis. To begin analyzing the Twitter data, you need to gather several tweets. First of all, you need to request a Twitter developer account through your Twitter account stating your details. Access to the developer account will be given within 2–3 days. You need to create an App in the developer account. Then, you will be provided with credentials to access the Twitter data.

Using the credentials, you can access Twitter API to gather Twitter data through the Tweepy library. You need to authorize your credentials using tweepy functions as below:

Using the Cursor function, you can search for the tweets on the particular hashtags or phrases. You need to specify from when you need to search for tweets and the total number of tweets you need as well some other features like which mode and language etc.

As mentioned earlier, I searched for the tweets regarding the “Farmer Protest” issue using the hashtags ‘#FarmersProtest’, ‘#SpeakUpForFarmers’, ‘#iamwithfarmers’ which have been in trending in Twitter regarding the tweets from the date 2021–01–01. You have to mention the number of tweets using .items(number_of_tweets). Cursor function returns an object where each item contains some attributes like number of retweets, number of favourites, userid etc. As retweets return the same texts again and again, we don’t need to get complicated with them. ‘-filter:retweets’ eliminate the retweets from the dataset you gathered. You may specify ‘extended’ mode to get the tweets in ‘extended’ mode which will return the whole tweet content.

Now, You need to iterate through the tweets object to extract the informations regarding the tweets. You may need to create lists such as:-tweet, number _of_followers, and number_of_likes in order to create data frames. A number of likes and number of retweets are included in the dataset because the favourites count and retweet count show the popularity of the tweets and like-minded population. Using to_csv() function I created .csv file to save the data.

Preprocessing Tweets

Yes, We are ready to deal with the Twitter data. Each tweet has a particular emotion. In order to analyze the sentiment of the dataset, we need to label the emotion of each tweet. The overall plotting will expose some patterns in the emotion of the public. It may be positive or negative or may not be concluded(Neutral).

Before proceeding with the sentimental analysis, you need to preprocess the tweets. Tweets contain a lot of @ mentions, hashtags, and URLs. They do nothing with the sentiment of the tweets. It’s better to eliminate them. The function given below eliminates them:

Analysis of Tweets

TextBlob python library provides a simple API for proceeding with common Natural Language Processing(NLP) tasks such as translation, classification, sentiment analysis, phrase/word extraction. We will use TextBlob library functions to label each tweet as Positive or Negative or Neutral.

TextBlob’s polarity and subjectivity functions return polarity and subjectivity of the phrase. Polarity is nothing but the measure of the sentiment of the sentence. It will return the values vary from -1 to +1. +1 implies a very positive sentence,-1 is a very negative one and 0 is a neutral sentence. Subjectivity measures whether the sentence is fact-based or an opinion. Subjectivity with 1 indicates that it is most opinionated and 0 subjectivity indicates that the sentence is fact-based one. I generate polarity and subjectivity functions for each tweet.

Histogram For the Polarity of tweets

The plot vividly shows that there is a bunch of mixed emotions from the public regarding the Farmers Protest. Especially, several statements with -1 and +1 polarity are found.

Plot for the subjectivity of the tweets

The plot gives some clarity that most of the tweets are opinionated. This is vivid that opinionated contents overrule social media.

Plotting Word Cloud

Let’s plot the words which are mostly used in the tweets using WorldCloud library functions as below.

First, join all the words in the tweet column in the data frame df. Generate the WordCloud function on all words in the tweet column as above.

WordCoud for Farmer protest analysis

Most common words that are used in the tweets are plotted as above.

Conclusions

Finally, The analysis shows that the public reveals mixed sentimental and opinionated feelings. We could plot the most common words from the user.

Reference

https://www.datacamp.com/community/tutorials/wordcloud-python

--

--

Arivan T

Computer Science and Engineering student NIT Trichy