Sentiment analysis can provide key insight into the feelings of your customers towards your company & hence is becoming an increasingly important part of data analysis.
Building a machine learning model to identify positive and negative sentiments is pretty complex, but luckily for us, there is a Python library that can help us out. It’s called TextBlob.
Through this post, we’ll look at how we use TextBlob with Python & the CSV functionality & also with Pandas, using dataframes.
First, let’s bring some data into a Pandas dataframe and take a look at it. These are Amazon reviews – we’re going to be analysing the text column.
Next, we need to calculate subjectivity and polarity. I guess you’re probably wondering what polarity and subjectivity are? Well, polarity is a measure of how positive or negative a statement is, ranging from -1 (very negative) to +1 (very positive) and subjectivity is how opinionated the comment is ranging from 0 (very opinionated) to 1 (very fact based views).
Here we use a lambda function to take the text field of each row and calculate both subjectivity and polarity.
We can take it a step further, by cleaning up input data and creating columns to say ‘yes’ it’s positive or negative. In my tests, I ran this across a 5,000 row dataset of Amazon reviews. It achieved a 90% accuracy (when manually checking 500 rows).
In the below, we:
- Remove punctuation from our review text
- Make everything lower case
- Correct incorrect spellings
- Drop stopwords (stop words are words like ‘the’, ‘an’, ‘in’ – useless words when trying to assess sentiment).
We are now in a position to recalculate subjectivity and polarity based on our new, cleaned dataset.
That concludes our simple example of sentiment analysis. We’ll loop back round to sentiment analysis in future articles to talk about more complex applications.