Sentiment Analysis

Twitter sentiment analysis of airlines' customer satisfaction

Identify the sentiment from a tweet dataset to understand airlines' customer satisfaction

Exploratory Data Analysis

  • Majority of the people (63%) voiced the negative sentiment and the major reason was ‘Customer Service Issue’. This was especially true in the case of American Airlines during 2 days in February when they had service interuption.
  • Interstingly Delta Airlines tweets didn’t have the customer service issue as the main negative reason but simply the ‘Late Flight’ which suggests they have succefully shifted the blame for the negative sentiment to flight problem and they are doing everything they can for the customers.

Data Pre-processing

In order to succesfully determine sentiment of tweets we had to perfom some generic text preprocessing like:

  • Removing HTML/SpecialChars/Punctuation, expanding contractions
  • But also some tweeters specific processing like:
    • removing tweeter names (@name) and
    • filtering stopwords to ensure frequent negative sentiment stopwords are not lost

Model Implementation & Performance

We have used 2 different vectoriezers to convert text into numeric expressions, and we saw that TD-IDF was more succesfull as it assigned the weight based on occurrences across tweets.

Finally we have ussed RandomForestClassifier to predict tweet text sentiment.

Future Model Improvements

Customer data science team was advised to try to use other methods to classify tweets like SVM or RNN and LSTM as they are popular in NLP practice as part of continuos improvement efforts.


Detailed Analysis: US Airline Twitter Sentiment Analysis.ipynb

Technologies: Text processing, Count & TF-IDF vectorizer, Sentiment Analysis with Random Forest classifier

Code Snippet

Using Word Clouds is easy way to convey text analysis to customers as long as the number of words and highlighting is carefully applied, here is the example of the code we used

from wordcloud import WordCloud
wordcloud = WordCloud(background_color="white",
                      colormap='viridis',
                      width=2500,
                      height=1500).generate(top_features)

# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.figure(1, figsize=(14, 11), frameon='equal')
plt.title('Top 40 features WordCloud', fontsize=14, pad=15)
plt.axis("off")
plt.show()