Customer Review Sentiment Analysis Using VADER

NLP Project

This VADER Sentiment Analysis Project analyzes customer reviews by cleaning raw text, applying VADER sentiment scoring, comparing sentiment labels with star ratings, and visualizing positive and negative language patterns.

About the project: This project works on a 10,000-row sample of Amazon Fine Food Reviews. The goal is to understand whether review text sentiment matches the given star rating. It combines text cleaning, rule-based sentiment scoring, mismatch detection, word cloud generation, boxplot analysis, and frequency-based word insights.

Dataset Sample Loads only the first 10,000 rows to keep analysis fast, lightweight, and beginner-friendly.

Text Cleaning Converts text to lowercase and removes noise such as HTML tags, punctuation, and stopwords.

Sentiment Scoring Uses VADER compound scores to classify reviews into Positive, Negative, or Neutral.

Business Insight Finds mismatches where review sentiment does not align with the customer’s star rating.

Project Pipeline

Raw Reviews

→

Clean Text

→

VADER Score

→

Sentiment Label

→

Insights & Visuals

Steps Performed in the Project

Load 10,000 Reviews

The project begins by loading a manageable sample using pandas.read_csv() with nrows=10000.

Output: Sample dataframe

Clean Review Text

Review text is converted to lowercase. HTML tags, punctuation, and stopwords are removed using NLTK-based preprocessing.

Output: Cleaned reviews

Apply VADER Analyzer

VADER SentimentIntensityAnalyzer is applied to each cleaned review to calculate a compound sentiment score.

Output: Compound score

Classify Sentiment

Reviews are classified as Positive, Negative, or Neutral based on compound score thresholds.

Output: Sentiment label

Compare with Star Rating

The predicted sentiment label is compared with the star rating to identify reviews where text and rating do not match.

Output: Mismatch records

Generate Word Clouds

Separate word clouds are created for positive and negative reviews to visually understand dominant language patterns.

Output: Positive & negative word clouds

Plot Score Distribution

A boxplot is used to compare the distribution of sentiment compound scores across different star ratings.

Output: Sentiment boxplot

Find Common Words

The Counter module is used to find the top 20 most common positive and negative words after cleaning.

Output: Top word frequency lists

What This Project Produces

Sentiment Labels Each review receives a clear Positive, Negative, or Neutral label based on its compound score.

Rating Mismatches The project highlights cases where the written review sentiment conflicts with the star rating.

Word Clouds Positive and negative review groups are visualized separately to reveal dominant customer language.

Boxplot Analysis Sentiment score spread is compared across star ratings to check how strongly text reflects ratings.

Top Word Lists The most frequent positive and negative words are extracted using Counter for quick interpretation.

Practical NLP Workflow The complete project demonstrates a clean beginner-friendly pipeline for sentiment analysis.

Tools and Libraries Used

Python Pandas NLTK VADER Counter WordCloud Matplotlib Boxplot

Project insight: Star ratings alone do not always tell the full story. A customer may give a high rating but still write a complaint, or give a low rating while mentioning a few positive aspects. Comparing text sentiment with rating helps identify these hidden mismatches.