How to Analyze Reviews with Elasticsearch and Amazon Comprehend

How to Analyze Reviews with Elasticsearch and Amazon Comprehend

Analyzing Customer Reviews with Elasticsearch and Amazon Comprehend

In this tutorial, we will set up a system to analyze customer reviews using Elasticsearch and Amazon Comprehend. We will leverage Amazon Comprehend for sentiment analysis and Elasticsearch for indexing, searching, and visualizing the data using Kibana. Our goal is to index customer reviews, perform sentiment analysis, and visualize insights to understand customer satisfaction and identify key areas for improvement.

Architecture Overview

  1. Amazon Comprehend: Performs sentiment analysis on customer reviews.
  2. Elasticsearch: Indexes and stores reviews, and enables querying and aggregations.
  3. Kibana: Provides visualization capabilities for the data in Elasticsearch.

Set Up Elasticsearch and Kibana

To begin, we’ll set up Elasticsearch and Kibana on our local machine. This local setup is ideal for development and testing purposes.

Step 1: Install Elasticsearch

  1. Download Elasticsearch:
  2. Install Elasticsearch:
    • Follow the installation instructions provided for your OS. After installation, start the Elasticsearch service.
    • Extract the files and navigate to the bin directory.
    • Start Elasticsearch by running ./elasticsearch (Linux/Mac) or elasticsearch.bat (Windows).
  3. Verify Installation:
    • Open a web browser and go to https://localhost:9200. We should see a JSON response indicating that Elasticsearch is running.
  4. Note that, Elasticsearch is secure by default in the latest version. You will be prompted to enter a username and password to access Elasticsearch.
  5. Look up the installation console for the username and password.
elasticsearch_local-1024x424 How to Analyze Reviews with Elasticsearch and Amazon Comprehend

Step 2: Install Kibana

  1. Download Kibana:
    • Visit the Kibana download page and download the version that is compatible with the Elasticsearch installation.
  2. Install Kibana:
    • Follow the installation instructions for your OS. Start the Kibana service once installed.
    • Extract the files and navigate to the bin directory.
    • Start Kibana by running ./kibana (Linux/Mac) or kibana.bat (Windows).
  3. Access Kibana:
    • Open a web browser and navigate to http://localhost:5601. We should see the Kibana dashboard.
  4. You will be prompted to enter the enrollment token, look up the token from the Elasticsearch installation console in Step 1.

Create an Elasticsearch Index for Reviews

An Elasticsearch index is like a database table where we store our data. Here, we’ll create an index to store customer reviews. We will use Kibana’s Dev Tools to create the Elasticsearch index as Kibana provides a user-friendly interface for interacting with Elasticsearch.

Step 3: Create an Elasticsearch Index

  1. Open Kibana’s Dev Tools:
    • Go to the Kibana dashboard and navigate to Dev Tools.
  2. Create an Index:
    • In Dev Tools, enter the following command to create an index called customer-reviews:
PUT /customer-reviews
{
  "mappings": {
    "properties": {
      "review_id": { "type": "keyword" },
      "product_id": { "type": "keyword" },
      "category_id": { "type": "keyword" },
      "product_name": { "type": "text" },
      "review_text": { "type": "text" },
      "sentiment_score": { "type": "keyword" },
      "key_phrases": { "type": "keyword" },
      "toxicity_level": { "type": "keyword" },
      "timestamp": { "type": "date" }
    }
  }
}
kibana_local-1024x484 How to Analyze Reviews with Elasticsearch and Amazon Comprehend

This index will store all the necessary details about each review, including the product ID, category, review text, sentiment, and more.

Analyzing Reviews with Amazon Comprehend

Next, we’ll use Amazon Comprehend to perform sentiment analysis on the reviews and extract key phrases.

Step 4: Set Up AWS Credentials

To use Amazon Comprehend, we need to configure our AWS credentials. This allows our script to access the Comprehend service.

  1. Install AWS CLI:
  2. Configure AWS CLI:
    • Run aws configure in our terminal and enter the AWS Access Key, Secret Key, region, and output format.

Step 5: Analyze Reviews

We’ll write a Python script to analyze the reviews, perform sentiment analysis, and then index the enriched data into Elasticsearch.

Navigate to your project directory and create a virtual environment:

cd /path/to/your/project
python3 -m venv path/to/venv

Activate the virtual environment using the following command:

source path/to/venv/bin/activate

Install Required Libraries:

With the virtual environment activated, install the necessary libraries:

pip3 install boto3 elasticsearch mysql-connector-python

Write the Script:

import mysql.connector
import json
import boto3
from elasticsearch import Elasticsearch

# Initialize clients
es = Elasticsearch(['http://localhost:9200'])
comprehend = boto3.client('comprehend')

# Connect to MySQL database
db_connection = mysql.connector.connect(
    host="your_database_host",
    user="your_database_user",
    password="your_database_password",
    database="your_database_name"
)

cursor = db_connection.cursor(dictionary=True)

def fetch_reviews():
    cursor.execute("SELECT * FROM reviews")
    return cursor.fetchall()

def analyze_text(text):
    # Analyze sentiment
    sentiment_response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
    sentiment = sentiment_response['Sentiment']

    # Detect toxic content
    toxic_content_response = comprehend.detect_toxic_content(TextSegments=[{'Text': text}], LanguageCode='en')
    labels = toxic_content_response['ResultList'][0]['Labels']

    # Extract individual scores
    hate_speech_score = next((label['Score'] for label in labels if label['Name'] == 'HATE_SPEECH'), 0)
    harassment_abuse_score = next((label['Score'] for label in labels if label['Name'] == 'HARASSMENT_OR_ABUSE'), 0)
    insult_score = next((label['Score'] for label in labels if label['Name'] == 'INSULT'), 0)
    violence_threat_score = next((label['Score'] for label in labels if label['Name'] == 'VIOLENCE_OR_THREAT'), 0)
    
    # Determine the overall toxicity level
    toxicity_level = 'Non-Toxic'
    if hate_speech_score > 0.5:
        toxicity_level = 'Hate Speech'
    elif harassment_abuse_score > 0.5:
        toxicity_level = 'Harassment or Abuse'
    elif insult_score > 0.5:
        toxicity_level = 'Insult'
    elif violence_threat_score > 0.5:
        toxicity_level = 'Violence or Threat'
    
    # Extract key phrases
    keyphrase_response = comprehend.detect_key_phrases(Text=text, LanguageCode='en')
    keyphrases = [phrase['Text'] for phrase in keyphrase_response['KeyPhrases']]

    result = {
        'sentiment': sentiment,
        'key_phrases': keyphrases,
        'toxicity_level': toxicity_level
    }

    return result

def process_reviews():
    reviews = fetch_reviews()
    for review in reviews:
        sentiment_data = analyze_text(review['review_text'])
        review.update({
            'sentiment_score': sentiment_data['sentiment'],
            'key_phrases': sentiment_data['key_phrases'],
            'toxicity_level': sentiment_data['toxicity_level']
        })

        # Index into Elasticsearch
        es.index(index='customer-reviews', id=review['review_id'], document=review)

# Process and index the reviews
process_reviews()

# Close the database connection
cursor.close()
db_connection.close()

We can set up a cron job or a scheduled task to run this script periodically to keep our Elasticsearch index updated with the latest reviews from the database.

The processed review data, now containing sentiment scores, key phrases, and toxicity levels, is indexed in Elasticsearch. The reviews are stored in an index called customer-reviews with defined mappings for each field.

...

{
  "_index": "customer-reviews",
  "_id": "2",
  "_score": 1,
  "_source": {
    "review_id": "2",
    "product_id": "123",
    "category_id": "phones",
    "product_name": "SuperPhone X",
    "review_text": "Not satisfied with the battery life. It doesn't last a full day with heavy usage.",
    "timestamp": "2024-08-02T15:30:00",
    "sentiment_score": "NEGATIVE",
    "key_phrases": [
      "the battery life",
      "a full day",
      "heavy usage"
    ],
    "toxicity_level": "Non-Toxic"
  }
},

...

Visualize Data with Kibana

Integrating Kibana with Elasticsearch to visualize the data is a key part of this setup. Now that our data is indexed in Elasticsearch, we can use Kibana to create visualizations that help us understand customer feedback.

Create an Index Pattern in Kibana:

Navigate to Index Patterns:

  • In Kibana, go to Management > Stack Management > Index Patterns.

Create a New Index Pattern:

  • Click on Create index pattern.
  • Enter the name of your Elasticsearch index (e.g., customer-reviews).
  • Click Next step.
  • Choose the timestamp field as the time filter (if applicable) to enable time-based analysis.
  • Click Create index pattern.

Explore Data in Kibana:

  • Go to Discover in the Kibana sidebar.
  • Select your index pattern (customer-reviews) from the dropdown menu.
  • You should see a list of documents (reviews) that you indexed into Elasticsearch.
  • Use the search bar to filter and explore the data, such as searching for specific sentiments, products, or key phrases.

Create Visualizations:

We can create several visualizations to gain insights into the reviews:

1. Pie Chart: Sentiment distribution

Purpose: To see the distribution of sentiment (positive, neutral, negative) across all reviews.

Create a Pie Chart Visualization:

  • Go to Visualize Library > Create visualization > Aggregation based > Pie.
  • Select our customer-reviews index pattern
  • In the Buckets section, we should be able to add a bucket.
  • Click on Add Bucket.
  • Select Split Slices, you should see options for aggregations.
  • Under the Aggregation dropdown, select Terms.
  • In the Field dropdown, select the sentiment field sentiment_score.

2. Vertical Bar: Top 3 Products

Purpose: To see the top 3 products that have the most positive or negative reviews.

Create a Vertical Bar Chart Visualization:

  • Go to Visualize Library > Create visualization > Aggregation based > Vertical Bar.
  • Select our customer-reviews index pattern
  • In the Y-Axis, choose a metric like Count.
  • Click on Add Bucket.
  • Select X-Axis.
  • In the X-Axis, choose Terms aggregation and set Field to sentiment_score.
  • Click on Add Bucket
  • Select Split Series
  • Under the sub-aggregation dropdown, select Terms.
  • In the Field dropdown, select the sentiment field product_id.
  • Set the size to 3.

Here’s how we can leverage both negative reviews and key phrases to pinpoint and address issues:

Steps to Identify Areas for Improvement

1. Filter Negative Reviews

  1. Navigate to Discover:
    • Go to the Discover section in Kibana to view raw data from your customer-reviews index.
  2. Apply Filters:
    • Add Filter: Click on the “Add filter” button.
    • Configure Filter:
      • Field: Select sentiment_score.
      • Operator: Choose “is”.
      • Value: Enter “NEGATIVE” or equivalent based on your sentiment scoring.
  3. Save and Apply:
    • Save the filter to view only the negative reviews.

2. Analyze Key Phrases in Negative Reviews

  1. Create a Key Phrase Visualization for Negative Reviews:
    • Go to Visualize Library and create a new visualization.
    • Choose a visualization type like a Tag Cloud.
    • Bucket Aggregation:
      • Add Tags, and select Terms as the aggregation.
      • Set the field to key_phrases.
    • Metrics:
      • Set the metric to “Count” to show the frequency of each key phrase in negative reviews.
  2. Customize and Save:
    • Adjust the visualization settings to focus on the most frequent key phrases.

3. Interpret Insights

A. Identify Common Issues:

  • Key Phrases: Look at the key phrases identified in the visualizations. Frequent phrases in negative reviews will highlight common complaints or issues.
  • Sentiment Analysis: Confirm that the sentiment associated with these key phrases is negative to ensure they reflect genuine problems.

B. Understand the Context:

  • Review Text: Click on individual negative reviews to understand the context and specifics of the complaints. This will help you identify if the issues are related to product quality, customer service, delivery, etc.

C. Take Action:

  • Address Issues: Use the insights to address the common issues. For instance, if “battery drains” is frequently mentioned, we may need to improve the products’ battery life.

By focusing on negative reviews and analyzing the associated key phrases, we can gain actionable insights that help improve customer satisfaction and address specific areas that need attention.

kibana-dashboard-1024x480 How to Analyze Reviews with Elasticsearch and Amazon Comprehend

Conclusion

In this advanced tutorial, we’ve set up a system to analyze customer reviews using Elasticsearch and Amazon Comprehend. We performed sentiment analysis, extracted key phrases, and detected toxic content. By integrating these analyses into Elasticsearch, we created powerful visualizations in Kibana to gain deep insights into customer satisfaction and identify areas for improvement.

Share this content:

Leave a Comment

Discover more from nnyw@tech

Subscribe now to keep reading and get access to the full archive.

Continue reading