How to Build a Real-Time Text Content Moderation System Using AWS
In this tutorial, we’ll build a real-time text content moderation system using several AWS services. The system will allow users to submit text content, process it for inappropriate language or harmful content, and provide immediate feedback. We will use AWS services such as Amazon Comprehend, API Gateway, Lambda, SQS, DynamoDB, and SNS to create a scalable and efficient solution.
Introduction
Text content moderation is essential for maintaining community standards and ensuring a safe environment in online platforms. By automating this process, we can efficiently handle large volumes of user-generated content and ensure it meets our guidelines.
In this tutorial, we’ll design a system that processes text content submissions asynchronously. This approach helps manage large amounts of data and provides scalability. Amazon Comprehend, a natural language processing (NLP) service, will play a crucial role in analyzing text content for sentiment and key phrases.
Architecture Design
Our system architecture consists of the following components:
- API Gateway: Receives text submissions from users.
- SQS (Simple Queue Service): Queues text submissions for asynchronous processing.
- Lambda Functions:
- QueueHandler: Places text into the SQS queue.
- TextModeration: Processes text content pulled from the SQS queue, performs moderation using Amazon Comprehend and updates DynamoDB.
- DynamoDB: Stores text content along with moderation results.
- SNS (Simple Notification Service): Sends notifications about moderation results.
- CloudWatch: Monitors the system’s performance and logs.
Step-by-Step Instructions
1. Create an SQS Queue
SQS helps decouple the text submission process from the moderation process. If the moderation process is slow or experiences errors, it won’t affect the text submission process. By using SQS, we can process text submissions asynchronously. When a user submits text, the API Gateway can simply send the text to the SQS queue and return a response to the user immediately, without waiting for the moderation process to complete.
To create an SQS queue, follow these steps:
- Navigate to the SQS service in the AWS Management Console.
- Choose a name (e.g.,
TextModerationQueue
). - Configure settings (e.g., default settings for a standard queue).
- Visibility Timeout (30 seconds): This setting determines how long a message is hidden from other consumers after it’s been received by a lambda function or another service. If the processing fails or times out, the message becomes visible again, and another consumer can pick it up.
- Message Retention Period (4 days): This setting determines how long a message is stored in the SQS queue before it’s automatically deleted.
- Delivery Delay (0 seconds): This setting determines how long SQS waits before delivering a message to a consumer (like a Lambda function). You might use a delay if you want to introduce a buffer between message submission and processing.
- Maximum Message Size (256 KB): This setting determines the largest size of a message that can be sent to the SQS queue.
- Receive Message Wait Time (0 seconds): This setting determines how long a consumer (like a Lambda function) waits for a message to be available in the queue.
- Click “Create Queue.”
We can use the AWS CLI to create an SQS queue programmatically. Here’s an example command:
aws sqs create-queue --queue-name TextModerationQueue --region ap-southeast-1
2. Set Up the DynamoDB Table
In this step, we’ll create a DynamoDB table to store the text moderation results. Here’s how to do it:
- Navigate to the DynamoDB service.
- Click on “Create Table.”
- Define a table name (e.g.,
TextModerationTable
). - Set a partition key (e.g.,
ContentId
of typeString
). - Select “Default settings” and click “Create Table.”
We can create the table using the AWS CLI. Here’s an example using the AWS CLI:
aws dynamodb create-table --table-name TextModerationTable \
--attribute-definitions AttributeName=ContentId,AttributeType=S \
--key-schema AttributeName=ContentId,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5
3. Create an SNS Topic
Next, we create an SNS topic to notify administrators when harmful text is detected. Here’s a summary of the steps:
- Navigate to the SNS service.
- Click on “Create topic.”
- Choose a topic type (e.g., Standard).
- Enter a topic name (e.g.,
TextModerationNotifications
). - Click “Create topic.”
Click on the “Create subscription” button, and select Email as the protocol to receive the email notification.
Here are the CLI commands for creating the SNS topic:
aws sns create-topic --name TextModerationNotifications
When our text moderation system detects harmful text, it can publish a message to this SNS topic. The topic can then trigger notifications to administrators, such as sending an email or SMS, to alert them to review the content.
4. Create the QueueHandler Lambda Function
We need to create a Lambda function to handle incoming text submissions and send them to an SQS queue for processing. When an API Gateway receives a text submission, it will trigger this Lambda function, passing the text as an event
object. The Lambda function will then send the text to the SQS queue for processing.
To create a Lambda function, follow these steps:
- Navigate to the Lambda service.
- Click on “Create function.”
- Choose “Author from scratch.”
- Enter a function name (e.g.,
QueueHandler
). - Choose a runtime (e.g., Python 3.x).
- Choose to create a new execution role with permissions to access SQS.
- Click “Create function.”
- In the Configuration tab, Permission section, click on the “View the QueueHandler role” link.
- Click on the “Add Permission” button.
- Click on the “Create inline policy”
- Click on the JSON tab, and enter the policy below:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:ap-southeast-1:<YOUR_ACCOUNT_ID>:TextModerationQueue"
}
]
}
Click on the Next button, and give the policy name “AllowSQSSendMessage
“. Go back to the lambda function code editor, replace the existing code with the following and click “Deploy”.
import json
import boto3
sqs = boto3.client('sqs')
def lambda_handler(event, context):
body = json.loads(event['body'])
text = body.get('text', '')
queue_url = 'https://sqs.ap-southeast-1.amazonaws.com/<YOUR_ACCOUNT_ID>/TextModerationQueue'
response = sqs.send_message(
QueueUrl=queue_url,
MessageBody=json.dumps({'text': text})
)
return {
'statusCode': 200,
'body': json.dumps({'message': 'Text submitted successfully!'})
}
5. Create the TextModeration Lambda Function
Repeat the steps to create a new function and name it TextModeration
. This Lambda function uses the Amazon Comprehend API to analyze the sentiment and detect toxic content in the input text. It then stores the results in a DynamoDB table and sends a notification to an SNS topic.
Sentiment Analysis API
Amazon Comprehend’s Sentiment Analysis API detect_sentiment
is used to determine the emotional tone of a piece of text. This can help in understanding user sentiment and ensuring content aligns with the desired tone.
How it works:
- Detection: The API analyzes the text to classify its overall sentiment into categories such as positive, negative, neutral, or mixed.
- Context: This analysis helps understand whether the text conveys a positive or negative sentiment, or if it is neutral.
Response:
- Sentiment Scores: The API returns the dominant sentiment along with a confidence score indicating how sure it is about the sentiment classification.
Example:
Input: “I absolutely love this product! It’s fantastic.”
Output:
{
"Sentiment": "POSITIVE",
"SentimentScore": {
"Positive": 0.97,
"Negative": 0.02,
"Neutral": 0.01,
"Mixed": 0.00
}
}
Here, the positive sentiment score suggests that the text is expressing a favorable opinion.
Toxic Content Detection API
Amazon Comprehend’s Toxic Content Detection API is used to identify harmful or abusive language in text. This is particularly useful for identifying toxic content in user-generated text.
How it works:
- Detection: When we submit a piece of text to the Toxic Content Detection API, it evaluates the content for several categories of toxicity.
- Categories: These categories typically include:
- Hate Speech: Language that targets individuals or groups based on attributes such as race, religion, sexual orientation, or gender.
- Harassment: Language intended to intimidate, harass, or embarrass others.
- Abuse: General abusive language that could include threats or insults.
Response:
- Scores: The API returns a toxicity score for each category. This score represents the likelihood that the text falls into the respective category of toxicity.
- Thresholds: You can use these scores to determine whether the content requires further review or automatic rejection based on predefined thresholds.
Example:
Input: “I hate all people from that country!”
Output:
{
'Labels': [
{'Name': 'PROFANITY', 'Score': 0.07029999792575836},
{'Name': 'HATE_SPEECH', 'Score': 0.11649999767541885},
{'Name': 'INSULT', 'Score': 0.7645000219345093},
{'Name': 'GRAPHIC', 'Score': 0.01860000006854534},
{'Name': 'HARASSMENT_OR_ABUSE', 'Score': 0.14139999449253082},
{'Name': 'SEXUAL', 'Score': 0.05260000005364418},
{'Name': 'VIOLENCE_OR_THREAT', 'Score': 0.11829999834299088}
],
'Toxicity': 0.7289000153541565
}
In this example, the high score for “HateSpeech” indicates that the text likely contains hateful language.
Amazon Comprehend toxicity detection is now generally available in four Regions: us-east-1, us-west-2, eu-west-1, and ap-southeast-2.
By combining the Sentiment Analysis API and the Toxic Content Detection API, we can create a more comprehensive content moderation strategy:
- Initial Screening: Start with Toxic Content Detection to identify and filter out any text that is likely to be harmful or abusive based on predefined categories.
- Sentiment Context: For the remaining content, use Sentiment Analysis to assess the overall emotional tone. This can help identify content that, while not explicitly toxic, may still be inappropriate or misaligned with community standards.
Here is the code for the TextModeration
Lambda function:
import json
import boto3
from decimal import Decimal
# Initialize AWS clients
comprehend = boto3.client('comprehend')
dynamodb = boto3.resource('dynamodb')
sns = boto3.client('sns')
def lambda_handler(event, context):
table = dynamodb.Table('TextModerationTable')
sns_topic_arn = 'arn:aws:sns:ap-southeast-1:<YOUR_ACCOUNT_ID>:TextModerationNotifications'
print(event)
for record in event['Records']:
print(record)
try:
message = json.loads(record['body'])
text = message['text']
content_id = record['messageId']
# Analyze sentiment
sentiment_response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
sentiment = sentiment_response['Sentiment']
sentiment_score = sentiment_response['SentimentScore']
sentiment_score = {k: Decimal(str(v)) for k, v in sentiment_score.items()}
# Detect toxic content
toxic_content_response = comprehend.detect_toxic_content(TextSegments=[{'Text': text}], LanguageCode='en')
toxicity_scores = toxic_content_response['ResultList'][0]['Toxicity']
# Extract toxicity scores
labels = toxic_content_response['ResultList'][0]['Labels']
labels = [{k: Decimal(str(v)) if k == 'Score' else v for k, v in label.items()} for label in labels]
# Extract individual scores
hate_speech_score = next((label['Score'] for label in labels if label['Name'] == 'HATE_SPEECH'), 0)
harassment_abuse_score = next((label['Score'] for label in labels if label['Name'] == 'HARASSMENT_OR_ABUSE'), 0)
insult_score = next((label['Score'] for label in labels if label['Name'] == 'INSULT'), 0)
violence_threat_score = next((label['Score'] for label in labels if label['Name'] == 'VIOLENCE_OR_THREAT'), 0)
# Determine the overall toxicity level
toxicity_level = 'Non-Toxic'
if hate_speech_score > 0.5:
toxicity_level = 'Hate Speech'
elif harassment_abuse_score > 0.5:
toxicity_level = 'Harassment or Abuse'
elif insult_score > 0.5:
toxicity_level = 'Insult'
elif violence_threat_score > 0.5:
toxicity_level = 'Violence or Threat'
# Extract key phrases
keyphrase_response = comprehend.detect_key_phrases(Text=text, LanguageCode='en')
keyphrases = [phrase['Text'] for phrase in keyphrase_response['KeyPhrases']]
# Store results in DynamoDB
table.put_item(
Item={
'ContentId': content_id,
'Text': text,
'Sentiment': sentiment,
'SentimentScore': sentiment_score,
'ToxicityLevel': toxicity_level,
'ToxicityScores':labels,
'KeyPhrases': keyphrases
}
)
if toxicity_level != 'Non-Toxic':
# Send notification via SNS
sns.publish(
TopicArn=sns_topic_arn,
Message=json.dumps({
'ContentId': content_id,
'Sentiment': sentiment,
'SentimentScore': {k: float(v) for k, v in sentiment_score.items()}, # Convert Decimal to float
'ToxicityLevel': toxicity_level,
'ToxicityScores': [{k: float(v) if k == 'Score' else v for k, v in label.items()} for label in labels], # Convert Decimal to float
'KeyPhrases': keyphrases
}),
Subject='Text Moderation Result'
)
except Exception as e:
print(f'Error processing record {record["messageId"]}: {e}')
return {
'statusCode': 200,
'body': json.dumps({'message': 'Text processed successfully!'})
}
Next, we need to attach this policy to the IAM role that the Lambda function is using. Here’s policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"comprehend:DetectSentiment",
"comprehend:DetectToxicContent",
"comprehend:DetectKeyPhrases"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"sns:Publish",
"dynamodb:PutItem",
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes",
],
"Resource": [
"arn:aws:sqs:ap-southeast-1:<YOUR_ACCOUNT_ID>:TextModerationQueue",
"arn:aws:sns:ap-southeast-1:<YOUR_ACCOUNT_ID>:TextModerationNotifications",
"arn:aws:dynamodb:ap-southeast-1:<YOUR_ACCOUNT_ID>:table/TextModerationTable"
]
}
]
}
6. Configure the Lambda trigger for the TextModerationQueue.
We need to configure the Lambda trigger to tell Amazon SQS to invoke the TextModeration
Lambda function whenever a new message is received in the TextModerationQueue
.
- Navigate to the Amazon SQS console and find the
TextModerationQueue
that you created earlier. - Click on the “Lambda triggers” tab in the
TextModerationQueue
details page. - Click on the “Configure” button next to “Lambda function triggers”.
- In the “Configure Lambda function triggers” page, select the
TextModeration
Lambda function that we created earlier from the dropdown list. - Click “Save” to save the changes.
7. Create an API Gateway
In this step, we will create a REST API using AWS API Gateway that will receive text submissions from users and trigger the QueueHandler
Lambda function to process the text. An API Gateway acts as an entry point for our text moderation system, allowing users to submit text content for moderation. I
To create a new API:
- Navigate to the API Gateway service.
- Click on “Create API.”
- Choose “HTTP API” and click “Build.”
- Click on “Add Integration”.
- Choose Lambda as the integration.
- Choose the
QueueHandler
lambda. - Enter a name (e.g.,
TextModerationAPI
). - Configure the routes by selecting POST as the request Method.
Enter the resource path (e.g., /submit-text
) and deploy the API to a new or existing stage.
8. Testing
To test the API Gateway, we can send a POST request to the /submit-text
resource with a JSON payload containing the text to be moderated.
curl -X POST \
https://hl40679vu7.execute-api.ap-southeast-2.amazonaws.com/prod/submit-text \
-H "Content-Type: application/json" \
-d "{\"text\": \"This product is a complete scam! The company is run by thieves and liars. I demand a full refund and I'll make sure to post negative reviews everywhere to ruin their reputation.\"}"
Go to the email:
Conclusion
In this tutorial, we’ve built a real-time text content moderation system using AWS services. By leveraging API Gateway, SQS, Lambda, DynamoDB, SNS, and Amazon Comprehend, we created a scalable and efficient workflow to process and moderate user-generated content. This architecture ensures that our system can handle large volumes of submissions while providing timely feedback and notifications.
Share this content:
Leave a Comment