Terraform-based Infrastructure as Code (IaC) to deploy a complete AWS backend for a Retrieval-Augmented Generation (RAG) application. It integrates with Google’s free-tier Gemini Pro and Embedding models for AI powered document querying and includes a Streamlit UI with token-based authentication for interacting with the app.
💰 Estimated cost: ~$3 (~₹250) to experiment without the AWS Free Tier, primarily for RDS and NAT Gateway if active.
👉 Related UI: RAG UI (Streamlit Frontend) A Streamlit-based frontend application designed to interact with the backend infrastructure deployed by this project. It's located within the rag_ui/
directory of this repository.
🎥 YouTube Video: Walkthrough on setting up the application, building, deploying, and running it end-to-end 👇
This repository contains the complete Terraform codebase for provisioning and managing the AWS infrastructure that powers a RAG application. It allows users to upload documents, which are then processed, embedded, and stored for efficient semantic search and AI-driven querying.
📌 Key features include:
- IaC with Terraform: For consistent and repeatable deployments across environments.
- Serverless Compute: AWS Lambda for backend logic (document processing, querying, uploads, authentication, DB initialization).
- Vector Storage: PostgreSQL RDS with the
pgvector
extension for storing and searching text embeddings. - AI Integration: Leverages Google's Gemini Pro (for generation) and Gemini Embedding models (for text embeddings).
- Authentication: Secure user management with AWS Cognito.
- CI/CD Workflows: GitHub Actions for automated deployment, testing, and cleanup.
- Multi-Environment Support: Designed for
dev
,staging
, andproduction
environments. - Comprehensive Testing: Includes unit and integration tests for backend Lambda functions.
- Streamlit UI: Includes a login page, document upload, query interface, and RAG evaluation dashboard.
- User uploads document → API Gateway →
upload_handler
Lambda upload_handler
Lambda → S3 Gateway Endpoint → S3 Bucket- S3 Event →
document_processor
Lambda (in private subnet) document_processor
Lambda → NAT Gateway → Internet Gateway → Gemini API (for embeddings)document_processor
Lambda → RDS Security Group → PostgreSQL Database (stores chunks/vectors)
- User submits query → API Gateway →
query_processor
Lambda (in private subnet) query_processor
Lambda → RDS Security Group → PostgreSQL Database (vector search)query_processor
Lambda → NAT Gateway → Internet Gateway → Gemini API (for answer generation)query_processor
Lambda → API Gateway → User (returns answer)
This network architecture ensures that sensitive operations and data are processed in a secure environment, while still allowing the necessary external communications through controlled channels.
🗺️ Infra Provisioning Lifecycle Flow (Illustrates the Terraform provisioning sequence)
.
├── .github/workflows/ # CI/CD via GitHub Actions
│ ├── deploy.yml # Infrastructure deployment workflow
│ └── manual_cleanup.yml # Resource cleanup workflow
├── environments/ # Environment-specific configs (dev, staging, prod)
│ └── dev/ # Example 'dev' environment
│ ├── main.tf # Root Terraform file for the environment
│ ├── providers.tf # Terraform provider configurations
│ └── variables.tf # Environment-specific variable definitions
├── modules/ # Reusable Terraform modules
│ ├── api/ # API Gateway configuration
│ ├── auth/ # Cognito authentication
│ ├── compute/ # Lambda functions & IAM roles
│ ├── database/ # PostgreSQL RDS with pgvector & Secrets Manager
│ ├── monitoring/ # CloudWatch Logs, Alarms & SNS Topic
│ ├── storage/ # S3 Buckets & DynamoDB Table
│ └── vpc/ # VPC, Subnets, NAT, Security Groups, Endpoints
├── rag_ui/ # Streamlit UI application
│ ├── app.py # Main Streamlit application code
│ └── README.md # README specific to the UI
├── scripts/ # Utility shell scripts
│ ├── cleanup.sh # Comprehensive resource cleanup script
│ ├── import_resources.sh # Script to import existing AWS resources into Terraform state
│ └── network-diagnostics.sh # Script for troubleshooting network connectivity (e.g., Lambda to RDS)
├── src/ # Lambda backend source code (Python)
│ ├── auth_handler/ # Lambda for Cognito authentication operations
│ ├── db_init/ # Lambda for database schema and pgvector initialization
│ ├── document_processor/ # Lambda for processing uploaded documents
│ ├── query_processor/ # Lambda for handling user queries and RAG
│ ├── tests/ # Unit and integration tests
│ │ ├── integration/ # Integration tests for deployed services
│ │ │ └── run_integration_tests.py
│ │ ├── unit/ # Unit tests for Lambda functions
│ │ │ ├── conftest.py # Pytest common fixtures and mocks
│ │ │ ├── test_*.py # Individual unit test files
│ │ └── __init__.py
│ ├── upload_handler/ # Lambda for handling file uploads via API
│ └── utils/ # Shared utility code (e.g., db_connectivity_test.py)
├── sonar-project.properties # SonarQube configuration file
└── tox.ini # tox configuration for running tests and linters
The infrastructure is modularized using Terraform modules:
- Custom VPC with public, private, and database subnets across multiple Availability Zones.
- Internet Gateway for public subnet access.
- NAT Gateways (configurable for single or multiple AZs) for private subnet outbound access.
- Route Tables for managing traffic flow.
- Security Groups to control access to Lambdas, RDS, and Bastion hosts.
- VPC Endpoints for S3 and DynamoDB, allowing private access from within the VPC.
- Optional VPC Flow Logs for network traffic monitoring (enabled for
prod
).
- All functions are Python 3.11 based.
- Authentication Handler (
auth_handler
): Manages user authentication lifecycle with Cognito (registration, login, email verification, password reset, token refresh). - Document Processor (
document_processor
):- Triggered by S3 uploads to the
uploads/
prefix in the documents bucket. - Downloads the uploaded file (PDF, TXT, CSV, etc.).
- Loads and chunks the document content.
- Generates text embeddings for chunks using the Gemini Embedding model.
- Stores document metadata and text chunks (with embeddings) in the PostgreSQL RDS database.
- Triggered by S3 uploads to the
- Query Processor (
query_processor
):- Handles user queries from the API.
- Generates an embedding for the user's query using the Gemini Embedding model.
- Performs a vector similarity search in PostgreSQL (using
pgvector
) against stored document chunks. - Retrieves relevant chunks and prepares a context.
- Generates a final answer using the Gemini Pro model with the retrieved context.
- Optionally performs RAG evaluation (faithfulness, relevancy, context precision).
- Upload Handler (
upload_handler
):- API endpoint for initiating file uploads.
- Receives file content (base64 encoded), name, and user ID.
- Uploads the raw file to a specific S3 path (
uploads/{user_id}/{document_id}/{file_name}
). - Stores initial document metadata in PostgreSQL and DynamoDB.
- DB Initialization (
db_init
):- A Lambda function invoked during CI/CD deployment.
- Connects to the PostgreSQL RDS instance.
- Creates necessary database tables (
documents
,chunks
) if they don't exist. - Enables the
pgvector
extension required for vector operations.
- IAM Roles & Policies: Granular permissions for Lambda functions to access S3, DynamoDB, RDS (via Secrets Manager), Secrets Manager, and CloudWatch Logs.
- S3 Buckets:
{project_name}-{stage}-documents
: Stores uploaded documents. S3 event notifications trigger thedocument_processor
Lambda. Configured with CORS and lifecycle rules.{project_name}-{stage}-lambda-code
: Stores Lambda function deployment packages (ZIP files).{project_name}-terraform-state
: Central S3 bucket for storing Terraform state files (versioning enabled).
- DynamoDB:
{project_name}-{stage}-metadata
: Stores metadata related to documents (e.g., status, S3 key, user ID). Used byupload_handler
anddocument_processor
. Features Global Secondary Indexes (GSIs) onuser_id
anddocument_id
, and Point-in-Time Recovery (PITR).{project_name}-{stage}-terraform-state-lock
: DynamoDB table for Terraform state locking, ensuring safe concurrent operations.
- PostgreSQL RDS with
pgvector
(modules/database
):- Managed PostgreSQL database instance.
- Utilizes the
pgvector
extension for efficient storage and similarity search of text embeddings. - Stores structured document information in a
documents
table and text chunks with their corresponding vector embeddings in achunks
table. - Database credentials are securely managed by AWS Secrets Manager.
-
API Gateway (REST API):
-
Provides public HTTP(S) endpoints for backend Lambda functions.
-
Routes include
/upload
,/query
, and/auth
. -
Configured with CORS for frontend integration.
-
Amazon API Gateway has a default timeout of 30 seconds. However, GenAI use cases may require longer processing times. To support this, you can request an increased timeout via the AWS support form. After logging into your AWS account, use the following URL to access the form. In our case, we’ve configured the timeout to 150,000 milliseconds (2.5 minutes). Select United States (N. Virginia) as the region since it's set as the default in terraform.tfvars. If you're using a different region, choose the appropriate one accordingly. Keep all other settings unchanged.
https://us-east-1.console.aws.amazon.com/servicequotas/home/template/add
-
-
Cognito User Pools:
- Manages user identities, including registration, sign-in, email verification, and password reset functionalities.
- Defines password policies and user attributes.
- Issues JWT (JSON Web Tokens) upon successful authentication.
- Includes an App Client configured for the frontend application.
-
JWT-based API Authorization:
- API Gateway utilizes a Cognito JWT authorizer to protect the
/upload
and/query
endpoints, ensuring only authenticated users can access them. - The
/auth
endpoint is public to allow user registration and login.
- API Gateway utilizes a Cognito JWT authorizer to protect the
-
Secrets Management (
modules/compute
,modules/database
):- AWS Secrets Manager: Used to securely store and manage sensitive information:
{project_name}-{stage}-gemini-api-key
: Stores the Google Gemini API Key used bydocument_processor
andquery_processor
.{project_name}-{stage}-db-credentials
: Stores the master credentials for the PostgreSQL RDS instance, automatically rotated or managed by Terraform.
- AWS Secrets Manager: Used to securely store and manage sensitive information:
- CloudWatch Logs: Centralized logging for API Gateway requests and all Lambda function executions. Log groups are configured with retention policies.
- CloudWatch Alarms: Monitors key metrics for Lambda functions (e.g.,
Errors
fordocument_processor
,query_processor
). - SNS Topic (
{project_name}-{stage}-alerts
):- Acts as a notification channel.
- CloudWatch Alarms publish messages to this topic when an alarm state is reached.
- Can be configured with subscriptions (e.g., email) to notify administrators of issues.
-
✅ Python:
3.11+
(For Streamlit UI). -
✅ AWS Cloud Account: You’ll need an AWS account to build and deploy this end-to-end application (excluding the streamlit UI, which can runs locally on your system).
-
✅ GitHub Account: For forking the repository and using GitHub Actions.
-
✅ Git installed on Local Machine: Use Git Bash or any preferred Git client to manage your repository.
-
✅ Google API Key: For accessing Google's free-tier Gemini Pro and Gemini Embedding models.
-
✅ Free SonarCloud Account for Code Quality Checks (Optional)
Sign up at SonarCloud to enable automated quality gates and static analysis for your codebase.
The repository supports multiple deployment environments, typically:
dev
: For development and testing.staging
: For pre-production validation.prod
: For the live production environment.
Configuration for each environment (Terraform variables, backend configuration) is managed within its respective subfolder under the environments/
directory (e.g., environments/dev/
, environments/staging/
).
-
Fork the Repository
-
Clone to Your Local Machine:
git clone https://github.com/<your-github-username>/rag-app-on-aws.git
-
Customize Project Configuration:
Update the following fields in environments//terraform.tfvars:
-
AWS Access Keys:
-
Generate an Access Key for either an IAM user with sufficient permissions or the Root user (which has full access) to experiment and create resources defined in Terraform..
-
If logged in as root user -> Go to the top-right dropdown menu after login and select 'Security Credentials' as shown below.
-
If IAM User -> Navigate to IAM > Users > [Your User] > Security credentials > Create access key.
-
Add these as GitHub repository secrets:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
-
-
SonarQube Token (Optional):
-
First, create an organization and import your GitHub project.
-
Then, generate an access token and add it to your GitHub repository secrets as SONAR_TOKEN.
-
Also, update the following keys in the sonar-project.properties file at the project root:
sonar.projectKey=<your-sonar-organization-name>_rag-app-on-aws
sonar.organization=<your-sonar-organization-name>
-
-
Google API Key:
- Although the GEMINI_API_KEY isn’t stored as a GitHub secret for deployment, it’s configured post-deployment via AWS Secrets Manager or as a Terraform variable. Terraform will create a placeholder secret in AWS Secrets Manager, which you must update manually. Go to the AWS Console, search for “Secrets Manager,” and update the secret with your actual Gemini API key (generated from Google’s Gemini AI Studio).
🔑 Secret name format: --gemini-api-key Example: rag-app-dev-gemini-api-key
To Add Secrets: Go to your forked GitHub repository → Settings → Secrets and variables → Actions → New repository secret.
The repository includes two primary GitHub Actions workflows:
- Terraform AWS Deployment (
.github/workflows/deploy.yml
):- Deploys the infrastructure based on the target environment.
- Triggered automatically on pushes to
develop
(fordev
env),main
(forprod
env), andstaging
(forstaging
env). - Can also be manually triggered from the GitHub Actions tab, allowing selection of environment and other parameters like
reset_db_password
orbastion_allowed_cidr
(Keep everything default if running pipeline manually)
- Manual AWS Cleanup (
.github/workflows/manual_cleanup.yml
):- A manually triggered workflow to tear down all AWS resources for a specified environment.
- Uses the
scripts/cleanup.sh
script.
📤 Push to trigger CI/CD deployment:
- Dev:
git push origin develop
- Staging:
git push origin staging
- Production:
git push origin main
- It’s recommended to make changes directly in the main branch of your forked repository to deploy AWS resources.
- SonarCloud offers free integration with GitHub’s main branch, and the AWS setup is configured similarly to the dev environment for easy experimentation.
🧑💻 Manually trigger deployment from GitHub:
- Go to your repository on GitHub.
- Click on the "Actions" tab.
- Select "Terraform AWS Deployment" from the list of workflows.
- Click "Run workflow", choose the branch, environment, and fill in any desired input parameters.
👉 UI Readme: https://github.com/genieincodebottle/rag-app-on-aws/rag_ui
Once the AWS resources are deployed via the GitHub Actions pipeline, follow these steps to launch the UI and test the application locally.
-
Navigate to the rag-ui directory in your cloned repository using the terminal.
cd rag-app-on-aws/rag_ui python -m venv venv venv\Scripts\activate # Linux: source venv/bin/activate pip install -r requirements.txt
-
Configuration
Create a
.env
file:# RAG Application API Configuration API_ENDPOINT=https://your-api-gateway-url.amazonaws.com/stage UPLOAD_ENDPOINT=/upload QUERY_ENDPOINT=/query AUTH_ENDPOINT=/auth # Default user settings DEFAULT_USER_ID=test-user # Cognito Configuration COGNITO_CLIENT_ID=your_cognito_client_id # Enabling/disabling evaluation ENABLE_EVALUATION="true"
Once the GitHub Action pipeline completes successfully, you can download the zipped environment variables file from the GitHub Artifact. Unzip it, open the file, and copy both API_ENDPOINT and COGNITO_CLIENT_ID into your .env file.
-
Usage
streamlit run app.py
Visit
http://localhost:8501
, register or log in, upload documents, and start querying.
The deploy.yml
workflow automates the deployment process with the following key steps:
- Determine Environment: Identifies the target environment (
dev
,staging
,prod
) based on the Git branch or manual workflow input. - Code Quality (SonarQube): (Optional) If a
SONAR_TOKEN
secret is configured, it runs SonarQube analysis usingtox
for code quality checks. - Build Lambda Functions:
- Sets up Python 3.11.
- Installs dependencies for each Lambda function.
- Packages the
auth_handler
,db_init
,document_processor
,query_processor
, andupload_handler
Lambda functions into ZIP artifacts. - Uploads these artifacts to GitHub Actions.
- Terraform Setup & Plan:
- Configures AWS credentials using GitHub secrets.
- Dynamically creates
backend.tf
for S3 state storage. - Creates the Terraform state S3 bucket (
{PROJECT_NAME}-terraform-state
) and DynamoDB lock table ({PROJECT_NAME}-{STAGE}-terraform-state-lock
) if they don't already exist. - Downloads Lambda artifacts and uploads them to the
{PROJECT_NAME}-{STAGE}-lambda-code
S3 bucket. - Initializes Terraform (
terraform init
). - Attempts to import existing AWS resources into the Terraform state using
scripts/import_resources.sh
(this helps adopt unmanaged resources). - Generates a Terraform plan (
terraform plan
) using environment-specific variables (e.g.,reset_db_password
,enable_lifecycle_rules
,bastion_allowed_cidr
). - Uploads the
tfplan
file as a GitHub artifact.
- Terraform Apply (Conditional - runs on push to specific branches or manual trigger):
- Downloads the
tfplan
artifact. - Applies the Terraform configuration (
terraform apply -auto-approve tfplan
). - Extracts outputs like
api_endpoint
andcognito_app_client_id
. - Uploads an
env_vars.env
file with these outputs for UI configuration.
- Downloads the
- Database Availability & Initialization:
- (Optional via
wait_for_db
input) Waits for the RDS instance to become available. - If
reset_db_password
was true, updates Lambda environment variables with the new DB secret ARN. - Ensures
db_init
andauth_handler
Lambda functions are updated with the latest code from S3 (as a safeguard). - Invokes the
db_init
Lambda function to set up the PostgreSQL schema andpgvector
extension. This step includes retries in case the database isn't immediately ready.
- (Optional via
- Verify Deployment: Makes a health check call to the
upload_handler
Lambda via API Gateway. - Integration Tests:
- Sets up Python and installs dependencies.
- Runs integration tests located in
src/tests/integration/run_integration_tests.py
against the deployed API Gateway endpoint. - Uploads test results as a GitHub artifact.
The /scripts/
folder contains helpful shell scripts:
cleanup.sh
: A comprehensive script to tear down all AWS resources created by Terraform for a specific environment. It requiresjq
to be installed. Use with extreme caution as this is destructive.import_resources.sh
: Aids in importing existing AWS resources into the Terraform state. This can be useful if some resources were created manually or outside of Terraform initially.network-diagnostics.sh
: A script to help troubleshoot network connectivity issues, particularly between Lambda functions and the RDS database within the VPC. It checks security groups, RDS status, and can test DNS resolution from a Lambda.
To remove all AWS resources created by this project for a specific environment:
- Navigate to your repository's "Actions" tab on GitHub.
- Find the "Manual AWS Cleanup" workflow in the sidebar.
- Click "Run workflow".
- Select the branch (usually your main or develop branch).
- Enter the environment name (e.g.,
dev
,staging
,prod
) you wish to clean up. - Click "Run workflow". This will execute the
scripts/cleanup.sh
script with the necessary context.
Warning: This script will delete resources. Ensure you have the correct AWS credentials and region configured for your AWS CLI, and that you are targeting the correct environment.
- Ensure
jq
is installed:# On Debian/Ubuntu sudo apt-get update && sudo apt-get install -y jq # On macOS (using Homebrew) brew install jq
- Navigate to the
scripts
directory:cd scripts
- Make the script executable:
chmod +x cleanup.sh
- Run the script, providing the necessary environment variables. The script expects
PROJECT_NAME
,STAGE
, andAWS_REGION
to be set. You can set them inline:(Replace with your actual project name, stage, and region). The script has built-in confirmations but destructive actions are significant.PROJECT_NAME="your-project-name" STAGE="dev" AWS_REGION="us-east-1" ./cleanup.sh
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new feature branch (e.g.,
git checkout -b feature/new-ai-model-integration
). - Make your changes and commit them with clear messages (e.g.,
git commit -m 'feat: Add support for Claude 3 model'
). - Push your changes to your forked repository (
git push origin feature/new-ai-model-integration
). - Open a Pull Request to the
develop
branch of the original repository.
Note: Deploying this infrastructure will incur AWS charges. Always review the output of
terraform plan
before applying changes to understand potential costs and resource modifications.Security Best Practice: Never commit secrets directly to your Git repository. Use GitHub Secrets for CI/CD variables and manage sensitive application configurations (like API keys) securely, for instance, through AWS Secrets Manager populated via secure Terraform variables or post-deployment steps.