A serverless data pipeline that processes CSV files containing coordinates to generate maps and manages truck test records using AWS services.
This project implements a serverless data pipeline to process CSV files containing coordinates, generate maps, and store truck test records. It leverages AWS services such as S3, DynamoDB, Lambda, API Gateway, and Cognito for secure and scalable data processing and user authentication.
- CSV Upload: When a CSV file is uploaded to the
IncomingCsv
S3 bucket, a Lambda function is triggered. - Record Creation: The first Lambda function creates a test record in the
RecordsTable
DynamoDB table. - Map Generation: A second Lambda function generates a map from the CSV coordinates, saves it to the
Maps
S3 bucket, and updates theRecordsTable
with the map file name. - Authentication: User authentication is managed via a Cognito User Pool.
- Serverless Processing: Uses AWS Lambda for event-driven CSV processing and map generation.
- Secure Authentication: Integrates Cognito User Pool for user registration and authentication.
- Scalable Storage: Stores truck configurations and test records in DynamoDB tables.
- RESTful API: Provides API endpoints via API Gateway to manage truck records and retrieve data.
- Map Visualization: Generates maps from coordinates using Pandas and Folium libraries.
The application is built using the AWS Cloud Development Kit (CDK) and consists of several stacks:
Manages user authentication and authorization.
- Cognito User Pool: Supports self-sign-up, email verification, and user alias (email/username).
- Cognito User Pool Client: Facilitates authentication flows, including user-password and Secure Remote Password (SRP).
- Cognito Identity Pool: Grants authenticated users read-only access to S3 buckets via an IAM role.
- Outputs:
UserPoolId
UserPoolClientId
IdentityPoolId
Handles truck configuration storage.
- DynamoDB Table:
TrucksTable
withcurrentVin
as the partition key. - Lambda Function:
EnterTruckLambda
, triggered by API Gateway to insert truck records. - IAM Role: Grants the Lambda function write access to
TrucksTable
. - Outputs:
TrucksTableARN
AddTruckLambdaARN
Processes CSV files and stores test records.
- DynamoDB Table:
RecordsTable
withfilename
as the partition key. - Lambda Function:
CsvLambda
, triggered by S3 to process CSV files and insert data intoRecordsTable
. - S3 Buckets:
incomingcsvs-
: Stores uploaded CSV files and triggersCsvLambda
.maps-
: Stores generated maps.
- Lambda Layer: Includes Pandas and Folium for map generation.
- Outputs:
CsvBucketName
MapsBucketName
Provides RESTful API endpoints.
- API Gateway:
RunlogRestApi
serves as the entry point for API requests. - Cognito Authorizer: Secures API endpoints using Cognito User Pool.
- API Methods:
POST /addtruck
: Adds truck records toTrucksTable
.GET /alltrucks
: Retrieves all truck records.GET /allrecords
: Retrieves all test records.
- IAM Role: Grants read access to
TrucksTable
andRecordsTable
.
Orchestrates stack deployment and manages dependencies using AWS CDK.
- Lambda functions:
csv_lambda.py
,maps_lambda.py
,trucksdb_lambda.py
. - Utility scripts:
createuser.py
,addtruck.py
,alltrucks.py
,allrecords.py
,getmap.py
. - Templates for data processing.
- AWS CLI: Installed and configured with appropriate credentials.
- Node.js: Required for AWS CDK (version 14 or higher recommended).
- Python: Version 3.8 or higher.
- AWS CDK: Install via
npm install -g aws-cdk
.
- Clone the repository:
git clone https://github.com/username/repo.git cd runlog
- Create and activate a virtual environment:
- MacOS/Linux:
python3 -m venv .venv source .venv/bin/activate
- Windows:
python -m venv .venv .venv\Scripts\activate.bat
- MacOS/Linux:
- Install dependencies:
pip install -r requirements.txt
- Synthesize the CloudFormation template:
cdk synth
- Deploy the stacks:
To skip manual approvals:
cdk deploy --all
cdk deploy --all --require-approval=never
- Populate the
variables.py
file with required values (e.g., bucket names, API endpoints). - Create a user in the Cognito User Pool:
Note: All API calls require a JWT token from an authenticated Cognito user.
python createuser.py
- Add a truck configuration:
python addtruck.py
- List all truck configurations:
python alltrucks.py
- Upload CSV files to the
incomingcsvs-
S3 bucket to trigger processing. - Retrieve test records:
python allrecords.py
- Retrieve generated maps:
python getmap.py