Data Pipeline for CSV Files

A serverless data pipeline that processes CSV files containing coordinates to generate maps and manages truck test records using AWS services.

Overview

This project implements a serverless data pipeline to process CSV files containing coordinates, generate maps, and store truck test records. It leverages AWS services such as S3, DynamoDB, Lambda, API Gateway, and Cognito for secure and scalable data processing and user authentication.

Workflow

CSV Upload: When a CSV file is uploaded to the IncomingCsv S3 bucket, a Lambda function is triggered.
Record Creation: The first Lambda function creates a test record in the RecordsTable DynamoDB table.
Map Generation: A second Lambda function generates a map from the CSV coordinates, saves it to the Maps S3 bucket, and updates the RecordsTable with the map file name.
Authentication: User authentication is managed via a Cognito User Pool.

Features

Serverless Processing: Uses AWS Lambda for event-driven CSV processing and map generation.
Secure Authentication: Integrates Cognito User Pool for user registration and authentication.
Scalable Storage: Stores truck configurations and test records in DynamoDB tables.
RESTful API: Provides API endpoints via API Gateway to manage truck records and retrieve data.
Map Visualization: Generates maps from coordinates using Pandas and Folium libraries.

Architecture

The application is built using the AWS Cloud Development Kit (CDK) and consists of several stacks:

CognitoStack (`cognito_stack.py`)

Manages user authentication and authorization.

Cognito User Pool: Supports self-sign-up, email verification, and user alias (email/username).
Cognito User Pool Client: Facilitates authentication flows, including user-password and Secure Remote Password (SRP).
Cognito Identity Pool: Grants authenticated users read-only access to S3 buckets via an IAM role.
Outputs:
- UserPoolId
- UserPoolClientId
- IdentityPoolId

TrucksDdbStack (`truck_ddb_stack.py`)

Handles truck configuration storage.

DynamoDB Table: TrucksTable with currentVin as the partition key.
Lambda Function: EnterTruckLambda, triggered by API Gateway to insert truck records.
IAM Role: Grants the Lambda function write access to TrucksTable.
Outputs:
- TrucksTableARN
- AddTruckLambdaARN

RecordsDdbStack (`records_ddb_stack.py`)

Processes CSV files and stores test records.

DynamoDB Table: RecordsTable with filename as the partition key.
Lambda Function: CsvLambda, triggered by S3 to process CSV files and insert data into RecordsTable.
S3 Buckets:
- incomingcsvs-: Stores uploaded CSV files and triggers CsvLambda.
- maps-: Stores generated maps.
Lambda Layer: Includes Pandas and Folium for map generation.
Outputs:
- CsvBucketName
- MapsBucketName

RestApiGWStack (`apigw_stack.py`)

Provides RESTful API endpoints.

API Gateway: RunlogRestApi serves as the entry point for API requests.
Cognito Authorizer: Secures API endpoints using Cognito User Pool.
API Methods:
- POST /addtruck: Adds truck records to TrucksTable.
- GET /alltrucks: Retrieves all truck records.
- GET /allrecords: Retrieves all test records.
IAM Role: Grants read access to TrucksTable and RecordsTable.

Application (`app.py`)

Orchestrates stack deployment and manages dependencies using AWS CDK.

Additional Files

Lambda functions: csv_lambda.py, maps_lambda.py, trucksdb_lambda.py.
Utility scripts: createuser.py, addtruck.py, alltrucks.py, allrecords.py, getmap.py.
Templates for data processing.

Prerequisites

AWS CLI: Installed and configured with appropriate credentials.
Node.js: Required for AWS CDK (version 14 or higher recommended).
Python: Version 3.8 or higher.
AWS CDK: Install via npm install -g aws-cdk.

Installation

Clone the repository:

git clone https://github.com/username/repo.git
cd runlog

Create and activate a virtual environment:

MacOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

Windows:

python -m venv .venv
.venv\Scripts\activate.bat

Install dependencies:
```
pip install -r requirements.txt
```
Synthesize the CloudFormation template:
```
cdk synth
```

Deploy the stacks:

cdk deploy --all

To skip manual approvals:

cdk deploy --all --require-approval=never

Usage

Populate the variables.py file with required values (e.g., bucket names, API endpoints).
Create a user in the Cognito User Pool:
```
python createuser.py
```
Note: All API calls require a JWT token from an authenticated Cognito user.
Add a truck configuration:
```
python addtruck.py
```
List all truck configurations:
```
python alltrucks.py
```
Upload CSV files to the incomingcsvs- S3 bucket to trigger processing.
Retrieve test records:
```
python allrecords.py
```
Retrieve generated maps:
```
python getmap.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
api_calls		api_calls
lambda_layers		lambda_layers
lambdas		lambdas
models		models
stacks		stacks
templates		templates
test-files		test-files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cdk.json		cdk.json
diagram.dot		diagram.dot
diagram.png		diagram.png
requirements.txt		requirements.txt
runlog.drawio		runlog.drawio
source.bat		source.bat
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Pipeline for CSV Files

Overview

Workflow

Table of Contents

Features

Architecture

CognitoStack (`cognito_stack.py`)

TrucksDdbStack (`truck_ddb_stack.py`)

RecordsDdbStack (`records_ddb_stack.py`)

RestApiGWStack (`apigw_stack.py`)

Application (`app.py`)

Additional Files

Prerequisites

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sidor2/data-pipeline-for-csv-files

Folders and files

Latest commit

History

Repository files navigation

Data Pipeline for CSV Files

Overview

Workflow

Table of Contents

Features

Architecture

CognitoStack (cognito_stack.py)

TrucksDdbStack (truck_ddb_stack.py)

RecordsDdbStack (records_ddb_stack.py)

RestApiGWStack (apigw_stack.py)

Application (app.py)

Additional Files

Prerequisites

Installation

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

CognitoStack (`cognito_stack.py`)

TrucksDdbStack (`truck_ddb_stack.py`)

RecordsDdbStack (`records_ddb_stack.py`)

RestApiGWStack (`apigw_stack.py`)

Application (`app.py`)

Packages