aws-redshift

Star

Here are 88 public repositories matching this topic...

tokern / piicatcher

Star

Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub

python data database catalog snowflake data-catalog aws-athena aws-redshift phi pii aws-glue

Updated Jan 5, 2024
Python

aws / amazon-redshift-python-driver

Star

Redshift Python Connector. It supports Python Database API Specification v2.0.

data-science data-analysis amazon-redshift aws-redshift

Updated Jul 1, 2025
Python

alanchn31 / Movalytics-Data-Warehouse

Star

Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

docker airflow udacity sql spark analytics aws-s3 movie-database python3 pyspark data-engineering redshift movie-reviews movie-recommendation aws-redshift data-engineering-pipeline data-modelling data-warehouse-cloud data-engineer-nanodegree

Updated Jun 16, 2020
Python

shravan-kuchkula / udacity-data-eng-proj-1

Star

Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3

docker airflow data-pipelines aws-redshift

Updated Nov 22, 2021
Python

jackmleitch / StravaDataPipline

Star

🔄 🏃 EtLT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow

python aws airflow sql aws-s3 postgresql data-engineering aws-redshift

Updated Jun 21, 2022
Python

AnMol12499 / Reddit-Analytics-Integration-Platform

Star

Project was based on an interest in Data Engineering, ETL pipeline. It also provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage.

python docker airflow terraform aws-s3 dbt aws-redshift etl-pipeline google-studio

Updated Sep 12, 2023
Python

ismaildawoodjee / aws-data-pipeline

Star

A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):

python docker aws airflow sql etl terraform aws-s3 postgresql aws-emr data-engineering infrastructure-as-code aws-ec2 aws-iam elt data-pipeline aws-redshift apache-superset

Updated May 14, 2022
Python

moritzkoerber / covid-19-data-engineering-pipeline

Star

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

api docker aws spark apache-spark aws-lambda aws-s3 pyspark aws-ecr aws-cloudformation aws-redshift apache-airflow aws-glue aws-cdk great-expectations

Updated Nov 21, 2023
Python

kishlayjeet / Zomato-Twitter-Sentiment-Analysis-Data-Pipeline

Star

This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and services.

python airflow aws-lambda etl aws-s3 selenium pandas data-engineering nltk psycopg2 boto3 twitter-sentiment-analysis data-pipeline aws-redshift zomato-data-analysis twitter-data-pipeline sentiment-data-pipeline zomato-data-pipeline vedar-lexicon

Updated Aug 27, 2023
Python

vsouza / spark-kinesis-redshift

Star

Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark

python shell aws spark etl spark-streaming aws-kinesis aws-redshift aws-kinesis-stream etl-pipeline

Updated May 22, 2018
Python

twistedFantasy / aws

Star

The goal of this repository is to provide good and clear examples of Amazon CLI commands together with Amazon CDK to easily create any AWS services and resources

python aws-s3 python3 aws-sqs aws-ec2 aws-iam amazon-web-services aws-rds aws-vpc amazon-aws aws-codedeploy aws-route53 aws-elasticsearch aws-redshift aws-parameter-store aws-load-balancer aws-security-group aws-systemmanager amazon-cdk

Updated Dec 22, 2019
Python

aws-samples / zero-etl-architecture-patterns

Star

Zero-ETL integrations - Enable near real-time analytics on petabytes of transactional data

aws-redshift aws-aurora-database zero-etl

Updated Jan 24, 2025
Python

DivineSamOfficial / SmartCityProject

Star

Smart City Realtime Data Engineering Project

python aws kafka aws-s3 pyspark spark-streaming aws-ec2 aws-athena aws-redshift aws-glue aws-quicksight aws-glue-crawler aws-glue-data-catalog

Updated May 24, 2024
Python

FedericoSerini / DEND-Project-3-Data-Warehouse-AWS

Star

Project 3 - Data Engineering Nanodegree

aws aws-s3 data-engineering udacity-nanodegree aws-redshift

Updated Apr 26, 2019
Python

FedericoSerini / DEND-Project-5-Data-Pipelines

Star

Project 5 - Data Engineering Nanodegree

aws aws-s3 data-engineering data-pipelines udacity-nanodegree aws-redshift apache-airflow

Updated Jun 26, 2019
Python

mikecerton / The-Retail-ELT-Pipeline-End-To-End

Star

This project designs and implements an ETL pipeline using Apache Airflow (Docker Compose) to ingest, process, and store retail data. AWS S3 acts as the data lake, AWS Redshift as the data warehouse, and Looker Studio for visualization. [Data Engineer]

aws-s3 data-engineer aws-redshift apache-airflow etl-pipeline looker-studio

Updated Jul 7, 2025
Python

Huyen-P / DE_DWH_AWS_S3_RedShift

Star

building etl pipelines to migrate music json data/ metadata files (semi-structured data) into a relational database stored in AWS Redshift cluster

python sql vpc cloudshell datawarehouse cmdline aws-redshift

Updated Apr 1, 2024
Python

eduardofb / redshift-create-manifest

Star

Redshift script to create a MANIFEST file recursively

redshift aws-redshift redshift-manifest

Updated Jun 7, 2017
Python

eduardofb / redshift-remove-duplicates

Star

Remove duplicates entries from a Redshift cluster

remove-duplicates redshift aws-redshift

Updated May 15, 2017
Python

DimaKuriptya / RedditETL

Star

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

python docker redis airflow aws-s3 postgresql pandas celery aws-athena aws-redshift aws-glue

Updated Apr 11, 2024
Python

Improve this page

Add a description, image, and links to the aws-redshift topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-redshift topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-redshift

Here are 88 public repositories matching this topic...

tokern / piicatcher

aws / amazon-redshift-python-driver

alanchn31 / Movalytics-Data-Warehouse

shravan-kuchkula / udacity-data-eng-proj-1

jackmleitch / StravaDataPipline

AnMol12499 / Reddit-Analytics-Integration-Platform

ismaildawoodjee / aws-data-pipeline

moritzkoerber / covid-19-data-engineering-pipeline

kishlayjeet / Zomato-Twitter-Sentiment-Analysis-Data-Pipeline

vsouza / spark-kinesis-redshift

twistedFantasy / aws

aws-samples / zero-etl-architecture-patterns

DivineSamOfficial / SmartCityProject

FedericoSerini / DEND-Project-3-Data-Warehouse-AWS

FedericoSerini / DEND-Project-5-Data-Pipelines

mikecerton / The-Retail-ELT-Pipeline-End-To-End

Huyen-P / DE_DWH_AWS_S3_RedShift

eduardofb / redshift-create-manifest

eduardofb / redshift-remove-duplicates

DimaKuriptya / RedditETL

Improve this page

Add this topic to your repo