pyspark

Star

Here are 13 public repositories matching this topic...

idealo / terraform-emr-pyspark

Star

Quickstart PySpark with Anaconda on AWS/EMR using Terraform

emr aws ecommerce cloud terraform python3 pyspark idealo

Updated Jan 7, 2025
HCL

alero-awani / batch-data-engineering-project

Star

A batch Data Pipeline that retrieves data from a user purchase table and a movie review table and is transformed to form a user behaviour metric table.

docker airflow sql pipeline terraform aws-s3 pyspark data-engineering-pipeline

Updated Aug 14, 2025
HCL

husqvarnagroup / terraform-dlt-public

Star

Deploying Delta Live Tables Pipelines on AWS with Terraform

python aws terraform pyspark infrastructure-as-code databricks delta-live-tables databricks-dlt

Updated Apr 18, 2023
HCL

Data-Bishop / Team5-BuildItAll-Data-Platform

Star

This repository contains the codebase for the BuildItAll Big Data Processing Platform, a case study project designed to manage large daily data for a hypothetical Belgian client.

aws airflow big-data terraform pyspark airflow-docker emr-cluster airflow-dags

Updated Jun 14, 2025
HCL

syedhassaanahmed / spark-with-engineering-fundamentals

Star

E2E Spark data pipelines with engineering fundamentals

apache-spark docker-compose terraform pyspark apache-kafka smoke-test azure-sql-database confluent-kafka azure-event-hubs azure-databricks azure-sql-server

Updated Oct 31, 2024
HCL

harinik05 / cleanflo-infra

Star

Project that incorporates TerraForm to create AWS infrastructure using S3, Lambda, and DynamoDB tables for ocean and river data 🐢

devops aws-lambda aws-kms terraform pyspark aws-dynamodb aws-glue githubactions

Updated Feb 25, 2024
HCL

HakeemSalaudeen / ETL-Pipeline-with-Amazon-EMR-and-Apache-Spark

Star

Implemented a data processing pipeline using Amazon EMR to transform monthly vendor sales data from CSV format to a clean, analyzed dataset accessible through Amazon Athena.

emr aws athena terraform bigdata s3 pyspark data-engineering iam-role

Updated Apr 10, 2025
HCL

nate-benton90 / emr_containers

Star

Via AWS resource for EMR on EKS (i.e. emr-containers), I provide a more grounded, applicable examples/explanations for using this service

emr kubernetes aws containers pyspark iaac eks

Updated Apr 20, 2025
HCL

escobarana / twitter_msk_emr

Star

Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API

emr serverless twitter-api amazon pyspark msk streaming-pipeline

Updated Sep 10, 2023
HCL

javi-domi / ON-datalake-poc

Star

Data lake project for a US based Insurance Company

terraform pyspark aws-glue redshift-spectrum

Updated Jun 23, 2023
HCL

HakeemSalaudeen / salesproject-batch-processing-on-AWS

Star

This repository contains Terraform code to deploy a serverless batch processing architecture on AWS, designed to replace an on-premises system with a scalable, reliable, and maintainable cloud solution.

aws etl terraform aws-s3 s3 glue pyspark vpc redshift batch-processing iam-role redshiftserverless

Updated Apr 2, 2025
HCL

yandex-cloud-examples / yc-data-proc-spark-pyspark

Star

Запуск и управление приложениями для Spark и PySpark в сервисе Yandex Data Processing.

spark pyspark yandex-cloud data-proc yandexcloud

Updated Jun 4, 2025
HCL

guilhermegandolfi / aws_avengers

Star

Esse repositório é para o aprendizado de recursos da aws, utilizando terraform, python e spark voltado para o universo de dados

python aws terraform pyspark

Updated Apr 19, 2023
HCL

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

Here are 13 public repositories matching this topic...

idealo / terraform-emr-pyspark

alero-awani / batch-data-engineering-project

husqvarnagroup / terraform-dlt-public

Data-Bishop / Team5-BuildItAll-Data-Platform

syedhassaanahmed / spark-with-engineering-fundamentals

harinik05 / cleanflo-infra

HakeemSalaudeen / ETL-Pipeline-with-Amazon-EMR-and-Apache-Spark

nate-benton90 / emr_containers

escobarana / twitter_msk_emr

javi-domi / ON-datalake-poc

HakeemSalaudeen / salesproject-batch-processing-on-AWS

yandex-cloud-examples / yc-data-proc-spark-pyspark

guilhermegandolfi / aws_avengers

Improve this page

Add this topic to your repo