pyspark
Here are 13 public repositories matching this topic...
Deploying Delta Live Tables Pipelines on AWS with Terraform
-
Updated
Apr 18, 2023 - HCL
This repository contains the codebase for the BuildItAll Big Data Processing Platform, a case study project designed to manage large daily data for a hypothetical Belgian client.
-
Updated
Jun 14, 2025 - HCL
E2E Spark data pipelines with engineering fundamentals
-
Updated
Oct 31, 2024 - HCL
Project that incorporates TerraForm to create AWS infrastructure using S3, Lambda, and DynamoDB tables for ocean and river data 🐢
-
Updated
Feb 25, 2024 - HCL
Via AWS resource for EMR on EKS (i.e. emr-containers), I provide a more grounded, applicable examples/explanations for using this service
-
Updated
Apr 20, 2025 - HCL
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
-
Updated
Sep 10, 2023 - HCL
Data lake project for a US based Insurance Company
-
Updated
Jun 23, 2023 - HCL
This repository contains Terraform code to deploy a serverless batch processing architecture on AWS, designed to replace an on-premises system with a scalable, reliable, and maintainable cloud solution.
-
Updated
Apr 2, 2025 - HCL
Запуск и управление приложениями для Spark и PySpark в сервисе Yandex Data Processing.
-
Updated
Jun 4, 2025 - HCL
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."