Skip to content
#

dataplatform

Here are 42 public repositories matching this topic...

This ETL project was designed to demonstrate the development of a scalable data pipeline for customer sales analysis. It covers all essential steps, from data extraction to transformation and loading into a database, with Apache Airflow used.

  • Updated Aug 19, 2024
  • Python

The Spark Memory Configuration Calculator is designed to help data engineers and Spark developers quickly determine the optimal memory and core configurations for their Spark clusters. With this tool, you can avoid common pitfalls and ensure your cluster resources are used efficiently, leading to better performance and lower costs.

  • Updated Aug 15, 2024
  • Python

Improve this page

Add a description, image, and links to the dataplatform topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataplatform topic, visit your repo's landing page and select "manage topics."

Learn more