Stars
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of awesome Apache Spark packages and resources.
All Algorithms implemented in Python
An extremely fast Python package and project manager, written in Rust.
OpenTofu lets you declaratively manage your cloud infrastructure.
The most intuitive desktop API client. Organize and execute REST, GraphQL, and gRPC requests in a simple and intuitive app.
Lightweight and extensible compatibility layer between dataframe libraries!
A curated list of awesome big data frameworks, ressources and other awesomeness.
pyspark methods to enhance developer productivity π£ π― π
A Python module for decorators, wrappers and monkey patching.
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Open-source scientific and technical publishing system built on Pandoc.
Apache DataFusion Ballista Distributed Query Engine
πΉ Cookiecutter template featuring the modern and extensible Python project manager hatch
A library that provides useful extensions to Apache Spark and PySpark.
PySpark test helper methods with beautiful error messages
π¨ Diagram as Code for prototyping cloud system architectures
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
βποΈ The minimal, blazing-fast, and infinitely customizable prompt for any shell!
ZenML π: The bridge between ML and Ops. https://zenml.io.