Versatile Data Kit is a data engineering framework that enables Data Engineers to develop, troubleshoot, deploy, run, and manage data processing workloads (referred to as "Data Jobs"). A "Data Job" enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database.
Versatile Data Kit provides an abstraction layer that helps solve common data engineering problems. It can be called by the workflow engine with the goal of making data engineers more efficient (for example, it ensures data applications are packaged, versioned and deployed correctly, while dealing with credentials, retries, reconnects, etc.). Everything exposed by Versatile Data Kit provides built-in monitoring, troubleshooting, and smart notification capabilities. For example, tracking both code and data modifications and the relations between them enables engineers to troubleshoot more quickly and provides an easy revert to a stable version.
Versatile Data Kit consists of:
- Control Service which enables creating, deploying, managing and executing Data Jobs in a Kubernetes runtime environment. It offers multitenancy support, SSO, Access Control and auditing capabilities. It exposes CLI.
- A development Kit to develop, test and run Data Jobs on your machine. It comes with common functionality for data ingestion and processing.
pip install -U pip setuptools wheel
pip install quickstart-vdk
Note that Versatile Data Kit requires Python 3.7+.
See the Installation page for more details.
# see Help to see what you can do
vdk --help
Check out the Getting Started page to create and run your first Data Job.
Official documentation for Versatile Data Kit can be found here.
If you are interested in contributing as a developer, visit CONTRIBUTING.md.
Feedback is very welcome via the GitHub site as issues or pull requests
Join our public slack workspace or our mailing list or follow us on twitter. Subscribe to Versatile Data Kit YouTube Channel
Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels and mailing lists is expected to follow the Code of Conduct.