Skip to content

An open-source ML pipeline development platform

License

Notifications You must be signed in to change notification settings

chrisemoulton/sematic

 
 

Repository files navigation

Sematic Logo

The open-source Continuous Machine Learning Platform

Build ML pipelines with only Python, run on your laptop, or in the cloud.

PyPI CircleCI PyPI - License Python 3.8 Python 3.9 Python 3.10 Discord Made By Sematic PyPI - Downloads

Sematic Screenshot

Sematic is an open-source ML development platform. It lets ML Engineers and Data Scientists write arbitrarily complex end-to-end pipelines with simple Python and execute them on their local machine, in a cloud VM, or on a Kubernetes cluster to leverage cloud resources.

Sematic is based on learnings gathered at top self-driving car companies. It enables chaining data processing jobs (e.g. Apache Spark) with model training (e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into type-safe, traceable, reproducible end-to-end pipelines that can be monitored and visualized in a modern web dashboard.

Read our documentation and join our Discord channel.

Why Sematic

  • Easy onboarding – no deployment or infrastructure needed to get started, simply install Sematic locally and start exploring.
  • Local-to-cloud parity – run the same code on your local laptop and on your Kubernetes cluster.
  • End-to-end traceability – all pipeline artifacts are persisted, tracked, and visualizable in a web dashboard.
  • Access heterogeneous compute – customize required resources for each pipeline step to optimize your performance and cloud footprint (CPUs, memory, GPUs, Spark cluster, etc.)
  • Reproducibility – rerun your pipelines from the UI with guaranteed reproducibility of results

Getting Started

To get started locally, simply install Sematic in your Python environment:

$ pip install sematic

Start the local web dashboard:

$ sematic start

Run an example pipeline:

$ sematic run examples/mnist/pytorch

Create a new boilerplate project:

$ sematic new my_new_project

Or from an existing example:

$ sematic new my_new_project --from examples/mnist/pytorch

Then run it with:

$ python3 -m my_new_project

To deploy Sematic to Kubernetes and leverage cloud resources, see our documentation.

Features

  • Lightweight Python SDK – define arbitrarily complex end-to-end pipelines
  • Pipeline nesting – arbitrarily nest pipelines into larger pipelines
  • Dynamic graphs – Python-defined graphs allow for iterations, conditional branching, etc.
  • Lineage tracking – all inputs and outputs of all steps are persisted and tracked
  • Runtime type-checking – fail early with run-time type checking
  • Web dashboard – Monitor, track, and visualize pipelines in a modern web UI
  • Artifact visualization – visualize all inputs and outputs of all steps in the web dashboard
  • Local execution – run pipelines on your local machine without any deployment necessary
  • Cloud orchestration – run pipelines on Kubernetes to access GPUs and other cloud resources
  • Heterogeneous compute resources – run different steps on different machines (e.g. CPUs, memory, GPU, Spark, etc.)
  • Helm chart deployment – install Sematic on your Kubernetes cluster
  • Pipeline reruns – rerun pipelines from the UI from an arbitrary point in the graph
  • Step caching – cache expensive pipeline steps for faster iteration
  • Step retry – recover from transient failures with step retries
  • Metadata and collaboration – Tags, source code visualization, docstrings, notes, etc.
  • Numerous integrations – See below

Integrations

  • Apache Spark – on-demand in-cluster Spark cluster
  • Ray – on-demand Ray in-cluster Ray resources
  • Snowflake – easily query your data warehouse (other warehouses supported too)
  • Plotly, Matplotlib – visualize plot artifacts in the web dashboard
  • Pandas – visualize dataframe artifacts in the dashboard
  • Grafana – embed Grafana panels in the web dashboard
  • Bazel – integrate with your Bazel build system
  • Helm chart – deploy to Kubernetes with our Helm chart
  • Git – track git information in the web dashboard

Community and resources

Learn more about Sematic and get in touch with the following resources:

Contribute!

To contribute to Sematic, check out open issues tagged "good first issue", and get in touch with us on Discord. You can find instructions on how to get your development environment set up in our developer docs. If you'd like to add an example, you may also find this guide helpful.

scarf pixel

About

An open-source ML pipeline development platform

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 68.4%
  • TypeScript 25.5%
  • Starlark 5.5%
  • JavaScript 0.2%
  • Shell 0.2%
  • HTML 0.1%
  • Other 0.1%