- What is Blurr
- Is Blurr for you?
- Playground
- Tutorial & Docs
- Contribute
- Data Science 'Joel Test'
- Roadmap
Blurr transforms structured, streaming raw data into features for model training and prediction using a high-level expressive YAML-based language called the Blurr Transform Spec (BTS). The BTS merges the schema and computation model for data processing.
The BTS is a data transform definition for structured data. The BTS encapsulates the business logic of data transforms and Blurr orchestrates the execution of data transforms. Blurr is runner-agnostic, so BTSs can be run by event processors such as Spark, Spark Streaming or Flink.
Yes, if: you are well on your way on the ML 'curve of enlightenment', and are thinking about how to do online scoring
Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering --- Andrew Ng
Streaming BTS Tutorial | Window BTS Tutorial
Preparing data for specific use cases using Blurr:
Welcome to the Blurr community! We are so glad that you share our passion for building MLOps!
Please create a new issue to begin a discussion. Alternatively, feel free to pick up an existing issue!
Please sign the Contributor License Agreement before raising a pull request.
Inspired by the (old school) Joel Test to rate software teams, here's our version for data science teams. What's your score?
- Data pipelines are versioned and reproducible
- Pipelines (re)build in one step
- Deploying to production needs minimal engineering help
- Successful ML is a long game. You play it like it is
- Kaizen. Experimentation and iterations are a way of life
Blurr is currently in Developer Preview. Stay in touch!: Star this project or email hello@blurr.ai
Local transformations onlySupport for custom functions and other python libraries in the BTSSpark runner- S3 support for data sink
- DynamoDB as an Intermediate Store
- Features server

