A quotation-based Scala DSL for scalable data analysis.
Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, declarative API which maximises reuse of native Scala syntax and constructs.
Emma supports state-of-the-art dataflow engines such as Apache Flink and Apache Spark as backend co-processors.
DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has two benefits.
First, it allows to reuse Scala-native, declarative constructs in the DSL.
Quoted Scala syntax such as
for
-comprehensions,
case-classes, and
pattern matching
are thereby lifted to an intermediate representation called Emma Core.
Second, it allows to analyze and optimize Emma Core terms holistically.
Subterms of type DataBag[A]
are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.
The emma-examples module contains examples from various fields.
- Graph Analysis
- Supervised Learning
- Unsupervised Learning
- Text Processing
Check emma-language.org for further information.
- JDK 7+ (preferably JDK 8)
- Maven 3
Run
mvn clean package -DskipTests
to build Emma without running any tests.
For more advanced build options including integration tests for the target runtimes please see the "Building Emma" section in the Wiki.