Apache-Crunch/Tutorial

Apache Crunch is Java library provides a framework for writing, testing, and running MapReduce pipelines.
Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.

Apache Crunch runs on top of Apache Hadoop Map Reduce and Apache Spark.
Apache Crunch is a simple Java API for tasks like joining and data aggregation that are tedious to implement on plain MapReduce

The APIs are especially useful when processing data that does not fit naturally into relational model, such as time series, serialized object formats like protocol buffers or Avro records, and HBase rows and columns. 

Scrunch API, It is a scala API built on top of the Java APIs and includes a REPL (read-eval-print loop) for creating MapReduce pipelines.