Skip to content

Commit

Permalink
Merge pull request #1196 from twitter/klin_execution_tutorial
Browse files Browse the repository at this point in the history
Add a new ExecutionApp tutorial
  • Loading branch information
johnynek committed Feb 18, 2015
2 parents 117a301 + e6aad51 commit 83eb5c3
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 3 deletions.
6 changes: 6 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,9 @@ matrix:
script:
- "scripts/build_assembly_no_test.sh scalding-core"
- "scripts/test_typed_tutorials.sh"

- scala: 2.10.4
env: BUILD="test execution tutorials"
script:
- "scripts/build_assembly_no_test.sh execution-tutorial"
- "scripts/test_execution_tutorial.sh"
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,17 @@ You can find more example code under [examples/](https://github.com/twitter/scal
## Documentation and Getting Started

* [**Getting Started**](https://github.com/twitter/scalding/wiki/Getting-Started) page on the [Scalding Wiki](https://github.com/twitter/scalding/wiki)
* [Scalding Scaladocs](http://twitter.github.com/scalding) provide details beyond the API References. Prefer using this as it's always up to date.
* [**REPL in Wonderland**](https://gist.github.com/johnynek/a47699caa62f4f38a3e2) a hands-on tour of the
scalding REPL requiring only git and java installed.
* [**Runnable tutorials**](https://github.com/twitter/scalding/tree/master/tutorial) in the source.
* The API Reference, including many example Scalding snippets:
* [Type-safe API Reference](https://github.com/twitter/scalding/wiki/Type-safe-api-reference)
* [Fields-based API Reference](https://github.com/twitter/scalding/wiki/Fields-based-API-Reference)
* [Scalding Scaladocs](http://twitter.github.com/scalding) provide details beyond the API References
* The Matrix Library provides a way of working with key-attribute-value scalding pipes:
* The [Introduction to Matrix Library](https://github.com/twitter/scalding/wiki/Introduction-to-Matrix-Library) contains an overview and a "getting started" example
* The [Matrix API Reference](https://github.com/twitter/scalding/wiki/Matrix-API-Reference) contains the Matrix Library API reference with examples
* [**Introduction to Scalding Execution**](https://github.com/twitter/scalding/wiki/Calling-Scalding-from-inside-your-application) contains general rules and examples of calling Scalding from inside another application.

Please feel free to use the beautiful [Scalding logo](https://drive.google.com/folderview?id=0B3i3pDi3yVgNbm9pMUdDcHFKVEk&usp=sharing) artwork anywhere.

Expand Down Expand Up @@ -124,6 +125,10 @@ Thanks for assistance and contributions:

* Sam Ritchie <http://twitter.com/sritchie>
* Aaron Siegel: <http://twitter.com/asiegel>
* Ian O'Connell <http://twitter.com/0x138>
* Alex Levenson <http://twitter.com/THISWILLWORK>
* Jonathan Coveney <http://twitter.com/jco>
* Kevin Lin <http://twitter.com/reconditesea>
* Brad Greenlee: <http://twitter.com/bgreenlee>
* Edwin Chen <http://twitter.com/edchedch>
* Arkajit Dey: <http://twitter.com/arkajit>
Expand All @@ -133,7 +138,6 @@ Thanks for assistance and contributions:
* Ning Liang <http://twitter.com/ningliang>
* Dmitriy Ryaboy <http://twitter.com/squarecog>
* Dong Wang <http://twitter.com/dongwang218>
* Kevin Lin <http://twitter.com/reconditesea>
* Josh Attenberg <http://twitter.com/jattenberg>
* Juliet Hougland <https://twitter.com/j_houg>
Expand Down
20 changes: 19 additions & 1 deletion project/Build.scala
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ object ScaldingBuild extends Build {
scaldingJdbc,
scaldingHadoopTest,
scaldingMacros,
maple
maple,
executionTutorial
)

lazy val formattingPreferences = {
Expand Down Expand Up @@ -430,4 +431,21 @@ object ScaldingBuild extends Build {
)
}
)

lazy val executionTutorial = Project(
id = "execution-tutorial",
base = file("tutorial/execution-tutorial"),
settings = sharedSettings
).settings(
name := "execution-tutorial",
libraryDependencies <++= (scalaVersion) { scalaVersion => Seq(
"org.scala-lang" % "scala-library" % scalaVersion,
"org.scala-lang" % "scala-reflect" % scalaVersion,
"org.apache.hadoop" % "hadoop-core" % hadoopVersion,
"org.slf4j" % "slf4j-api" % slf4jVersion,
"org.slf4j" % "slf4j-log4j12" % slf4jVersion,
"cascading" % "cascading-hadoop" % cascadingVersion
)
}
).dependsOn(scaldingCore)
}
24 changes: 24 additions & 0 deletions scripts/test_execution_tutorial.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
set -e # first error should stop execution of this script

# Identify the bin dir in the distribution, and source the common include script
BASE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )"/.. && pwd )"
source ${BASE_DIR}/scripts/common.sh
SHORT_SCALA_VERSION=${TRAVIS_SCALA_VERSION%.*}
SCALDING_VERSION=`cat ${BASE_DIR}/version.sbt`
SCALDING_VERSION=${SCALDING_VERSION#*\"}
SCALDING_VERSION=${SCALDING_VERSION%\"}


# also trap errors, to reenable terminal settings
trap onExit ERR
export CLASSPATH=tutorial/execution-tutorial/target/scala-${SHORT_SCALA_VERSION}/execution-tutorial-assembly-${SCALDING_VERSION}.jar
time java -jar tutorial/execution-tutorial/target/scala-${SHORT_SCALA_VERSION}/execution-tutorial-assembly-${SCALDING_VERSION}.jar \
com.twitter.scalding.tutorial.MyExecJob --local \
--input tutorial/data/hello.txt \
--output tutorial/data/execution_output.txt

# restore stty
SCALA_EXIT_STATUS=0
onExit


66 changes: 66 additions & 0 deletions tutorial/execution-tutorial/ExecutionTutorial.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
/*
Copyright 2012 Twitter, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package com.twitter.scalding.tutorial

import java.io._
import scala.util.{Failure, Success}

import com.twitter.scalding._

/**
Tutorial of using Execution
This tutorial gives an example of use Execution to do MapReduce word count.
Instead of writing the results in reducers, it writes the data at submitter node.
To test it, first build the assembly jar from root directory:
./sbt execution-tutorial/assembly
Run:
scala -classpath tutorial/execution-tutorial/target/execution-tutorial-assembly-0.13.1.jar \
com.twitter.scalding.tutorial.MyExecJob --local \
--input tutorial/data/hello.txt \
--output tutorial/data/execution_output.txt
**/

object MyExecJob extends ExecutionApp {

override def job = Execution.getConfig.flatMap { config =>
val args = config.getArgs

TypedPipe.from(TextLine(args("input")))
.flatMap(_.split("\\s+"))
.map((_, 1L))
.sumByKey
.toIterableExecution
// toIterableExecution will materialize the outputs to submitter node when finish.
// We can also write the outputs on HDFS via .writeExecution(TypedTsv(args("output")))
.onComplete { t => t match {
case Success(iter) =>
val file = new PrintWriter(new File(args("output")))
iter.foreach { case (k, v) =>
file.write(s"$k\t$v\n")
}
file.close
case Failure(e) => println("Error: " + e.toString)
}
}
// use the result and map it to a Unit. Otherwise the onComplete call won't happen
.unit
}
}


0 comments on commit 83eb5c3

Please sign in to comment.