Skip to content

Commit

Permalink
Publish arlas-proc to cloudsmith #85
Browse files Browse the repository at this point in the history
  • Loading branch information
laurent-thiebaud-gisaia committed Jul 22, 2019
1 parent b241ece commit 39a7084
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 17 deletions.
55 changes: 49 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,18 +31,35 @@ It's very important to check the version of spark being used, here we are using

[Check Spark/ScyllaDB YAML](scripts/tests/docker-compose-standalone.yml)

# Build and deploy application jar
# Build and deploy application JAR

## Build locally
```bash
# Build jar
sbt clean assembly
```

## Deploy JAR to Cloudsmith

You need to set up the following environment variables first:
- CLOUDSMITH_USER
- CLOUDSMITH_API_KEY (see [https://cloudsmith.io/user/settings/api/])

### Deploy thin JAR

```bash
sbt clean publish
```

# Deploy jar
# Ensure to provide your Google Cloud Storage credentials
# @see : https://github.com/Tapad/sbt-gcs#credentials
sbt [-DgcsProject=arlas-lsfp] [-DgcsBucket=arlas-proc] [-DgcsBucketPath=/artifacts] gcs:publish
### Deploy fat JAR

This deploys a fat jar, ready to be used from GCP Dataproc to start processing.

```bash
sbt clean "project arlasProcAssembly" publish
```


# Integration tests

```bash
Expand All @@ -51,7 +68,7 @@ sbt [-DgcsProject=arlas-lsfp] [-DgcsBucket=arlas-proc] [-DgcsBucketPath=/artifac

# User guide

### spark-shell example
## Start spark-shell locally

Start ScyllaDB and Elasticsearch clusters. For example :
```bash
Expand Down Expand Up @@ -83,6 +100,32 @@ docker run -ti \

$CLOUDSMITH_TOKEN is required when using ML models from cloudsmith. Its value should be asked to a developer.

## Start spark-shell on GCP

Once the cluster is started, open an SSH session to the master node.

First, [Set the CLOUDSMITH_TOKEN](#ARLAS-ML-dependency)

Then copy-paste the following:

```bash
spark-shell \
--packages datastax:spark-cassandra-connector:2.3.1-s_2.11,org.elasticsearch:elasticsearch-spark-20_2.11:6.4.0,org.geotools:gt-referencing:20.1,org.geotools:gt-geometry:20.1,org.geotools:gt-epsg-hsql:20.1 \
--exclude-packages javax.media:jai_core \
--repositories http://repo.boundlessgeo.com/main,http://download.osgeo.org/webdav/geotools/ \
--jars https://dl.cloudsmith.io/$CLOUDSMITH_TOKEN/gisaia/arlas/maven/io/arlas/arlas-proc-assembly_2.11/0.3.0-SNAPSHOT/arlas-proc-assembly_2.11-0.3.0-SNAPSHOT.jar \
--conf spark.es.nodes="gisaia-elasticsearch" \
--conf spark.es.index.auto.create="true" \
--conf spark.cassandra.connection.host="gisaia-scylla-db" \
--conf spark.driver.allowMultipleContexts="true" \
--conf spark.rpc.netty.dispatcher.numThreads="2" \
--conf spark.driver.CLOUDSMITH_TOKEN="$CLOUDSMITH_TOKEN"
```

You may also use a specific hosted JAR, eg. `arlas-proc-assembly_2.11-0.3.0-20190717.101238-7.jar`

## Spark-shell example

Paste (using `:paste`) the following code snippet :
```scala
import io.arlas.data.sql._
Expand Down
29 changes: 20 additions & 9 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ val geotools = Seq(gtReferencing, gtGeometry)
val arlasMl = "io.arlas" %% "arlas-ml" % "0.1.0"
val arlas = Seq(arlasMl)

lazy val arlasData = (project in file("."))
lazy val arlasProc = (project in file("."))
.settings(
name := "arlas-proc",
libraryDependencies ++= spark,
Expand All @@ -36,14 +36,25 @@ lazy val arlasData = (project in file("."))
libraryDependencies ++= geotools,
libraryDependencies ++= arlas,
libraryDependencies += scalaTest % Test
)

// publish artifact to GCP
enablePlugins(GcsPlugin)
gcsProjectId := sys.props.getOrElse("gcsProject", default = "arlas-lsfp")
gcsBucket := sys.props.getOrElse("gcsBucket", default = "arlas-proc")+sys.props.getOrElse("gcsBucketPath", default = "/artifacts")
)

gcsLocalArtifactPath := (assemblyOutputPath in assembly).value
publish := publish.dependsOn(assembly).value
//publish to external repo
ThisBuild / publishTo := { Some("Cloudsmith API" at "https://maven.cloudsmith.io/gisaia/private/") }
ThisBuild / pomIncludeRepository := { x => false }
ThisBuild / credentials += Credentials("Cloudsmith API", "maven.cloudsmith.io", sys.env.getOrElse("CLOUDSMITH_USER", ""), sys.env.getOrElse("CLOUDSMITH_API_KEY", ""))

//publish also assembly jar
test in assembly := {}
assemblyJarName in assembly := s"${name.value}_${version.value}-assembly.jar"
lazy val arlasProcAssembly = project
.dependsOn(arlasProc)
.settings(
publishArtifact in (Compile, packageBin) := false,
publishArtifact in (Compile, packageDoc) := false,
publishArtifact in (Compile, packageSrc) := false,
name := "arlas-proc-assembly",
artifact in (Compile, assembly) ~= { art =>
art.withClassifier(Some("assembly"))
},
addArtifact(artifact in (Compile, assembly), assembly)
)
3 changes: 1 addition & 2 deletions project/plugins.sbt
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
addSbtPlugin("com.tapad.sbt" % "sbt-gcs" % "0.2.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.9")

0 comments on commit 39a7084

Please sign in to comment.