Skip to content

Commit

Permalink
added scala contributions from colinlouie (yugabyte#4133)
Browse files Browse the repository at this point in the history
  • Loading branch information
schoudhury authored Apr 2, 2020
1 parent edcba0e commit 65315e1
Show file tree
Hide file tree
Showing 19 changed files with 462 additions and 742 deletions.
Original file line number Diff line number Diff line change
@@ -1,22 +1,12 @@
## Maven

Add the following snippet to your `pom.xml` for Scala 2.10:

```xml
<dependency>
<groupId>com.yugabyte.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.5-yb-2</version>
</dependency>
```

For Scala 2.11:
To build your Java application using the YugabyteDB Spark Connector for YCQL, add the following snippet to your `pom.xml` for Scala 2.11:

```xml
<dependency>
<groupId>com.yugabyte.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.5-yb-2</version>
<version>2.4-yb</version>
</dependency>
```

Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@

## PySpark

Start PySpark with for Scala 2.10:
To build your Python application using the YugabyteDB Spark Connector for YCQL, start PySpark with the following for Scala 2.11:

```sh
$ pyspark --packages com.yugabyte.spark:spark-cassandra-connector_2.10:2.0.5-yb-2
```

For Scala 2.11:

```sh
$ pyspark --packages com.yugabyte.spark:spark-cassandra-connector_2.11:2.0.5-yb-2
$ pyspark --packages com.yugabyte.spark:spark-cassandra-connector_2.11:2.4-yb
```
Original file line number Diff line number Diff line change
@@ -1,7 +1,281 @@
## sbt

Add the following library dependency to your project configuration:
To build your Scala application using the YugabyteDB Spark Connector for YCQL, add the following sbt dependency to your application:

```
libraryDependencies += "com.yugabyte.spark" %% "spark-cassandra-connector" % "2.0.5-yb-2"
libraryDependencies += "com.yugabyte.spark" %% "spark-cassandra-connector" % "2.4-yb"
```

## Sample application

This tutorial assumes that you have:

- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [quick start guide](../../../../api/ycql/quick-start/).

- installed Scala version 2.12+ and sbt 1.3.8+

- installed the [`sbt-assembly`](https://github.com/sbt/sbt-assembly) plugin in your sbt project as shown below.
```sh
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
```

### Create the sbt build file

Create a sbt build file `build.sbt` and add the following content into it.

```sbt
name := "CassandraSparkWordCount"
version := "1.0"
scalaVersion := "2.11.12"
scalacOptions := Seq("-unchecked", "-deprecation")

val sparkVersion = "2.4.4"

// maven repo at https://mvnrepository.com/artifact/com.yugabyte.spark/spark-cassandra-connector
libraryDependencies += "com.yugabyte.spark" %% "spark-cassandra-connector" % "2.4-yb"

// maven repo at https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % sparkVersion % Provided

// maven repo at https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion % Provided
```

### Writing a sample app

Copy the following contents into the file `CassandraSparkWordCount.scala`.

```scala
package com.yugabyte.sample.apps

import com.datastax.spark.connector.cql.CassandraConnector
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession


object CassandraSparkWordCount {

val DEFAULT_KEYSPACE = "ybdemo";
val DEFAULT_INPUT_TABLENAME = "lines";
val DEFAULT_OUTPUT_TABLENAME = "wordcounts";

def main(args: Array[String]): Unit = {

// Setup the local spark master, with the desired parallelism.
val conf =
new SparkConf()
.setAppName("yb.wordcount")
.setMaster("local[*]")
.set("spark.cassandra.connection.host", "127.0.0.1")

val spark =
SparkSession
.builder
.config(conf)
.getOrCreate

// Create the Spark context object.
val sc = spark.sparkContext

// Create the Cassandra connector to Spark.
val connector = CassandraConnector.apply(conf)

// Create a Cassandra session, and initialize the keyspace.
val session = connector.openSession


//------------ Setting Input source (Cassandra table only) -------------\\

val inputTable = DEFAULT_KEYSPACE + "." + DEFAULT_INPUT_TABLENAME

// Drop the sample table if it already exists.
session.execute(s"DROP TABLE IF EXISTS ${inputTable};")

// Create the input table.
session.execute(
s"""
CREATE TABLE IF NOT EXISTS ${inputTable} (
id INT,
line VARCHAR,
PRIMARY KEY(id)
);
"""
)

// Insert some rows.
val prepared = session.prepare(
s"""
INSERT INTO ${inputTable} (id, line) VALUES (?, ?);
"""
)

val toInsert = Seq(
(1, "ten nine eight seven six five four three two one"),
(2, "ten nine eight seven six five four three two"),
(3, "ten nine eight seven six five four three"),
(4, "ten nine eight seven six five four"),
(5, "ten nine eight seven six five"),
(6, "ten nine eight seven six"),
(7, "ten nine eight seven"),
(8, "ten nine eight"),
(9, "ten nine"),
(10, "ten")
)

for ((id, line) <- toInsert) {
// Note: new Integer() is required here to impedance match with Java
// since Scala Int != Java Integer.
session.execute(prepared.bind(new Integer(id), line))
}


//------------- Setting Output location (Cassandra table) --------------\\

val outTable = DEFAULT_KEYSPACE + "." + DEFAULT_OUTPUT_TABLENAME

// Drop the output table if it already exists.
session.execute(s"DROP TABLE IF EXISTS ${outTable};")

// Create the output table.
session.execute(
s"""
CREATE TABLE IF NOT EXISTS ${outTable} (
word VARCHAR PRIMARY KEY,
count INT
);
"""
)


//--------------------- Read from Cassandra table ----------------------\\

// Read rows from table as a DataFrame.
val df =
spark
.read
.format("org.apache.spark.sql.cassandra")
.options(
Map(
"keyspace" -> DEFAULT_KEYSPACE, // "ybdemo".
"table" -> DEFAULT_INPUT_TABLENAME // "lines".
)
)
.load


//------------------------ Perform Word Count --------------------------\\

import spark.implicits._

// ----------------------------------------------------------------------
// Example with RDD.
val wordCountRdd =
df.select("line")
.rdd // reduceByKey() operates on PairRDDs. Start with a simple RDD.
// Similar to: https://spark.apache.org/examples.html
// vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
.flatMap(x => x.getString(0).split(" ")) // This creates the PairRDD.
.map(word => (word, 1))
.reduceByKey(_ + _)
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

// This is not used for saving, but it could be.
wordCountRdd
.toDF("word", "count") // convert to DataFrame for pretty printing.
.show

// ----------------------------------------------------------------------
// Example using DataFrame.
val wordCountDf =
df.select("line")
.flatMap(x => x.getString(0).split(" "))
.groupBy("value").count // flatMap renames column to "value".
.toDF("word", "count") // rename columns.

wordCountDf.show


//---------------------- Save to Cassandra table -----------------------\\

// -----------------------------------------------------------------------
// Save the output to the CQL table, using RDD as the source.
// This has been tested to be fungible with the DataFrame->CQL code block.

/* Comment this line out to enable this code block.
// This import (for this example) is only needed for the
// <RDD>.wordCountRdd.saveToCassandra() call.
import com.datastax.spark.connector._
wordCountRdd.saveToCassandra(
DEFAULT_KEYSPACE, // "ybdemo".
DEFAULT_OUTPUT_TABLENAME, // "wordcounts".
SomeColumns(
"word", // first column name.
"count" // second column name.
)
)
// */

// ----------------------------------------------------------------------
// Save the output to the CQL table, using DataFrame as the source.

// /* Uncomment this line out to disable this code block.
wordCountDf
.write
.format("org.apache.spark.sql.cassandra")
.options(
Map(
"keyspace" -> DEFAULT_KEYSPACE, // "ybdemo".
"table" -> DEFAULT_OUTPUT_TABLENAME // "wordcounts".
)
)
.save
// */


// ----------------------------------------------------------------------
// Disconnect from Cassandra.
session.close

// Stop the Spark Session.
spark.stop
} // def main

} // object CassandraSparkWordCount
```

### Build and run the application

To build the JAR, run the following command.

```sh
$ sbt assembly
```

To run the program, run the following command.

```sh
$ spark-submit --class com.yugabyte.sample.apps.CassandraSparkWordCount \
target/scala-2.11/CassandraSparkWordCount-assembly-1.0.jar
```

You should see a table similar to the following as the output.

```
+-----+-----+
| word|count|
+-----+-----+
| two| 2|
|eight| 8|
|seven| 7|
| four| 4|
| one| 1|
| six| 6|
| ten| 10|
| nine| 9|
|three| 3|
| five| 5|
+-----+-----+
```



29 changes: 12 additions & 17 deletions docs/content/latest/quick-start/build-apps/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -150,23 +150,18 @@
</a>
</div>



<!-- <a class="section-link icon-offset" href="scala/">
<div class="icon">
<i class="fas fa-scala" aria-hidden="true"></i>
</div>
<div class="text">
Scala
<div class="col-12 col-md-6 col-lg-12 col-xl-6">
<a class="section-link icon-offset" href="scala/">
<div class="head">
<div class="icon">
<i class="icon-scala"></i>
</div>
<div class="title">Scala</div>
</div>
<div class="body">
Build applications using Scala.
</div>
</a>
</div>
</a>

<a class="section-link icon-offset" href="spark/">
<div class="icon">
<i class="fas fa-code" aria-hidden="true"></i>
</div>
<div class="text">
Spark
</div>
</a> -->
</div>
2 changes: 1 addition & 1 deletion docs/content/latest/quick-start/build-apps/go/ycql.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ showAsideToc: true

This tutorial assumes that you have:

- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [Quick start](../../../../quick-start/test-cassandra/).
- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [Quick start](../../../../api/ycql/quick-start/).
- installed Go version 1.8+

## Install the Go Cassandra driver
Expand Down
3 changes: 2 additions & 1 deletion docs/content/latest/quick-start/build-apps/java/ycql.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Use Java to build a YugabyteDB application
headerTitle: Build a Java app
linkTitle: Build a Java app
description: Follow this tutorial to use Java and YCQL to build a simple YugabyteDB application.
menu:
Expand Down Expand Up @@ -52,7 +53,7 @@ To build your Java application using the YugabyteDB Cassandra driver, add the fo

This tutorial assumes that you have:

- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [quick start guide](../../../../quick-start/test-cassandra/).
- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [quick start guide](../../../../api/ycql/quick-start/).
- installed JDK version 1.8+ and maven 3.3+

### Create the Maven build file
Expand Down
2 changes: 1 addition & 1 deletion docs/content/latest/quick-start/build-apps/nodejs/ycql.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ $ npm install yb-ycql-driver

This tutorial assumes that you have:

- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [quick start guide](../../../../quick-start/test-cassandra/).
- installed YugabyteDB, created a universe and are able to interact with it using the CQL shell. If not, please follow these steps in the [quick start guide](../../../../api/ycql/quick-start/).
- installed a recent version of `node`. If not, you can find install instructions [here](https://nodejs.org/en/download/).

We will be using the [async](https://github.com/caolan/async) JS utility to work with asynchronous Javascript. Install this by running the following command:
Expand Down
Loading

0 comments on commit 65315e1

Please sign in to comment.