Skip to content

[SPARK-16294][SQL] Labelling support for the include_example Jekyll plugin #13972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions docs/_plugins/include_example.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,26 @@ def render(context)
@code_dir = File.join(site.source, config_dir)

clean_markup = @markup.strip
@file = File.join(@code_dir, clean_markup)
@lang = clean_markup.split('.').last

parts = clean_markup.strip.split(' ')
if parts.length > 1 then
@snippet_label = ':' + parts[0]
snippet_file = parts[1]
else
@snippet_label = ''
snippet_file = parts[0]
end

@file = File.join(@code_dir, snippet_file)
@lang = snippet_file.split('.').last

code = File.open(@file).read.encode("UTF-8")
code = select_lines(code)

rendered_code = Pygments.highlight(code, :lexer => @lang)

hint = "<div><small>Find full example code at " \
"\"examples/src/main/#{clean_markup}\" in the Spark repo.</small></div>"
"\"examples/src/main/#{snippet_file}\" in the Spark repo.</small></div>"

rendered_code + hint
end
Expand All @@ -66,13 +76,13 @@ def select_lines(code)
# Select the array of start labels from code.
startIndices = lines
.each_with_index
.select { |l, i| l.include? "$example on$" }
.select { |l, i| l.include? "$example on#{@snippet_label}$" }
.map { |l, i| i }

# Select the array of end labels from code.
endIndices = lines
.each_with_index
.select { |l, i| l.include? "$example off$" }
.select { |l, i| l.include? "$example off#{@snippet_label}$" }
.map { |l, i| i }

raise "Start indices amount is not equal to end indices amount, see #{@file}." \
Expand All @@ -92,7 +102,10 @@ def select_lines(code)
if start == endline
lastIndex = endline
range = Range.new(start + 1, endline - 1)
result += trim_codeblock(lines[range]).join
trimmed = trim_codeblock(lines[range])
# Filter out possible example tags of overlapped labels.
taggs_filtered = trimmed.select { |l| !l.include? '$example ' }
result += taggs_filtered.join
result += "\n"
end
result
Expand Down
41 changes: 6 additions & 35 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,52 +57,23 @@ Throughout this document, we will often refer to Scala/Java Datasets of `Row`s a
<div class="codetabs">
<div data-lang="scala" markdown="1">

The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.build()`:

{% highlight scala %}
import org.apache.spark.sql.SparkSession

val spark = SparkSession.build()
.master("local")
.appName("Word Count")
.config("spark.some.config.option", "some-value")
.getOrCreate()

// this is used to implicitly convert an RDD to a DataFrame.
import spark.implicits._
{% endhighlight %}
The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`:

{% include_example init_session scala/org/apache/spark/examples/sql/RDDRelation.scala %}
</div>

<div data-lang="java" markdown="1">

The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.build()`:
The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`:

{% highlight java %}
import org.apache.spark.sql.SparkSession

SparkSession spark = SparkSession.build()
.master("local")
.appName("Word Count")
.config("spark.some.config.option", "some-value")
.getOrCreate();
{% endhighlight %}
{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQL.java %}
</div>

<div data-lang="python" markdown="1">

The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.build`:

{% highlight python %}
from pyspark.sql import SparkSession

spark = SparkSession.build \
.master("local") \
.appName("Word Count") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
{% endhighlight %}
The entry point into all functionality in Spark is the [`SparkSession`](api/python/pyspark.sql.html#pyspark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder`:

{% include_example init_session python/sql.py %}
</div>

<div data-lang="r" markdown="1">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
// $example on:init_session$
import org.apache.spark.sql.SparkSession;
// $example off:init_session$

public class JavaSparkSQL {
public static class Person implements Serializable {
Expand All @@ -51,10 +53,13 @@ public void setAge(int age) {
}

public static void main(String[] args) throws Exception {
// $example on:init_session$
SparkSession spark = SparkSession
.builder()
.appName("JavaSparkSQL")
.config("spark.some.config.option", "some-value")
.getOrCreate();
// $example off:init_session$

System.out.println("=== Data source: RDD ===");
// Load a text file and convert each line to a Java Bean.
Expand Down
5 changes: 5 additions & 0 deletions examples/src/main/python/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,20 @@
import os
import sys

# $example on:init_session$
from pyspark.sql import SparkSession
# $example off:init_session$
from pyspark.sql.types import Row, StructField, StructType, StringType, IntegerType


if __name__ == "__main__":
# $example on:init_session$
spark = SparkSession\
.builder\
.appName("PythonSQL")\
.config("spark.some.config.option", "some-value")\
.getOrCreate()
# $example off:init_session$

# A list of Rows. Infer schema from the first row, create a DataFrame and print the schema
rows = [Row(name="John", age=19), Row(name="Smith", age=23), Row(name="Sarah", age=18)]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,27 @@
// scalastyle:off println
package org.apache.spark.examples.sql

import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.SaveMode
// $example on:init_session$
import org.apache.spark.sql.SparkSession
// $example off:init_session$

// One method for defining the schema of an RDD is to make a case class with the desired column
// names and types.
case class Record(key: Int, value: String)

object RDDRelation {
def main(args: Array[String]) {
// $example on:init_session$
val spark = SparkSession
.builder
.appName("RDDRelation")
.appName("Spark Examples")
.config("spark.some.config.option", "some-value")
.getOrCreate()

// Importing the SparkSession gives access to all the SQL functions and implicit conversions.
import spark.implicits._
// $example off:init_session$

val df = spark.createDataFrame((1 to 100).map(i => Record(i, s"val_$i")))
// Any RDD containing case classes can be used to create a temporary view. The schema of the
Expand Down