Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Spark 1.3 #384

Merged
merged 11 commits into from
Mar 16, 2015
Merged

Support Spark 1.3 #384

merged 11 commits into from
Mar 16, 2015

Conversation

Leemoonsoo
Copy link
Contributor

Spark 1.3 is released.
This PR make Zeppelin work with Spark 1.3.

  • Add profile
  • Make Zeppelin build with Spark 1.3
  • Take care of SchemaRDD -> DataFrame
  • Test on cluster environment

@swkimme
Copy link
Contributor

swkimme commented Mar 13, 2015

You're moving so fast!

@Leemoonsoo
Copy link
Contributor Author

Ready to be merged!

image

Note that implicit conversion from RDD -> DataFrame is not working.
ie. following code is failing

case class Person(name:String)
val person = sc.parallelize(List(Person("hello"), Person("world")))
person.registerTempTable("person")  // fails

The same problem exists in spark-shell, too.

@swkimme
Copy link
Contributor

swkimme commented Mar 14, 2015

Great demo! +1 for merge.

On 2015년 3월 14일 (토) 11:50 Lee moon soo notifications@github.com wrote:

Ready to be merged!

[image: image]
https://cloud.githubusercontent.com/assets/1540981/6649957/bda0b1d4-ca3d-11e4-908e-da6ad1bd172d.png

Note that implicit conversion from RDD -> DataFrame is not working.
ie. following code is failing

case class Person(name:String)val person = sc.parallelize(List(Person("hello"), Person("world")))
person.registerTempTable("person") // fails

The same problem exists in spark-shell, too.


Reply to this email directly or view it on GitHub
#384 (comment).

@syepes
Copy link

syepes commented Mar 14, 2015

+1 will be test driving 1.3 this week

@Leemoonsoo
Copy link
Contributor Author

According to https://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sqlcontext, HiveContext is preferable one than using SparkContext.

I pushed one more change that if HiveContext (hive related dependency is loaded) is available, use it instead of SparkContext

@syepes
Copy link

syepes commented Mar 14, 2015

@Leemoonsoo have you tried:

person.toDF.registerTempTable("person") 

@syepes
Copy link

syepes commented Mar 14, 2015

@felixcheung I am with you, It would be better to leave it up to the user to make the choice. I personally use the CassandraSQLContext

@Leemoonsoo
Copy link
Contributor Author

@syepes Thanks for letting know a way to registerTempTable.

https://github.com/apache/spark/blob/v1.3.0/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L1022
spark's createSQLContext() api always creates HiveContext, creates SQLContext when it fails.
so i tried to the same in Zeppelin. But i agree on giving user an option.

Then, @felixcheung, @syepes, how about bringing zeppelin.spark.useHiveContext property back with default value 'true'? Which is just removed in this PullRequest. Previously, default value was 'false'

@syepes
Copy link

syepes commented Mar 15, 2015

@Leemoonsoo No problem, the usage of the useHiveContext is a good alternative.
I will be updating my fork that has the useCassandraSqlContext option with you changes.

Thanks for the work on 1.3

@felixcheung
Copy link
Contributor

@Leemoonsoo sounds good to me too.

@geekflyer
Copy link

@Leemoonsoo I've tried to execute your very exact same example, however it appears the df val is not passed along to the sql editor and I get the message no such table List(df).

I'm running running spark standalone with 2 workers. Spark version 1.3.0 on Ubuntu 14.04 LTS.
Zeppelin was built using mvn clean package -Pspark-1.3 -Dhadoop.version=2.2.0 -Phadoop-2.2 -DskipTests from the spark_1.3 branch.

Any idea what causes the problem?

image

@Leemoonsoo
Copy link
Contributor Author

@geekflyer
Thanks for trying this branch. I missed one statement in my screenshot. You need register DataFrame as a table before making sql query, like

df.toDF.registerTempTable("df")

@Leemoonsoo
Copy link
Contributor Author

zeppelin.spark.useHiveContext property is restored with defaultValue 'true'.
I'm merging it if there're no more issues on this branch!

@swkimme
Copy link
Contributor

swkimme commented Mar 16, 2015

+1 for merge

Leemoonsoo added a commit that referenced this pull request Mar 16, 2015
@Leemoonsoo Leemoonsoo merged commit c84347d into master Mar 16, 2015
@Leemoonsoo Leemoonsoo deleted the spark_1.3 branch March 16, 2015 08:18
@geekflyer
Copy link

@Leemoonsoo Thanks for you help. Now it works completely fine :-)

epahomov pushed a commit to epahomov/zeppelin that referenced this pull request Jul 23, 2016
Linked **[JIRA]**

[JIRA]: https://issues.apache.org/jira/browse/ZEPPELIN-382?jql=project%20%3D%20ZEPPELIN

Author: DuyHai DOAN <doanduyhai@gmail.com>

Closes ZEPL#384 from doanduyhai/CassandraInterpreterDocumentation and squashes the following commits:

b0bf36a [DuyHai DOAN] [ZEPPELIN-382] Add Documentation for Cassandra interpreter in the doc pages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants