Skip to content

Commit 7c89a8f

Browse files
zhzhanmarmbrus
authored andcommitted
[SPARK-2706][SQL] Enable Spark to support Hive 0.13
Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support. Approach: Introduce “hive-version” property, and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically, 1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1 2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0) if no hive.version is specified. 3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1. 4. With this approach, nothing is changed with current hive-0.12 support. No change by default: sbt/sbt -Phive For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1 For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1). Author: Zhan Zhang <zhazhan@gmail.com> Author: zhzhan <zhazhan@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2241 from zhzhan/spark-2706 and squashes the following commits: 3ece905 [Zhan Zhang] minor fix 410b668 [Zhan Zhang] solve review comments cbb4691 [Zhan Zhang] change run-test for new options 0d4d2ed [Zhan Zhang] rebase 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default ab028d1 [Zhan Zhang] rebase 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241 b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706 2b50502 [Zhan Zhang] rebase a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb22863 [Zhan Zhang] correct the typo 20f6cf7 [Zhan Zhang] solve compatability issue f7912a9 [Zhan Zhang] rebase and solve review feedback 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 10c3565 [Zhan Zhang] address review comments 6bc9204 [Zhan Zhang] rebase and remove temparory repo d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706 cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ced0d7 [Zhan Zhang] rebase d9b981d [Zhan Zhang] rebase and fix error due to rollback adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dc7bdb3 [Zhan Zhang] solve conflicts 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d48bd18 [Zhan Zhang] address review comments 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 9412d24 [Zhan Zhang] address review comments f4af934 [Zhan Zhang] rebase 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 5f5619f [Zhan Zhang] restructure the directory and different hive version support 05d3683 [Zhan Zhang] solve conflicts e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark 87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master' 70964fe [Zhan Zhang] revert fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 70ffd93 [Zhan Zhang] revert 42585ec [Zhan Zhang] test 7d5fce2 [Zhan Zhang] test
1 parent 0e88661 commit 7c89a8f

File tree

19 files changed

+406
-63
lines changed

19 files changed

+406
-63
lines changed

assembly/pom.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,12 @@
197197
<artifactId>spark-hive_${scala.binary.version}</artifactId>
198198
<version>${project.version}</version>
199199
</dependency>
200+
</dependencies>
201+
</profile>
202+
<profile>
203+
<!-- TODO: Move this to "hive" profile once 0.13 JDBC is supported -->
204+
<id>hive-0.12.0</id>
205+
<dependencies>
200206
<dependency>
201207
<groupId>org.apache.spark</groupId>
202208
<artifactId>spark-hive-thriftserver_${scala.binary.version}</artifactId>

dev/run-tests

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ CURRENT_BLOCK=$BLOCK_BUILD
140140

141141
{
142142
# We always build with Hive because the PySpark Spark SQL tests need it.
143-
BUILD_MVN_PROFILE_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"
143+
BUILD_MVN_PROFILE_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-0.12.0"
144144

145145
echo "[info] Building Spark with these arguments: $BUILD_MVN_PROFILE_ARGS"
146146

@@ -167,7 +167,7 @@ CURRENT_BLOCK=$BLOCK_SPARK_UNIT_TESTS
167167
# If the Spark SQL tests are enabled, run the tests with the Hive profiles enabled.
168168
# This must be a single argument, as it is.
169169
if [ -n "$_RUN_SQL_TESTS" ]; then
170-
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive"
170+
SBT_MAVEN_PROFILES_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-0.12.0"
171171
fi
172172

173173
if [ -n "$_SQL_TESTS_ONLY" ]; then

docs/building-spark.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -97,12 +97,20 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
9797
mvn -Pyarn-alpha -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7 -DskipTests clean package
9898
{% endhighlight %}
9999

100+
<!--- TODO: Update this when Hive 0.13 JDBC is added -->
101+
100102
# Building With Hive and JDBC Support
101103
To enable Hive integration for Spark SQL along with its JDBC server and CLI,
102-
add the `-Phive` profile to your existing build options.
104+
add the `-Phive` profile to your existing build options. By default Spark
105+
will build with Hive 0.13.1 bindings. You can also build for Hive 0.12.0 using
106+
the `-Phive-0.12.0` profile. NOTE: currently the JDBC server is only
107+
supported for Hive 0.12.0.
103108
{% highlight bash %}
104-
# Apache Hadoop 2.4.X with Hive support
109+
# Apache Hadoop 2.4.X with Hive 13 support
105110
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
111+
112+
# Apache Hadoop 2.4.X with Hive 12 support
113+
mvn -Pyarn -Phive-0.12.0 -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
106114
{% endhighlight %}
107115

108116
# Spark Tests in Maven
@@ -111,8 +119,8 @@ Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.o
111119

112120
Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. The following is an example of a correct (build, test) sequence:
113121

114-
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package
115-
mvn -Pyarn -Phadoop-2.3 -Phive test
122+
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-0.12.0 clean package
123+
mvn -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test
116124

117125
The ScalaTest plugin also supports running only a specific test suite as follows:
118126

@@ -175,21 +183,21 @@ can be set to control the SBT build. For example:
175183

176184
Some of the tests require Spark to be packaged first, so always run `sbt/sbt assembly` the first time. The following is an example of a correct (build, test) sequence:
177185

178-
sbt/sbt -Pyarn -Phadoop-2.3 -Phive assembly
179-
sbt/sbt -Pyarn -Phadoop-2.3 -Phive test
186+
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 assembly
187+
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test
180188

181189
To run only a specific test suite as follows:
182190

183-
sbt/sbt -Pyarn -Phadoop-2.3 -Phive "test-only org.apache.spark.repl.ReplSuite"
191+
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 "test-only org.apache.spark.repl.ReplSuite"
184192

185193
To run test suites of a specific sub project as follows:
186194

187-
sbt/sbt -Pyarn -Phadoop-2.3 -Phive core/test
195+
sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 core/test
188196

189197
# Speeding up Compilation with Zinc
190198

191199
[Zinc](https://github.com/typesafehub/zinc) is a long-running server version of SBT's incremental
192200
compiler. When run locally as a background process, it speeds up builds of Scala-based projects
193201
like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
194202
Zinc. The project site gives instructions for building and running `zinc`; OS X users can
195-
install it using `brew install zinc`.
203+
install it using `brew install zinc`.

pom.xml

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,11 @@
127127
<hbase.version>0.94.6</hbase.version>
128128
<flume.version>1.4.0</flume.version>
129129
<zookeeper.version>3.4.5</zookeeper.version>
130-
<hive.version>0.12.0-protobuf-2.5</hive.version>
130+
<!-- Version used in Maven Hive dependency -->
131+
<hive.version>0.13.1</hive.version>
132+
<!-- Version used for internal directory structure -->
133+
<hive.version.short>0.13.1</hive.version.short>
134+
<derby.version>10.10.1.1</derby.version>
131135
<parquet.version>1.4.3</parquet.version>
132136
<jblas.version>1.2.3</jblas.version>
133137
<jetty.version>8.1.14.v20131031</jetty.version>
@@ -456,7 +460,7 @@
456460
<dependency>
457461
<groupId>org.apache.derby</groupId>
458462
<artifactId>derby</artifactId>
459-
<version>10.4.2.0</version>
463+
<version>${derby.version}</version>
460464
</dependency>
461465
<dependency>
462466
<groupId>com.codahale.metrics</groupId>
@@ -1308,16 +1312,31 @@
13081312
</dependency>
13091313
</dependencies>
13101314
</profile>
1311-
13121315
<profile>
1313-
<id>hive</id>
1316+
<id>hive-0.12.0</id>
13141317
<activation>
13151318
<activeByDefault>false</activeByDefault>
13161319
</activation>
1320+
<!-- TODO: Move this to "hive" profile once 0.13 JDBC is supported -->
13171321
<modules>
13181322
<module>sql/hive-thriftserver</module>
13191323
</modules>
1324+
<properties>
1325+
<hive.version>0.12.0-protobuf-2.5</hive.version>
1326+
<hive.version.short>0.12.0</hive.version.short>
1327+
<derby.version>10.4.2.0</derby.version>
1328+
</properties>
1329+
</profile>
1330+
<profile>
1331+
<id>hive-0.13.1</id>
1332+
<activation>
1333+
<activeByDefault>false</activeByDefault>
1334+
</activation>
1335+
<properties>
1336+
<hive.version>0.13.1</hive.version>
1337+
<hive.version.short>0.13.1</hive.version.short>
1338+
<derby.version>10.10.1.1</derby.version>
1339+
</properties>
13201340
</profile>
1321-
13221341
</profiles>
13231342
</project>

sql/hive/pom.xml

Lines changed: 31 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,6 @@
3636
</properties>
3737

3838
<dependencies>
39-
<dependency>
40-
<groupId>com.twitter</groupId>
41-
<artifactId>parquet-hive-bundle</artifactId>
42-
<version>1.5.0</version>
43-
</dependency>
4439
<dependency>
4540
<groupId>org.apache.spark</groupId>
4641
<artifactId>spark-core_${scala.binary.version}</artifactId>
@@ -116,7 +111,6 @@
116111
<scope>test</scope>
117112
</dependency>
118113
</dependencies>
119-
120114
<profiles>
121115
<profile>
122116
<id>hive</id>
@@ -144,6 +138,19 @@
144138
</plugins>
145139
</build>
146140
</profile>
141+
<profile>
142+
<id>hive-0.12.0</id>
143+
<activation>
144+
<activeByDefault>false</activeByDefault>
145+
</activation>
146+
<dependencies>
147+
<dependency>
148+
<groupId>com.twitter</groupId>
149+
<artifactId>parquet-hive-bundle</artifactId>
150+
<version>1.5.0</version>
151+
</dependency>
152+
</dependencies>
153+
</profile>
147154
</profiles>
148155

149156
<build>
@@ -154,6 +161,24 @@
154161
<groupId>org.scalatest</groupId>
155162
<artifactId>scalatest-maven-plugin</artifactId>
156163
</plugin>
164+
<plugin>
165+
<groupId>org.codehaus.mojo</groupId>
166+
<artifactId>build-helper-maven-plugin</artifactId>
167+
<executions>
168+
<execution>
169+
<id>add-default-sources</id>
170+
<phase>generate-sources</phase>
171+
<goals>
172+
<goal>add-source</goal>
173+
</goals>
174+
<configuration>
175+
<sources>
176+
<source>v${hive.version.short}/src/main/scala</source>
177+
</sources>
178+
</configuration>
179+
</execution>
180+
</executions>
181+
</plugin>
157182

158183
<!-- Deploy datanucleus jars to the spark/lib_managed/jars directory -->
159184
<plugin>

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ import org.apache.hadoop.hive.ql.Driver
3232
import org.apache.hadoop.hive.ql.metadata.Table
3333
import org.apache.hadoop.hive.ql.processors._
3434
import org.apache.hadoop.hive.ql.session.SessionState
35-
import org.apache.hadoop.hive.ql.stats.StatsSetupConst
3635
import org.apache.hadoop.hive.serde2.io.TimestampWritable
3736
import org.apache.hadoop.hive.serde2.io.DateWritable
3837

@@ -47,6 +46,7 @@ import org.apache.spark.sql.execution.ExtractPythonUdfs
4746
import org.apache.spark.sql.execution.QueryExecutionException
4847
import org.apache.spark.sql.execution.{Command => PhysicalCommand}
4948
import org.apache.spark.sql.hive.execution.DescribeHiveTableCommand
49+
import org.apache.spark.sql.hive.HiveShim
5050

5151
/**
5252
* DEPRECATED: Use HiveContext instead.
@@ -171,13 +171,15 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
171171

172172
val tableParameters = relation.hiveQlTable.getParameters
173173
val oldTotalSize =
174-
Option(tableParameters.get(StatsSetupConst.TOTAL_SIZE)).map(_.toLong).getOrElse(0L)
174+
Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize))
175+
.map(_.toLong)
176+
.getOrElse(0L)
175177
val newTotalSize = getFileSizeForTable(hiveconf, relation.hiveQlTable)
176178
// Update the Hive metastore if the total size of the table is different than the size
177179
// recorded in the Hive metastore.
178180
// This logic is based on org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
179181
if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
180-
tableParameters.put(StatsSetupConst.TOTAL_SIZE, newTotalSize.toString)
182+
tableParameters.put(HiveShim.getStatsSetupConstTotalSize, newTotalSize.toString)
181183
val hiveTTable = relation.hiveQlTable.getTTable
182184
hiveTTable.setParameters(tableParameters)
183185
val tableFullName =
@@ -282,29 +284,24 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) {
282284
*/
283285
protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String] = {
284286
try {
285-
// Session state must be initilized before the CommandProcessor is created .
286-
SessionState.start(sessionState)
287-
288287
val cmd_trimmed: String = cmd.trim()
289288
val tokens: Array[String] = cmd_trimmed.split("\\s+")
290289
val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
291-
val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf)
290+
val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
292291

293292
proc match {
294293
case driver: Driver =>
295-
driver.init()
296-
297-
val results = new JArrayList[String]
294+
val results = HiveShim.createDriverResultsArray
298295
val response: CommandProcessorResponse = driver.run(cmd)
299296
// Throw an exception if there is an error in query processing.
300297
if (response.getResponseCode != 0) {
301-
driver.destroy()
298+
driver.close()
302299
throw new QueryExecutionException(response.getErrorMessage)
303300
}
304301
driver.setMaxRows(maxRows)
305302
driver.getResults(results)
306-
driver.destroy()
307-
results
303+
driver.close()
304+
HiveShim.processResults(results)
308305
case _ =>
309306
sessionState.out.println(tokens(0) + " " + cmd_1)
310307
Seq(proc.run(cmd_1).getResponseCode.toString)

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import org.apache.hadoop.{io => hadoopIo}
2626
import org.apache.spark.sql.catalyst.expressions._
2727
import org.apache.spark.sql.catalyst.types
2828
import org.apache.spark.sql.catalyst.types._
29+
import org.apache.spark.sql.hive.HiveShim
2930

3031
/* Implicit conversions */
3132
import scala.collection.JavaConversions._
@@ -149,7 +150,7 @@ private[hive] trait HiveInspectors {
149150
case l: Long => l: java.lang.Long
150151
case l: Short => l: java.lang.Short
151152
case l: Byte => l: java.lang.Byte
152-
case b: BigDecimal => new HiveDecimal(b.underlying())
153+
case b: BigDecimal => HiveShim.createDecimal(b.underlying())
153154
case b: Array[Byte] => b
154155
case d: java.sql.Date => d
155156
case t: java.sql.Timestamp => t

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ import scala.util.parsing.combinator.RegexParsers
2222
import org.apache.hadoop.hive.metastore.api.{FieldSchema, SerDeInfo, StorageDescriptor, Partition => TPartition, Table => TTable}
2323
import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
2424
import org.apache.hadoop.hive.ql.plan.TableDesc
25-
import org.apache.hadoop.hive.ql.stats.StatsSetupConst
2625
import org.apache.hadoop.hive.serde2.Deserializer
2726

2827
import org.apache.spark.Logging
@@ -34,6 +33,7 @@ import org.apache.spark.sql.catalyst.plans.logical
3433
import org.apache.spark.sql.catalyst.plans.logical._
3534
import org.apache.spark.sql.catalyst.rules._
3635
import org.apache.spark.sql.catalyst.types._
36+
import org.apache.spark.sql.hive.HiveShim
3737
import org.apache.spark.util.Utils
3838

3939
/* Implicit conversions */
@@ -56,7 +56,7 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with
5656
val table = client.getTable(databaseName, tblName)
5757
val partitions: Seq[Partition] =
5858
if (table.isPartitioned) {
59-
client.getAllPartitionsForPruner(table).toSeq
59+
HiveShim.getAllPartitionsOf(client, table).toSeq
6060
} else {
6161
Nil
6262
}
@@ -185,7 +185,7 @@ object HiveMetastoreTypes extends RegexParsers {
185185
"bigint" ^^^ LongType |
186186
"binary" ^^^ BinaryType |
187187
"boolean" ^^^ BooleanType |
188-
"decimal" ^^^ DecimalType |
188+
HiveShim.metastoreDecimal ^^^ DecimalType |
189189
"date" ^^^ DateType |
190190
"timestamp" ^^^ TimestampType |
191191
"varchar\\((\\d+)\\)".r ^^^ StringType
@@ -272,13 +272,13 @@ private[hive] case class MetastoreRelation
272272
// of RPCs are involved. Besides `totalSize`, there are also `numFiles`, `numRows`,
273273
// `rawDataSize` keys (see StatsSetupConst in Hive) that we can look at in the future.
274274
BigInt(
275-
Option(hiveQlTable.getParameters.get(StatsSetupConst.TOTAL_SIZE))
275+
Option(hiveQlTable.getParameters.get(HiveShim.getStatsSetupConstTotalSize))
276276
.map(_.toLong)
277277
.getOrElse(sqlContext.defaultSizeInBytes))
278278
}
279279
)
280280

281-
val tableDesc = new TableDesc(
281+
val tableDesc = HiveShim.getTableDesc(
282282
Class.forName(
283283
hiveQlTable.getSerializationLib,
284284
true,

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
package org.apache.spark.sql.hive
1919

2020
import java.sql.Date
21-
21+
import org.apache.hadoop.hive.conf.HiveConf
22+
import org.apache.hadoop.hive.ql.Context
2223
import org.apache.hadoop.hive.ql.lib.Node
2324
import org.apache.hadoop.hive.ql.parse._
2425
import org.apache.hadoop.hive.ql.plan.PlanUtils
@@ -216,7 +217,18 @@ private[hive] object HiveQl {
216217
/**
217218
* Returns the AST for the given SQL string.
218219
*/
219-
def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql))
220+
def getAst(sql: String): ASTNode = {
221+
/*
222+
* Context has to be passed in hive0.13.1.
223+
* Otherwise, there will be Null pointer exception,
224+
* when retrieving properties form HiveConf.
225+
*/
226+
val hContext = new Context(new HiveConf())
227+
val node = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql, hContext))
228+
hContext.clear()
229+
node
230+
}
231+
220232

221233
/** Returns a LogicalPlan for a given HiveQL string. */
222234
def parseSql(sql: String): LogicalPlan = hqlParser(sql)

sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ import org.apache.spark.SerializableWritable
3434
import org.apache.spark.broadcast.Broadcast
3535
import org.apache.spark.rdd.{EmptyRDD, HadoopRDD, RDD, UnionRDD}
3636
import org.apache.spark.sql.catalyst.expressions._
37+
import org.apache.spark.sql.hive.HiveShim
3738

3839
/**
3940
* A trait for subclasses that handle table scans.
@@ -138,7 +139,7 @@ class HadoopTableReader(
138139
filterOpt: Option[PathFilter]): RDD[Row] = {
139140
val hivePartitionRDDs = partitionToDeserializer.map { case (partition, partDeserializer) =>
140141
val partDesc = Utilities.getPartitionDesc(partition)
141-
val partPath = partition.getPartitionPath
142+
val partPath = HiveShim.getDataLocationPath(partition)
142143
val inputPathStr = applyFilterIfNeeded(partPath, filterOpt)
143144
val ifc = partDesc.getInputFileFormatClass
144145
.asInstanceOf[java.lang.Class[InputFormat[Writable, Writable]]]

0 commit comments

Comments
 (0)