-
Notifications
You must be signed in to change notification settings - Fork 119
Hive Installation using Spark Engine
-
Make sure below environment variables exist in ~/.bashrc file. JAVA_HOME should point to you java installation directory.
-
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
-
export JRE_HOME=$JAVA_HOME/jre
-
export PATH=$PATH:$JAVA_HOME/bin
-
export HADOOP_HOME=/usr/local/hadoop
-
export PATH=$PATH:$HADOOP_HOME/bin
-
export PATH=$PATH:$HADOOP_HOME/sbin
-
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
-
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
-
export HADOOP_CLASSPATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/tools.jar
-
export HADOOP_MAPRED_HOME=$HADOOP_HOME
-
export HADOOP_COMMON_HOME=$HADOOP_HOME
-
export HADOOP_HDFS_HOME=$HADOOP_HOME
-
export HADOOP_YARN_HOME=$HADOOP_HOME
-
#HIVE
-
export HIVE_HOME=/usr/lib/hive/apache-hive-2.3.0-bin
-
PATH=$PATH:$HIVE_HOME/bin
-
export HIVE_CONF_DIR=$HIVE_HOME/conf
-
export PATH
-
#SPARK
-
export SPARK_HOME=/usr/lib/spark/spark-2.2.0-bin-hadoop2.7
-
PATH=$PATH:$SPARK_HOME/bin
-
export PATH
-
export HADOOP_YARN_HOME=$HADOOP_HOME
-
export YARN_CONF_DIR=$HADOOP_CONF_DIR
-
Reload environment variables
-
source ~/.bashrc
-
link scala and spark jars in Hive lib folder
-
cd $HIVE_HOME/lib
-
ln -s $SPARK_HOME/jars/scala-library*.jar
-
ln -s $SPARK_HOME/jars/spark-core*.jar
-
ln -s $SPARK_HOME/jars/spark-network-common*.jar
-
Add below configurations in hive-site.xml to use Spark execution engine
-
vi $HIVE_HOME/conf/hive-site.xml
-
<name>hive.execution.engine</name>
-
<value>spark</value>
-
<description>Use Map Reduce as default execution engine</description>
-
<name>spark.master</name>
-
<value>spark://localhost:7077</value>
-
<name>spark.eventLog.enabled</name>
-
<value>true</value>
-
<name>spark.eventLog.dir</name>
-
<value>/tmp</value>
-
<name>spark.serializer</name>
-
<value>org.apache.spark.serializer.KryoSerializer</value>
-
spark.yarn.jars
-
hdfs://localhost:54310/spark-jars/*
-
Make sure below properties exist in yarn-site.xml. If not add them. These jar paths are needed when using Spark as execution engine for hive. I had to use absolute paths instead of environment variables in below configuration. For some reason environment variables did not work. Make sure these paths refer to your hadoop installation directories.
-
vi $HADOOP_CONF_DIR/yarn-site.xml
-
yarn.application.classpath
-
/usr/local/hadoop/share/hadoop/mapreduce/,/usr/local/hadoop/share/hadoop/mapreduce/lib/,/usr/local/hadoop/share/hadoop/hdfs/,/usr/local/hadoop/share/hadoop/hdfs/lib/,/usr/local/hadoop/share/hadoop/common/lib/,/usr/local/hadoop/share/hadoop/common/,/usr/local/hadoop/share/hadoop/yarn/lib/,/usr/local/hadoop/share/hadoop/yarn/
-
mapreduce.application.classpath
-
/usr/local/hadoop/share/hadoop/mapreduce/,/usr/local/hadoop/share/hadoop/mapreduce/lib/,/usr/local/hadoop/share/hadoop/hdfs/,/usr/local/hadoop/share/hadoop/hdfs/lib/,/usr/local/hadoop/share/hadoop/common/lib/,/usr/local/hadoop/share/hadoop/common/,/usr/local/hadoop/share/hadoop/yarn/lib/,/usr/local/hadoop/share/hadoop/yarn/
-
Remove old version of Hive jars from Spark jars folder. This step should be changed as per your version of Hive jars in Spark folder. You can determine version by looking at content of $SPARK_HOME/jars folder with below command
-
ls $SPARK_HOME/jars/hive.jar
-
In my case those jars were having version 1.2.1. So remove them with below command.
-
rm $SPARK_HOME/jars/hive1.2.1*
-
Run below command to copy new version of Hive jars to Spark jars folder. These jars are necessary in order to run Hive with new Spark engine that we have.
-
cp $HIVE_HOME/lib/hive.jar $SPARK_HOME/jars/
-
Run below commands to copy spark jars on HDFS spark-jars folder
-
hadoop fs -mkdir /spark-jars
-
hadoop fs -put $SPARK_HOME/jars/*.jar /spark-jars/