New demo: ingest data to hbase #71
Description
We want to test the hbase spark connector and create a demo
https://github.com/apache/hbase-connectors/tree/master/spark
define in more details today (15.08.22)
- Create hdfs, hbase, zookeeper
- clone repo and build locally https://github.com/apache/hbase-connectors/tree/master/spark for spark 3.3.0
Versionen:
-- spark: 3.3.0
-- hadoop: 3.3.2
-- scala. 2.12.14
-- hbase: 2.4.12
mvn -Dspark.version=3.3.0 -Dscala.version=2.12.10 -Dhadoop-three.version=3.3.2 -Dscala.binary.version=2.12 -Dhbase.version=2.4.12 clean install
-
Client-side (Spark) configuration:
The HBase configuration file hbase-site.xml should be made available to Spark, it can be copied to $SPARK_CONF_DIR (default is $SPARK_HOME/conf`) -
Server-side (HBase region servers) configuration:
The following jars need to be in the CLASSPATH of the HBase region servers: scala-library, hbase-spark, and hbase-spark-protocol-shaded.
The server-side configuration is needed for column filter pushdown
if you cannot perform the server-side configuration, consider using .option("hbase.spark.pushdown.columnfilter", false)
The Scala library version must match the Scala version (2.11 or 2.12) used for compiling the connector. -
Upload .jars to Nexus --> https://repo.stackable.tech/service/rest/repository/browse/misc/hbase-spark-k8s/
-
Use PVC add app + dependecies
sparkConf: (5)
"spark.hadoop.fs.s3a.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
"spark.driver.extraClassPath": "/dependencies/jars/"
"spark.executor.extraClassPath": "/dependencies/jars/" -
create application to load data from hdfs to hbase
-
create application to load access hbase data from spark