Skip to content

HBase Spark Demo #19

Open
Open

Description

Loading data into HBase is not trivial. We want the demo to show how this can be done and to provide some guidance and best practice.

Aims

Tasks

  • Load data into HDFS from S3
  • Parse CSV and create HFiles
  • Load incremental HFiles into HBase
  • Load a streaming data source into HBase
  • Stackable cluster configuration
  • Verify the data is there (sanity check) using HBase shell
  • Create Phoenix view over table
  • Configure Phoenix as a data source in SuperSet
  • Create a visualisation using Phoenix JDBC and SuperSet
  • Query HBase using Spark HBase connector

## Learning Points and Challenges

  • Where does DistCP and HBase bulk load run, given there is no YARN cluster?
  • Are these jobs scalable?
  • Can we go near real time dashboards in Grafana and see instant updates
  • Stress testing
  • Test HBase region management - can we watch this in real time as part of a demo?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions