Skip to content

HBase Spark Demo #19

Open
Open
@Jimvin

Description

@Jimvin

Loading data into HBase is not trivial. We want the demo to show how this can be done and to provide some guidance and best practice.

Aims

Tasks

  • Load data into HDFS from S3
  • Parse CSV and create HFiles
  • Load incremental HFiles into HBase
  • Load a streaming data source into HBase
  • Stackable cluster configuration
  • Verify the data is there (sanity check) using HBase shell
  • Create Phoenix view over table
  • Configure Phoenix as a data source in SuperSet
  • Create a visualisation using Phoenix JDBC and SuperSet
  • Query HBase using Spark HBase connector

## Learning Points and Challenges

  • Where does DistCP and HBase bulk load run, given there is no YARN cluster?
  • Are these jobs scalable?
  • Can we go near real time dashboards in Grafana and see instant updates
  • Stress testing
  • Test HBase region management - can we watch this in real time as part of a demo?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions