HBase Spark Demo

Loading data into HBase is not trivial. We want the demo to show how this can be done and to provide some guidance and best practice.

## Aims
- Load data row by row (NiFi)
- Batch processing CSV files (MapReduce)
- Direct load of HFiles
- Test HBase Spark connector (https://github.com/stackabletech/stackablectl/issues/71)

## Tasks

- [x] Load data into HDFS from S3
- [x] Parse CSV and create HFiles
- [x] Load incremental HFiles into HBase
- [ ] Load a streaming data source into HBase
- [x] Stackable cluster configuration
- [ ] Verify the data is there (sanity check) using HBase shell
- [ ] Create Phoenix view over table
- [ ] Configure Phoenix as a data source in SuperSet
- [ ] Create a visualisation using Phoenix JDBC and SuperSet
- [ ] [Query HBase using Spark HBase connector](https://github.com/stackabletech/stackablectl/issues/71)


## Learning Points and Challenges

- Where does DistCP and HBase bulk load run, given there is no YARN cluster?
- Are these jobs scalable?
- Can we go near real time dashboards in Grafana and see instant updates
- Stress testing
- Test HBase region management -  can we watch this in real time as part of a demo?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

HBase Spark Demo #19

Aims

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

HBase Spark Demo #19

Description

Aims

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions