-
Notifications
You must be signed in to change notification settings - Fork 92
Building the Druid Index for the TPCH Dataset using the Local Indexing Service.
This assumes that you have setup Druid and have created the Denormalized dataset
In the following we describe the procedure to use the Druid Indexing Service in local mode. Use this procedure when indexing a small dataset in a dev. environment. For a production environment use the HadoopDruidIndexer
Ensure that the Druid overlord service is running. Then issue a POST like the following:
curl -X 'POST' -H 'Content-Type:application/json' \
-d @/Users/hbutani/sparkline/tpch-spark-druid/druid/tpch_index_task.json \
localhost:8090/druid/indexer/v1/task
The overlord listens on port 8090 and indexing commands can be posted to it. The Index Json in this case points the TPCH datascale1 denormalized dataset.
The Status of the Indexing can be viewed at its console. Note that the local indexing service takes several hours to index even the datascale 1 TPCH dataset. For development purposes consider indexing only a small sample/subset of the datascale1 dataset.
- Overview
- Quick Start
-
User Guide
- [Defining a DataSource on a Flattened Dataset](https://github.com/SparklineData/spark-druid-olap/wiki/Defining-a Druid-DataSource-on-a-Flattened-Dataset)
- Defining a Star Schema
- Sample Queries
- Approximate Count and Spatial Queries
- Druid Datasource Options
- Sparkline SQLContext Options
- Using Tableau with Sparkline
- How to debug a Query Plan?
- Running the ThriftServer with Sparklinedata components
- [Setting up multiple Sparkline ThriftServers - Load Balancing & HA] (https://github.com/SparklineData/spark-druid-olap/wiki/Setting-up-multiple-Sparkline-ThriftServers-(Load-Balancing-&-HA))
- Runtime Views
- Sparkline SQL extensions
- Sparkline Pluggable Modules
- Dev. Guide
- Reference Architectures
- Releases
- Cluster Spinup Tool
- TPCH Benchmark