A data ingestion pipeline for Polkadot-based blockchains that combines Substrate API Sidecar with a custom block ingest service.
dotlake-community enables comprehensive data extraction and processing from Polkadot-based networks through three key components:
- Substrate API Sidecar: REST service for blockchain data access
- Custom Block Ingest Service: Data processing and storage pipeline
- Apache Superset: Data visualization and analytics
- Docker and Docker Compose
- Access to a Substrate-based blockchain node (WSS endpoint)
- Sufficient storage space for blockchain data
- Clone the repository:
git clone https://github.com/your-org/dotlake-community.git
cd dotlake-community
- Configure your settings in
config.yaml
:
The config.yaml
file contains the following configuration options:
relay_chain
: Name of the relay chain (e.g., "Polkadot", "Kusama", "solo"). Defaults to "solo" if not specifiedchain
: Name of the chain to index (e.g., "Polkadot", "Kusama", "substrate_chain"). Defaults to "substrate_chain" if not specifiedwss
: WebSocket endpoint URL for the chain nodeingest_mode
: Mode of operation ("live" or "historical")start_block
: Starting block number for ingestion (only applies for historical mode)end_block
: Ending block number for ingestion (only applies for historical mode)
create_db
: Set totrue
to create a new local PostgreSQL database,false
to use existing databaseretain_db
: Set totrue
to keep the database after cleanup,false
to remove it (only applies whencreate_db
istrue
)databases
: Database connection details (required if create_db is false)databases: - type: postgres # Database type (postgres/mysql) host: 0.0.0.0 # Database host address port: 5432 # Database port name: dotlake # Database name user: username # Database username password: password # Database password
- Start the ingestion pipeline:
bash dotlakeIngest.sh
- To stop the ingestion and cleanup resources:
bash cleanup.sh
- Connects to blockchain node via WebSocket
- Exposes REST API on port 8080
- Provides standardized access to blockchain data
Processes blockchain data through multiple stages:
- Data extraction from Sidecar API
- Transformation and enrichment
- Storage in PostgreSQL
- Custom visualization capabilities
- Direct connection to stored data
To contribute or modify:
- Fork the repository
- Create a feature branch
- Submit a pull request