Skip to content

Commit

Permalink
First iteration
Browse files Browse the repository at this point in the history
  • Loading branch information
pranaydotparity committed Sep 6, 2024
1 parent 298efad commit 5628fa0
Show file tree
Hide file tree
Showing 11 changed files with 807 additions and 1 deletion.
50 changes: 49 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,49 @@
# dotlake-sidecar
# dotlake-sidecar

This repository contains the necessary components to set up and run a data ingestion pipeline for Polkadot-based blockchains using Substrate API Sidecar and a custom block ingest service.

## Overview

The dotlake-sidecar project facilitates the extraction and processing of blockchain data from Polkadot-based networks. It uses two main components:

1. Substrate API Sidecar
2. Custom Block Ingest Service

These components work together to provide a robust solution for capturing and processing blockchain data.

## Components

### 1. Substrate API Sidecar

[Substrate API Sidecar](https://github.com/paritytech/substrate-api-sidecar) is an open-source REST service that runs alongside Substrate nodes. It provides a way to access blockchain data through a RESTful API, making it easier to query and retrieve information from the blockchain.

### 2. Custom Block Ingest Service

The custom block ingest service is a Docker container that processes and stores blockchain data. It is designed to work in conjunction with the Substrate API Sidecar to ingest blocks and related information from the specified blockchain. The service utilizes DuckDB, an embedded analytical database, as part of its data flow:

1. The service ingests blockchain data from the Substrate API Sidecar.
2. The ingested data is then processed and transformed as needed.
3. The processed data is stored in DuckDB.

## How It Works

The system operates using the following workflow:

1. The Substrate API Sidecar connects to a specified WebSocket endpoint (WSS) of a Substrate-based blockchain node.
2. The Sidecar exposes a RESTful API on port 8080, allowing easy access to blockchain data.
3. The custom block ingest service connects to the same WSS endpoint and begins processing blocks.
4. The ingest service stores the processed data, making it available for further analysis or querying.

## Usage

To run the dotlake-sidecar, use the provided `dotlakeIngest.sh` script. This script sets up both the Substrate API Sidecar and the custom block ingest service as Docker containers.

Example usage:

If you wish to start the ingest service for polkadot, you can use the following command:

```
sh dotlakeIngest.sh --chain polkadot --relay-chain polkadot --wss wss://rpc.ibp.network/polkadot
```


55 changes: 55 additions & 0 deletions dotlakeIngest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/bin/bash

# Parse command line arguments
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
--chain)
CHAIN="$2"
shift
shift
;;
--relay-chain)
RELAY_CHAIN="$2"
shift
shift
;;
--wss)
WSS="$2"
shift
shift
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done

# Check if required arguments are provided
if [ -z "$CHAIN" ] || [ -z "$RELAY_CHAIN" ] || [ -z "$WSS" ]; then
echo "Usage: $0 --chain <chain> --relay-chain <relay_chain> --wss <wss_endpoint>"
exit 1
fi

# Start Substrate API Sidecar
echo "Starting Substrate API Sidecar..."
docker run -d --rm --read-only -e SAS_SUBSTRATE_URL="$WSS" -p 8080:8080 parity/substrate-api-sidecar:latest
if [ $? -eq 0 ]; then
echo "Substrate API Sidecar started successfully."
else
echo "Failed to start Substrate API Sidecar."
exit 1
fi

# Start Block Ingest Service
echo "Starting Block Ingest Service..."
docker run -d --rm -e CHAIN="$CHAIN" -e RELAY_CHAIN="$RELAY_CHAIN" -e WSS="$WSS" -p 8501:8501 eu.gcr.io/parity-data-infra-evaluation/block-ingest:0.1
if [ $? -eq 0 ]; then
echo "Block Ingest Service started successfully."
else
echo "Failed to start Block Ingest Service."
exit 1
fi

echo "Both services are now running. You can access Substrate API Sidecar at http://localhost:8080 and Block Ingest Service at http://localhost:8501"
18 changes: 18 additions & 0 deletions ingest/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install any needed packages specified in requirements.txt
COPY requirements.txt /app/
RUN pip install -r requirements.txt

# Make sure start-ingest.sh is executable
RUN chmod +x start-ingest.sh

# Run start-ingest.sh when the container launches
CMD ["./start-ingest.sh"]
96 changes: 96 additions & 0 deletions ingest/Home.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import streamlit as st
import argparse
import duckdb
import pandas as pd
import time
from datetime import datetime
from streamlit_autorefresh import st_autorefresh

# Function to connect to DuckDB
def connect_to_db(db_path='/Users/pranaypatil/data_eng_repos/data-applications/blocks.db'):
return duckdb.connect(db_path, read_only=True)

def parse_arguments():
parser = argparse.ArgumentParser(description="Block ingestion script for Substrate-based chains")
parser.add_argument("--chain", required=True)
parser.add_argument("--relay_chain", required=True)
parser.add_argument("--db_path", required=True)
return parser.parse_args()


args = parse_arguments()

st.set_page_config(
page_title="Dotlake Explorer",
page_icon="🚀",
)

# Run the autorefresh about every 2000 milliseconds (2 seconds) and stop
# after it's been refreshed 100 times.
# count = st_autorefresh(interval=30000, limit=100, key="fizzbuzzcounter")

placeholder = st.empty()

chain = args.chain
relay_chain = args.relay_chain

with placeholder.container():

try:
# Connect to DuckDB
conn = connect_to_db(args.db_path)

query = f"""
SELECT *
FROM blocks_{relay_chain}_{chain}
ORDER BY number DESC
LIMIT 50
"""
recent_blocks = conn.execute(query).fetchdf()
recent_blocks['timestamp'] = recent_blocks['timestamp'].apply(lambda x: datetime.fromtimestamp(x).strftime("%Y-%m-%d %H:%M:%S") )

# Streamlit app
st.title("Dotlake Explorer")
st.sidebar.header("Home")

# Display the chain and relay chain names
st.markdown(f"**Chain**: *{chain}* **Relay Chain**: *{relay_chain}*")

# Display the most recent block
st.subheader("Most Recent Block")

bn, ts = st.columns(2)
latest_block = recent_blocks['number'].iloc[0]
bn.metric("Block Number", latest_block)

# Get and display the timestamp of the last block
last_timestamp = recent_blocks['timestamp'].iloc[0]
ts.metric("Block Timestamp", f"{last_timestamp} (+UTC)")

# Create two columns for displaying extrinsics and events metrics
extrinsics_col, events_col = st.columns(2)

# Display the number of extrinsics in the most recent block
num_extrinsics = len(recent_blocks['extrinsics'].iloc[0])
extrinsics_col.metric("Extrinsics", num_extrinsics)

# Calculate the total number of events in the most recent block
num_events = (
len(recent_blocks['onFinalize'].iloc[0]['events']) +
len(recent_blocks['onInitialize'].iloc[0]['events']) +
sum(len(ex['events']) for ex in recent_blocks['extrinsics'].iloc[0])
)

# Display the total number of events
events_col.metric("Events", num_events)

# Display a table of the most recent blocks
st.header("Recent Blocks")

st.dataframe(recent_blocks[['number', 'timestamp', 'hash', 'finalized']])

# Close the connection
conn.close()
except Exception as e:
st.error("Oops...something went wrong, please refresh the page")

Loading

0 comments on commit 5628fa0

Please sign in to comment.