| title | emoji | colorFrom | colorTo | sdk | pinned | license | short_description |
|---|---|---|---|---|---|---|---|
Cluster Protocol |
🔥 |
indigo |
red |
docker |
false |
mit |
Behavioral clustering engine for Web3 wallets |
Unsupervised machine learning project to segment cryptocurrency wallets into behavioral personas (e.g., "Whales", "NFT Flippers", "Dormant") based on on-chain transaction data.
In the Web3 ecosystem, users are anonymous by default. A wallet address (0x123...) gives no indication of whether the user is a high-value institution, a retail trader, a bot, or an NFT collector.
- Marketing is blind: Projects cannot target specific users effectively.
- Risk is opaque: Protocols cannot easily distinguish between organic users and sybil attackers.
- Data is noisy: Raw transaction logs are massive and unreadable without advanced processing.
Cluster Protocol is an AI-powered engine that "fingerprints" wallets based on their behavior, not their identity.
- Ingest: Pulls raw on-chain data (Gas spent, NFT volume, DEX trades, etc.) via Dune Analytics.
- Process: Normalizes skewed financial data using Yeo-Johnson Power Transformations.
- Cluster: Uses K-Means Clustering to mathematically group similar wallets.
- Label: Assigns a human-readable persona (e.g., "Active Retail", "High-Frequency Bot") with a confidence score.
- Robust Preprocessing: Handles extreme data skewness (common in financial data) using Yeo-Johnson Power Transformation.
- Smart Filtering: Heuristic detection to separate Smart Contracts from EOAs (Externally Owned Accounts).
- Model Selection: Benchmarked K-Means, DBSCAN, and GMM. K-Means (K=4) was selected as the production model.
- Inference with Confidence: Predicts personas for new wallets and provides probability scores (e.g., "85% Whale, 15% Trader").
- Automated Retraining: GitHub Actions workflow automatically fetches new data and retrains the model weekly to handle data drift.
- End-to-End API: Fetch data from Dune and classify a wallet in a single API call.
Cluster Protocol currently supports Ethereum Mainnet (L1) only.
- Supported: Ethereum (
0x...). - Not Supported: L2s (Arbitrum, Optimism, Base), Sidechains (Polygon), or Non-EVM chains (Solana, Bitcoin).
- Note: The engine analyzes the last 2 Years of history for DeFi/NFTs to ensure relevance and speed.
- Python 3.10+
- Pandas & NumPy (Data manipulation)
- Scikit-Learn (Clustering & Preprocessing)
- Matplotlib & Seaborn (Visualization)
- FastAPI (Inference API)
- Dune API (Data ingestion)
- GitHub Actions (CI/CD & Automation)
cluster/
├── data/ # Dataset storage
├── docs/ # Visualizations & Images
├── notebooks/ # Jupyter notebooks for EDA and modeling
├── src/ # Core logic (Inference Engine)
├── .github/workflows/ # Automated retraining workflows
├── app.py # FastAPI Endpoint
├── predict.py # CLI Inference Tool
├── train.py # Production training pipeline
├── request.py # Script to fetch data from Dune
├── README.md # Project documentation
└── PROJECT_LOG.md # Engineering log & decision records
The model identified 4 distinct behavioral clusters:
- Ultra-Whales / Institutional & Exchange Wallets (Cluster 3)
- Characteristics: Massive volume, extremely high transaction counts.
- Active Retail Users / Everyday Traders (Cluster 2)
- Characteristics: Consistent daily activity, moderate volume.
- High-Frequency Bots / Automated Traders (Cluster 1)
- Characteristics: High transaction count but low human-like variety.
- High-Value NFT & Crypto Traders (Degen Whales) (Cluster 0)
- Characteristics: High risk, high NFT volume, specialized activity.
- Python 3.10+
uv(recommended)- Dune Analytics API Key (for fetching new data)
git clone <repo-url>
cd cluster
uv syncCreate a .env file with your API key:
DUNE_API_KEY=your_key_here
Run the production pipeline to train K-Means and save artifacts (kmeans_model.pkl, wallet_power_transformer.pkl).
uv run train.pyClassify a specific wallet (or row from the dataset) and see confidence scores.
uv run predict.py --row 0
# Output:
# Cluster: 3
# Persona: Ultra-Whales / Institutional
# Confidence: Ultra-Whales: 0.52, Retail: 0.26...Start the FastAPI server for real-time inference.
uv run uvicorn app:app --reloadAnalyze a specific wallet (Fetch + Predict):
curl "http://localhost:8000/analyze/0x123...abc"Generate fresh t-SNE and Radar charts.
uv run visualize_clusters.py
