Below is a README for HodlIntel, a crypto price prediction project aimed at providing 3-month outlooks on new and existing coins based on a rich set of fundamental and community-driven metrics.
HodlIntel is a predictive analytics platform designed to forecast cryptocurrency prices over a 3-month horizon. By unifying multiple data sources—tokenomics, social sentiment, exchange distribution, developer activity, market conditions, liquidity, and whale activity—this project aims to deliver actionable insights for both new and seasoned crypto enthusiasts.
- Project Overview
- Database Schema
- Data Flow & Architecture
- Modeling Approach
- Setup & Installation
- Usage
- Roadmap
- License
HodlIntel provides a medium-term (3 months) forecast of crypto prices. While short-term predictions can be overly reactive to daily market noise, and long-term predictions can be fraught with broad uncertainties, a 3-month window strikes a balance, capturing fundamental growth patterns without being overwhelmed by extreme market cycles.
- Unified Data Model: Consolidates crypto-related metrics across multiple dimensions (e.g., tokenomics, community activity, whale activity).
- Interactive "What If" Dashboard: Users can adjust factors (like community size or developer activity) to see how the predicted price might change.
- Multi-Coin Support: Easily extendable to handle multiple coins, each stored under a unique
coin_id
.
HodlIntel uses a relational database with the following tables:
- Coins: Central reference for each coin, including:
id
(PK),full_name
,handler
,unique_id
- Tokenomics: Market cap, circulating supply, etc.
coin_id
(FK tocoins.id
)
- ExchangeDistribution: Number of exchanges published, coverage on big exchanges, etc.
coin_id
(FK tocoins.id
)
- SocialSentiment: Social media hype, influencer support, etc.
coin_id
(FK tocoins.id
)
- CommunityActivity: Reddit size, Discord members, Twitter size, etc.
coin_id
(FK tocoins.id
)
- Development: Developer activity, partnerships, roadmap clarity, etc.
coin_id
(FK tocoins.id
)
- MarketConditions: Macro-economic factors, regulatory climate, etc.
coin_id
(FK tocoins.id
)
- Liquidity: Daily trading volume, liquidity on exchanges.
coin_id
(FK tocoins.id
)
- Whalecomics: Whale statistics, ratio of whale holdings to total market.
coin_id
(FK tocoins.id
)
- Price: Historical daily price data per coin.
coin_id
(FK tocoins.id
)
ER Diagram (Conceptual)
Coins ---< Tokenomics
\--< ExchangeDistribution
\--< SocialSentiment
\--< CommunityActivity
\--< Development
\--< MarketConditions
\--< Liquidity
\--< Whalecomics
\--< Price
For convenience, each table has a date
column to align daily metrics.
-
Data Ingestion
- Each metric table (e.g.
Tokenomics
,SocialSentiment
) receives daily or periodic updates for each coin via an ETL pipeline or direct data entry. - The
Price
table is updated with daily closing prices, or intraday intervals if needed.
- Each metric table (e.g.
-
Data Consolidation
- A unifying script or model queries all tables by
(coin_id, date)
to merge features into a single dataset. This dataset becomes the basis for training or inference.
- A unifying script or model queries all tables by
-
Model Training
- Data is split by time (e.g., train on historical data prior to a cutoff date, then validate on the subsequent 3-month period).
- A regression or advanced ML model is trained to forecast price 3 months out.
-
Prediction & Dashboard
- Once the model is trained, new or hypothetical data points (like "If the Reddit community size doubles...") can be fed in to get a predicted price in 3 months.
- A dashboard or web app (e.g., Streamlit or Dash) provides a user-friendly interface for these “what-if” scenarios.
- Linear Regression or Tree-Based Models: For interpretability and robust handling of mixed data, you might use:
- Linear Regression: Yields direct coefficients for each feature (easy to interpret and do “what-if” scenarios).
- Random Forest / Gradient Boosting: Can capture nonlinear interactions. Tools like SHAP can help interpret feature contributions.
- Time Series: If strictly forecasting future prices, methods like ARIMA or Prophet could incorporate time-series aspects, but your data is more akin to tabular regressors plus historical price, so a regression approach often works well.
Target Variable: The coin price 3 months from t
.
Feature Set: Values from all metric tables at time t
(or an average over a lookback window).
- Python 3.8+
- Virtual Environment (recommended)
- Database: SQLite, PostgreSQL, or any SQLAlchemy-compatible database.
- Clone the Repository:
git clone https://github.com/yourusername/hodlintel.git cd hodlintel
- Create a Virtual Environment (optional but recommended):
python3 -m venv venv source venv/bin/activate # macOS / Linux # or venv\Scripts\activate on Windows
- Install Dependencies:
pip install -r requirements.txt
- Database Setup:
- Update the
DATABASE_URL
in your code to match your local or remote DB settings. - Create tables:
from your_app.models import Base, engine Base.metadata.create_all(engine)
- Update the
- Insert or Update rows in the relevant tables (
tokenomics
,price
, etc.) for each coin. - Ensure all tables include the correct
coin_id
linking back to thecoins
table.
- Run a script (e.g.,
train_model.py
) that:- Joins all tables on
(coin_id, date)
. - Splits data into training and validation sets based on time.
- Trains a model (Linear Regression, XGBoost, etc.).
- Saves the trained model to disk (pickle, joblib, etc.).
- Joins all tables on
- Run
streamlit run app.py
or the equivalent for your chosen framework. - Interactively set features like Reddit size, daily trading volume, etc., and get a 3-month predicted price.
- Optionally compare predictions across different “what-if” scenarios.
- API Integration: Automate data ingestion from popular crypto APIs (e.g., CoinGecko, CMC).
- Real-Time Updates: Add near real-time ingestion and model updates for fast-evolving new coins.
- Advanced Analytics: Integrate sentiment from Twitter, Reddit posts, or Discord chat logs.
- Multi-Coin Dashboard: Extend the “what-if” dashboard to easily compare predictions across multiple coins side-by-side.
- Model Optimization: Experiment with neural networks or advanced time-series models.
Choose a license that suits your project’s requirements, e.g., MIT or Apache 2.0. Include a LICENSE file in your repository.
If you have questions, suggestions, or want to contribute to HodlIntel, please open an issue or submit a pull request on GitHub.
Happy HODLing!