This project focuses on identifying anomalies in Bitcoin price using a forecasting-based approach. A powerful LSTM model learns patterns to predict the next highest price for transactions within the next hour. Any significant deviation (measured by Mean Absolute Error - MAE) from the predicted price is flagged as a potential anomaly.
The data_generator.py module serves as the engine driving the real-time simulation. It dynamically generates a database table if one doesn't already exist, and populates it with historical Bitcoin transaction data from 2021.
The core objective is to forecast the peak Bitcoin value within each hour and subsequently identify any market anomalies. To mimic a continuous data stream, the generator transmits data to the /predict_anomaly endpoint (established by the Predictor module) at a frequency specified in the config.py file.
- Real-time Anomaly Detection: Simulates real-time data streaming to detect anomalies as they occur.
- LSTM-Based Forecasting: Employs a Long Short-Term Memory (LSTM) neural network to capture complex patterns in Bitcoin price data.
- Data-Driven Insights: Leverages historical Bitcoin transaction data for model training and anomaly identification.
- Streamlit Visualization: Provides an interactive Streamlit application to display the price chart with highlighted anomalies.
- Machine Learning: PyTorch Lightning
- Web Framework: Flask
- Data Visualization: Streamlit
- Database: PostgreSQL
Prerequisites:
- PostgreSQL 16: Install PostgreSQL 16 from the official website: https://www.postgresql.org/
- Python 3.12.2: Ensure you have Python 3.12.2 installed.
Installation and Setup:
-
Clone the repository:
git clone https://github.com/Cripry/Data-Stream-Anomaly-Detection.git -
Create a virtual environment (recommended):
python -m venv anomaly_env source anomaly_env/bin/activate # On Windows, use anomaly_env\Scripts\activate -
Install dependencies:
pip install -r requirements.txt -
Configure Database:
-
Create a PostgreSQL database, user, and password.
-
Update the
config.pyfile with your **database credentials:DB_NAME = "your_database_name" DB_USER = "your_database_user" DB_PASSWORD = "your_database_password" DB_HOST = "your_database_host" DB_PORT = "your_database_port"**
-
-
Run the Streamlit app:
After 3-5 minutes, you will be able to see the anomalies.
anomaly_detection: Core project code for data handling, model, and predictiondata: Contains the Bitcoin transaction dataset (BTC_Hourly_2021.csv)Jupyter Notebooks: Jupyter notebook used for model training and developmentutils: Stores model weights (BTC_Model.pth) and scalers
The project utilizes real-time Bitcoin transaction data from 2021, available on Kaggle:
This project is licensed under the [Apache License 2.0] -
For any questions or feedback, feel free to reach out to cristianpreguza@gmail.com.

