- Overview
- Features
- Prerequisites
- Installation
- Configuration
- API Credentials
- Usage
- Project Structure
- Extending the Project
- Testing
- Monitoring and Logging
- Security Notes
- Performance Optimization
- Contribution Guidelines
- Troubleshooting
- FAQ
- Internal Resources
- Support
- License
This project implements a sophisticated sentiment analysis system that combines market data with social media sentiment to predict cryptocurrency price movements. It's designed to give our exchange a competitive edge by incorporating real-time public sentiment into our trading strategies. By leveraging advanced natural language processing techniques and deep learning models, we aim to capture market sentiment and its impact on cryptocurrency prices more accurately than traditional technical analysis alone.
- Data Collection:
- Integration with Twitter API for real-time tweet collection
- Reddit data scraping for cryptocurrency-related posts
- Historical and real-time market data retrieval from major exchanges
- Sentiment Analysis:
- Advanced NLP model fine-tuned for crypto-specific sentiment analysis
- Multi-lingual support for global market sentiment capture
- Market Prediction:
- Multi-input neural network combining market data and sentiment scores
- Real-time prediction system for multiple cryptocurrencies
- Customizable prediction timeframes
- Visualization:
- Interactive dashboard for visualizing predictions and market trends
- Detailed charts and metrics for in-depth analysis
- Extensibility:
- Modular design for easy integration of new data sources and models
- Configurable parameters for fine-tuning the system
- Python 3.8+
- TensorFlow 2.4+
- PyTorch 1.9+
- CUDA-capable GPU (recommended for faster training and inference)
- Access to Twitter and Reddit APIs
- Account with a supported cryptocurrency exchange (e.g., Binance)
-
Clone the repository:
git clone https://github.com/selimozten/mmsag.git
-
Navigate to the project directory:
cd mmsag
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Install additional dependencies for GPU support (if available):
pip install tensorflow-gpu torch-gpu
-
Copy the example configuration file:
cp example.config.yml config.yml
-
Edit
config.yml
to customize the project settings:symbols
: List of cryptocurrency pairs to analyzetimeframe
: Data timeframe for market analysissentiment_window
: Time window for sentiment analysismodel_params
: Neural network architecture parameterstraining
: Parameters for model trainingprediction
: Settings for real-time predictionslogging
: Configuration for application logsdashboard
: Settings for the Streamlit dashboard
Refer to the comments in example.config.yml
for detailed explanations of each setting.
-
Copy the example credentials file:
cp example.credentials.yml credentials.yml
-
Edit
credentials.yml
and fill in your API credentials for:- Twitter API (consumer key, consumer secret, access token, access token secret)
- Reddit API (client ID, client secret, user agent)
- Exchange API (API key, secret key)
- Sentiment Analysis API (if using a paid service)
Important: Never commit your credentials.yml
file to version control. It's included in the .gitignore
file to prevent accidental commits.
-
Collect and analyze sentiment data:
python collect_sentiment.py
-
Train the prediction model:
python train_model.py
-
Make real-time predictions:
python predict.py
-
Run the prediction dashboard:
streamlit run dashboards/prediction_dashboard.py
For scheduled runs, consider using cron jobs or a task scheduler appropriate for your operating system.
collect_sentiment.py
: Script for collecting and analyzing sentiment datatrain_model.py
: Script for training the multi-input neural networkpredict.py
: Script for making real-time predictionsmodels/
: Directory containing model definitionsmulti_input_nn.py
: Implementation of the multi-input neural network
data_processing/
: Modules for data fetching and preprocessingtwitter_data.py
: Twitter data collection modulereddit_data.py
: Reddit data collection modulemarket_data.py
: Cryptocurrency market data retrieval module
sentiment_analysis/
: Modules for sentiment analysismodel.py
: Sentiment analysis model implementation
utils/
: Utility functions and helpersconfig.yml
: Project configuration filecredentials.yml
: API credentials (do not commit this file)tests/
: Directory containing unit testsdashboards/
: Directory containing Streamlit dashboard fileslogs/
: Directory for log files (created at runtime)example.config.yml
: Example configuration fileexample.credentials.yml
: Example credentials file.gitignore
: Specifies intentionally untracked files to ignorerequirements.txt
: List of Python package dependenciesREADME.md
: This file, containing project documentation
- Create a new module in
data_processing/
for the new source - Implement the required API calls and data parsing
- Update
collect_sentiment.py
to incorporate the new source - Modify the model in
models/multi_input_nn.py
if necessary
- Fine-tune the sentiment model on crypto-specific data:
python fine_tune_sentiment.py
- Experiment with different NLP models in
sentiment_analysis/model.py
Run the test suite with:
python -m unittest discover tests
Ensure all tests pass before deploying any changes to production. To run tests with coverage report:
coverage run -m unittest discover tests
coverage report -m
- Logs are stored in the
logs/
directory - Use the
logging
module for consistent log formatting - Monitor model performance using TensorBoard:
tensorboard --logdir=./logs/tensorboard
- Never commit
credentials.yml
or any file containing API keys - Rotate API keys regularly and update them in our secure key management system
- Ensure all data processing follows our data protection guidelines
- Implement proper input validation and sanitization, especially for user inputs in the dashboard
- Regularly update dependencies to patch security vulnerabilities
- Use batch processing for large datasets
- Implement caching mechanisms for frequently accessed data
- Optimize database queries and indexes
- Consider distributing workloads across multiple machines for scalability
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Please ensure your code adheres to our coding standards and includes appropriate tests.
- Check the
logs/
directory for detailed error messages - Ensure all API credentials are correct and have the necessary permissions
- Verify that your system meets all the prerequisites, including GPU drivers if using GPU acceleration
- For common issues, refer to the FAQ section
Q: How often should I retrain the model? A: We recommend retraining the model weekly or when there's a significant change in market conditions.
Q: Can I use this system for high-frequency trading? A: The current implementation is not optimized for high-frequency trading. It's designed for short to medium-term predictions.
- For a detailed explanation of the sentiment analysis algorithm, see
docs/sentiment_algorithm.md
- For guidelines on model deployment, refer to our MLOps playbook in the company wiki
- API documentation can be found in
docs/api_reference.md
For issues or feature requests, please create an issue in this repository or contact the Data Science team directly at ozten@inpocket.ai
This project is licensed under the MIT License - see the LICENSE.md file for details.