This project aims to effectively mitigate the volatility of cryptocurrency investment by evaluating the media hype from Google Trends in our prediction models.
-
Clone repository
git clone https://github.com/crystalcheong/crypto-genie.git
-
Install dependencies with pip
$ pip install -r requirements.txt
📂 Project Structure
📦crypto-genie
┣ 📂data
┃ ┣ 📂searchTrends
┃ ┣ 📜BTC-SearchTrend.csv
┃ ┗ 📜README.md
┣ 📂metrics
┃ ┣ 📜README.md
┣ 📂models
┣ 📜0_DataScraper.ipynb
┣ 📜1_DataAnalysis.ipynb
┣ 📜2_UnivariateForecast.ipynb
┣ 📜3_MultivariateForecast.ipynb
┣ 📜README.md
┗ 📜requirements.txt
/data
- stores all the collected data to be utilized
/metrics
- contains the exported measurement of accuracy & efficacy
/models
- contains the exported pre-trained models
- Financial Data - Yahoo Finance / yfinance
- Search Trends Data - Google Trends / pytrends
📚 Notebooks Overview
Each notebook is prefixed with the chronological order of the analysis pipeline and can be executed as a standalone.
-
- Retrieves stock information on Bitcoin (BTC-USD) from Yahoo Finance
- Annual archive of Google Search Trends using the pytrends package
- Generate and export BTC-SearchTrend.csv from the successful merger of above-mentioned datasets
-
- Displays dataset overview and checks for any null values
- Exploratory data analysis on the BTC-SearchTrend.csv dataset
-
- Utilises one stock ticker variable (OPEN / CLOSE / HIGH / LOW) as the model data
- Initialise, train and predict Bitcoin (BTC-USD) valuation with the following machine learning model(s):
- Rolling-Forecast ARIMA
- XGBRegressor
- LSTM
- Summarises and compares the performance of all previously ran models
-
- Utilises all stock ticker and search trend variables as the model data
- Initialise, train and predict Bitcoin (BTC-USD) valuation with the following machine learning model(s):
- LSTM
- Summarises and compares the performance of all previously ran models
In conclusion, the inclusion of media hype from Google Trends resulted in a more accurate multivariate forecasting. That said, as the Google Trends search percentile is not the only definitive measurement of media hype, the accuracy and efficacy of the prediction models can be further enhanced by the integration of a multi-faceted data curation from other sources such as the users' sentiments from Reddit and Twitter
- Learning Outcomes
- Interpret stock information
- Develop & evaluate time series forecasting machine learning models such as
- Rolling-Forecast ARIMA
- XGBRegressor
- LSTM
- Utilized external Python libraries such as
plotly
,statsmodels
andtensorflow
- Collaboration on Google Colab and git repository management with Github
Crystal Cheong |
Yue Chong |
Jared Chan |
|
|
|
💡 References
This repository is submitted as a project work for Nanyang Technological University's SC1015- Data Science and Artificial Intelligence course.