This the final assignment of the Data Acquisition and Processing Systems (DaPS) course ELEC0136 at UCL.
The objective of this assignment is to simulate a real-life data-science situation that can be approached using the process described in class: i) finding a source of data, ii) acquiring and storing it iii) cleaning and preprocessing it, iv) extracting meaningful visualisations, v) building a model for inference. You are also free to use any additional methods you find are well suited for the problem.
Environments and requirements are provided in this repo. Cleaned data is also provided. You can run the main function of the jupyter notebook directly.
Stock prices of Microsoft from April 2017 to April 2021 are acquired from the Internet. Two external data are collected to help predict the stock price: annual incomes of Microsoft and American GDP.
All data is stored as csv files.
Data cleaning: Processing missing data and outliers
Data visualization: General trends of all three data can be shown in the diagrams.
Data transformation: Including data normalization.
EDA and Hypothesis testing. Diagrams are provided.
Two models are built to predict stocks of May 2021 using data from April 2017 to April 2021:
Model using just previous stock prices
Model using stocks and two external data sources
Trained models have already been uploaded to github.
True values of stock prices of May are stored in 'MSFT_21May.csv'. Visualizations and evaluations of results are provided.