Repository containing the Capstone project I performed at General Assembly, London, for the Data Science Immersive course (June-Sept 2017)
This capstone project consists of 5 notebooks:
-
Part 1: First analysis of the data and modelling (Logistic Regression, Random Forest) on a subset of 500 tornadoes.
-
Part 2 / Web Scraping: Importing a bigger dataset from the web and saving as a local PostgreSQL
-
Part 3: work on the dataset of tornadoes which magnitude was measured with EF scale (2007-2017). Final EDA, more modelling (Recurrent Neural Network, Stochastic Gradient Descent), NLP and time series analysis.
-
Part 4 / Pre-Processing class: Gathering of all the pre-processing steps of part 3 in a single class.
-
Part 5: Upsampling, further NLP testing and a first glance at predicting damage costs.