Please note the structure of this repo as outlined in this document, as well, as key aspects of this project.
| -- Src
| -- LibraryInstaller.R
| -- DefineFunctions.R
| -- data
| -- raw
| -- processed
| -- References
| -- README.md
##Project outline
Group Member: Mario Saraiva, Lizhizi Cui
Start date: March 01, 2018
End date: May 10, 2018
This project is based on the Kaggle competition on "House Prices: Advanced Regression Techniques".
The data is available at: https://www.kaggle.com/c/house-prices-advanced-regression-techniques.
Outcomes:
- Executive report with findings, including but not limited to:
-
Different predictive models
-
The pros and cons of each model
-
Reflections and Recommendations
###Phase 0: Project Setup
- Create Repo, folders, and outline tasks.
###Phase 1: Exploratory data analysis.
- Understand how the data is distributed
- Histograms
- Scatter plots
- Produce descriptive statistics / summaries
- Extract important input variables for the analysis
- Identify outliers
- Identity patterns (if any).
- Make a ranked list of important input variables for the analysis
- Have a sense of robustness of conclusions(sample biased)
- Conclusion as to whether individual factors are statistically significant
- Uncertainties for important estimates
- Define the problem/purpose of the project(assumptions)
###Phase 2: Design different models to be tested.
###Phase 3: Test models.
###Phase 4: Compare models
###Phase 5: Compile findings into a report.