This Exploratory Data Analysis (EDA) was a three day task in my Data Science Bootcamp at neue fische.
Project time
2023-08-09 - 2023-08-11
My client Amy Williams is a house seller. She "works" as an Italian mafiosi and sells several central houses (top10%) over time. Now she is looking for average outskirt houses over time to hide from the FBI.
- Find central top 10% houses, show prices over time
- Find outskirt average houses, show prices over time
- Find recommendations for my client
- clean EDA-notebook
- presentation for "stakeholders" (bootcamp participants and coaches)
The King County Housing dataset contains information about 22000 home sales in King County (USA) from 2014_05 until 2015_05. The description of the column names can be found in the column_names.md and feature_description.md file.
- Top 10% houses in the center
- sell in autumn
- don’t sell in spring
- median price ($): 2,500,000
- Average houses in the outskirts
- buy in December or January
- sell in April
- median price ($): 419,980
- Average houses in the north
- cheaper than average outskirts
- median price ($): 405,000
- EDA deep dive:
- select fitting houses
- refine customer needs
- clean notebook and repo
- optional:
- write python scripts
- unit testing
- Jupyter Notebooks for EDA
- google slides for presentation
- pyenv
- python==3.11.3
This repo contains a requirements.txt file with a list of all the packages and dependencies you will need. In order to install the environment you can use the following commands:
pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt