Skip to content

Codeup repository for 'Regression' project using 'zillow' dataset

Notifications You must be signed in to change notification settings

Jared-Wood135/zillow_project

Repository files navigation

Zillow Project - README

Table of Contents:




Project Description:

Back to 'Table of Contents'

Using the 'zillow' dataset from a SQL database acquire properties that have a transaction date of 2017 and are single family/single family inferred homes in order to best predict the home's value.



Project Goals:

Back to 'Table of Contents'

  • Implement Data Science Pipeline
  • Acquire 'zillow' Dataset
  • Prepare 'zillow' Dataset
  • Explore 'zillow' Dataset
  • Model 'zillow' Dataset
  • Deliver 'zillow' Dataset



  • Hypothesis/Questions:

    Back to 'Table of Contents'

    Hypothesis:

    Given the 'zillow' dataset at hand, I believe that the location of the home, the contents of the home, as well as it's proximity to key features in the area will be a strong determining factor to accurately predict the home's value.

    Questions:

  • Do values correlate to the location? (State, county, city, neighborhood)
  • Do values correlate with the amount of bedrooms and bathrooms
  • Do values correlate with the size of the home and it's overall property
  • Does the ratio of the home to overall sqft matter
  • Does proximity to key features like city center, leisure, stores, etc. matter
  • Does the local crime rate matter
  • Does the population density matter
  • Does proximity to job opportunity density matter
  • Does proximity to major roads matter



  • Data Dictionary:

    Back to 'Table of Contents'

    Field Name Data Type Data Format Description Example
    object str Defines gender of customer 'Male'
    bedrooms float #.# Defines the number of bedrooms in the home 3.0
    home_sqft float #.# Defines the total square footage of the home 2444.0
    full_bathrooms int # Defines the number of full bathrooms in the home 3
    lotsize_sqft float #.# Defines the total square footage of the lot the home resides on 10200.0
    home_age int # Defines how old the home is from when it was built to 2017 76
    value float #.# TARGET VALUE - Defines the home's value 689354.0
    home_lot_ratio float #.# Defines the ratio of the home size to the lot size 0.24
    DUMMY COLS uint 0, 1 Binary (True, False) column for specific column name 0




    Planning:

    Back to 'Table of Contents'

    Planning

  • This good ol’ thingy

  • Acquire

  • env.file
  • SQL query
  • acquire.py
  • acquire.ipynb

  • Prepare

  • Remove unwanted cols
  • Aggregate cols
  • Confirm veracity of cols
  • prepare.py
  • prepare.ipynb

  • Explore

  • Determine best cols
  • Visualize regression lines to target variable
  • explore.py
  • explore.ipynb

  • Modeling

  • Linear Regression
  • LassoLars
  • TweedieRegressor
  • Polynomial Regression
  • Top Model
  • modeling.py
  • modeling.ipynb

  • Delivery

  • Final notebook (.ipynb)
  • final.py
  • Readme.md



  • Instructions To Replicate:

    Back to 'Table of Contents'

    1. Clone this repo
    2. Create 'env.py' file that connects to SQL
    3. Run desired .ipynb


    Takeaways:

    Back to 'Table of Contents'

    Key Findings:

    • Adds better predictive value to regression model
      • Number of bedrooms
      • Total sq. ft. of home
      • Number of full bathrooms
      • Total sq. ft. of lot the home is on
      • The age of the home
      • The ratio of the home and lot sq. ft.

    Recommendations:

    • Increases value
      • Number of bedrooms
      • Total sq. ft. of home
      • Number of full bathrooms
      • The ratio of the home and lot sq. ft.
    • Decreases value
      • The age of the home

    Takeaways:

    • Focus for higher value
      • More bedrooms
      • More full bathrooms
      • Larger home
      • Larger home to lot ratio

    Next Steps:

    • Location specific
      • Difference in states
      • Difference in county
      • Difference in city
      • Difference in neighborhood
    • Proximity specific
      • Density of schools
      • Density of entertainment/recreation
      • Density of landmarks/parks
      • Density of retail
      • Density of job opportunity
      • Accessibility
    • Community specific
      • Density of population
      • Type of religion
      • Type of residents (Young, middle, old)
      • Type of family structures
      • Ethnic distribution
      • Gender distribution
    • Hazard specific
      • Natural disaster risk (Tornado, flood, hurricane, etc.)
      • Crime rate density
      • Type of crime

    About

    Codeup repository for 'Regression' project using 'zillow' dataset

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published