This project is an exploratory data analysis (EDA) focusing on the California Housing dataset. The main goal is to understand factors that influence housing prices in California by examining data trends, distributions, and relationships between variables. Insights gained from this analysis are aimed at identifying key predictors for potential future modeling.
-
Data Cleaning
- Addressed missing values and handled outliers to prepare a clean dataset for analysis.
-
Exploratory Analysis
- Analyzed distributions of features, correlations between variables, and observed trends within the dataset.
-
Visualizations
- Created visual plots to illustrate relationships among variables, enhancing understanding of influential factors.
-
Income and Prices
- A strong positive correlation was identified between median income and housing prices, suggesting that income is a significant factor in price variations.
-
Regional Factors
- Certain regional attributes were found to have substantial effects on housing prices across different areas in California.
- Python: Version 3.8 or above
- Libraries:
-
Pandas -
NumPy -
Matplotlib -
Seaborn
-
I welcome feedback, suggestions, and reviews for this project.
If you find any issues or have suggestions for improvement, feel free to open an issue!
- 🐛 Open an Issue: Click here to report a problem