Skip to content

A from-scratch Linear Regression model optimized via Gradient Descent for house price prediction.

License

Notifications You must be signed in to change notification settings

Zer0-Bug/House_Price_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

House Price Prediction

Python NumPy Matplotlib License

A high-performance implementation of Linear Regression using Gradient Descent. This project focuses on predicting house prices by optimizing model weights through iterative gradient updates, featuring data standardization and a comparative analysis against the pseudo-inverse (Normal Equation) solution.

° ° ° °



Technical Architecture

The system implements a classic machine learning workflow optimized for regression tasks. It emphasizes numerical stability and performance benchmarking:

  1. Feature Standardization: Implements Z-score normalization (via scaling.py) to center feature distributions, which is critical for the convergence speed of Gradient Descent.
  2. Gradient Descent Optimization: An iterative approach that calculates the direction of steepest descent to minimize the Mean Squared Error (MSE).
  3. Model Validation: Continuous monitoring of Root Mean Squared Error (RMSE) across both Training and Test datasets to detect potential overfitting.
  4. Benchmarking: Includes a direct comparison with the np.linalg.pinv solution, providing a reference for the accuracy of the iterative gradient descent method.


Project Structure

House_Price_Prediction/
├── LICENSE                                   # MIT License
├── README.md                                 # Project documentation
├── .gitattributes                            # Git configuration
├── Project Report.pdf                        # Technical research report
│
└── Code/                                     # Implementation scripts
    ├── simple_regression.py                  # Core logic (Training, GD, Plots)
    ├── scaling.py                            # Data normalization utilities
    ├── train.txt                             # Training dataset (House Prices)
    └── test.txt                              # Test dataset for validation


Mathematical Foundations

1. Hypothesis Function

The linear model used to predict the house price given input feature x.

h_w(x) = w₀ + w₁x

2. Objective Function (MSE)

The "Least Squares" cost function that calculates the average squared error.

J(w) = (1 / 2n) * Σ [h_w(x⁽ⁱ⁾) - y⁽ⁱ⁾]²

3. Gradient Calculation

The partial derivative of the cost function with respect to weights, used for updates.

∇J(w) = (1 / n) * XT(Xw - y)

4. Root Mean Squared Error (RMSE)

Used as the primary metric for evaluating model performance in the original units.

RMSE = sqrt( (1 / n) * Σ [h_w(x⁽ⁱ⁾) - y⁽ⁱ⁾]² )


Technical Specifications

Parameter Configuration Details
Algorithm Linear Regression
Optimization Batch Gradient Descent
Learning Rate (λ) 0.1 (Optimized for convergence)
Epochs 500 Iterations
Scaling Z-Score (Standardization)
Evaluation RMSE comparison vs. Pseudo-Inverse


Deployment & Installation

Repository Acquisition

Clone the repository and enter the project directory:

git clone https://github.com/Zer0-Bug/House_Price_Prediction.git
cd House_Price_Prediction

Environment Setup

Create a virtual environment and install the required numerical libraries:

# Environment initialization
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install numpy matplotlib

Running the Project

The main implementation script will process the data, perform training, and generate visualization plots:

python Code/simple_regression.py


Contribution

Contributions are always appreciated. Open-source projects grow through collaboration, and any improvement—whether a bug fix, new feature, documentation update, or suggestion—is valuable.

To contribute, please follow the steps below:

  1. Fork the repository.
  2. Create a new branch for your change:
    git checkout -b feature/your-feature-name
  3. Commit your changes with a clear and descriptive message:
    git commit -m "Add: brief description of the change"
  4. Push your branch to your fork:
    git push origin feature/your-feature-name
  5. Open a Pull Request describing the changes made.

All contributions are reviewed before being merged. Please ensure that your changes follow the existing code style and include relevant documentation or tests where applicable.

References

  1. Breiman, L. (2001) - Statistical Modeling: The Two Cultures. Statistical Science.
  2. Hastie, T., Tibshirani, R., & Friedman, J. (2009) - The Elements of Statistical Learning. Springer.
  3. Goodfellow, I., Bengio, Y., & Courville, A. (2016) - Deep Learning, MIT Press.


Email × LinkedIn