A high-performance implementation of Linear Regression using Gradient Descent. This project focuses on predicting house prices by optimizing model weights through iterative gradient updates, featuring data standardization and a comparative analysis against the pseudo-inverse (Normal Equation) solution.
The system implements a classic machine learning workflow optimized for regression tasks. It emphasizes numerical stability and performance benchmarking:
- Feature Standardization: Implements Z-score normalization (via
scaling.py) to center feature distributions, which is critical for the convergence speed of Gradient Descent. - Gradient Descent Optimization: An iterative approach that calculates the direction of steepest descent to minimize the Mean Squared Error (MSE).
- Model Validation: Continuous monitoring of Root Mean Squared Error (RMSE) across both Training and Test datasets to detect potential overfitting.
- Benchmarking: Includes a direct comparison with the
np.linalg.pinvsolution, providing a reference for the accuracy of the iterative gradient descent method.
House_Price_Prediction/
├── LICENSE # MIT License
├── README.md # Project documentation
├── .gitattributes # Git configuration
├── Project Report.pdf # Technical research report
│
└── Code/ # Implementation scripts
├── simple_regression.py # Core logic (Training, GD, Plots)
├── scaling.py # Data normalization utilities
├── train.txt # Training dataset (House Prices)
└── test.txt # Test dataset for validation
The linear model used to predict the house price given input feature x.
h_w(x) = w₀ + w₁x
The "Least Squares" cost function that calculates the average squared error.
J(w) = (1 / 2n) * Σ [h_w(x⁽ⁱ⁾) - y⁽ⁱ⁾]²
The partial derivative of the cost function with respect to weights, used for updates.
∇J(w) = (1 / n) * XT(Xw - y)
Used as the primary metric for evaluating model performance in the original units.
RMSE = sqrt( (1 / n) * Σ [h_w(x⁽ⁱ⁾) - y⁽ⁱ⁾]² )
| Parameter | Configuration Details |
|---|---|
| Algorithm | Linear Regression |
| Optimization | Batch Gradient Descent |
| Learning Rate (λ) | 0.1 (Optimized for convergence) |
| Epochs | 500 Iterations |
| Scaling | Z-Score (Standardization) |
| Evaluation | RMSE comparison vs. Pseudo-Inverse |
Clone the repository and enter the project directory:
git clone https://github.com/Zer0-Bug/House_Price_Prediction.gitcd House_Price_PredictionCreate a virtual environment and install the required numerical libraries:
# Environment initialization
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install numpy matplotlibThe main implementation script will process the data, perform training, and generate visualization plots:
python Code/simple_regression.pyContributions are always appreciated. Open-source projects grow through collaboration, and any improvement—whether a bug fix, new feature, documentation update, or suggestion—is valuable.
To contribute, please follow the steps below:
- Fork the repository.
- Create a new branch for your change:
git checkout -b feature/your-feature-name - Commit your changes with a clear and descriptive message:
git commit -m "Add: brief description of the change" - Push your branch to your fork:
git push origin feature/your-feature-name - Open a Pull Request describing the changes made.
All contributions are reviewed before being merged. Please ensure that your changes follow the existing code style and include relevant documentation or tests where applicable.
- Breiman, L. (2001) - Statistical Modeling: The Two Cultures. Statistical Science.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009) - The Elements of Statistical Learning. Springer.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016) - Deep Learning, MIT Press.
∞