This project applies Logistic Regression to a housing dataset to classify whether a house is high-priced or not based on features like area, number of bedrooms, bathrooms, and stories.
The model is trained on both unscaled and scaled data to compare performance and demonstrate the importance of feature scaling in classification problems.
- File:
Housing.csv
- Target Variable:
HighPrice
(binary: 1 = high-priced, 0 = not high-priced) - Features Used:
area
bedrooms
bathrooms
stories
If
HighPrice
is not already binary, it is generated by comparing prices to the median:df['HighPrice'] = (df['price'] > df['price'].median()).astype(int)
- Model: Logistic Regression (
scikit-learn
) - Scaler Used:
StandardScaler
(for scaled version) - Data Split: 80% training, 20% testing
- Evaluation:
- Accuracy score
- Classification report (Precision, Recall, F1-Score)
Metric | Unscaled Model | Scaled Model |
---|---|---|
Accuracy | 0.76 | 0.77 |
Precision (Class 0) | 0.74 | 0.75 |
Recall (Class 0) | 0.92 | 0.91 |
F1-Score (Class 0) | 0.82 | 0.82 |
Precision (Class 1) | 0.83 | 0.81 |
Recall (Class 1) | 0.53 | 0.58 |
F1-Score (Class 1) | 0.65 | 0.68 |
Macro Avg F1 | 0.73 | 0.75 |
Weighted Avg F1 | 0.75 | 0.76 |
β Observation: Scaling slightly improved overall performance and macro-average F1-score, particularly improving recall for high-priced properties (Class 1).
- Open the Jupyter Notebook:
housing_model.ipynb
- Run each cell in order to:
- Load and preprocess the data
- Train logistic regression models (scaled and unscaled)
- Evaluate and compare the results
π You can run the notebook in:
- Jupyter Lab
- Jupyter Notebook
- VS Code (with Python + Jupyter extensions)
Install required Python libraries before running the notebook:
- pandas
- numpy
- scikit-learn
logistic-regression
housing-data
machine-learning
binary-classification
scikit-learn
feature-scaling
standardscaler
data-analysis
predictive-modeling
python
real-estate
classification-model