This project is a Ridge regression model used to estimate house prices. It was developed for a Kaggle competition.
- The code reads in the training and testing data from CSV files.
- It encodes the categorical features using OneHotEncoder from scikit-learn.
- The encoded data is combined with the numerical features, and any missing values are filled with 0.
- The training data is split into features (X_train) and target variable (y_train).
- A Ridge regression model is created with an alpha value of 10.
- The model is trained on the training data using the fit() method.
- The trained model is saved to a file named "model.joblib" using joblib.
- The performance of the model is evaluated on the training data using cross-validation with two metrics: negative root mean squared error and R-squared score.
- Predictions are made on the testing data using the predict() method.
- The predictions are written to a CSV file named "submission.csv".
This project is licensed under the MIT License - see the LICENSE file for details.