Submit this code as a pull request back to GitHub Classroom by the date and time listed in Canvas.
For this assignment, use the dataset called radar_parameters.csv
provided in the GitHub repository in the folder homework
.
The provided data has
The training data consists of polarimetric radar parameters calculated from a disdrometer (an instrument that measures rain drop sizes, shapes, and rainfall rate) measurements from several years in Huntsville, Alabama. A model called pytmatrix
is used to calculate polarimetric radar parameters from the droplet observations, which can be used as a way to compare what a remote sensing instrument would see and rainfall.
Features (radar measurements):
Zh
- radar reflectivity factor (dBZ) - use the formula
Zdr
- differential reflectivity
Ldr
- linear depolarization ratio
Kdp
- specific differential phase
Ah
- specific attenuation
Adp
- differential attenuation
Target :
R
- rain rate
-
Split the data into a 70-30 split for training and testing data.
-
Using the split created in (1), train a multiple linear regression dataset using the training dataset, and validate it using the testing dataset. Compare the
$R^2$ and root mean square errors of model on the training and testing sets to a baseline prediction of rain rate using the formula$Z = 200 R^{1.6}$ . -
Repeat 1 doing a grid search over polynomial orders, using a grid search over orders 0-21, and use cross-validation of 7 folds. For the best polynomial model in terms of
$R^2$ , does it outperform the baseline and the linear regression model in terms of$R^2$ and root mean square error? -
Repeat 1 with a Random Forest Regressor, and perform a grid_search on the following parameters:
{'bootstrap': [True, False], 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None], 'max_features': ['auto', 'sqrt'], 'min_samples_leaf': [1, 2, 4], 'min_samples_split': [2, 5, 10], 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}
Can you beat the baseline, or the linear regression, or best polynomial model with the best optimized Random Forest Regressor in terms of