Skip to content

Latest commit

 

History

History
55 lines (31 loc) · 2.29 KB

ATMS 523 HW 5.md

File metadata and controls

55 lines (31 loc) · 2.29 KB

ATMS 523

Module 4 Project

Submit this code as a pull request back to GitHub Classroom by the date and time listed in Canvas.

For this assignment, use the dataset called radar_parameters.csv provided in the GitHub repository in the folder homework.

The provided data has

Dataset Description

The training data consists of polarimetric radar parameters calculated from a disdrometer (an instrument that measures rain drop sizes, shapes, and rainfall rate) measurements from several years in Huntsville, Alabama. A model called pytmatrix is used to calculate polarimetric radar parameters from the droplet observations, which can be used as a way to compare what a remote sensing instrument would see and rainfall.

Data columns

Features (radar measurements):

Zh - radar reflectivity factor (dBZ) - use the formula $dBZ = 10\log_{10}(Z)$

Zdr - differential reflectivity

Ldr - linear depolarization ratio

Kdp - specific differential phase

Ah - specific attenuation

Adp - differential attenuation

Target :

R - rain rate

Your assignment

  1. Split the data into a 70-30 split for training and testing data.

  2. Using the split created in (1), train a multiple linear regression dataset using the training dataset, and validate it using the testing dataset. Compare the $R^2$ and root mean square errors of model on the training and testing sets to a baseline prediction of rain rate using the formula $Z = 200 R^{1.6}$.

  3. Repeat 1 doing a grid search over polynomial orders, using a grid search over orders 0-21, and use cross-validation of 7 folds. For the best polynomial model in terms of $R^2$, does it outperform the baseline and the linear regression model in terms of $R^2$ and root mean square error?

  4. Repeat 1 with a Random Forest Regressor, and perform a grid_search on the following parameters:

    {'bootstrap': [True, False],  
    'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],  
    'max_features': ['auto', 'sqrt'],  
    'min_samples_leaf': [1, 2, 4],  
    'min_samples_split': [2, 5, 10],  
    'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}

Can you beat the baseline, or the linear regression, or best polynomial model with the best optimized Random Forest Regressor in terms of $R^2$ and root mean square error?