|
61 | 61 | cv_results[columns].sort_values(by="rank_test_score") |
62 | 62 |
|
63 | 63 | # %% [markdown] |
64 | | -# We can observe that in our grid-search, the largest `max_depth` together |
65 | | -# with the largest `n_estimators` led to the best generalization performance. |
66 | | -# |
| 64 | +# We can observe that in our grid-search, the largest `max_depth` together with |
| 65 | +# the largest `n_estimators` led, on average, to the best performance on the |
| 66 | +# validation sets. Now we will estimate the generalization performance of the |
| 67 | +# best model by refitting it with the full training set and using the test set |
| 68 | +# for scoring on unseen data. This is done by default when calling the `.fit` |
| 69 | +# method. |
| 70 | + |
| 71 | +# %% |
| 72 | +error = -grid_search.score(data_test, target_test) |
| 73 | +print(f"On average, our random forest regressor makes an error of {error:.2f} k$") |
| 74 | + |
| 75 | +# %% [markdown] |
67 | 76 | # ## Gradient-boosting decision trees |
68 | 77 | # |
69 | | -# For gradient-boosting, parameters are coupled, so we cannot set the |
70 | | -# parameters one after the other anymore. The important parameters are |
71 | | -# `n_estimators`, `max_depth`, and `learning_rate`. |
| 78 | +# For gradient-boosting, parameters are coupled, so we cannot set the parameters |
| 79 | +# one after the other anymore. The important parameters are `n_estimators`, |
| 80 | +# `max_depth`, and `learning_rate`. |
72 | 81 | # |
73 | | -# Let's first discuss the `max_depth` parameter. |
74 | | -# We saw in the section on gradient-boosting that the algorithm fits the error |
75 | | -# of the previous tree in the ensemble. Thus, fitting fully grown trees will |
76 | | -# be detrimental. |
77 | | -# Indeed, the first tree of the ensemble would perfectly fit (overfit) the data |
78 | | -# and thus no subsequent tree would be required, since there would be no |
79 | | -# residuals. |
| 82 | +# Let's first discuss the `max_depth` parameter. We saw in the section on |
| 83 | +# gradient-boosting that the algorithm fits the error of the previous tree in |
| 84 | +# the ensemble. Thus, fitting fully grown trees will be detrimental. Indeed, the |
| 85 | +# first tree of the ensemble would perfectly fit (overfit) the data and thus no |
| 86 | +# subsequent tree would be required, since there would be no residuals. |
80 | 87 | # Therefore, the tree used in gradient-boosting should have a low depth, |
81 | 88 | # typically between 3 to 8 levels. Having very weak learners at each step will |
82 | 89 | # help reducing overfitting. |
|
85 | 92 | # residuals will be corrected and less learners are required. Therefore, |
86 | 93 | # `n_estimators` should be increased if `max_depth` is lower. |
87 | 94 | # |
88 | | -# Finally, we have overlooked the impact of the `learning_rate` parameter |
89 | | -# until now. When fitting the residuals, we would like the tree |
90 | | -# to try to correct all possible errors or only a fraction of them. |
91 | | -# The learning-rate allows you to control this behaviour. |
92 | | -# A small learning-rate value would only correct the residuals of very few |
93 | | -# samples. If a large learning-rate is set (e.g., 1), we would fit the |
94 | | -# residuals of all samples. So, with a very low learning-rate, we will need |
95 | | -# more estimators to correct the overall error. However, a too large |
96 | | -# learning-rate tends to obtain an overfitted ensemble, |
97 | | -# similar to having a too large tree depth. |
| 95 | +# Finally, we have overlooked the impact of the `learning_rate` parameter until |
| 96 | +# now. When fitting the residuals, we would like the tree to try to correct all |
| 97 | +# possible errors or only a fraction of them. The learning-rate allows you to |
| 98 | +# control this behaviour. A small learning-rate value would only correct the |
| 99 | +# residuals of very few samples. If a large learning-rate is set (e.g., 1), we |
| 100 | +# would fit the residuals of all samples. So, with a very low learning-rate, we |
| 101 | +# will need more estimators to correct the overall error. However, a too large |
| 102 | +# learning-rate tends to obtain an overfitted ensemble, similar to having a too |
| 103 | +# large tree depth. |
98 | 104 |
|
99 | 105 | # %% |
100 | 106 | from sklearn.ensemble import GradientBoostingRegressor |
|
121 | 127 | # Here, we tune the `n_estimators` but be aware that using early-stopping as |
122 | 128 | # in the previous exercise will be better. |
123 | 129 | # ``` |
| 130 | + |
| 131 | +# %% [markdown] |
| 132 | +# Now we estimate the generalization performance of the best model |
| 133 | +# using the test set. |
| 134 | + |
| 135 | +# %% |
| 136 | +error = -grid_search.score(data_test, target_test) |
| 137 | +print(f"On average, our GBDT regressor makes an error of {error:.2f} k$") |
| 138 | + |
| 139 | +# %% [markdown] |
| 140 | +# The mean test score in the held-out test set is slightly better than the score |
| 141 | +# of the best model. The reason is that the final model is refitted on the whole |
| 142 | +# training set and therefore, on more data than the inner cross-validated models |
| 143 | +# of the grid search procedure. |
0 commit comments