Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlpWeightDecayML: problem with random search and adaptive resampling #1354

Open
Rek27 opened this issue Feb 5, 2024 · 1 comment
Open

mlpWeightDecayML: problem with random search and adaptive resampling #1354

Rek27 opened this issue Feb 5, 2024 · 1 comment

Comments

@Rek27
Copy link

Rek27 commented Feb 5, 2024

I am trying to do the hyperparameter tuning by using random search algorithm. When trying out the mlpWeightDecayML model, it seems that there is an underlying problem with implementation how the search algorithm works with this model.

Note: I have noticed this only with this model, any other I tried worked fine. Also, I have been playing with adaptive resampling and there is a same problem.

If I set the seed to 5 and take a look at the output, everything looks normal:

data(iris)
set.seed(5)
caretModel = caret::train(
    x = iris[, -ncol(iris)],
    y = iris[, ncol(iris)],
    method = "mlpWeightDecayML",
    tuneLength = 5,
    trControl = caret::trainControl(method="cv", search="random", number=2, verboseIter = TRUE)
)
# + Fold1: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# - Fold1: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# + Fold1: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05
# - Fold1: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05 
# + Fold1: layer1=10, layer2=15, layer3=10, decay=8.376e-03
# - Fold1: layer1=10, layer2=15, layer3=10, decay=8.376e-03 
# + Fold1: layer1= 8, layer2=12, layer3=19, decay=3.723e-02
# - Fold1: layer1= 8, layer2=12, layer3=19, decay=3.723e-02 
# + Fold1: layer1=20, layer2=16, layer3=17, decay=3.865e-02
# - Fold1: layer1=20, layer2=16, layer3=17, decay=3.865e-02 
# + Fold2: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05
# - Fold2: layer1=16, layer2= 3, layer3= 5, decay=1.538e-05 
# + Fold2: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05
# - Fold2: layer1=12, layer2= 6, layer3= 9, decay=1.222e-05 
# + Fold2: layer1=10, layer2=15, layer3=10, decay=8.376e-03
# - Fold2: layer1=10, layer2=15, layer3=10, decay=8.376e-03 
# + Fold2: layer1= 8, layer2=12, layer3=19, decay=3.723e-02
# - Fold2: layer1= 8, layer2=12, layer3=19, decay=3.723e-02 
# + Fold2: layer1=20, layer2=16, layer3=17, decay=3.865e-02
# - Fold2: layer1=20, layer2=16, layer3=17, decay=3.865e-02 
# Aggregating results
# Selecting tuning parameters
# Fitting layer1 = 12, layer2 = 6, layer3 = 9, decay = 1.22e-05 on full training set

If I try a different seed, for example 6, it doesn't seem to work:

data(iris)
set.seed(6)
caretModel = caret::train(
    x = iris[, -ncol(iris)],
    y = iris[, ncol(iris)],
    method = "mlpWeightDecayML",
    tuneLength = 5,
    trControl = caret::trainControl(method="cv", search="random", number=2, verboseIter = TRUE)
)
# + Fold1: layer1=14, layer2= 2, layer3= 0, decay=3.063e+00 
# + Fold1: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# - Fold1: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# + Fold1: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# - Fold1: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03 
# + Fold1: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# - Fold1: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02 
# + Fold1: layer1= 3, layer2=17, layer3=18, decay=5.715e-05 
# - Fold1: layer1= 3, layer2=17, layer3=18, decay=5.715e-05 
# + Fold2: layer1=14, layer2= 2, layer3= 0, decay=3.063e+00
# + Fold2: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# - Fold2: layer1=15, layer2= 2, layer3=10, decay=4.430e-04 
# + Fold2: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# - Fold2: layer1= 5, layer2= 3, layer3= 7, decay=3.981e-03
# + Fold2: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# - Fold2: layer1= 4, layer2= 8, layer3= 2, decay=1.351e-02
# + Fold2: layer1= 3, layer2=17, layer3=18, decay=5.715e-05
# - Fold2: layer1= 3, layer2=17, layer3=18, decay=5.715e-05
# Error in { :
#   task 1 failed - "arguments imply differing number of rows: 0, 75"
# In addition: Warning messages:
# 1: At least one layer had zero units and were removed. The new structure is 14->2   
# 2: At least one layer had zero units and were removed. The new structure is 14->2 

After inspecting the output, I have noticed something strange. In the first example, the tuning iterations look normal, one starts and then finishes ('+' and then '-' at the beginning of the line). But if you take a look at the second example, it is not chronological. First two lines of the output are the beginnings of iterations (first one did not finish) and then at the beginning of Fold2, the same thing happened. I am not sure what is the underlining problem. I have tried changing datasets, different trainControl parameters but there is a pattern that, when this error happens, the iteration output is a bit messy.

Session Info:

>sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] caret_6.0-94    lattice_0.20-45 ggplot2_3.4.4   reprex_2.1.0

loaded via a namespace (and not attached):
 [1] httr_1.4.7           jsonlite_1.8.8       splines_4.2.2
 [4] foreach_1.5.2        R.utils_2.12.3       prodlim_2023.08.28
 [7] stats4_4.2.2         yaml_2.3.8           globals_0.16.2      
[10] ipred_0.9-14         RSNNS_0.4-17         pillar_1.9.0
[13] glue_1.6.2           pROC_1.18.5          digest_0.6.33
[16] hardhat_1.3.0        colorspace_2.1-0     recipes_1.0.9
[19] htmltools_0.5.7      Matrix_1.5-1         R.oo_1.25.0
[22] plyr_1.8.9           timeDate_4032.109    clipr_0.8.0
[25] pkgconfig_2.0.3      listenv_0.9.0        purrr_1.0.2
[28] scales_1.3.0         processx_3.8.2       gower_1.0.1
[31] lava_1.7.3           proxy_0.4-27         timechange_0.2.0    
[34] tibble_3.2.1         styler_1.10.2        generics_0.1.3
[37] withr_2.5.2          nnet_7.3-18          cli_3.6.1
[40] survival_3.4-0       magrittr_2.0.3       evaluate_0.23
[43] ps_1.7.5             R.methodsS3_1.8.2    fs_1.6.3
[46] fansi_1.0.5          future_1.33.1        parallelly_1.36.0
[49] R.cache_0.16.0       nlme_3.1-160         MASS_7.3-58.1
[52] class_7.3-20         tools_4.2.2          data.table_1.14.8   
[55] lifecycle_1.0.4      stringr_1.5.1        munsell_0.5.0
[58] callr_3.7.3          e1071_1.7-13         compiler_4.2.2
[61] rlang_1.1.2          grid_4.2.2           iterators_1.0.14
[64] rstudioapi_0.15.0    rmarkdown_2.25       gtable_0.3.4
[67] ModelMetrics_1.2.2.2 codetools_0.2-18     reshape2_1.4.4
[70] R6_2.5.1             lubridate_1.9.3      knitr_1.45
[73] dplyr_1.1.4          fastmap_1.1.1        future.apply_1.11.1
[76] utf8_1.2.4           stringi_1.8.2        parallel_4.2.2
[79] Rcpp_1.0.11          vctrs_0.6.5          rpart_4.1.19
[82] tidyselect_1.2.0     xfun_0.41
>
@Rek27
Copy link
Author

Rek27 commented Feb 5, 2024

In addition, it seems that the problem happens if decay is very big. It could be the cause of the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant