Skip to content

Commit e4258f6

Browse files
committed
merge master
Merge branch 'master' into fs-ensemble # Conflicts: # docs/articles/tutorial/create_filter_files/figure-html/unnamed-chunk-5-1.png # docs/articles/tutorial/feature_selection_files/figure-html/unnamed-chunk-4-1.png
2 parents 2562572 + 0fd1c73 commit e4258f6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+143
-224
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
We are always happy to receive pull requests.
22

33
Please make sure you have read our coding guidelines:
4-
https://github.com/mlr-org/mlr/wiki/mlr-Coding-Guidelines
4+
https://www.notion.so/mlrorg/Style-Guide-740bc663207a4bbb9a457987bda6fd91
55

66
This especially means that you have understood:
77

NEWS.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,7 @@
1111
See `?regr.randomForest` for more details.
1212
`regr.ranger` relies on the functions provided by the package ("jackknife" and "infjackknife" (default))
1313
(@jakob-r, #1784)
14-
15-
## functions - general
16-
- `getClassWeightParam()` now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)
17-
- added `getLearnerNote()` to query the "Note" slot of a learner (@alona-sydorova, #2086)
14+
- `e1071::svm()` now only uses the formula interface if factors are present. This change is supposed to prevent from "stack overflow" issues some users encountered when using large datasets. See #1738 for more information. (@mb706, #1740)
1815

1916
## learners - new
2017
- add learner `cluster.MiniBatchKmeans` from package _ClusterR_ (@Prasiddhi, #2554)
@@ -23,6 +20,12 @@
2320
- `plotHyperParsEffect()` now supports facet visualization of hyperparam effects for nested cv (@MasonGallo, #1653)
2421
- fixed a bug that caused an incorrect aggregation of probabilities in some cases. The bug existed since quite some time and was exposed due to the change of `data.table`s default in `rbindlist()`. See #2578 for more information. (@mllg, #2579)
2522
- fixed a bug in which `options(on.learner.error)` was not respected in `benchmark()`. This caused `benchmark()` to stop even if it should have continued including `FailureModels` in the result (@dagola, #1984)
23+
- `getClassWeightParam()` now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)
24+
- added `getLearnerNote()` to query the "Note" slot of a learner (@alona-sydorova, #2086)
25+
26+
## filters - general
27+
28+
- Filter `praznik_mrmr` also supports `regr` and `surv` tasks
2629

2730
# mlr 2.14.0
2831

R/Filter.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1003,7 +1003,7 @@ makeFilter(
10031003
name = "praznik_MRMR",
10041004
desc = "Minimum redundancy maximal relevancy filter",
10051005
pkg = "praznik",
1006-
supported.tasks = "classif",
1006+
supported.tasks = c("classif", "regr", "surv"),
10071007
supported.features = c("numerics", "factors", "integer", "character", "logical"),
10081008
fun = praznik_filter("MRMR")
10091009
)

R/RLearner_classif_cforest.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ trainLearner.classif.cforest = function(.learner, .task, .subset,
4040

4141
f = getTaskFormula(.task)
4242
d = getTaskData(.task, .subset)
43+
44+
# default handling necessary because the default of controls is `cforest_unbiased()` which does not allow all parameters (e.g. replace)
4345
defaults = getDefaults(getParamSet(.learner))
4446
if (missing(teststat)) teststat = defaults$teststat
4547
if (missing(testtype)) testtype = defaults$testtype
@@ -50,6 +52,7 @@ trainLearner.classif.cforest = function(.learner, .task, .subset,
5052
fraction, trace, teststat, testtype, mincriterion,
5153
minsplit, minbucket, stump, nresample, maxsurrogate,
5254
maxdepth, savesplitstats)
55+
5356
party::cforest(f, data = d, controls = ctrl, weights = .weights, ...)
5457
}
5558

R/RLearner_classif_svm.R

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,16 @@ makeRLearner.classif.svm = function() {
2828
}
2929

3030
#' @export
31-
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
32-
f = getTaskFormula(.task)
33-
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
31+
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
32+
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
33+
# use formula interface if factors are present
34+
f = getTaskFormula(.task)
35+
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
36+
} else {
37+
# use the "data.frame" approach if no factors are present to prevent issues like https://github.com/mlr-org/mlr/issues/1738
38+
d = getTaskData(.task, .subset, target.extra = TRUE)
39+
e1071::svm(d$data, d$target, probability = .learner$predict.type == "prob", ...)
40+
}
3441
}
3542

3643
#' @export

R/RLearner_regr_cforest.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ trainLearner.regr.cforest = function(.learner, .task, .subset, .weights = NULL,
3939

4040
f = getTaskFormula(.task)
4141
d = getTaskData(.task, .subset)
42+
43+
# default handling necessary because the default of controls is `cforest_unbiased()` which does not allow all parameters (e.g. replace)
4244
defaults = getDefaults(getParamSet(.learner))
4345
if (missing(teststat)) teststat = defaults$teststat
4446
if (missing(testtype)) testtype = defaults$testtype
@@ -49,6 +51,7 @@ trainLearner.regr.cforest = function(.learner, .task, .subset, .weights = NULL,
4951
trace, teststat, testtype, mincriterion,
5052
minsplit, minbucket, stump,
5153
nresample, maxsurrogate, maxdepth, savesplitstats)
54+
5255
party::cforest(f, data = d, controls = ctrl, weights = .weights, ...)
5356
}
5457

R/RLearner_regr_crs.R

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,7 @@ predictLearner.regr.crs = function(.learner, .model, .newdata, ...) {
6464
lwr = attr(pred, "lwr")
6565
attr(pred, "lwr") = NULL
6666
attr(pred, "upr") = NULL
67-
# FIXME: make sure that this is correct, ask Daniel
68-
se = (pred - lwr) * sqrt(.model$task.desc$size) / qnorm(0.95)
67+
se = (pred - lwr) / qnorm(0.95)
6968
cbind(pred, se)
7069
} else {
7170
pred = predict(.model$learner.model, newdata = .newdata, ...)

R/RLearner_regr_svm.R

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,14 @@ makeRLearner.regr.svm = function() {
2727
}
2828

2929
#' @export
30-
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
31-
f = getTaskFormula(.task)
32-
e1071::svm(f, data = getTaskData(.task, .subset), ...)
30+
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
31+
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
32+
f = getTaskFormula(.task)
33+
e1071::svm(f, data = getTaskData(.task, .subset), ...)
34+
} else {
35+
d = getTaskData(.task, .subset, target.extra = TRUE)
36+
e1071::svm(d$data, d$target, ...)
37+
}
3338
}
3439

3540
#' @export

docs/PULL_REQUEST_TEMPLATE.html

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/articles/tutorial/create_filter.html

Lines changed: 7 additions & 13 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)