Skip to content

Commit 0facddd

Browse files
mb706vrodriguezf
authored andcommitted
e1071::svm(): Use formula interface only if factors are present (mlr-org#1740)
* Testing svm with many features task * svm use data.frame instead of formula * spaces around match operator * Only use svm data.frame interface if task is all numeric * Deploy from Travis build 13884 [ci skip] Build URL: https://travis-ci.org/mlr-org/mlr/builds/542175846 Commit: 5565287 * add NEWS entry * Deploy from Travis build 13922 [ci skip] Build URL: https://travis-ci.org/mlr-org/mlr/builds/546742364 Commit: de67d1a
1 parent 516d916 commit 0facddd

File tree

6 files changed

+38
-6
lines changed

6 files changed

+38
-6
lines changed

NEWS.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
See `?regr.randomForest` for more details.
1212
`regr.ranger` relies on the functions provided by the package ("jackknife" and "infjackknife" (default))
1313
(@jakob-r, #1784)
14+
- `e1071::svm()` now only uses the formula interface if factors are present. This change is supposed to prevent from "stack overflow" issues some users encountered when using large datasets. See #1738 for more information. (@mb706, #1740)
1415

1516
## functions - general
1617
- `getClassWeightParam()` now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)

R/RLearner_classif_svm.R

+10-3
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,16 @@ makeRLearner.classif.svm = function() {
2828
}
2929

3030
#' @export
31-
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
32-
f = getTaskFormula(.task)
33-
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
31+
trainLearner.classif.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
32+
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
33+
# use formula interface if factors are present
34+
f = getTaskFormula(.task)
35+
e1071::svm(f, data = getTaskData(.task, .subset), probability = .learner$predict.type == "prob", ...)
36+
} else {
37+
# use the "data.frame" approach if no factors are present to prevent issues like https://github.com/mlr-org/mlr/issues/1738
38+
d = getTaskData(.task, .subset, target.extra = TRUE)
39+
e1071::svm(d$data, d$target, probability = .learner$predict.type == "prob", ...)
40+
}
3441
}
3542

3643
#' @export

R/RLearner_regr_svm.R

+8-3
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,14 @@ makeRLearner.regr.svm = function() {
2727
}
2828

2929
#' @export
30-
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
31-
f = getTaskFormula(.task)
32-
e1071::svm(f, data = getTaskData(.task, .subset), ...)
30+
trainLearner.regr.svm = function(.learner, .task, .subset, .weights = NULL, ...) {
31+
if (sum(getTaskDesc(.task)$n.feat[c("factors", "ordered")]) > 0) {
32+
f = getTaskFormula(.task)
33+
e1071::svm(f, data = getTaskData(.task, .subset), ...)
34+
} else {
35+
d = getTaskData(.task, .subset, target.extra = TRUE)
36+
e1071::svm(d$data, d$target, ...)
37+
}
3338
}
3439

3540
#' @export

docs/news/index.html

+2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/testthat/test_classif_svm.R

+8
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,11 @@ test_that("classif_svm", {
5454
preds = predict(model, multiclass.task)
5555
expect_lt(performance(preds), 0.3)
5656
})
57+
58+
test_that("classif_svm with many features", {
59+
set.seed(8008135)
60+
xt = cbind(as.data.frame(matrix(rnorm(4e4), ncol = 2e4)), x = as.factor(c("a", "b")))
61+
xt.task = makeClassifTask("xt", xt, "x")
62+
# the given task has many features, the formula interface fails
63+
train("classif.svm", xt.task)
64+
})

tests/testthat/test_regr_svm.R

+9
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,12 @@ test_that("regr_svm", {
3030

3131
testCVParsets("regr.svm", regr.df, regr.target, tune.train = tt, tune.predict = tp, parset.list = parset.list)
3232
})
33+
34+
test_that("classif_svm with many features", {
35+
set.seed(8008135)
36+
xt = cbind(as.data.frame(matrix(rnorm(4e4), ncol = 2e4)), x = 1:2)
37+
xt.task = makeRegrTask("xt", xt, "x")
38+
# the given task has many features, the formula interface fails
39+
train("regr.svm", xt.task)
40+
})
41+

0 commit comments

Comments
 (0)