-
-
Notifications
You must be signed in to change notification settings - Fork 405
e1071::svm(): Use formula interface only if factors are present #1740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note 1: You guys could systematically go through the learners and look for learners that use the formula interface but could conceivably use a superior data.frame interface. Anecdotically, I have seen the shim that is Note 2: I don't really like the test I implemented, it takes a relatively large amount of time and memory while really only testing for a design decision. Feel free to leave it out. Note 3: If you think the test for many-feature-dataframes are a good idea, it would be possible to systematically apply that test to all the learners that are supposed to handle many-features-situations. |
This turns out to be harder than I thought, see my comment in the issue |
i would not advise to merge this. see what i also posted in the issue. we should create a clean issue out of this. |
The latest changes (only using formula if factors are present) looks good to me. |
Yep, looks good to me. |
how about commenting on the general issue i raised? this is more then "hey code looks good here". also does anybody see any PROBLEM that thi might introduce? or is it just a clear enhancement? |
i mean, just not the possible side effects this has with this PR #1763 what happens if we merge that too? |
As I've said in the other thread, an alternative would be to have two separate learners for this. That would avoid issues with being able to specify formulas. |
where exactly please? |
i am not a big fan of this. it is |
I have to agree with bernd. We should not try to autodetect this and internally do a if else here that changes the behaivour. I thought that our convention was to always use the formula interface whenever possible (e.g. randomForest). I think the user should be able to choose this himself. |
What cases do we have?
Concrete suggestion:
Consequences:
|
My opinion: I suggest not to complicate the UI but instead make Different solution: Use the formula if a user specified formula was attached to the task (would add another condition to the |
What's the status here? Have we decided on a course of action? |
I think we haven't made a concrete decision yet. Do we even need this PR here which only solves this issue for SVM? Or do we want to solve this on an abstract level? |
pushing this discussion as it sets the base for some PRs now (#1899, #1763, #1740) @giuseppec is again working on #1763, specifying the formula within the task. There must always be one place where the formula is specified manually. Why is the ParamSet of a learner (as in #1899) a bad place? |
|
This would be ok if the same formula notation is accepted by multiple learners. However, in the case of @berndbischl |
cf3db65
to
57f099e
Compare
Since the discussion about formula handling was ported to mlr3 and will probably not solved anymore here (which blocked the merging of this PR), is there something that would still speak against merging this SVM fix here? |
@larskotthoff ready to merge? |
…org#1740) * Testing svm with many features task * svm use data.frame instead of formula * spaces around match operator * Only use svm data.frame interface if task is all numeric * Deploy from Travis build 13884 [ci skip] Build URL: https://travis-ci.org/mlr-org/mlr/builds/542175846 Commit: 5565287 * add NEWS entry * Deploy from Travis build 13922 [ci skip] Build URL: https://travis-ci.org/mlr-org/mlr/builds/546742364 Commit: de67d1a
See if this fixes #1738