You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apparently we get some overhead when mlr3pipelines builds tasks with many BackendCbinds. One way to fix this would be if there were an option to "flatten" cbinded tasks.
Suggested interface:
Task$flatten(force=FALSE) # default
creates a task with a single BackendDataTable, unless this is for some reason a bad idea, e.g. when a backend is a database backend. (A Backend class would need to report whether flattening is a "bad idea", possibly with an active binding, e.g. a database backend could say flattening is okay if the size is less than X MB)
Setting force = TRUE should OTOH flatten the task always, equivalent to creating a new task with the task$data().
Example: TaskClassif that consists of two cbinded data.tables that were cbinded with a database backend:
(abbreviating (DataBackend as DB)
We could think whether it is a good idea if mlr3pipelines does this with all its output tasks by default.
Another question is whether that should be an in-place operation that swaps out a task's data backend, or whether this should create a new task.
Another question is what to do with columns that do not have any column role. Maybe a good default would be to drop backends that do not provide columns that have a role (and are therefore ignored in many cases).
Maybe we would want to have a DataBackendMultiCBind that can cbind multiple sources, so even a task that has many different database backends will only be one level deep at the most after flattening. The $flatten(force = FALSE) -operation would have to check, for each column, if it comes from a data backend that reports it does not want to be flattened. There should be a method in DataBackend that does this recursively. $flatten() would then construct the desired DataBackendMultiCBind.
The text was updated successfully, but these errors were encountered:
Apparently we get some overhead when mlr3pipelines builds tasks with many BackendCbinds. One way to fix this would be if there were an option to "flatten" cbinded tasks.
Suggested interface:
creates a task with a single BackendDataTable, unless this is for some reason a bad idea, e.g. when a backend is a database backend. (A Backend class would need to report whether flattening is a "bad idea", possibly with an active binding, e.g. a database backend could say flattening is okay if the size is less than X MB)
Setting
force = TRUE
should OTOH flatten the task always, equivalent to creating a new task with thetask$data()
.Example: TaskClassif that consists of two cbinded data.tables that were cbinded with a database backend:
(abbreviating (
DataBackend
asDB
)$flatten(force = FALSE)
:$flatten(force = TRUE)
:We could think whether it is a good idea if mlr3pipelines does this with all its output tasks by default.
Another question is whether that should be an in-place operation that swaps out a task's data backend, or whether this should create a new task.
Another question is what to do with columns that do not have any column role. Maybe a good default would be to drop backends that do not provide columns that have a role (and are therefore ignored in many cases).
Maybe we would want to have a
DataBackendMultiCBind
that can cbind multiple sources, so even a task that has many different database backends will only be one level deep at the most after flattening. The$flatten(force = FALSE)
-operation would have to check, for each column, if it comes from a data backend that reports it does not want to be flattened. There should be a method inDataBackend
that does this recursively.$flatten()
would then construct the desiredDataBackendMultiCBind
.The text was updated successfully, but these errors were encountered: