-
-
Notifications
You must be signed in to change notification settings - Fork 50
Add tabular learning methods #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tabular learning methods #141
Conversation
Some notes on integration with the data block API Ideally the table-specific data processing functionality would be encapsulated in an # tabular classification
BlockMethod(
(TableRow(catcols, contcols), Label(classes)),
(
TabularPreprocessing(), # only transforms `TableRow`
OneHot(), # only transforms `Label`
)
)
# tabular regression
BlockMethod(
(TableRow(catcols, contcols), Continuous(n)),
(
TabularPreprocessing(), # only transforms `TableRow`
)
) This would require the following changes:
|
26bb2c4
to
9b92a28
Compare
src/datablock/encoding.jl
Outdated
@@ -246,7 +246,7 @@ function testencoding(encoding, block, data = mockblock(block)) | |||
@test checkblock(outblock, outdata) | |||
|
|||
# Test decoding (if supported) works correctly | |||
inblock = decodedblock(encoding, outblock, true) | |||
inblock = decodedblock(encoding, outblock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the encodings for which decoding isn't supported yet, like TabularTransforms
(I'll rename this to TabularPreprocessing), it will give incorrect results I think.
The encoded block after applying TabularTransforms
on a TableRow
is EncodedTableRow
, and if decoding was supported, then the decoded block should be a TableRow
again. But as it isn't supported, using the fill option would make it so that EncodedTableRow
is returned again (instead of nothing
through the fallback method) which wouldn't be correct.
The tests after this could fail because inblock
(which would be the encoded block in this situation), doesn't have to be the same as the original block. Also I'm not sure if the if
condition would ever be false
as of now.
catcols, contcols | ||
end | ||
|
||
function gettransforms(td::Datasets.TableDataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this in a constructor for TabularTransform
so you can call it like TabularTransform(TableDataset(...))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so something like this?
TabularTransform(td::TableDataset) = TabularTransform(gettransforms(td))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good 👍
Anything else that is missing before this can be merged? Only thing I can think of are simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more small nitpicks
a5294a5
to
f95149c
Compare
I think the tests are failing because the model PR isn't merged yet. |
Is there a reason that encodedblock(::Encoding, block::Block) = block
decodedblock(::Encoding, block::Block) = block This seems intuitive to me. Unless we use the |
That's the only way to know that data was not changed without running the encoding. |
This looks good. Are the tests failing due to the issue with the decoding not being implemented? In that case, we could add a definition for just |
I think the tests might be failing because |
Co-authored-by: lorenzoh <lorenz.ohly@gmail.com>
Co-authored-by: lorenzoh <lorenz.ohly@gmail.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
9ff64ae
to
d4abcc1
Compare
maybe some minor doc improvements could still be made, but let's do that in another PR that also integrates the notebook with the documentation. We still have time for that before the next release. |
Adds the implementation of DLPipelines.jl
LearningMethod
interface for tabular regression and tabular classification tasks.