Add tabular learning methods #141

manikyabard · 2021-07-16T12:43:17Z

Adds the implementation of DLPipelines.jl LearningMethod interface for tabular regression and tabular classification tasks.

src/methods/tabularclassification.jl

lorenzoh · 2021-08-10T14:08:43Z

Some notes on integration with the data block API

Ideally the table-specific data processing functionality would be encapsulated in an Encoding called TabularPreprocessing that works on a TableRow Block. If the target variable is separated from the row, we could then write the learning tasks as:

# tabular classification
BlockMethod(
    (TableRow(catcols, contcols), Label(classes)),
    (
        TabularPreprocessing(),  # only transforms `TableRow`
        OneHot(),  # only transforms `Label`
    )
)

# tabular regression
BlockMethod(
    (TableRow(catcols, contcols), Continuous(n)),
    (
        TabularPreprocessing(),  # only transforms `TableRow`
    )
)

This would require the following changes:

instead of using data containers with observations row, separate out target variable such that one observation is (row, target)
move preprocessing into an encoding TabularPreprocessing
implement block TabularRow

Block needs to hold information on which columns are categorical and continuous, so something like
```
struct TabularRow <: Block
    catcols
    contcols
end
```
implement block Continuous for regression targets

src/datablock/block.jl

lorenzoh · 2021-08-15T14:25:25Z

src/datablock/encoding.jl

@@ -246,7 +246,7 @@ function testencoding(encoding, block, data = mockblock(block))
        @test checkblock(outblock, outdata)

        # Test decoding (if supported) works correctly
-        inblock = decodedblock(encoding, outblock, true)
+        inblock = decodedblock(encoding, outblock)


Why is this changed?

For the encodings for which decoding isn't supported yet, like TabularTransforms (I'll rename this to TabularPreprocessing), it will give incorrect results I think.
The encoded block after applying TabularTransforms on a TableRow is EncodedTableRow, and if decoding was supported, then the decoded block should be a TableRow again. But as it isn't supported, using the fill option would make it so that EncodedTableRow is returned again (instead of nothing through the fallback method) which wouldn't be correct.
The tests after this could fail because inblock (which would be the encoded block in this situation), doesn't have to be the same as the original block. Also I'm not sure if the if condition would ever be false as of now.

src/datablock/models.jl

src/encodings/tabularpreprocessing.jl

lorenzoh · 2021-08-15T14:34:09Z

src/encodings/tabularpreprocessing.jl

+    catcols, contcols
+end
+
+function gettransforms(td::Datasets.TableDataset)


Can we use this in a constructor for TabularTransform so you can call it like TabularTransform(TableDataset(...))?

Sure, so something like this?

TabularTransform(td::TableDataset) = TabularTransform(gettransforms(td))

src/encodings/tabularpreprocessing.jl

src/encodings/onehot.jl

src/datablock/block.jl

src/datablock/models.jl

src/encodings/tabularpreprocessing.jl

lorenzoh

Changes look good 👍

lorenzoh · 2021-08-16T08:30:40Z

Anything else that is missing before this can be merged?

Only thing I can think of are simple BlockMethod wrappers like those in src/fasterai/learningmethods.jl? Each should have a constructor that takes just the blocks i.e. ::Tuple{<:TableRow, <:Continuous} and one convenience constructor. They can then be registered with some block types so you can find them using findlearningmethods(s. the same file for how).

darsnack

Some more small nitpicks

src/datablock/block.jl

src/encodings/tabularpreprocessing.jl

src/datablock/models.jl

manikyabard · 2021-08-18T10:27:15Z

I think the tests are failing because the model PR isn't merged yet.

darsnack · 2021-08-20T15:54:47Z

Is there a reason that decode(::Encoding, ::Block) defaults to nothing (same for encode)? I get that it allows us to distinguish between when a block is modified but happens to be the same type vs. when it is not affected by Encoding. But is there value in tracking that information? Could we just default to an identity map like

encodedblock(::Encoding, block::Block) = block
decodedblock(::Encoding, block::Block) = block

This seems intuitive to me. Unless we use the nothing information later in the pipeline.

lorenzoh · 2021-08-22T17:08:59Z

Is there a reason that decode(::Encoding, ::Block) defaults to nothing (same for encode)? I get that it allows us to distinguish between when a block is modified but happens to be the same type vs. when it is not affected by Encoding. But is there value in tracking that information? Could we just default to an identity map like
encodedblock(::Encoding, block::Block) = block
decodedblock(::Encoding, block::Block) = block
This seems intuitive to me. Unless we use the nothing information later in the pipeline.

That's the only way to know that data was not changed without running the encoding. encode and decode themselves do have an identity as the default, encodedblock and decodedblock just handle the block info.

lorenzoh · 2021-08-22T17:17:59Z

This looks good.

Are the tests failing due to the issue with the decoding not being implemented? In that case, we could add a definition for justdecodedblock without implementing decode. Or add a decode = true kwarg to testencoding so that it only does the decoding tests if decode == true.

manikyabard · 2021-08-22T17:36:44Z

This looks good.

Are the tests failing due to the issue with the decoding not being implemented? In that case, we could add a definition for justdecodedblock without implementing decode. Or add a decode = true kwarg to testencoding so that it only does the decoding tests if decode == true.

I think the tests might be failing because methodmodel is being tested as well and the model code isn't present in this branch yet.

Co-authored-by: lorenzoh <lorenz.ohly@gmail.com>

Co-authored-by: lorenzoh <lorenz.ohly@gmail.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

lorenzoh · 2021-08-22T19:01:41Z

maybe some minor doc improvements could still be made, but let's do that in another PR that also integrates the notebook with the documentation. We still have time for that before the next release.

lorenzoh reviewed Jul 16, 2021

View reviewed changes

src/methods/tabularclassification.jl Outdated Show resolved Hide resolved

manikyabard force-pushed the manikyabard/tabularmethods branch from 26bb2c4 to 9b92a28 Compare August 12, 2021 06:00

lorenzoh reviewed Aug 12, 2021

View reviewed changes

src/datablock/block.jl Outdated Show resolved Hide resolved

src/datablock/block.jl Outdated Show resolved Hide resolved

lorenzoh requested changes Aug 15, 2021

View reviewed changes

darsnack requested changes Aug 15, 2021

View reviewed changes

src/datablock/block.jl Outdated Show resolved Hide resolved

src/datablock/models.jl Outdated Show resolved Hide resolved

src/datablock/models.jl Outdated Show resolved Hide resolved

src/encodings/tabularpreprocessing.jl Outdated Show resolved Hide resolved

lorenzoh reviewed Aug 16, 2021

View reviewed changes

darsnack requested changes Aug 16, 2021

View reviewed changes

manikyabard force-pushed the manikyabard/tabularmethods branch from a5294a5 to f95149c Compare August 18, 2021 05:45

manikyabard marked this pull request as ready for review August 19, 2021 04:37

darsnack requested a review from lorenzoh August 19, 2021 13:51

manikyabard added 13 commits August 22, 2021 23:14

add tabular regression method

c15984a

add TabularRegression docstring and update encode

d03cc5c

add helper methods for creating transform dicts

a3f75bd

added docstring for TabularTransforms

6d1d07d

minor tabular regression changes

91314cc

add tabular classification method

c4973e9

updated tabular methods and added notebooks

157233d

added tabular data blocks and encodings

db243d1

refactored table blocks

d58d138

removed pre datablock methods

2c21f80

added docstrings and helper methods for preprocessing statistics

5d06b6a

remove old exports

76ae61f

update methods to use fill

4fb2f48

manikyabard and others added 15 commits August 22, 2021 23:14

update Continuous Block

5704dfe

Co-authored-by: lorenzoh <lorenz.ohly@gmail.com>

fix Continuous block references

04218a6

account for missing vals and cols in TableRow

63cc460

added tablepreprocessing test, updated testencoding

cfb77d7

updated blockmodel

590f6b5

update tabular classification notebook

28e0d5f

added convenience funcs, and updated tabularclassification notebook

de86860

minor changes and docstring updates

1434dd1

Co-authored-by: lorenzoh <lorenz.ohly@gmail.com> Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

use concrete field types

e23e905

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

block model updates

f6e47a0

minor fixes

77251c8

added tabular task wrappers and testcases

b1bbdb7

update testencoding

73b1ac3

added adult dataset checksum and size

1d18440

model fixes

d4abcc1

manikyabard force-pushed the manikyabard/tabularmethods branch from 9ff64ae to d4abcc1 Compare August 22, 2021 18:05

lorenzoh approved these changes Aug 22, 2021

View reviewed changes

lorenzoh merged commit 63bdc3a into FluxML:master Aug 22, 2021

manikyabard mentioned this pull request Aug 23, 2021

FastAI.jl tabular development GSoC tracking FluxML/FluxML-Community-Call-Minutes#34

Closed

5 tasks

Uh oh!

Add tabular learning methods #141

Add tabular learning methods #141

Uh oh!

Conversation

manikyabard commented Jul 16, 2021

Uh oh!

Uh oh!

lorenzoh commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorenzoh Aug 15, 2021

Choose a reason for hiding this comment

Uh oh!

manikyabard Aug 16, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorenzoh Aug 15, 2021

Choose a reason for hiding this comment

Uh oh!

manikyabard Aug 16, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorenzoh left a comment

Choose a reason for hiding this comment

Uh oh!

lorenzoh commented Aug 16, 2021

Uh oh!

darsnack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manikyabard commented Aug 18, 2021

Uh oh!

darsnack commented Aug 20, 2021

Uh oh!

lorenzoh commented Aug 22, 2021

Uh oh!

lorenzoh commented Aug 22, 2021

Uh oh!

manikyabard commented Aug 22, 2021

Uh oh!

lorenzoh commented Aug 22, 2021

Uh oh!

Uh oh!

lorenzoh commented Aug 10, 2021 •

edited

Loading