Skip to content

Column purposes should have Sampling Key #1873

Open

Description

Is your feature request related to a problem? Please describe.
At the moment, Advanced Data Options dialog box only supports 3 column purposes: Feature / Label / Ignore.

Some datasets overfit easily when validation/test data is chosen randomly. These might include datasets which include items about the same entity (for example, blood results of the same patient at different times or credit card fraud transactions dataset where one person has commited multiple fraud) or time based series for related items (for example, will it rain tomorrow in a specific location and dataset only include one country). Running this kinds of experiments with Model Builder will often lead to overfitted models, which blocks usage of Model builder for these purposes.

Describe the solution you'd like
Add "Sampling Key" purpose to Advanced Data Options.

Additional context
This has been been suggested earlier by justinormont in 2019 #75 (comment) but I did not notice discussion about it not being suitable for ML Builder, so maybe it could be now reconsidered?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions