Description
openedon Oct 26, 2021
Is your feature request related to a problem? Please describe.
At the moment, Advanced Data Options dialog box only supports 3 column purposes: Feature / Label / Ignore.
Some datasets overfit easily when validation/test data is chosen randomly. These might include datasets which include items about the same entity (for example, blood results of the same patient at different times or credit card fraud transactions dataset where one person has commited multiple fraud) or time based series for related items (for example, will it rain tomorrow in a specific location and dataset only include one country). Running this kinds of experiments with Model Builder will often lead to overfitted models, which blocks usage of Model builder for these purposes.
Describe the solution you'd like
Add "Sampling Key" purpose to Advanced Data Options.
Additional context
This has been been suggested earlier by justinormont in 2019 #75 (comment) but I did not notice discussion about it not being suitable for ML Builder, so maybe it could be now reconsidered?