Description
Let's make the error message more actionable.
I would recommend adding similar named column(s):
- $"Provided {columnPurpose} column '{columnName}' not found in training data."
+ $"Provided {columnPurpose} column '{columnName}' not found in training data. Did you mean '{closestNamed}'."
For my current example, this would print: Provided ignored column 'tagMaxTotalItem' not found in training data. Did you mean 'tagMaxTotalItems'.
I'd recommend using Levenshtein distance to find the closest named column (code).
Code location:
machinelearning/src/Microsoft.ML.AutoML/Utils/UserInputValidationUtil.cs
Lines 248 to 252 in 5dbfd8a
Background:
It took me ~20min to debug why this error was occurring (obvious in retrospect). My column existed in the dataset, it existed in my loader function, it existed in my IDataView, ...; simply was just misspelt ("tagMaxTotalItem" instead of "tagMaxTotalItems").
Improving the usability of this error message will save future users' time.