Skip to content

Error message for non-parsable datasets in AutoML #5130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2020

Conversation

justinormont
Copy link
Contributor

Fixes: #5129

I'd recommend improving the current error message:

if (!splitInference.IsSuccess)
{
throw new InferenceException(InferenceExceptionType.ColumnSplit, "Unable to split the file provided into multiple, consistent columns.");
}

It currently says, "Unable to split the file provided into multiple, consistent columns.", which is rather uninformative and non-actionable.

Perhaps, as I think @briacht is suggesting, have it list the acceptable file formats we can parse: "Unable to split the file provided into multiple, consistent columns. Readable formats include delimited files such as CSV/TSV. Check for a consistent number of columns and proper escaping and quoting.".

This messaging now includes, the problem, and next steps for the user.

I mention delimited as AutoML supports more than CSV/TSV as it tries tab, comma, space, semi-colon as the separator (src). If we run into other common separators, we can trivially augment this list. One candidate is the vertical bar |.

@justinormont justinormont requested a review from a team as a code owner May 14, 2020 20:02
@codecov
Copy link

codecov bot commented May 14, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@a2406f6). Click here to learn what that means.
The diff coverage is 0.00%.

@@            Coverage Diff            @@
##             master    #5130   +/-   ##
=========================================
  Coverage          ?   75.60%           
=========================================
  Files             ?      990           
  Lines             ?   179207           
  Branches          ?    19287           
=========================================
  Hits              ?   135481           
  Misses            ?    38465           
  Partials          ?     5261           
Flag Coverage Δ
#Debug 75.60% <0.00%> (?)
#production 71.50% <0.00%> (?)
#test 88.85% <ø> (?)
Impacted Files Coverage Δ
...ft.ML.AutoML/ColumnInference/ColumnInferenceApi.cs 83.00% <0.00%> (ø)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mstfbl mstfbl merged commit c025270 into dotnet:master May 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error messaging for non-parsable datasets
4 participants