Skip to content

Better error for columns with incorrect type prediction #1442

Open

Description

System Information (please complete the following information):

  • Model Builder Version: 16.5.0.2115505
  • Visual Studio Version : 16.9.4

Describe the bug
This happen during the training for a regression model, the system is not able to parse correctly data on columns and throw errors

2021-05-06 16:01:51.7496 DEBUG Parsing failed with an exception: Could not parse value P in line 99, column PavedDrive
at Microsoft.ML.Data.TextLoader.Cursor.d__33.MoveNext()
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.IO.BinarySaver.RowsPerBlockHeuristic(IDataView data, ColumnCodec[] actives)
at Microsoft.ML.Data.IO.BinarySaver.SaveData(Stream stream, IDataView data, Int32[] colIndices)
at Microsoft.ML.Data.DataSaverUtils.SaveDataView(IChannel ch, IDataSaver saver, IDataView view, Stream stream, Boolean keepHidden)
at Microsoft.ML.BinaryLoaderSaverCatalog.SaveAsBinary(DataOperationsCatalog catalog, IDataView data, Stream stream, Boolean keepHidden)
at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.LocalAutoMLExperiment.d__21.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalAutoMLExperiment.cs:line 120
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__23.MoveNext() in /
/src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 155 (Mic

To Reproduce
I use this csv file : https://github.com/mdfarragher/DSC/blob/master/Regression/HousePricePrediction/data.csv
The column to predict is SalePrice and the train time is 10s
I Start the train and I obtain the exception I extracted the trace here above.
Steps to reproduce the behavior:

Expected behavior
I expect the tool is able to parse the column without raising exception , or is it normal? Do we expect the dataset is clean and usable for the prediction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions