Description
openedon May 6, 2021
System Information (please complete the following information):
- Model Builder Version: 16.5.0.2115505
- Visual Studio Version : 16.9.4
Describe the bug
This happen during the training for a regression model, the system is not able to parse correctly data on columns and throw errors
2021-05-06 16:01:51.7496 DEBUG Parsing failed with an exception: Could not parse value P in line 99, column PavedDrive
at Microsoft.ML.Data.TextLoader.Cursor.d__33.MoveNext()
at Microsoft.ML.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Data.IO.BinarySaver.RowsPerBlockHeuristic(IDataView data, ColumnCodec[] actives)
at Microsoft.ML.Data.IO.BinarySaver.SaveData(Stream stream, IDataView data, Int32[] colIndices)
at Microsoft.ML.Data.DataSaverUtils.SaveDataView(IChannel ch, IDataSaver saver, IDataView view, Stream stream, Boolean keepHidden)
at Microsoft.ML.BinaryLoaderSaverCatalog.SaveAsBinary(DataOperationsCatalog catalog, IDataView data, Stream stream, Boolean keepHidden)
at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.LocalAutoMLExperiment.d__21.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/LocalAutoMLExperiment.cs:line 120
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ML.ModelBuilder.AutoMLEngine.d__23.MoveNext() in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 155 (Mic
To Reproduce
I use this csv file : https://github.com/mdfarragher/DSC/blob/master/Regression/HousePricePrediction/data.csv
The column to predict is SalePrice and the train time is 10s
I Start the train and I obtain the exception I extracted the trace here above.
Steps to reproduce the behavior:
Expected behavior
I expect the tool is able to parse the column without raising exception , or is it normal? Do we expect the dataset is clean and usable for the prediction?