Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Input string was not in a correct format." exception when executing experiment with ML.AutoML #5428

Closed
patricia-ikosoft opened this issue Oct 12, 2020 · 18 comments · Fixed by #5163
Assignees
Labels
AutoML.NET Automating various steps of the machine learning process bug Something isn't working

Comments

@patricia-ikosoft
Copy link

patricia-ikosoft commented Oct 12, 2020

System information

  • Microsoft Windows 7 Professional, Version 6.1.7601 Service Pack 1 Build 7601
  • .Net Core 3.1
  • Microsoft.ML.AutoML 0.17.2

Issue

  • I am executing an experiment with ML.AutoML, using the data from test2.csv
  • I constantly have a parsing exception on experiment.Execute(data, labelProperty.Name),
    no matter what label I'm choosing, or if I'm loading the data directly from the file or I'm reading and parsing it myself.

at System.Number.ThrowOverflowOrFormatException(ParsingStatus status, TypeCode type)
at Microsoft.ML.AutoML.SweeperProbabilityUtils.ParameterSetAsFloatArray(IValueGenerator[] sweepParams, ParameterSet ps, Boolean expandCategoricals)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, String labelColumnName, String samplingKeyColumn, IEstimator1 preFeaturizer, IProgress1 progressHandler) at MLPoc.Services.LoadDataService.TrainDataAndCreateModel(List1 properties, DynamicTypeProperty labelProperty, List`1 lineValues) in C:\DevITPAzurePatricia\Ikosoft\MLPoc\Services\LoadDataService.cs:line 81

Source code / logs

code.zip

@frank-dong-ms-zz frank-dong-ms-zz added AutoML.NET Automating various steps of the machine learning process bug Something isn't working labels Oct 19, 2020
@frank-dong-ms-zz frank-dong-ms-zz self-assigned this Oct 20, 2020
@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft Sorry for late response, could you please share a complete repro project, I download you code piece and there are some missing definition: MLPoc.Data, DynamicTypeProperty, DataModel and LoadDataHelper.

@frank-dong-ms-zz frank-dong-ms-zz added the Awaiting User Input Awaiting author to supply further info (data, model, repro). Will close issue if no more info given. label Oct 20, 2020
@patricia-ikosoft
Copy link
Author

complete_code.zip

@patricia-ikosoft
Copy link
Author

You can test it by running the app, selecting the .csv file, selecting "HistorySalesPrice" in the dropdown, and click on the button "Click me" to run the experiment

@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft I tried run you sample and get below error message with call stack:
All instances skipped due to missing features.

call stack:
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.ConvertData(RoleMappedData trainData) at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, String labelColumnName, String samplingKeyColumn, IEstimator1 preFeaturizer, IProgress`1 progressHandler)
at MLPoc.Services.LoadDataService.RunExperiment(MLContext mlContext, IDataView data, DynamicTypeProperty labelProperty) in C:\code\complete_code\complete_code\MLPoc\Services\LoadDataService.cs:line 130

@frank-dong-ms-zz
Copy link
Contributor

Looks like previous error I got is not stable, sometimes I can run experiment without issue sometime I get previous error.
Looked at call stacks and error messages I can tell they both are complain about data is not in right format or no data at all, so @patricia-ikosoft looks like the exception you got is related to data loading process. Could you please try to mock some data instead of loading from file and then see if you can reproduce your exception?
Also, ML.NET has its own data loading functionality, you could reference below API: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.textloadersavercatalog.loadfromtextfile?view=ml-dotnet

@patricia-ikosoft
Copy link
Author

patricia-ikosoft commented Oct 21, 2020

I always reproduce the error, even if I tried to use LoadFromTextFile or LoadFromEnumerable (using a predefined data model, not dynamic, and parsing myself the values from file). It is systematic on my machine (Windows7). What I noticed on the other hand it is that, if I lower the time of the experiment (60 seconds for example), I no longer have the error.

If that helps, I could modify the code to use the LoadFromTextFile method with e predefined model, but on my machine I always got the same error, no matter what loading method I was using.

@mstfbl mstfbl removed the Awaiting User Input Awaiting author to supply further info (data, model, repro). Will close issue if no more info given. label Oct 21, 2020
@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft Thanks, please do create repro code sample using LoadFromTextFile, I will try to create a azure vm with window7 to see if I can repro your issue.

@patricia-ikosoft
Copy link
Author

patricia-ikosoft commented Oct 22, 2020

complete_code2.zip
I just uploaded you the repo. Thanks.

@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft Thanks, I did created an Azure vm with windows7 (Windows 7 Enterprise, this is the only available image on Azure for windows7) and still no repro on your issue.

I found one issue read through your code is DataModel class, when LoadColumn you should always starts from index 0 instead of 1, could you please fix that and see if the error you are seeing still exists?

@patricia-ikosoft
Copy link
Author

I modified the index, but I still have the error. Could it be a problem of culture ? I've seen this other issue, that seems to be identical:
#5162

@frank-dong-ms-zz
Copy link
Contributor

frank-dong-ms-zz commented Oct 23, 2020

Yeah, looks like it is same issue, could you try on en-US? @patricia-ikosoft
At the same time, I do found another issue with AutoML experiment that after about 70 round of experiment, FastTree will complaint about it input data and throw exception, I think we should either properly handle this exception or stop early in this case I'm working on an PR for this issue.

The error message and call stack of the issue I mentioned before:

All instances skipped due to missing features.

call stack:
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
at Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
at Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase3.ConvertData(RoleMappedData trainData) at Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context) at Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable1 previousRuns) at Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable1 previousRuns)
at Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable1 history, Boolean isMaximizingMetric) at Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable1 trainerAllowList) at Microsoft.ML.AutoML.Experiment2.Execute()
at Microsoft.ML.AutoML.ExperimentBase2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator1 preFeaturizer, IProgress1 progressHandler, IRunner1 runner)
at Microsoft.ML.AutoML.ExperimentBase2.ExecuteCrossValSummary(IDataView[] trainDatasets, ColumnInformation columnInfo, IDataView[] validationDatasets, IEstimator1 preFeaturizer, IProgress1 progressHandler) at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress1 progressHandler)
at Microsoft.ML.AutoML.ExperimentBase2.Execute(IDataView trainData, String labelColumnName, String samplingKeyColumn, IEstimator1 preFeaturizer, IProgress`1 progressHandler)
at MLPoc.Services.LoadDataService.RunExperiment(MLContext mlContext, IDataView data, DynamicTypeProperty labelProperty) in C:\code\complete_code\complete_code\MLPoc\Services\LoadDataService.cs:line 130

@patricia-ikosoft
Copy link
Author

How could I change the culture that is used internally by ML.Net ?

@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft just change the system culture to en-US

@patricia-ikosoft
Copy link
Author

patricia-ikosoft commented Oct 28, 2020

Changing the culture to en-US does not seem to solve the error.

@frank-dong-ms-zz
Copy link
Contributor

@patricia-ikosoft what is your original culture? did you restart machine after change culture?

@patricia-ikosoft
Copy link
Author

patricia-ikosoft commented Nov 3, 2020

Yes, it was French initially. Yes, I restarted the machine

@jakub-sedlacek
Copy link

Hello, I encountered same error and resolved it by changing system's decimal separator from ',' to '.' (my culture is cs-CZ). After restart everything worked ok.

@luisquintanilla
Copy link
Contributor

Thanks all for the comments and feedback. We've updated the implementation of AutoML, so previous API patterns are no longer applicable. Closing this issue for now. Please feel free to open a new issue if it continues to be a problem.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AutoML.NET Automating various steps of the machine learning process bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants