Description
openedon Jul 10, 2019
System Information (please complete the following information):
- Model Builder Version: July Release (latest mlnet CLI)
- Visual Studio Version: Any
- OS Culture: Finnish (suomi)
Describe the bug
Our price prediction (regression) dataset has numbers that are formatted for the US/UK culture. For US/UK English, the decimal separator is .
character, while for other countries it's ,
. Since the dataset has numbers like 17.5, the mlnet CLI (and AutoML) believes those columns to be strings, not numbers.
To Reproduce
Steps to reproduce the behavior:
- Change OS Regional format to Finnish (download the language pack called "suomi")
- Run Model Builder or mlnet CLI on the taxi fare dataset, using the price prediction (regression) task. Use fare_amount as the column to predict
Expected behavior
A model should be trained for the dataset.
Actual behavior
Training fails with the following error:
Exception occured while exploring pipelines:
Provided label column 'fare_amount' was of type String, but only type Single is allowed.
Additional context
In the past, this was not an issue. AutoML/CLI did not take into account the user's OS culture when reading the file. Now that it parses the file with the user's culture, it doesn't recognize the US formatted datasets as numeric.