Skip to content

Regression (price prediction) tutorial dataset is only formatted for US/UK culture #149

Closed

Description

System Information (please complete the following information):

  • Model Builder Version: July Release (latest mlnet CLI)
  • Visual Studio Version: Any
  • OS Culture: Finnish (suomi)

Describe the bug
Our price prediction (regression) dataset has numbers that are formatted for the US/UK culture. For US/UK English, the decimal separator is . character, while for other countries it's ,. Since the dataset has numbers like 17.5, the mlnet CLI (and AutoML) believes those columns to be strings, not numbers.

To Reproduce
Steps to reproduce the behavior:

  1. Change OS Regional format to Finnish (download the language pack called "suomi")
    Screenshot (6)
  2. Run Model Builder or mlnet CLI on the taxi fare dataset, using the price prediction (regression) task. Use fare_amount as the column to predict

Expected behavior
A model should be trained for the dataset.

Actual behavior
Training fails with the following error:

Exception occured while exploring pipelines:
Provided label column 'fare_amount' was of type String, but only type Single is allowed.

Additional context
In the past, this was not an issue. AutoML/CLI did not take into account the user's OS culture when reading the file. Now that it parses the file with the user's culture, it doesn't recognize the US formatted datasets as numeric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions