Skip to content

Enabling different settings for the TextLoader on train and test data might lead to incorrect metrics with no error #5

Closed
@GalOshri

Description

@GalOshri

The train and test data can be read with different TextLoaders, which makes it easy to introduce bugs. If there is a difference in the settings, the metrics are likely to be wrong but there are no errors.
Example:

string trainDataPath= "sentiment_data.tsv";
pipeline.Add(new TextLoader<SentimentData>(trainDataPath, useHeader: true));

// Later in the file, after completing the pipeline and training the model
string testDataPath = "sentiment_test.tsv";
var testData = new TextLoader<SentimentData>(testDataPath, userHeader: true, sep: ',');

var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);

Evaluating on the test data will result in incorrect metrics as the test file will be parsed incorrectly. However, the experiment runs successfully with no errors, so this might be difficult to detect.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions