Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using PlattCalibratorTransformer with custon name for Score Column #4700

Closed
antoniovs1029 opened this issue Jan 24, 2020 · 2 comments · Fixed by #5261
Closed

Using PlattCalibratorTransformer with custon name for Score Column #4700

antoniovs1029 opened this issue Jan 24, 2020 · 2 comments · Fixed by #5261
Assignees
Labels
bug Something isn't working loadsave Bugs related loading and saving data or models P1 Priority of the issue for triage purpose: Needs to be fixed soon.

Comments

@antoniovs1029
Copy link
Member

antoniovs1029 commented Jan 24, 2020

Issue

  • What did you do?
    I tried to create a model with a PlattCalibratorEstimator that uses a scoreColumnName with a name different from "Score" (as done through this API)

  • What happened?
    After fitting the estimator, and while trying to transform the input dataview, the following exception is thrown:
    System.InvalidOperationException: 'The data to calibrate contains no 'Score' column'

  • What did you expect?
    The model to work the same way as if I had used the name "Score" for my score column

Furthermore, I couldn't find any sample or test that actually used the optional parameter scoreColumnName of PlattCalibratorEstimator, or the other parameters (such as labelColumnName). So adding such tests might be also necessary (if my PR #4700 gets in, then fixing this issue in here would also require to add onnx tests to check that PlattCalibrator with custom scoreColumnName is saved correctly to onnx). Checking if this problem also occurs in the other CalibratorTransformers would also be relevant.

Notice that a simple workaround for this would be to copy the column that holds the score into a new column called Score, and specify Score as the scoreColumnName.

Source code / logs

In EXAMPLE 1 I show that it works if my score column is named "Score". But if I change the name, then it doesn't work.

using Microsoft.ML;

namespace Platt2
{
    public static class Platt2
    {

        class ModelInput
        {
            public bool Label { get; set; }
            public float Score { get; set; }
        }

        class ModelInput2
        {
            public bool Label { get; set; }
            public float ScoreX { get; set; }
        }

        public static void Main()
        {
            var mlContext = new MLContext(seed: 0);

            // EXAMPLE 1 - Works
            IDataView data = mlContext.Data.LoadFromEnumerable<ModelInput>(
                new ModelInput[]
                {
                                new ModelInput { Score = 10, Label = true },
                                new ModelInput { Score = 15, Label = false },
                }
            );

            var calibratorEstimator = mlContext.BinaryClassification.Calibrators
                .Platt();

            var calibratorTransformer = calibratorEstimator.Fit(data);
            var finalData = calibratorTransformer.Transform(data);
            var prev = finalData.Preview();


            // EXAMPLE 2 - Doesn't Work
            IDataView data2 = mlContext.Data.LoadFromEnumerable<ModelInput2>(
                new ModelInput2[]
                {
                                new ModelInput2 { ScoreX = 10, Label = true },
                                new ModelInput2 { ScoreX = 15, Label = false },
                }
            );

            calibratorEstimator = mlContext.BinaryClassification.Calibrators
                .Platt(scoreColumnName: "ScoreX");

            calibratorTransformer = calibratorEstimator.Fit(data2);
            finalData = calibratorTransformer.Transform(data2); // Throws exception
            prev = finalData.Preview();

        }

    }
}
@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Jan 24, 2020

The reason why the exception is thrown after training is because, it seems, that the PlattCalibratorEstimator actually trains correctly, but it's the CalibratorTransformer which has a bug.

This happens because when passing the desired scoreColumnName into the API, that name is actually used when training the calibrator, so that part is done correctly:

var calibrator = (TICalibrator)CalibratorUtils.TrainCalibrator(Host, ch,
_calibratorTrainer, input, LabelColumn.Name, ScoreColumn.Name, WeightColumn.Name);
return Create(Host, calibrator);

Problem is that when Creating the transformer, it doesn't get to know what was the scoreColumnName (i.e. that name isn't passed or stored in anyway by the transformer)

private protected abstract CalibratorTransformer<TICalibrator> Create(IHostEnvironment env, TICalibrator calibrator);

Then when creating a mapper out from the transformer, it actually doesn't know what the scoreColumnName was, and it's hardcoded to use the default name (which in this case is "Score")

_scoreColIndex = inputSchema.GetColumnOrNull(DefaultColumnNames.Score)?.Index ?? -1;
parent.Host.Check(_scoreColIndex > 0, "The data to calibrate contains no 'Score' column");

It's in that line of code that the exception gets thrown.

To fix this issue I think that it would be necessary to pass the scoreColumnName to the Create method that creates the transformer, and add a field in the transformer to hold and use to use it in the mapper. I am not sure if this should also be done for LabelColumnName and WeightColumnName but my guess is that it isn't necessary, since the mapper only needs the score column to work. And also these changes should also be checked for the other CalibratorTransformers (Isotonic, Naïve and Fixed).

@antoniovs1029 antoniovs1029 self-assigned this Jan 24, 2020
@antoniovs1029
Copy link
Member Author

Another minor bug I found in the PlattCalibratorTransformer is that even when using a score column named "Score", if the column is the first one in the schema, then the Transformer throws this exception when transforming the input data view:

System.InvalidOperationException: 'The data to calibrate contains no 'Score' column'

To reproduce it, take the sample code I left in the first post of this issue, and simply change the ModelInput class to be like this:

        class ModelInput
        {
            public float Score { get; set; } // If this is declared first, it becomes the first column in the Schema
            public bool Label { get; set; }
        }

By simply making that minor change (i.e. declaring Score before Label), then EXAMPLE 1 in my sample also throws the exception I've mentioned.

This happens because of the check in CalibratorTransformer's Mapper:

parent.Host.Check(_scoreColIndex > 0, "The data to calibrate contains no 'Score' column");

For some reason it was required the index to be bigger than 0, and since "0" refers to the first column in the Schema, it doesn't allow the Score column to be the first one, and the exception is thrown.

@antoniovs1029 antoniovs1029 added bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon. P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. and removed P1 Priority of the issue for triage purpose: Needs to be fixed soon. P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. labels Jan 24, 2020
@harishsk harishsk added onnx Exporting ONNX models or loading ONNX models loadsave Bugs related loading and saving data or models labels Apr 29, 2020
@antoniovs1029 antoniovs1029 removed the onnx Exporting ONNX models or loading ONNX models label Jun 2, 2020
@antoniovs1029 antoniovs1029 removed their assignment Jun 16, 2020
@mstfbl mstfbl self-assigned this Jun 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working loadsave Bugs related loading and saving data or models P1 Priority of the issue for triage purpose: Needs to be fixed soon.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants