Skip to content

Warning messages when using types not supporting missing values as labels #1059

Closed

Description

System information

  • OS version/distro: Windows 10
  • .NET Version:
.NET Core SDK (reflecting any global.json):
 Version:   2.1.402
 Commit:    3599f217f4

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.17134
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\2.1.402\

Host (useful for support):
  Version: 2.1.4
  Commit:  85255dde3e

Issue

  • What did you do?
    I updated this tutorial to the new API (see code below), and ran it.

  • What happened?
    I got the following output and warning:

Auto-tuning parameters: UseCat = False
Auto-tuning parameters: LearningRate = 0.2
Auto-tuning parameters: NumLeaves = 20
Auto-tuning parameters: MinDataPerLeaf = 5
Auto-tuning parameters: UseSoftmax = False
LightGBM objective=multiclassova
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Predicted flower type is: 2
  • What did you expect?

I did not expect any warning for using integers as labels. It is a warning from KeyToValueTransform. The same warning appears when using string labels instead of integer labels.

I think the reason for the warning is that KeyToValue might be lossy, since we do not support missing values for integers, and we warn the user every time integer labels are used. We currently map missing values to the default value of int which is 0. If we are using 0 as a label, which is a very reasonable thing to do with int labels, we would be mapping missing labels to an existing label.

We don't want this warning to be displayed every time, since integer labels are reasonable to have. A possible solution might be not to warn the user every time that integer labels are used, but instead only warn when missing integer labels are mapped to an existing label.


Code that I ran:

using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.LightGBM;
using System;

namespace myApp
{
    class Program
    {
        // STEP 1: Define your data structures

        // IrisData is used to provide training data, and as 
        // input for prediction operations
        // - First 4 properties are inputs/features used to predict the label
        // - Label is what you are predicting, and is only set when training
        public class IrisData
        {
            [Column("0")]
            public float SepalLength;

            [Column("1")]
            public float SepalWidth;

            [Column("2")]
            public float PetalLength;

            [Column("3")]
            public float PetalWidth;

            [Column("4")]
            [ColumnName("Label")]
            public int Label;
        }

        // IrisPrediction is the result returned from prediction operations
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public int PredictedLabels;
        }

        static TextLoader.Arguments GetIrisLoaderArgs()
        {
            return new TextLoader.Arguments()
            {
                Separator = "comma",
                HasHeader = true,
                Column = new[]
                {
                    new TextLoader.Column("SepalLength", DataKind.R4, 0),
                    new TextLoader.Column("SepalWidth", DataKind.R4, 1),
                    new TextLoader.Column("PetalLength", DataKind.R4, 2),
                    new TextLoader.Column("PetalWidth", DataKind.R4, 3),
                    new TextLoader.Column("Label", DataKind.I4, 4)
                }
            };
        }

        static void Main(string[] args)
        {
            // STEP 2: Create a pipeline and load your data
            //var pipeline = new LearningPipeline();
            var env = new ConsoleEnvironment();

            // If working in Visual Studio, make sure the 'Copy to Output Directory' 
            // property of iris-data.txt is set to 'Copy always'
            string dataPath = "iris-data.txt";
            var data = new TextLoader(env, GetIrisLoaderArgs()).Read(new MultiFileSource(dataPath));

            // STEP 3: Transform your data
            // Assign numeric values to text in the "Label" column, because only
            // numbers can be processed during model training
            var pipeline = new TermEstimator(env, "Label")
                // Puts all features into a vector
                .Append(new ConcatEstimator(env, "Features", new string[] { "SepalLength", "SepalWidth", "PetalLength", "PetalWidth" }))
                // STEP 4: Add learner
                // Add a learning algorithm to the pipeline. 
                // This is a classification scenario (What type of iris is this?)
                .Append(new LightGbmMulticlassTrainer(env, "Label", "Features"))
                // Convert the Label back into original text (after converting to number in step 3)
                .Append(new KeyToValueEstimator(env, "PredictedLabel"));

            // STEP 5: Train your model based on the data set
            var model = pipeline.Fit(data);
            var engine = model.MakePredictionFunction<IrisData, IrisPrediction>(env);

            // STEP 6: Use your model to make a prediction
            // You can change these numbers to test different predictions
            var prediction = engine.Predict(new IrisData()
            {
                SepalLength = 3.3f,
                SepalWidth = 1.6f,
                PetalLength = 0.2f,
                PetalWidth = 5.1f,
            });

            Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
            Console.ReadLine();
        }
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions