Description
openedon Sep 26, 2018
System information
- OS version/distro: Windows 10
- .NET Version:
.NET Core SDK (reflecting any global.json):
Version: 2.1.402
Commit: 3599f217f4
Runtime Environment:
OS Name: Windows
OS Version: 10.0.17134
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.402\
Host (useful for support):
Version: 2.1.4
Commit: 85255dde3e
Issue
-
What did you do?
I updated this tutorial to the new API (see code below), and ran it. -
What happened?
I got the following output and warning:
Auto-tuning parameters: UseCat = False
Auto-tuning parameters: LearningRate = 0.2
Auto-tuning parameters: NumLeaves = 20
Auto-tuning parameters: MinDataPerLeaf = 5
Auto-tuning parameters: UseSoftmax = False
LightGBM objective=multiclassova
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Warning: There is no NA value for type 'I4'. The missing key value will be mapped to the default value of 'I4'
Predicted flower type is: 2
- What did you expect?
I did not expect any warning for using integers as labels. It is a warning from KeyToValueTransform. The same warning appears when using string labels instead of integer labels.
I think the reason for the warning is that KeyToValue might be lossy, since we do not support missing values for integers, and we warn the user every time integer labels are used. We currently map missing values to the default value of int which is 0. If we are using 0 as a label, which is a very reasonable thing to do with int labels, we would be mapping missing labels to an existing label.
We don't want this warning to be displayed every time, since integer labels are reasonable to have. A possible solution might be not to warn the user every time that integer labels are used, but instead only warn when missing integer labels are mapped to an existing label.
Code that I ran:
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.LightGBM;
using System;
namespace myApp
{
class Program
{
// STEP 1: Define your data structures
// IrisData is used to provide training data, and as
// input for prediction operations
// - First 4 properties are inputs/features used to predict the label
// - Label is what you are predicting, and is only set when training
public class IrisData
{
[Column("0")]
public float SepalLength;
[Column("1")]
public float SepalWidth;
[Column("2")]
public float PetalLength;
[Column("3")]
public float PetalWidth;
[Column("4")]
[ColumnName("Label")]
public int Label;
}
// IrisPrediction is the result returned from prediction operations
public class IrisPrediction
{
[ColumnName("PredictedLabel")]
public int PredictedLabels;
}
static TextLoader.Arguments GetIrisLoaderArgs()
{
return new TextLoader.Arguments()
{
Separator = "comma",
HasHeader = true,
Column = new[]
{
new TextLoader.Column("SepalLength", DataKind.R4, 0),
new TextLoader.Column("SepalWidth", DataKind.R4, 1),
new TextLoader.Column("PetalLength", DataKind.R4, 2),
new TextLoader.Column("PetalWidth", DataKind.R4, 3),
new TextLoader.Column("Label", DataKind.I4, 4)
}
};
}
static void Main(string[] args)
{
// STEP 2: Create a pipeline and load your data
//var pipeline = new LearningPipeline();
var env = new ConsoleEnvironment();
// If working in Visual Studio, make sure the 'Copy to Output Directory'
// property of iris-data.txt is set to 'Copy always'
string dataPath = "iris-data.txt";
var data = new TextLoader(env, GetIrisLoaderArgs()).Read(new MultiFileSource(dataPath));
// STEP 3: Transform your data
// Assign numeric values to text in the "Label" column, because only
// numbers can be processed during model training
var pipeline = new TermEstimator(env, "Label")
// Puts all features into a vector
.Append(new ConcatEstimator(env, "Features", new string[] { "SepalLength", "SepalWidth", "PetalLength", "PetalWidth" }))
// STEP 4: Add learner
// Add a learning algorithm to the pipeline.
// This is a classification scenario (What type of iris is this?)
.Append(new LightGbmMulticlassTrainer(env, "Label", "Features"))
// Convert the Label back into original text (after converting to number in step 3)
.Append(new KeyToValueEstimator(env, "PredictedLabel"));
// STEP 5: Train your model based on the data set
var model = pipeline.Fit(data);
var engine = model.MakePredictionFunction<IrisData, IrisPrediction>(env);
// STEP 6: Use your model to make a prediction
// You can change these numbers to test different predictions
var prediction = engine.Predict(new IrisData()
{
SepalLength = 3.3f,
SepalWidth = 1.6f,
PetalLength = 0.2f,
PetalWidth = 5.1f,
});
Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
Console.ReadLine();
}
}
}