Closed
Description
System information
- OS version/distro: all
- .NET Version (eg., dotnet --info): all
Issue
- What did you do?
I'm trying to port https://www.microsoft.com/net/learn/machine-learning-and-ai/get-started-with-ml-dotnet-tutorial to the “direct access” API.
public class IrisData
{
[Column("0")]
public float SepalLength;
[Column("1")]
public float SepalWidth;
[Column("2")]
public float PetalLength;
[Column("3")]
public float PetalWidth;
[Column("4")]
[ColumnName("Label")]
public string Label;
}
public class IrisPrediction
{
[ColumnName("PredictedLabel")]
[KeyType]
public uint PredictedLabels;
}
static void Main(string[] args)
{
using (var env = new TlcEnvironment(seed: 0))
{
string dataPath = "iris-data.txt";
var loader = new TextLoader(env, new TextLoader.Arguments()
{
HasHeader = false,
SeparatorChars = new char[] { ',' },
Column = new[] {
ScalarCol("SepalLength", 0),
ScalarCol("SepalWidth", 1),
ScalarCol("PetalLength", 2),
ScalarCol("PetalWidth", 3),
ScalarCol("Label", 4, DataKind.Text),
}
}, new MultiFileSource(dataPath));
IDataTransform trans = new TermTransform(env, loader, "Label");
trans = new ConcatTransform(env, trans, "Features",
"SepalLength", "SepalWidth", "PetalLength", "PetalWidth");
var trainer = new SdcaMultiClassTrainer(env, new SdcaMultiClassTrainer.Arguments());
var cached = new CacheDataView(env, trans, prefetch: null);
var trainRoles = new RoleMappedData(cached, label: "Label", feature: "Features");
var pred = trainer.Train(trainRoles);
// Score.
IDataView scoredData = ScoreUtils.GetScorer(pred, trainRoles, env, trainRoles.Schema);
// Do a simple prediction.
var engine = env.CreatePredictionEngine<IrisData, IrisPrediction>(scoredData);
var prediction = engine.Predict(new IrisData()
{
SepalLength = 3.3f,
SepalWidth = 1.6f,
PetalLength = 0.2f,
PetalWidth = 5.1f,
});
Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
}
}
- What happened?
Unhandled Exception: System.ArgumentOutOfRangeException: Feature column 'Features' not found
Parameter name: name
at Microsoft.ML.Runtime.Data.ColumnInfo.CreateFromName(ISchema schema, String name, String descName)
at Microsoft.ML.Runtime.Data.RoleMappedSchema.MapFromNames(ISchema schema, IEnumerable`1 roles, Boolean opt)
at Microsoft.ML.Runtime.Data.RoleMappedSchema..ctor(ISchema schema, IEnumerable`1 roles, Boolean opt)
at Microsoft.ML.Runtime.Data.PredictedLabelScorerBase.BindingsImpl.ApplyToSchema(ISchema input, ISchemaBindableMapper bindable, IHostEnvironment env)
at Microsoft.ML.Runtime.Data.PredictedLabelScorerBase..ctor(IHostEnvironment env, PredictedLabelScorerBase transform, IDataView newSource, String registrationName)
at Microsoft.ML.Runtime.Data.MultiClassClassifierScorer..ctor(IHostEnvironment env, MultiClassClassifierScorer transform, IDataView newSource)
at Microsoft.ML.Runtime.Data.MultiClassClassifierScorer.ApplyToData(IHostEnvironment env, IDataView newSource)
at Microsoft.ML.Runtime.Data.ApplyTransformUtils.ApplyTransformToData(IHostEnvironment env, IDataTransform transform, IDataView newSource)
at Microsoft.ML.Runtime.Data.ApplyTransformUtils.ApplyAllTransformsToData(IHostEnvironment env, IDataView chain, IDataView newSource, IDataView oldSource)
at Microsoft.ML.Runtime.Api.BatchPredictionEngine`2..ctor(IHostEnvironment env, IDataView dataPipeline, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at Microsoft.ML.Runtime.Api.PredictionEngine`2..ctor(IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at Microsoft.ML.Runtime.Api.ComponentCreation.CreatePredictionEngine[TSrc,TDst](IHostEnvironment env, IDataView dataPipe, Boolean ignoreMissingColumns, SchemaDefinition inputSchemaDefinition, SchemaDefinition outputSchemaDefinition)
at myApp.Program.Main(String[] args) in C:\Users\eerhardt\source\repos\MLNetCore30Test\Program.cs:line 182
- What did you expect?
I expected it to work.
Notes
The reason (AFAICT) is because of the CacheDataView usage. When PredictionEngine is trying to apply all the transforms:
It hits that CacheDataView, which isn’t an IDataTransform, and it escapes out. Thus, the only transform that gets applied is the Scorer transform, and not any of the transforms used before (like adding the “Features” column).
We work around this in the tests by serializing the IDV out and then reading it back in:
private IDataScorerTransform GetScorer(IHostEnvironment env, IDataView transforms, IPredictor pred, string testDataPath = null)
{
using (var ch = env.Start("Saving model"))
using (var memoryStream = new MemoryStream())
{
var trainRoles = new RoleMappedData(transforms, label: "Label", feature: "Features");
// Model cannot be saved with CacheDataView
TrainUtils.SaveModel(env, ch, memoryStream, pred, trainRoles);
memoryStream.Position = 0;
using (var rep = RepositoryReader.Open(memoryStream, ch))
{
IDataLoader testPipe = ModelFileUtils.LoadLoader(env, rep, new MultiFileSource(testDataPath), true);
RoleMappedData testRoles = new RoleMappedData(testPipe, label: "Label", feature: "Features");
return ScoreUtils.GetScorer(pred, testRoles, env, testRoles.Schema);
}
}
}
I would not expect a user to have to do this. Any thoughts on how to make this better?
I removed the CacheDataView from my pipeline, which makes the code work but the training got super slow. So that seems to be a non-starter.
/cc @TomFinley @Zruty0
Metadata
Metadata
Assignees
Labels
No labels