Skip to content

Scores to Label mapping #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 25, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions src/Microsoft.ML.Core/Data/ITransformModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@ public interface ITransformModel
/// </summary>
ISchema InputSchema { get; }

/// <summary>
/// The resulting schema once applied to this model. The <see cref="InputSchema"/> might have
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resulting schema once applied to this model [](start = 12, length = 47)

That is not correct. Indeed the point of writing this documentation is to clarify to users that this will not necessarily be the schema when applied. :/ #Closed

/// columns that are not needed by this transform and these columns will be seen in the
/// <see cref="OutputSchema"/> produced by this transform.
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not correct. E.g., choose columns transform. #Closed

/// </summary>
ISchema OutputSchema { get; }

/// <summary>
/// Apply the transform(s) in the model to the given input data.
/// </summary>
Expand Down
12 changes: 8 additions & 4 deletions src/Microsoft.ML.Data/EntryPoints/TransformModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,14 @@ public sealed class TransformModel : ITransformModel
/// if transform model A needs column X and model B needs Y, that is NOT produced by A,
/// then trimming A's input schema would cause composition to fail.
/// </summary>
public ISchema InputSchema
{
get { return _schemaRoot; }
}
public ISchema InputSchema => _schemaRoot;

/// <summary>
/// The resulting schema once applied to this model. The <see cref="InputSchema"/> might have
/// columns that are not needed by this transform and these columns will be seen in the
/// <see cref="OutputSchema"/> produced by this transform.
/// </summary>
public ISchema OutputSchema => _chain.Schema;

Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember we now have C# 7.x's niceties available to us. #Closed

/// <summary>
/// Create a TransformModel containing the transforms from "result" back to "input".
Expand Down
34 changes: 34 additions & 0 deletions src/Microsoft.ML/PredictionModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.EntryPoints;
using Microsoft.ML.Runtime.Internal.Utilities;
using System;
using System.Collections.Generic;
using System.IO;
Expand All @@ -29,6 +30,39 @@ internal Runtime.EntryPoints.TransformModel PredictorModel
get { return _predictorModel; }
}

/// <summary>
/// Returns labels that correspond to indices of the score array in the case of
/// multi-class classification problem.
/// </summary>
/// <param name="mapping">Label to score mapping</param>
/// <param name="scoreColumnName">Name of the score column</param>
/// <returns></returns>
public bool TryGetScoreLabelMapping(out string[] mapping, string scoreColumnName = DefaultColumnNames.Score)
{
mapping = null;
ISchema schema = _predictorModel.OutputSchema;
int colIndex = -1;
if (!schema.TryGetColumnIndex(scoreColumnName, out colIndex))
return false;

int expectedLabelCount = schema.GetColumnType(colIndex).AsVector.ValueCount;
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsVector [](start = 68, length = 8)

AsVector can be null, in the case where the type is not vector. REplace AsVector.ValueCount with VectorSize. Below this you will need to return false if VectorSize is not a positive number. #Closed

if (!schema.HasSlotNames(colIndex, expectedLabelCount))
return false;

VBuffer<DvText> labels = default;
schema.GetMetadata(MetadataUtils.Kinds.SlotNames, colIndex, ref labels);
VBufferUtils.Densify(ref labels);
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VBufferUtils.Densify(ref labels); [](start = 12, length = 33)

Not Densify, DenseValues. There is no need for you to materialize a dense array in the sparse case -- DenseValues returns an IEnumerable<DvText> that you can process. #Closed


if (labels.Length != expectedLabelCount)
return false;

mapping = new string[labels.Length];
for (int index = 0; index < labels.Count; index++)
mapping[index] = labels.Values[index].ToString();
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase as enumeration over DenseValues here, in case it is not clear. #Closed


Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra lines #Resolved

return true;
}

/// <summary>
/// Read model from file asynchronously.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ public void TrainAndPredictIrisModelWithStringLabelTest()
pipeline.Add(new StochasticDualCoordinateAscentClassifier());

PredictionModel<IrisDataWithStringLabel, IrisPrediction> model = pipeline.Train<IrisDataWithStringLabel, IrisPrediction>();
string[] scoreLabels;
model.TryGetScoreLabelMapping(out scoreLabels);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.TryGetScoreLabelMapping(out scoreLabels); [](start = 12, length = 47)

Why I can't just define
public class IrisPrediction
{
[ColumnName("Score")]
public float[] PredictedScores;

        [ColumnName("OriginalLabels")]
        public string[] OriginalLabels;
    }

and fill it automatically if it's possible? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IrisPrediction is a per-row structure. This is metadata, which is a property of the schema itself.


In reply to: 190754719 [](ancestors = 190754719)


Assert.NotNull(scoreLabels);
Assert.Equal(3, scoreLabels.Length);
Assert.Equal("Iris-setosa", scoreLabels[0]);
Assert.Equal("Iris-versicolor", scoreLabels[1]);
Assert.Equal("Iris-virginica", scoreLabels[2]);

IrisPrediction prediction = model.Predict(new IrisDataWithStringLabel()
{
Expand Down