Skip to content

Adding a sample for LightGbm Ranking #2650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
b572614
Adding a sample for LightGbm Ranking
najeeb-kazmi Feb 20, 2019
a25763e
Textloader internalizations, documentation, and Arguments refactoring…
artidoro Feb 20, 2019
eb7c121
Address PR feedback from #2579 (#2611)
eerhardt Feb 20, 2019
3cdc716
typo in comment: shoudl changed to should (#2666)
elbruno Feb 20, 2019
bd64b88
Metadata utils internalization, migration of few useful methods (#2651)
TomFinley Feb 20, 2019
bd00c1e
Microsoft.ML.Transforms assembly lockdown (#2648)
artidoro Feb 20, 2019
b0a1701
Introduce SimpleColumnInfo class (#2641)
yaeldMS Feb 20, 2019
4d4cfbb
Internalize Microsoft.ML.Data Evaluators folder. (#2635)
codemzs Feb 20, 2019
30df61a
Internalization of OneToOne and ManyToOne Column classes (#2632)
codemzs Feb 20, 2019
a1b66ac
Hide much of Microsoft.ML.Model namespace. (#2649)
TomFinley Feb 20, 2019
96ec842
Microsoft.ML.Internal.Internallearn namespace requires certain intern…
codemzs Feb 20, 2019
fb6ce54
Fixing renmants of argument keyword in public API (#2636)
abgoswam Feb 20, 2019
eb60021
Reorder MatrixFactorizationTrainer parameters (#2561)
najeeb-kazmi Feb 21, 2019
ed7f706
VectorToImageTransform conversion to estimator/transformer (#2580)
yaeldMS Feb 21, 2019
9e8f100
Increase build timeout for code coverage CI. (#2647)
codemzs Feb 21, 2019
01a362b
Fix the build (#2682)
yaeldMS Feb 21, 2019
512493a
Adding functional tests for explainability (#2584)
rogancarr Feb 21, 2019
412e1f9
Stop using System.ComponentModel.Composition (#2569)
eerhardt Feb 21, 2019
ec418e4
Change Default Settings in TextLoader (#2630)
wschin Feb 22, 2019
b604b07
Mark EntryPoints classes and APIs as internal (#2674)
ganik Feb 22, 2019
eb959c3
Adding defaults for labelColumn and groupIdColumn to Ranking evaluato…
rogancarr Feb 22, 2019
44c3113
Removes the learning rate parameter from RandomForest as this paramet…
singlis Feb 23, 2019
f3d5d82
PR feedback + cleaning up namespaces in Microsoft.ML.Samples project
najeeb-kazmi Feb 23, 2019
ba14a9d
Adding a sample for LightGbm Ranking
najeeb-kazmi Feb 20, 2019
f20d7bf
PR feedback + cleaning up namespaces in Microsoft.ML.Samples project
najeeb-kazmi Feb 23, 2019
d862c3b
nit
najeeb-kazmi Feb 23, 2019
269619f
merge conflicts
najeeb-kazmi Feb 23, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion build/Dependencies.props
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
<SystemMemoryVersion>4.5.1</SystemMemoryVersion>
<SystemReflectionEmitLightweightPackageVersion>4.3.0</SystemReflectionEmitLightweightPackageVersion>
<SystemThreadingTasksDataflowPackageVersion>4.8.0</SystemThreadingTasksDataflowPackageVersion>
<SystemComponentModelCompositionVersion>4.5.0</SystemComponentModelCompositionVersion>
</PropertyGroup>

<!-- Other/Non-Core Product Dependencies -->
Expand Down
5 changes: 4 additions & 1 deletion build/ci/phase-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ phases:
_arch: ${{ parameters.architecture }}
_codeCoverage: ${{ parameters.codeCoverage }}
queue:
timeoutInMinutes: 45
${{ if eq(variables._codeCoverage, 'false') }}:
timeoutInMinutes: 30
${{ if eq(variables._codeCoverage, 'true') }}:
timeoutInMinutes: 60
parallel: 99
matrix:
${{ if eq(parameters.customMatrixes, '') }}:
Expand Down
39 changes: 20 additions & 19 deletions docs/code/MlNetCookBook.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,10 +219,10 @@ private class AdultData

// Read the data into a data view.
var trainData = mlContext.Data.ReadFromTextFile<AdultData>(trainDataPath,
// First line of the file is a header, not a data row.
hasHeader: true,
// Default separator is tab, but we need a semicolon.
separatorChar: ';'
separatorChar: ';',
// First line of the file is a header, not a data row.
hasHeader: true
);

```
Expand Down Expand Up @@ -328,7 +328,7 @@ In the file above, the last column (12th) is label that we predict, and all the
// First, we define the reader: specify the data columns and where to find them in the text file.
// Read the data into a data view. Remember though, readers are lazy, so the actual reading will happen when the data is accessed.
var trainData = mlContext.Data.ReadFromTextFile<AdultData>(dataPath,
// First line of the file is a header, not a data row.
// Default separator is tab, but the dataset has comma.
separatorChar: ','
);

Expand Down Expand Up @@ -372,7 +372,7 @@ Assuming the example above was used to train the model, here's how you calculate
```csharp
// Read the test dataset.
var testData = mlContext.Data.ReadFromTextFile<AdultData>(testDataPath,
// First line of the file is a header, not a data row.
// Default separator is tab, but the dataset has comma.
separatorChar: ','
);
// Calculate metrics of the model on the test data.
Expand Down Expand Up @@ -970,27 +970,27 @@ Please note that you need to make your `mapping` operation into a 'pure function
- It should not have side effects (we may call it arbitrarily at any time, or omit the call)

One important caveat is: if you want your custom transformation to be part of your saved model, you will need to provide a `contractName` for it.
At loading time, you will need to reconstruct the custom transformer and inject it into MLContext.
At loading time, you will need to register the custom transformer with the MLContext.

Here is a complete example that saves and loads a model with a custom mapping.
```csharp
/// <summary>
/// One class that contains all custom mappings that we need for our model.
/// One class that contains the custom mapping functionality that we need for our model.
///
/// It has a <see cref="CustomMappingFactoryAttributeAttribute"/> on it and
/// derives from <see cref="CustomMappingFactory{TSrc, TDst}"/>.
/// </summary>
public class CustomMappings
[CustomMappingFactoryAttribute(nameof(CustomMappings.IncomeMapping))]
public class CustomMappings : CustomMappingFactory<InputRow, OutputRow>
{
// This is the custom mapping. We now separate it into a method, so that we can use it both in training and in loading.
public static void IncomeMapping(InputRow input, OutputRow output) => output.Label = input.Income > 50000;

// MLContext is needed to create a new transformer. We are using 'Import' to have ML.NET populate
// this property.
[Import]
public MLContext MLContext { get; set; }

// We are exporting the custom transformer by the name 'IncomeMapping'.
[Export(nameof(IncomeMapping))]
public ITransformer MyCustomTransformer
=> MLContext.Transforms.CustomMappingTransformer<InputRow, OutputRow>(IncomeMapping, nameof(IncomeMapping));
// This factory method will be called when loading the model to get the mapping operation.
public override Action<InputRow, OutputRow> GetMapping()
{
return IncomeMapping;
}
}
```

Expand All @@ -1013,8 +1013,9 @@ using (var fs = File.Create(modelPath))

// Now pretend we are in a different process.

// Create a custom composition container for all our custom mapping actions.
newContext.CompositionContainer = new CompositionContainer(new TypeCatalog(typeof(CustomMappings)));
// Register the assembly that contains 'CustomMappings' with the ComponentCatalog
// so it can be found when loading the model.
newContext.ComponentCatalog.RegisterAssembly(typeof(CustomMappings).Assembly);

// Now we can load the model.
ITransformer loadedModel;
Expand Down
4 changes: 2 additions & 2 deletions docs/samples/Microsoft.ML.Samples/Dynamic/NgramExtraction.cs
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ public static void NgramTransform()
};
// Preview of the CharsUnigrams column obtained after processing the input.
VBuffer<ReadOnlyMemory<char>> slotNames = default;
transformedData_onechars.Schema["CharsUnigrams"].Metadata.GetValue(MetadataUtils.Kinds.SlotNames, ref slotNames);
transformedData_onechars.Schema["CharsUnigrams"].GetSlotNames(ref slotNames);
var charsOneGramColumn = transformedData_onechars.GetColumn<VBuffer<float>>(ml, "CharsUnigrams");
printHelper("CharsUnigrams", charsOneGramColumn, slotNames);

Expand All @@ -62,7 +62,7 @@ public static void NgramTransform()
// 'B' - 0 'e' - 6 's' - 3 't' - 6 '<?>' - 9 'g' - 2 'a' - 2 'm' - 2 'I' - 0 ''' - 0 'v' - 0 ...
// Preview of the CharsTwoGrams column obtained after processing the input.
var charsTwoGramColumn = transformedData_twochars.GetColumn<VBuffer<float>>(ml, "CharsTwograms");
transformedData_twochars.Schema["CharsTwograms"].Metadata.GetValue(MetadataUtils.Kinds.SlotNames, ref slotNames);
transformedData_twochars.Schema["CharsTwograms"].GetSlotNames(ref slotNames);
printHelper("CharsTwograms", charsTwoGramColumn, slotNames);

// CharsTwograms column obtained post-transformation.
Expand Down
2 changes: 1 addition & 1 deletion docs/samples/Microsoft.ML.Samples/Dynamic/Normalizer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ public static void Example()

// Composing a different pipeline if we wanted to normalize more than one column at a time.
// Using log scale as the normalization mode.
var multiColPipeline = ml.Transforms.Normalize(NormalizingEstimator.NormalizerMode.LogMeanVariance, new[] { ("LogInduced", "Induced"), ("LogSpontaneous", "Spontaneous") });
var multiColPipeline = ml.Transforms.Normalize(NormalizingEstimator.NormalizerMode.LogMeanVariance, new SimpleColumnInfo[] { ("LogInduced", "Induced"), ("LogSpontaneous", "Spontaneous") });
// The transformed data.
var multiColtransformer = multiColPipeline.Fit(trainData);
var multiColtransformedData = multiColtransformer.Transform(trainData);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ public static void Example()

// This is the dictionary to convert words into the integer indexes.
var lookupMap = mlContext.Data.ReadFromTextFile(Path.Combine(modelLocation, "imdb_word_index.csv"),
columns: new[]
columns: new[]
{
new TextLoader.Column("Words", DataKind.TX, 0),
new TextLoader.Column("Ids", DataKind.I4, 1),
},
separatorChar: ','
separatorChar: ','
);

// Load the TensorFlow model once.
Expand Down Expand Up @@ -70,7 +70,7 @@ public static void Example()
};

var engine = mlContext.Transforms.Text.TokenizeWords("TokenizedWords", "Sentiment_Text")
.Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new[] { ("VariableLenghtFeatures", "TokenizedWords") }))
.Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new SimpleColumnInfo[] { ("VariableLenghtFeatures", "TokenizedWords") }))
.Append(mlContext.Transforms.CustomMapping(ResizeFeaturesAction, "Resize"))
.Append(mlContext.Transforms.ScoreTensorFlowModel(modelInfo, new[] { "Prediction/Softmax" }, new[] { "Features" }))
.Append(mlContext.Transforms.CopyColumns(("Prediction", "Prediction/Softmax")))
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
using Microsoft.ML.Transforms.Categorical;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public class LightGbmBinaryClassification
public class LightGbm
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
Expand All @@ -17,25 +17,25 @@ public static void Example()
var split = mlContext.BinaryClassification.TrainTestSplit(dataview, testFraction: 0.1);

// Create the Estimator.
var pipeline = mlContext.BinaryClassification.Trainers.LightGbm("IsOver50K", "Features");
var pipeline = mlContext.BinaryClassification.Trainers.LightGbm();

// Fit this Pipeline to the Training Data.
var model = pipeline.Fit(split.TrainSet);

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(split.TestSet);

var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions, "IsOver50K");
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Output:
// Accuracy: 0.88
// AUC: 0.93
// F1 Score: 0.71
// Negative Precision: 0.90
// Negative Recall: 0.94
// Positive Precision: 0.76
// Positive Recall: 0.66
// Expected output:
// Accuracy: 0.88
// AUC: 0.93
// F1 Score: 0.71
// Negative Precision: 0.90
// Negative Recall: 0.94
// Positive Precision: 0.76
// Positive Recall: 0.66
}
}
}
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
using Microsoft.ML.LightGBM;
using Microsoft.ML.Transforms.Categorical;
using static Microsoft.ML.LightGBM.Options;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
class LightGbmBinaryClassificationWithOptions
class LightGbmWithOptions
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
Expand All @@ -22,8 +21,6 @@ public static void Example()
var pipeline = mlContext.BinaryClassification.Trainers.LightGbm(
new Options
{
LabelColumn = "IsOver50K",
FeatureColumn = "Features",
Booster = new GossBooster.Options
{
TopRate = 0.3,
Expand All @@ -37,17 +34,17 @@ public static void Example()
// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(split.TestSet);

var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions, "IsOver50K");
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Output:
// Accuracy: 0.88
// AUC: 0.93
// F1 Score: 0.71
// Negative Precision: 0.90
// Negative Recall: 0.94
// Positive Precision: 0.76
// Positive Recall: 0.67
// Expected output:
// Accuracy: 0.88
// AUC: 0.93
// F1 Score: 0.71
// Negative Precision: 0.90
// Negative Recall: 0.94
// Positive Precision: 0.76
// Positive Recall: 0.67
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class SDCALogisticRegression
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
using System.Linq;
using Microsoft.ML.Data;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class SDCASupportVectorMachine
{
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class SymbolicStochasticGradientDescent
{
Expand All @@ -24,15 +24,17 @@ public static void Example()

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(dataWithPredictions, "IsOver50K");
var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);
// Accuracy: 0.85
// AUC: 0.90
// F1 Score: 0.64
// Negative Precision: 0.88
// Negative Recall: 0.93
// Positive Precision: 0.72
// Positive Recall: 0.58

// Expected output:
// Accuracy: 0.85
// AUC: 0.90
// F1 Score: 0.64
// Negative Precision: 0.88
// Negative Recall: 0.93
// Positive Precision: 0.72
// Positive Recall: 0.58
}
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class SymbolicStochasticGradientDescentWithOptions
{
Expand All @@ -22,7 +22,6 @@ public static void Example()
var pipeline = mlContext.BinaryClassification.Trainers.SymbolicStochasticGradientDescent(
new ML.Trainers.HalLearners.SymSgdClassificationTrainer.Options()
{
LabelColumn = "IsOver50K",
LearningRate = 0.2f,
NumberOfIterations = 10,
NumberOfThreads = 1,
Expand All @@ -33,15 +32,17 @@ public static void Example()

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(dataWithPredictions, "IsOver50K");
var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);
// Accuracy: 0.84
// AUC: 0.88
// F1 Score: 0.60
// Negative Precision: 0.87
// Negative Recall: 0.93
// Positive Precision: 0.69
// Positive Recall: 0.53

// Expected output:
// Accuracy: 0.84
// AUC: 0.88
// F1 Score: 0.60
// Negative Precision: 0.87
// Negative Recall: 0.93
// Positive Precision: 0.69
// Positive Recall: 0.53
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
using Microsoft.ML.Data;
using Microsoft.ML.SamplesUtils;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
class LightGbmMulticlassClassification
class LightGbm
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
using Microsoft.ML.SamplesUtils;
using static Microsoft.ML.LightGBM.Options;

namespace Microsoft.ML.Samples.Dynamic
namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
class LightGbmMulticlassClassificationWithOptions
class LightGbmWithOptions
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
Expand Down
Loading