Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML Add Recommendation Task #4246

Merged
merged 33 commits into from
Oct 17, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6f3d26c
[AutoML] Pull out Code Gen as separate library plus some changes in C…
LittleLittleCloud Aug 25, 2019
7e0f6d0
pack codegen into mlnet
LittleLittleCloud Sep 5, 2019
22edabb
pack codegen into mlnet (#4179)
LittleLittleCloud Sep 9, 2019
50e0dcd
Merge branch 'features/automl' of https://github.com/dotnet/machinele…
LittleLittleCloud Sep 23, 2019
09c56f7
add MatrixFactorization Trainer
LittleLittleCloud Sep 23, 2019
15c58f1
add RecommendationExperiment and other functions
LittleLittleCloud Sep 23, 2019
ac57d9a
some refactor in MatrixFactorization, plus fix small bugs
LittleLittleCloud Sep 24, 2019
c07948f
add LabelFeautre ColumnPurpose and some update
LittleLittleCloud Sep 25, 2019
f182a20
Merge branch 'u/xiaoyun/recommendation'
LittleLittleCloud Sep 25, 2019
9695ffe
add missing Native dll
LittleLittleCloud Sep 25, 2019
b54de14
remove mlnet project
LittleLittleCloud Sep 25, 2019
913b4af
update based on comment
LittleLittleCloud Sep 25, 2019
3fc520c
update example
LittleLittleCloud Sep 26, 2019
2f47c02
Merge branch 'master' into u/xiaoyun/recommendation
maryamariyan Sep 26, 2019
c78efbf
nit: code style
maryamariyan Sep 26, 2019
5864b78
- Rename RecommendationExperimentScenario.MF to RecommendationExperim…
maryamariyan Sep 26, 2019
4010d90
nit: code style/ add space between if and (
maryamariyan Sep 26, 2019
fef926e
Fix compile error
maryamariyan Sep 26, 2019
9c4852c
minor fixes
maryamariyan Oct 7, 2019
74cbc5c
First stage changes
maryamariyan Oct 14, 2019
7e7c272
change signature for ITrainerEstimator
maryamariyan Oct 15, 2019
17500cf
Adding tests, checking code coverage
maryamariyan Oct 16, 2019
b882ee1
cleanup + improve SweepParams, taken from MatrixFactorizationTrainer
maryamariyan Oct 16, 2019
d7a272d
Address PR feedback - part1
maryamariyan Oct 16, 2019
b69d9c3
Apply PR feedbacks - Part 2
maryamariyan Oct 16, 2019
f9c6abb
Update test to reflect change made to sweep params
maryamariyan Oct 16, 2019
7d856c8
Apply PR feedbacks: Part 3
maryamariyan Oct 16, 2019
7852c5e
Adds more sweepable params and test
maryamariyan Oct 16, 2019
f889fa5
Rename to UserId/ItemId
maryamariyan Oct 16, 2019
2ec0649
Rename User/Item ID: part 2
maryamariyan Oct 16, 2019
c39ae94
- Removing SamplingKey for first iteration
maryamariyan Oct 16, 2019
7186280
Apply review comments
maryamariyan Oct 17, 2019
d3d6b4a
Minor rename
maryamariyan Oct 17, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/samples/Microsoft.ML.AutoML.Samples/DataStructures/Movie.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.ML.Data;

namespace Microsoft.ML.AutoML.Samples.DataStructures
{
public class Movie
{
[LoadColumn(0)]
public string userId;

[LoadColumn(1)]
public string movieId;

[LoadColumn(2)]
public float rating;

[LoadColumn(3)]
public float timestamp;
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<Project Sdk="Microsoft.NET.Sdk">
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
Expand All @@ -7,6 +7,7 @@

<ItemGroup>
<ProjectReference Include="..\..\..\src\Microsoft.ML.AutoML\Microsoft.ML.AutoML.csproj" />
<NativeAssemblyReference Include="MatrixFactorizationNative" />
</ItemGroup>

</Project>
3 changes: 3 additions & 0 deletions docs/samples/Microsoft.ML.AutoML.Samples/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ public static void Main(string[] args)
{
try
{
RecommendationExperiment.Run();
Console.Clear();

RegressionExperiment.Run();
Console.Clear();

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
using System;
using System.IO;
using System.Linq;
using Microsoft.ML.AutoML;
using Microsoft.ML.AutoML.Samples.DataStructures;
using Microsoft.ML.Data;

namespace Microsoft.ML.AutoML.Samples
{
public static class RecommendationExperiment
{
private static string TrainDataPath = @"C:\Users\xiaoyuz\Desktop\machinelearning-samples\datasets\recommendation-ratings-train.csv";
private static string TestDataPath = @"C:\Users\xiaoyuz\Desktop\machinelearning-samples\datasets\recommendation-ratings-test.csv";
private static string ModelPath = @"C:\Users\xiaoyuz\source\test\recommendation.zip";
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
private static string LabelColumnName = "rating";
private static uint ExperimentTime = 60;

public static void Run()
{
MLContext mlContext = new MLContext();

// STEP 1: Load data
IDataView trainDataView = mlContext.Data.LoadFromTextFile<Movie>(TrainDataPath, hasHeader: true, separatorChar: ',');
IDataView testDataView = mlContext.Data.LoadFromTextFile<Movie>(TestDataPath, hasHeader: true, separatorChar: ',');

var settings = new RecommendationExperimentSettings(RecommendationExperimentScenario.MF, "userId", "movieId");
var inputColumnInformation = new ColumnInformation();
inputColumnInformation.LabelCategoricalColumnNames.Add("movieId");
inputColumnInformation.LabelCategoricalColumnNames.Add("userId");
inputColumnInformation.LabelColumnName = "rating";

// STEP 2: Run AutoML experiment
Console.WriteLine($"Running AutoML regression experiment for {ExperimentTime} seconds...");
ExperimentResult<RegressionMetrics> experimentResult = mlContext.Auto()
.CreateRecommendationExperiment(settings)
.Execute(trainDataView, inputColumnInformation);

// STEP 3: Print metric from best model
RunDetail<RegressionMetrics> bestRun = experimentResult.BestRun;
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
Console.WriteLine($"Total models produced: {experimentResult.RunDetails.Count()}");
Console.WriteLine($"Best model's trainer: {bestRun.TrainerName}");
Console.WriteLine($"Metrics of best model from validation data --");
PrintMetrics(bestRun.ValidationMetrics);

// STEP 5: Evaluate test data
IDataView testDataViewWithBestScore = bestRun.Model.Transform(testDataView);
RegressionMetrics testMetrics = mlContext.Regression.Evaluate(testDataViewWithBestScore, labelColumnName: LabelColumnName);
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
Console.WriteLine($"Metrics of best model on test data --");
PrintMetrics(testMetrics);

// STEP 6: Save the best model for later deployment and inferencing
using (FileStream fs = File.Create(ModelPath))
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
mlContext.Model.Save(bestRun.Model, trainDataView.Schema, fs);

// STEP 7: Create prediction engine from the best trained model
var predictionEngine = mlContext.Model.CreatePredictionEngine<Movie, TaxiTripFarePrediction>(bestRun.Model);

// STEP 8: Initialize a new test taxi trip, and get the predicted fare
var testTaxiTrip = new Movie
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
userId="1",
movieId = "1097",
};
var prediction = predictionEngine.Predict(testTaxiTrip);
Console.WriteLine($"Predicted fare for test taxi trip: {prediction.FareAmount}");

Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}

private static void PrintMetrics(RegressionMetrics metrics)
{
Console.WriteLine($"MeanAbsoluteError: {metrics.MeanAbsoluteError}");
Console.WriteLine($"MeanSquaredError: {metrics.MeanSquaredError}");
Console.WriteLine($"RootMeanSquaredError: {metrics.RootMeanSquaredError}");
Console.WriteLine($"RSquared: {metrics.RSquared}");
}
}
}
11 changes: 11 additions & 0 deletions src/Microsoft.ML.AutoML/API/AutoCatalog.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections.Generic;
using Microsoft.ML.Data;

namespace Microsoft.ML.AutoML
Expand All @@ -13,6 +14,11 @@ public sealed class AutoCatalog
{
private readonly MLContext _context;

/// <summary>
/// save some intermediate value
/// </summary>
public static Dictionary<string, object> ValuePairs { get; set; } = new Dictionary<string, object>();
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved

internal AutoCatalog(MLContext context)
{
_context = context;
Expand Down Expand Up @@ -123,6 +129,11 @@ public MulticlassClassificationExperiment CreateMulticlassClassificationExperime
return new MulticlassClassificationExperiment(_context, experimentSettings);
}

public RecommendationExperiment CreateRecommendationExperiment(RecommendationExperimentSettings experimentSettings)
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
return new RecommendationExperiment(_context, experimentSettings);
}

/// <summary>
/// Infers information about the columns of a dataset in a file located at <paramref name="path"/>.
/// </summary>
Expand Down
11 changes: 11 additions & 0 deletions src/Microsoft.ML.AutoML/API/ColumnInference.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using Microsoft.ML.Data;
Expand Down Expand Up @@ -76,6 +77,15 @@ public sealed class ColumnInformation
/// </remarks>
public ICollection<string> CategoricalColumnNames { get; }

/// <summary>
/// The dataset columns that are LabelFeature.
/// </summary>
/// <remarks>
/// It's quite similar with categorical feature, but it require ValueToKey converter instead of OneHotEncoding.
/// This column purpose can only be pre-set in this place, instead of being infered.
/// </remarks>
public ICollection<string> LabelCategoricalColumnNames { get; }

/// <summary>
/// The dataset columns that are numeric.
/// </summary>
Expand All @@ -98,6 +108,7 @@ public ColumnInformation()
{
LabelColumnName = DefaultColumnNames.Label;
CategoricalColumnNames = new Collection<string>();
LabelCategoricalColumnNames = new Collection<string>();
NumericColumnNames = new Collection<string>();
TextColumnNames = new Collection<string>();
IgnoredColumnNames = new Collection<string>();
Expand Down
74 changes: 74 additions & 0 deletions src/Microsoft.ML.AutoML/API/RecommendationExperiment.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.AutoML
{
public enum RecommendationExperimentScenario
{
MF,
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
}

public sealed class RecommendationExperimentSettings : ExperimentSettings
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
public RecommendationExperimentScenario Scenerio { get; set; }
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved

public string MatrixColumnIndexColumnName { get; set; }

public string MatrixRowIndexColumnName { get; set; }

// We can use RegressionMetric as evaluation Metric
public RegressionMetric OptimizingMetric { get; set; }

public ICollection<RecommendationTrainer> Trainers { get; }

public RecommendationExperimentSettings(RecommendationExperimentScenario scenario, string columnIndexName, string rowIndexName)
: this()
{
if(scenario == RecommendationExperimentScenario.MF)
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
AutoCatalog.ValuePairs[nameof(MatrixFactorizationTrainer.Options.MatrixColumnIndexColumnName)] = columnIndexName;
AutoCatalog.ValuePairs[nameof(MatrixFactorizationTrainer.Options.MatrixRowIndexColumnName)] = rowIndexName;
return;
}
throw new NotImplementedException();
}

private RecommendationExperimentSettings()
{
OptimizingMetric = RegressionMetric.RSquared;
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
Trainers = Enum.GetValues(typeof(RecommendationTrainer)).OfType<RecommendationTrainer>().ToList();
}
}

public enum RecommendationTrainer
{
MatrixFactorization,
}

public sealed class RecommendationExperiment : ExperimentBase<RegressionMetrics, RecommendationExperimentSettings>
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
internal RecommendationExperiment(MLContext context, RecommendationExperimentSettings settings)
: base(context,
new RegressionMetricsAgent(context, settings.OptimizingMetric),
new OptimizingMetricInfo(settings.OptimizingMetric),
settings,
TaskKind.Recommendation,
TrainerExtensionUtil.GetTrainerNames(settings.Trainers))
{
}
private protected override CrossValidationRunDetail<RegressionMetrics> GetBestCrossValRun(IEnumerable<CrossValidationRunDetail<RegressionMetrics>> results)
{
return BestResultUtil.GetBestRun(results, MetricsAgent, OptimizingMetricInfo.IsMaximizing);
}

private protected override RunDetail<RegressionMetrics> GetBestRun(IEnumerable<RunDetail<RegressionMetrics>> results)
{
return BestResultUtil.GetBestRun(results, MetricsAgent, OptimizingMetricInfo.IsMaximizing);
}
}
}
2 changes: 2 additions & 0 deletions src/Microsoft.ML.AutoML/Assembly.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@
[assembly: InternalsVisibleTo("Benchmark, PublicKey=00240000048000009400000006020000002400005253413100040000010001004b86c4cb78549b34bab61a3b1800e23bfeb5b3ec390074041536a7e3cbd97f5f04cf0f857155a8928eaa29ebfd11cfbbad3ba70efea7bda3226c6a8d370a4cd303f714486b6ebc225985a638471e6ef571cc92a4613c00b8fa65d61ccee0cbe5f36330c9a01f4183559f1bef24cc2917c6d913e3a541333a1d05d9bed22b38cb")]

[assembly: WantsToBeBestFriends]
[assembly: InternalsVisibleTo("Microsoft.ML.CodeGenerator, PublicKey=00240000048000009400000006020000002400005253413100040000010001004b86c4cb78549b34bab61a3b1800e23bfeb5b3ec390074041536a7e3cbd97f5f04cf0f857155a8928eaa29ebfd11cfbbad3ba70efea7bda3226c6a8d370a4cd303f714486b6ebc225985a638471e6ef571cc92a4613c00b8fa65d61ccee0cbe5f36330c9a01f4183559f1bef24cc2917c6d913e3a541333a1d05d9bed22b38cb")]
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
[assembly: InternalsVisibleTo("Microsoft.ML.ModelBuilder, PublicKey=002400000480000094000000060200000024000052534131000400000100010007d1fa57c4aed9f0a32e84aa0faefd0de9e8fd6aec8f87fb03766c834c99921eb23be79ad9d5dcc1dd9ad236132102900b723cf980957fc4e177108fc607774f29e8320e92ea05ece4e821c0a5efe8f1645c4c0c93c1ab99285d622caa652c1dfad63d745d6f2de5f17e5eaf0fc4963d261c8a12436518206dc093344d5ad293")]
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,11 @@ internal static class ColumnInformationUtil
return ColumnPurpose.Ignore;
}

if (columnInfo.LabelCategoricalColumnNames.Contains(columnName))
{
return ColumnPurpose.LabelFeature;
}

return null;
}

Expand Down Expand Up @@ -76,6 +81,9 @@ internal static ColumnInformation BuildColumnInfo(IEnumerable<(string name, Colu
case ColumnPurpose.NumericFeature:
columnInfo.NumericColumnNames.Add(column.name);
break;
case ColumnPurpose.LabelFeature:
columnInfo.LabelCategoricalColumnNames.Add(column.name);
break;
case ColumnPurpose.TextFeature:
columnInfo.TextColumnNames.Add(column.name);
break;
Expand Down Expand Up @@ -104,6 +112,7 @@ public static IEnumerable<string> GetColumnNames(ColumnInformation columnInforma
AddStringsToListIfNotNull(columnNames, columnInformation.IgnoredColumnNames);
AddStringsToListIfNotNull(columnNames, columnInformation.NumericColumnNames);
AddStringsToListIfNotNull(columnNames, columnInformation.TextColumnNames);
AddStringsToListIfNotNull(columnNames, columnInformation.LabelCategoricalColumnNames);
return columnNames;
}

Expand Down
3 changes: 2 additions & 1 deletion src/Microsoft.ML.AutoML/ColumnInference/ColumnPurpose.cs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ internal enum ColumnPurpose
TextFeature = 4,
Weight = 5,
ImagePath = 6,
SamplingKey = 7
SamplingKey = 7,
LabelFeature, // CategoricalFeature that requires ValueToKey converter, better naming?
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to use HashToKey (name be off) instead of the mentioned ValueToKey as the ValueToKey will map future unseen values to NA in your test dataset; and as a lesser issue is slow by taking a full pass of the dataset.

}
}
1 change: 1 addition & 0 deletions src/Microsoft.ML.AutoML/Microsoft.ML.AutoML.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
<ProjectReference Include="..\Microsoft.ML.CpuMath\Microsoft.ML.CpuMath.csproj" />
<ProjectReference Include="..\Microsoft.ML.LightGbm\Microsoft.ML.LightGbm.csproj" />
<ProjectReference Include="..\Microsoft.ML.Mkl.Components\Microsoft.ML.Mkl.Components.csproj" />
<ProjectReference Include="..\Microsoft.ML.Recommender\Microsoft.ML.Recommender.csproj" />
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" />
</ItemGroup>

Expand Down
1 change: 1 addition & 0 deletions src/Microsoft.ML.AutoML/TaskKind.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ internal enum TaskKind
BinaryClassification,
MulticlassClassification,
Regression,
Recommendation,
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
using Microsoft.ML.Trainers.Recommender;

namespace Microsoft.ML.AutoML
{
using ITrainerEsitmator = ITrainerEstimator<ISingleFeaturePredictionTransformer<object>, object>;

internal class MatrixFactorizationExtension : ITrainerExtension
{
public ITrainerEsitmator CreateInstance(MLContext mlContext, IEnumerable<SweepableParam> sweepParams, ColumnInformation columnInfo)
{
// TODO
// MatrixFactorizationTrainer.Options should inheriat from ABC TrainerInputBaseWithGroupId
var options = TrainerExtensionUtil.CreateOptions<MatrixFactorizationTrainer.Options>(sweepParams, columnInfo.LabelColumnName);
options.MatrixColumnIndexColumnName = (string)AutoCatalog.ValuePairs[nameof(options.MatrixColumnIndexColumnName)];
options.MatrixRowIndexColumnName = (string)AutoCatalog.ValuePairs[nameof(options.MatrixRowIndexColumnName)];
return mlContext.Recommendation().Trainers.MatrixFactorization(options);
}

public PipelineNode CreatePipelineNode(IEnumerable<SweepableParam> sweepParams, ColumnInformation columnInfo)
{
var property = new Dictionary<string, object>();
property.Add(nameof(MatrixFactorizationTrainer.Options.MatrixColumnIndexColumnName), AutoCatalog.ValuePairs[nameof(MatrixFactorizationTrainer.Options.MatrixColumnIndexColumnName)]);
property.Add(nameof(MatrixFactorizationTrainer.Options.MatrixRowIndexColumnName), AutoCatalog.ValuePairs[nameof(MatrixFactorizationTrainer.Options.MatrixRowIndexColumnName)]);
return TrainerExtensionUtil.BuildPipelineNode(TrainerExtensionCatalog.GetTrainerName(this), sweepParams, columnInfo.LabelColumnName, additionalProperties:property);
}

public IEnumerable<SweepableParam> GetHyperparamSweepRanges()
{
return SweepableParams.BuildMatrixFactorizationParmas();
}
}
}
11 changes: 11 additions & 0 deletions src/Microsoft.ML.AutoML/TrainerExtensions/SweepableParams.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.AutoML
{
Expand Down Expand Up @@ -115,6 +117,15 @@ public static IEnumerable<SweepableParam> BuildLightGbmParams()
};
}

public static IEnumerable<SweepableParam> BuildMatrixFactorizationParmas()
{
return new SweepableParam[]
maryamariyan marked this conversation as resolved.
Show resolved Hide resolved
{
new SweepableDiscreteParam(nameof(MatrixFactorizationTrainer.Options.NumberOfIterations), new object[] { 10, 20 }),
new SweepableFloatParam(nameof(MatrixFactorizationTrainer.Options.LearningRate), 0.025f, 0.04f, numSteps:2),
new SweepableDiscreteParam(nameof(MatrixFactorizationTrainer.Options.ApproximationRank), new object[] { 10, 20 }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy sweep ranges & hyperparameters from:

/// <summary>
/// Regularization parameter.
/// </summary>
/// <remarks>
/// It's the weight of factor matrices Frobenius norms in the objective function minimized by matrix factorization's algorithm. A small value could cause over-fitting.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Regularization parameter. " +
"It's the weight of factor matrices Frobenius norms in the objective function minimized by matrix factorization's algorithm. " +
"A small value could cause over-fitting.")]
[TGUI(SuggestedSweeps = "0.01,0.05,0.1,0.5,1")]
[TlcModule.SweepableDiscreteParam("Lambda", new object[] { 0.01f, 0.05f, 0.1f, 0.5f, 1f })]
public double Lambda = Defaults.Lambda;
/// <summary>
/// Rank of approximation matrices.
/// </summary>
/// <remarks>
/// If input data has size of m-by-n we would build two approximation matrices m-by-k and k-by-n where k is approximation rank.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Latent space dimension (denoted by k). If the factorized matrix is m-by-n, " +
"two factor matrices found by matrix factorization are m-by-k and k-by-n, respectively. " +
"This value is also known as the rank of matrix factorization because k is generally much smaller than m and n.", ShortName = "K")]
[TGUI(SuggestedSweeps = "8,16,64,128")]
[TlcModule.SweepableDiscreteParam("K", new object[] { 8, 16, 64, 128 })]
public int ApproximationRank = Defaults.ApproximationRank;
/// <summary>
/// Number of training iterations.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Training iterations; that is, the times that the training algorithm iterates through the whole training data once.", ShortName = "iter,numiterations")]
[TGUI(SuggestedSweeps = "10,20,40")]
[TlcModule.SweepableDiscreteParam("NumIterations", new object[] { 10, 20, 40 })]
public int NumberOfIterations = Defaults.NumIterations;
///<summary>
/// Initial learning rate. It specifies the speed of the training algorithm.
///</summary>
///<remarks>
/// Small value may increase the number of iterations needed to achieve a reasonable result.
/// Large value may lead to numerical difficulty such as a infinity value.
///</remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Initial learning rate. It specifies the speed of the training algorithm. " +
"Small value may increase the number of iterations needed to achieve a reasonable result. Large value may lead to numerical difficulty such as a infinity value.", ShortName = "Eta")]
[TGUI(SuggestedSweeps = "0.001,0.01,0.1")]
[TlcModule.SweepableDiscreteParam("Eta", new object[] { 0.001f, 0.01f, 0.1f })]
public double LearningRate = Defaults.LearningRate;
/// <summary>
/// Importance of unobserved entries' loss in one-class matrix factorization. Applicable if <see cref="LossFunction"/> set to <see cref="LossFunctionType.SquareLossOneClass"/>
/// </summary>
/// <remarks>
/// Importance of unobserved (i.e., negative) entries' loss in one-class matrix factorization.
/// In general, only a few of matrix entries (e.g., less than 1%) in the training are observed (i.e., positive).
/// To balance the contributions from unobserved and observed in the overall loss function, this parameter is
/// usually a small value so that the solver is able to find a factorization equally good to unobserved and observed
/// entries. If only 10000 observed entries present in a 200000-by-300000 training matrix, one can try Alpha = 10000 / (200000*300000 - 10000).
/// When most entries in the training matrix are observed, one can use Alpha >> 1; for example, if only 10000 in previous
/// matrix is not observed, one can try Alpha = (200000 * 300000 - 10000) / 10000. Consequently,
/// Alpha = (# of observed entries) / (# of unobserved entries) can make observed and unobserved entries equally important
/// in the minimized loss function. However, the best setting in machine learning is always data-dependent so user still needs to
/// try multiple values.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Importance of unobserved entries' loss in one-class matrix factorization.")]
[TGUI(SuggestedSweeps = "1,0.01,0.0001,0.000001")]
[TlcModule.SweepableDiscreteParam("Alpha", new object[] { 1f, 0.01f, 0.0001f, 0.000001f })]
public double Alpha = Defaults.Alpha;
/// <summary>
/// Desired negative entries value in one-class matrix factorization. Applicable if <see cref="LossFunction"/> set to <see cref="LossFunctionType.SquareLossOneClass"/>
/// </summary>
/// <remarks>
/// In one-class matrix factorization, all matrix values observed are one (which can be viewed as positive cases in binary classification)
/// while unobserved values (which can be viewed as negative cases in binary classification) need to be specified manually using this option.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Desired negative entries' value in one-class matrix factorization")]
[TGUI(SuggestedSweeps = "0.000001,0,0001,0.01")]
[TlcModule.SweepableDiscreteParam("C", new object[] { 0.000001f, 0.0001f, 0.01f })]
public double C = Defaults.C;

Ideally, in the future, we should access them directly from the learner instead of having our copy in AutoML.

Copy link
Contributor Author

@LittleLittleCloud LittleLittleCloud Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that automl will try 5 * 4 * 3 * 3 *... set of params on MatrixFactorization, will that cost too much time?

Copy link
Contributor

@justinormont justinormont Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr; SMAC focuses our search efforts, and we are bounded by our total runtime.


This is expected (and should be greater). The Bayesian hyperparameter optimization (SMAC) focuses on the useful areas of the search space.

For selected trainers, we first do 20 iterations of random sweeping to warm up the search space. Then SMAC uses the found results of those 20 iterations to predict which areas of the hyperparameter space is best to explore next. The choice of what to explore next is based on which areas areas are doing best and which areas are unexplored/uncertain.

Beyond SMAC we have additional hyperparameter optimization algos we should be using like KDO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your comment in the PR description:

figuring out how to do sweepParams for MatrixFactorization, the current one seems to have some bugs and it's never stop! (maybe SMACSweeper does not converge?)

Is AutoML not stopping at the specified timeout? If you set it to 60s of runtime, it should stop soon after this limit.

AutoML is designed to do round robin between three trainers culled from 8-11 depending on the task. Since this has only one trainer to choose from, perhaps the AutoML code needs updating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that I forget to set the experiment time so it is defaulted to 16400... My Bad

};
}
public static IEnumerable<SweepableParam> BuildLinearSvmParams()
{
return new SweepableParam[] {
Expand Down
Loading