Skip to content

Creation of components through MLContext: advanced options and other feedback #1798

Closed
@TomFinley

Description

@TomFinley

In our estimators and other similar components often have advanced settings, because, sometimes people have unusual circumstances. At the same time, there is a 95% or 99% scenario for "simple" usage that most people will be happy with. For this reason we have often made a distinction between common and advanced settings, as we see here.

public static FastTreeRegressionTrainer FastTree(this RegressionContext.RegressionTrainers ctx,
string labelColumn = DefaultColumnNames.Label,
string featureColumn = DefaultColumnNames.Features,
string weights = null,
int numLeaves = Defaults.NumLeaves,
int numTrees = Defaults.NumTrees,
int minDatapointsInLeaves = Defaults.MinDocumentsInLeaves,
double learningRate = Defaults.LearningRates,
Action<FastTreeRegressionTrainer.Arguments> advancedSettings = null)
{

There are some possible things that excited feedback:

  1. Echoing feedback seen in Rename mlContext.Data.TextReader() to mlContext.Data.CreateTextLoader() #1690, these things where we're making something should have the prefix Create, even in situations where this a catalog where we are always creating. Note: Create preferred to Make.

  2. The worth of ASP.NET style configuration was questioned (seen above as Action<FastTreeRegressionTrainer.Arguments> advancedSettings), e.g., there may not be much purpose in having a delegate. The older style where it just takes the Arguments period was preferred.

  3. Having this Arguments object as a nested class the component being created was viewed as positive, but this would be more idiomatically called Options -- Arguments was a holdover name from when these were exclusively for command line arguments, but for the API this is not a great name. So while keeping the general structure of how they are placed currently, they should probably be renamed to Options.

  4. It is good to have the convenience for the simple arguments, however, if we have both simple and advanced settings, we should not mix them but have instead two distinct constructors/extension methods. (E.g., in the above, we would have two methods, one that took the advanced options.) To do otherwise is to invite confusion about which "wins" if we have the setting set in both.

    • Note that phase setting "set in both," which suggests that these settings object should retain the "simpler" settings in them. This reinforces feedback elsewhere as seen here.
  5. If the simple arguments are totally sufficient, then there is no need to expose this Arguments class in hte public API. (For practical reasons relating to command line and entry-point usage, we still need to always have these Arguments objects, but if they serve no purpose for the API the class can simply be made internal.)

/cc @KrzysztofCwalina, @terrajobst , on whose feedback this list is primarily based, and who can correct me and provide clarification in case I misspoke.

Metadata

Metadata

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions