Skip to content

Text loader v.s in-memory data structure in API reference samples #2726

@wschin

Description

@wschin

We often starts our trainer examples with text loader but recently I feel loading text into IDataView is not directly related to the actual training procedure. If we use

/// <summary>
/// Example with one binary label and 10 feature values.
/// </summary>
public class BinaryLabelFloatFeatureVectorSample
{
    public bool Label;

    [VectorType(_simpleBinaryClassSampleFeatureLength)]
    public float[] Features;
}

as our in-memory example, we can create more flexible examples like scikit-learn ones (where data matrix is float matrix) and make ML.NET's learning curve smoother (because users don't need to learn text loader, the loaded data, and trainer at the same time).

cc @shmoradims, @rogancarr, @sfilipi, @shauheen

#2780 shows a scikit-learn-style example for ML.NET. It is

  • Self-contained --- To understand it, user doesn't need to look another document or use Visual Studio to search for those used functions. Notice that we can't rely on Visual Studio because not everyone is using it (1st partry and 3rd party experiences should the same!).
  • End-to-end to C# developers --- because it trains a model over a C# List and get the prediction back as a C# List (The two ends are not IDataView so user doesn't need to learn IDataView to play with that API).
  • Independent to external packages --- We shouldn’t expect a user who needs doc knows SamplesUtils.
  • Production-friendly --- Doing prediction with C# data structure is included. That's how a trained model will be used in production.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationRelated to documentation of ML.NET

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions