Closed
Description
We need to add samples on how to use the new transformer, and estimators than reference those samples from the XML documentation so that in docs.microsoft.com users can copy/paste the sample and have a head-starts.
Mot of the tests that got added as part of the transformer work are a good start for creating a sample.
MLContext Catalogs
Catalog | Total APIs | Samples Owner | Samples Status / ETA |
---|---|---|---|
MLContext.Transforms (root) | 19 | Senja | Remaining: 4 overrides for the normalizer multicolumn examples |
MLContext.Transforms.Categorical | 2 | ZeeshanA | Done v1 |
MLContext.Transforms.Conversion | 6 | Senja | DoneV1 |
MLContext.Transforms.FeatureSelection | 4 | ZeeshanA | Done v1 |
MLContext.Transforms.TimeSeries | 4 | Senja | Done V1 |
MLContext.Transforms.Text | 29 | ZeeshanA | Done V1 |
MLContext.Data | 10 | Senja | DoneV1 |
MLContext.Model (root) | 4 | ZeeshanS | DoneV1 |
P0+P1 Public API (extension methods) per Catalog
MLContext.Transforms (root) | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
CopyColumns | 2 | Yes | 2 Can remove dependency on DatasetUtils. | Zeeshan |
Concatenate | 1 | Yes, needs improvement. | 1 - Can remove dependency on DatasetUtils. | Zeeshan |
DropColumns | 1 | Yes | 1 Can remove dependency on DatasetUtils. | Zeeshan |
SelectColumns | 2 | Yes, needs improvement. | 2 - Can remove dependency on DatasetUtils. | Zeeshan |
Normalize | 1 | Done. | 1 #3244 | Ivan |
CustomMapping | 1 | Yes, needs improvement. | Done-v1 #3275 | Artidoro |
IndicateMissingValues | 2 | Done-v1 #3275 | Artidoro | |
ReplaceMissingValues | 2 | Done-v1 #3275 | Artidoro | |
ConvertToGrayscale | 1 | Yes, needs fixes. Example not displaying. | 1 #3165 | Abhishek |
LoadImages | 1 | Yes, needs fixes. Example not displaying. | 1 #3165 | Abhishek |
ExtractPixels | 2 | Yes, needs fixes. Example not displaying. | 1 #3165 | Abhishek |
ResizeImages | 2 | Yes. Example not displaying. | 1 #3165 | Abhishek |
ConvertToImage | 2 | Yes. | 1 #3165 | Abhishek |
IidChangePointEstimator | 1 | 1- Done | Senja | |
IidSpikeEstimator | 1 | 1 - Done | Senja | |
SsaChangePointEstimator | 1 | 1 - Done | Senja | |
SsaSpikeEstimator | 1 | 1 - Done | Senja | |
ApplyOnnxModel | 3 | DoneV1 | #3349 | Gani |
DnnFeaturizeImage | 1 | Yes, needs improvement. | 1 - Done | Senja |
NormalizeGlobalContrast | 1 | Done | 0 #3232 | Ivan |
NormalizeLpNorm | 1 | Done. | 0 #3232 | Ivan |
ApproximatedKernelMap | 1 | Done | 0 #3232 | Ivan |
mlContext.Transforms. CalculateFeatureContribution | 1 | Yes, needs improvement | Rogan |
MLContext.Transforms.Categorical | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
OneHotEncoding | 2 | 2 #3179 | Abhishek | |
OneHotHashEncoding | 2 | 2 #3179 | Abhishek | |
MLContext.Transforms.Conversion | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
Hash | 2 | can't find the API | Done | Senja |
ConvertType | 2 | Yes, needs improvement. | Done | Senja |
MapKeyToValue | 2 | Yes, needs improvement. | Done | Senja |
MapKeyToVector | 2 | Yes, needs improvement. | Done | Senja |
MapValueToKey | 2 | Yes. | Done | Senja |
MapKeyToBinaryVector | 2 | Yes, needs improvement. | Done | Senja |
MLContext.Transforms.FeatureSelection | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
SelectFeaturesBasedOnMutualInformation | 2 | need a better example to show MI computation. something like this | 2 #3184 | Abhishek |
SelectFeaturesBasedOnCount | 2 | 2 #3184 | Abhishek | |
MLContext.Transforms.Text | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
FeaturizeText | 2 | #3120 | Zeeshan | |
TokenizeCharacters | 1 | #3123 | Zeeshan | |
NormalizeText | 1 | #3133 | Zeeshan | |
ExtractWordEmbeddings | 1 | #3142 | Zeeshan | |
TokenizeWords | 1 | #3156 | Zeeshan | |
ProduceNgrams | 3 | #3177 | Zeeshan | |
RemoveDefaultStopWords | 2 | #3156 | Zeeshan | |
RemoveStopWords | 2 | #3156 | Zeeshan | |
ProduceWordBags | 3 | #3183 | Zeeshan | |
ProduceHashedWordBags | 3 | #3183 | Zeeshan | |
ProduceHashedNgrams | 3 | #3177 | Zeeshan | |
LatentDirichletAllocation | 2 | #3191 | Zeeshan |
For the Data catalog, all API's documentations needs to be augmented with suggestions for when would one use this API.
MLContext.Data | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
LoadFromEnumerable | 1 | Done. | 1 - Done. | Senja |
CreateEnumerable | 2 | Done. The second overload of this API is a P4 scenario. the use case for that API would be: users has a model which has slot names preserved for the features, and when they load the models, they also get the schema out of the loaded model and pass that schema, together with the TRow type they want to load the data to this API. This API will then populate the Annotations (former metadata) for the feature column. | 1 | Senja |
BootstrapSample | 1 | Done. | 1 - Done. | Senja |
Cache | 1 | Done. | 1 - Done. | Senja |
FilterRowsByColumn | 1 | Done. | 1 - Done. | Senja |
FilterRowsByKeyColumnFraction | 1 | Done. | 1 - Done. | Senja |
FilterRowsByMissingValues | 1 | Done. | 1 - Done. | Senja |
ShuffleRows | 1 | Done. | 1 - Done. | Senja |
SkipRows | 1 | Done. | 1 - Done. | Senja |
TakeRows | 1 | Done. | 1 - Done. | Senja |
Other | Num Overloads | Documentation | Sample | API Owner |
---|---|---|---|---|
Permutation Feature Importance | 4 | Yes, but needs work | Yes, but needs work | Rogan |