Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing featurizers now in ML.Net #5209

Merged
merged 2 commits into from
Jun 10, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
881 changes: 0 additions & 881 deletions src/Microsoft.ML.Featurizers/CategoricalImputer.cs

This file was deleted.

1,673 changes: 0 additions & 1,673 deletions src/Microsoft.ML.Featurizers/RobustScaler.cs

This file was deleted.

1,579 changes: 0 additions & 1,579 deletions src/Microsoft.ML.Featurizers/ToStringTransformer.cs

This file was deleted.

3 changes: 0 additions & 3 deletions test/BaselineOutput/Common/EntryPoints/core_ep-list.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ Transforms.BinaryPredictionScoreColumnsRenamer For binary prediction, it renames
Transforms.BinNormalizer The values are assigned into equidensity bins and a value is mapped to its bin_number/number_of_bins. Microsoft.ML.Data.Normalize Bin Microsoft.ML.Transforms.NormalizeTransform+BinArguments Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.CategoricalHashOneHotVectorizer Converts the categorical value into an indicator array by hashing the value and using the hash as an index in the bag. If the input column is a vector, a single indicator bag is returned for it. Microsoft.ML.Transforms.Categorical CatTransformHash Microsoft.ML.Transforms.OneHotHashEncodingTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.CategoricalOneHotVectorizer Converts the categorical value into an indicator array by building a dictionary of categories based on the data and using the id in the dictionary as the index in the array. Microsoft.ML.Transforms.Categorical CatTransformDict Microsoft.ML.Transforms.OneHotEncodingTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.CategoryImputer Fills in missing values in a column based on the most frequent value Microsoft.ML.Featurizers.CategoryImputerEntrypoint ImputeToKey Microsoft.ML.Featurizers.CategoricalImputerEstimator+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.CharacterTokenizer Character-oriented tokenizer where text is considered a sequence of characters. Microsoft.ML.Transforms.Text.TextAnalytics CharTokenize Microsoft.ML.Transforms.Text.TokenizingByCharactersTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.ColumnConcatenator Concatenates one or more columns of the same item type. Microsoft.ML.EntryPoints.SchemaManipulation ConcatColumns Microsoft.ML.Data.ColumnConcatenatingTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.ColumnCopier Duplicates columns from the dataset Microsoft.ML.EntryPoints.SchemaManipulation CopyColumns Microsoft.ML.Transforms.ColumnCopyingTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Expand Down Expand Up @@ -124,7 +123,6 @@ Transforms.PcaCalculator PCA is a dimensionality-reduction transform which compu
Transforms.PermutationFeatureImportance Permutation Feature Importance (PFI) Microsoft.ML.Transforms.PermutationFeatureImportanceEntryPoints PermutationFeatureImportance Microsoft.ML.Transforms.PermutationFeatureImportanceArguments Microsoft.ML.Transforms.PermutationFeatureImportanceOutput
Transforms.PredictedLabelColumnOriginalValueConverter Transforms a predicted label column to its original values, unless it is of type bool. Microsoft.ML.EntryPoints.FeatureCombiner ConvertPredictedLabel Microsoft.ML.EntryPoints.FeatureCombiner+PredictedLabelInput Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.RandomNumberGenerator Adds a column with a generated number sequence. Microsoft.ML.Transforms.RandomNumberGenerator Generate Microsoft.ML.Transforms.GenerateNumberTransform+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.RobustScaler Removes the median and scales the data according to the quantile range. Microsoft.ML.Featurizers.RobustScalerEntrypoint RobustScaler Microsoft.ML.Featurizers.RobustScalerEstimator+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.RowRangeFilter Filters a dataview on a column of type Single, Double or Key (contiguous). Keeps the values that are in the specified min/max range. NaNs are always filtered out. If the input is a Key type, the min/max are considered percentages of the number of values. Microsoft.ML.EntryPoints.SelectRows FilterByRange Microsoft.ML.Transforms.RangeFilter+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.RowSkipAndTakeFilter Allows limiting input to a subset of rows at an optional offset. Can be used to implement data paging. Microsoft.ML.EntryPoints.SelectRows SkipAndTakeFilter Microsoft.ML.Transforms.SkipTakeFilter+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.RowSkipFilter Allows limiting input to a subset of rows by skipping a number of rows. Microsoft.ML.EntryPoints.SelectRows SkipFilter Microsoft.ML.Transforms.SkipTakeFilter+SkipOptions Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Expand All @@ -137,7 +135,6 @@ Transforms.TensorFlowScorer Transforms the data using the TensorFlow model. Micr
Transforms.TextFeaturizer A transform that turns a collection of text documents into numerical feature vectors. The feature vectors are normalized counts of (word and/or character) n-grams in a given tokenized text. Microsoft.ML.Transforms.Text.TextAnalytics TextTransform Microsoft.ML.Transforms.Text.TextFeaturizingEstimator+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.TextToKeyConverter Converts input values (words, numbers, etc.) to index in a dictionary. Microsoft.ML.Transforms.Categorical TextToKey Microsoft.ML.Transforms.ValueToKeyMappingTransformer+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.TimeSeriesImputer Fills in missing row and values Microsoft.ML.Featurizers.TimeSeriesTransformerEntrypoint TimeSeriesImputer Microsoft.ML.Featurizers.TimeSeriesImputerEstimator+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.ToString Turns the given column into a column of its string representation Microsoft.ML.Featurizers.ToStringTransformerEntrypoint ToString Microsoft.ML.Featurizers.ToStringTransformerEstimator+Options Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.TrainTestDatasetSplitter Split the dataset into train and test sets Microsoft.ML.EntryPoints.TrainTestSplit Split Microsoft.ML.EntryPoints.TrainTestSplit+Input Microsoft.ML.EntryPoints.TrainTestSplit+Output
Transforms.TreeLeafFeaturizer Trains a tree ensemble, or loads it from a file, then maps a numeric feature vector to three outputs: 1. A vector containing the individual tree outputs of the tree ensemble. 2. A vector indicating the leaves that the feature vector falls on in the tree ensemble. 3. A vector indicating the paths that the feature vector falls on in the tree ensemble. If a both a model file and a trainer are specified - will use the model file. If neither are specified, will train a default FastTree model. This can handle key labels by training a regression model towards their optionally permuted indices. Microsoft.ML.Data.TreeFeaturize Featurizer Microsoft.ML.Data.TreeEnsembleFeaturizerTransform+ArgumentsForEntryPoint Microsoft.ML.EntryPoints.CommonOutputs+TransformOutput
Transforms.TwoHeterogeneousModelCombiner Combines a TransformModel and a PredictorModel into a single PredictorModel. Microsoft.ML.EntryPoints.ModelOperations CombineTwoModels Microsoft.ML.EntryPoints.ModelOperations+SimplePredictorModelInput Microsoft.ML.EntryPoints.ModelOperations+PredictorModelOutput
Expand Down
276 changes: 0 additions & 276 deletions test/BaselineOutput/Common/EntryPoints/core_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -17707,82 +17707,6 @@
"ITransformOutput"
]
},
{
"Name": "Transforms.CategoryImputer",
"Desc": "Fills in missing values in a column based on the most frequent value",
"FriendlyName": "CategoryImputer",
"ShortName": "CategoryImputer",
"Inputs": [
{
"Name": "Column",
"Type": {
"Kind": "Array",
"ItemType": {
"Kind": "Struct",
"Fields": [
{
"Name": "Name",
"Type": "String",
"Desc": "Name of the new column",
"Aliases": [
"name"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
},
{
"Name": "Source",
"Type": "String",
"Desc": "Name of the source column",
"Aliases": [
"src"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
}
]
}
},
"Desc": "New column definition (optional form: name:src)",
"Aliases": [
"col"
],
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
},
{
"Name": "Data",
"Type": "DataView",
"Desc": "Input dataset",
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
}
],
"Outputs": [
{
"Name": "OutputData",
"Type": "DataView",
"Desc": "Transformed dataset"
},
{
"Name": "Model",
"Type": "TransformModel",
"Desc": "Transform model"
}
],
"InputKind": [
"ITransformInput"
],
"OutputKind": [
"ITransformOutput"
]
},
{
"Name": "Transforms.CharacterTokenizer",
"Desc": "Character-oriented tokenizer where text is considered a sequence of characters.",
Expand Down Expand Up @@ -22997,130 +22921,6 @@
"ITransformOutput"
]
},
{
"Name": "Transforms.RobustScaler",
"Desc": "Removes the median and scales the data according to the quantile range.",
"FriendlyName": "RobustScalerTransformer",
"ShortName": "RobScalT",
"Inputs": [
{
"Name": "Column",
"Type": {
"Kind": "Array",
"ItemType": {
"Kind": "Struct",
"Fields": [
{
"Name": "Name",
"Type": "String",
"Desc": "Name of the new column",
"Aliases": [
"name"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
},
{
"Name": "Source",
"Type": "String",
"Desc": "Name of the source column",
"Aliases": [
"src"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
}
]
}
},
"Desc": "New column definition (optional form: name:src)",
"Aliases": [
"col"
],
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
},
{
"Name": "Data",
"Type": "DataView",
"Desc": "Input dataset",
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
},
{
"Name": "Center",
"Type": "Bool",
"Desc": "If True, center the data before scaling.",
"Aliases": [
"ctr"
],
"Required": false,
"SortOrder": 2.0,
"IsNullable": false,
"Default": true
},
{
"Name": "Scale",
"Type": "Bool",
"Desc": "If True, scale the data to interquartile range.",
"Aliases": [
"sc"
],
"Required": false,
"SortOrder": 3.0,
"IsNullable": false,
"Default": true
},
{
"Name": "QuantileMin",
"Type": "Float",
"Desc": "Min for the quantile range used to calculate scale.",
"Aliases": [
"min"
],
"Required": false,
"SortOrder": 4.0,
"IsNullable": false,
"Default": 25.0
},
{
"Name": "QuantileMax",
"Type": "Float",
"Desc": "Max for the quantile range used to calculate scale.",
"Aliases": [
"max"
],
"Required": false,
"SortOrder": 5.0,
"IsNullable": false,
"Default": 75.0
}
],
"Outputs": [
{
"Name": "OutputData",
"Type": "DataView",
"Desc": "Transformed dataset"
},
{
"Name": "Model",
"Type": "TransformModel",
"Desc": "Transform model"
}
],
"InputKind": [
"ITransformInput"
],
"OutputKind": [
"ITransformOutput"
]
},
{
"Name": "Transforms.RowRangeFilter",
"Desc": "Filters a dataview on a column of type Single, Double or Key (contiguous). Keeps the values that are in the specified min/max range. NaNs are always filtered out. If the input is a Key type, the min/max are considered percentages of the number of values.",
Expand Down Expand Up @@ -24255,82 +24055,6 @@
"ITransformOutput"
]
},
{
"Name": "Transforms.ToString",
"Desc": "Turns the given column into a column of its string representation",
"FriendlyName": "ToString Transform",
"ShortName": "tostr",
"Inputs": [
{
"Name": "Column",
"Type": {
"Kind": "Array",
"ItemType": {
"Kind": "Struct",
"Fields": [
{
"Name": "Name",
"Type": "String",
"Desc": "Name of the new column",
"Aliases": [
"name"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
},
{
"Name": "Source",
"Type": "String",
"Desc": "Name of the source column",
"Aliases": [
"src"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": null
}
]
}
},
"Desc": "New column definition (optional form: name:src)",
"Aliases": [
"col"
],
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
},
{
"Name": "Data",
"Type": "DataView",
"Desc": "Input dataset",
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
}
],
"Outputs": [
{
"Name": "OutputData",
"Type": "DataView",
"Desc": "Transformed dataset"
},
{
"Name": "Model",
"Type": "TransformModel",
"Desc": "Transform model"
}
],
"InputKind": [
"ITransformInput"
],
"OutputKind": [
"ITransformOutput"
]
},
{
"Name": "Transforms.TrainTestDatasetSplitter",
"Desc": "Split the dataset into train and test sets",
Expand Down
Loading