Description
Hi,
Although CategoricalColumnNames returns the correct count of the categorical columns with their correct names, NumericColumnNames on the other hand returns the correct count and column name if the dataset has only one numerical column. However, if the dataset has more than one numerical column, it will always return a count of 1, and the column name will always be "Features" for some reason!
For example, imagine the following dataset:
x1, x2, x3, x4
1, T, 3, A
2, T, 4, A
3, L, 4, A
4, L, 4, B
CategoricalColumnNames will return a count of 2 categorical columns with the names x2 and x4. However, NumericColumnNames will return a count of 1 instead of 2, and one column name which is "Features" instead of x1 and x3.
This is how they are implemented:
ColumnInferenceResults columnInference = MLContext.Auto().InferColumns(TrainingDataPath, labelColumnIndex: 4, hasHeader: true);
ColumnInformation columnInformation = columnInference.ColumnInformation;
ICollection CatCols = columnInformation.CategoricalColumnNames;
ICollection NumCols = columnInformation.NumericColumnNames;
Please help. Thanks.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: b000cf86-79fe-a677-3b39-f834e1c4b959
- Version Independent ID: af43d324-d6c1-c104-c16b-81580a638de2
- Content: ColumnInformation.NumericColumnNames Property (Microsoft.ML.AutoML)
- Content Source: dotnet/xml/Microsoft.ML.AutoML/ColumnInformation.xml
- Product: dotnet-ml-api
- GitHub Login: @natke
- Microsoft Alias: nakersha