Skip to content

Conversation

michaelgsharp
Copy link
Contributor

Adds in the missing values replacing method of Mode. Replaces missing values with the most frequent value in a column. In the case that multiple values have the same count, the first one encountered is the one that is returned.

This also moves a test helping method from OnnxConverstionTest.cs into the BaseTestBaseline class so that every test class can use it.

@michaelgsharp michaelgsharp requested review from a team and harishsk June 3, 2020 22:27
@michaelgsharp michaelgsharp self-assigned this Jun 3, 2020
@codecov
Copy link

codecov bot commented Jun 4, 2020

Codecov Report

Merging #5205 into master will increase coverage by 0.49%.
The diff coverage is 96.12%.

@@            Coverage Diff             @@
##           master    #5205      +/-   ##
==========================================
+ Coverage   73.08%   73.57%   +0.49%     
==========================================
  Files        1004     1016      +12     
  Lines      187398   190214    +2816     
  Branches    20212    20456     +244     
==========================================
+ Hits       136952   139952    +3000     
+ Misses      44929    44687     -242     
- Partials     5517     5575      +58     
Flag Coverage Δ
#Debug 73.57% <96.12%> (+0.49%) ⬆️
#production 69.37% <91.73%> (+0.49%) ⬆️
#test 87.53% <100.00%> (+0.30%) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.Transforms/MissingValueReplacing.cs 77.53% <ø> (+0.17%) ⬆️
...rosoft.ML.Transforms/MissingValueReplacingUtils.cs 54.15% <91.73%> (+15.20%) ⬆️
...est/Microsoft.ML.TestFramework/BaseTestBaseline.cs 77.23% <100.00%> (+4.53%) ⬆️
test/Microsoft.ML.Tests/OnnxConversionTest.cs 96.62% <100.00%> (-0.19%) ⬇️
.../Microsoft.ML.Tests/Transformers/NAReplaceTests.cs 100.00% <100.00%> (ø)
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 83.19% <0.00%> (-3.37%) ⬇️
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 84.32% <0.00%> (-0.85%) ⬇️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 40.00% <0.00%> (-0.68%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 89.29% <0.00%> (-0.16%) ⬇️
....ML.Tests/Transformers/CountTargetEncodingTests.cs 100.00% <0.00%> (ø)
... and 39 more

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@harishsk
Copy link
Contributor

harishsk commented Jun 5, 2020

            Append(mlContext.Transforms.NormalizeMinMax("Features")).

Can you please add a separate onnx test for ReplaceMissingValues with all the supported types of replacements?


Refers to: test/Microsoft.ML.Tests/OnnxConversionTest.cs:581 in 701d9d8. [](commit_id = 701d9d8, deletion_comment = False)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

/// <summary>
/// Replace with the most frequent value of the column.
/// </summary>
Mode = 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we skip 4 here? It went from 0, 1, 2, 3 and then jumped to 5.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants