Skip to content

Conversation

@kere-nel
Copy link
Contributor

@kere-nel kere-nel commented Apr 4, 2020

Issue: The GatherElements ONNX operator used by SlotsDroppingTransformer (transformer for CountFeatureSelectingEstimator) has issues when there's nothing selected, so mlnet needs to handle this case separately.

This PR

  • Fixes the issue mentioned above
  • Modifies the estimator's test to be more robust

Other issues:

  • There's no support for string initializing in ONNX, which prevents us from handling the case where no string columns get selected.
  • The documentation for CountFeatureSelectingEstimator is ambiguous. It states that it supports input as "vector or scalar of numeric, text or key data types". However, only strings, doubles, and singles are supported.

@kere-nel kere-nel requested a review from a team as a code owner April 4, 2020 00:04
string opType;
if (_srcTypes[iinfo] is VectorDataViewType)
var slots = _slotDropper[iinfo].GetPreservedSlots();
if (_srcTypes[iinfo] is VectorDataViewType && slots.Count() > 0)
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_srcTypes[iinfo] is VectorDataViewType && slots.Count() > 0 [](start = 20, length = 59)

what if _srcTypes[iinfo] is VectorDataViewType && slots.Count() == 0? should we return true or false?
in current implementation seems it will goes into the "else" condition in line 917 and which should handling double numeric data type #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If slots.Count == 0, then the column get's "suppressed". It's not handled well by the GatherElements operator, so we create an empty vector in the else clause. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but we better to make the logic clear like below, don't let different cases fall into same condition which will make code harder to read and will cause potential issue in future:

if (_srcTypes[iinfo] is VectorDataViewType)
{
if (count > 0) { ... }
else { ... }
}
else { ... }


In reply to: 404404988 [](ancestors = 404404988)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@kere-nel kere-nel merged commit 8660ecc into dotnet:master Apr 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants