-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-12217] [ML] Document invalid handling for StringIndexer #10257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #47545 has finished for PR 10257 at commit
|
Pinging @holdenk and @jkbradley |
That looks good to me, I don't think a full code example is necessary. |
@@ -459,6 +459,42 @@ column, we should get the following: | |||
"a" gets index `0` because it is the most frequent, followed by "c" with index `1` and "b" with | |||
index `2`. | |||
|
|||
Additionaly, there are two strategies regarding how `StringIndexer` will handle | |||
unseen labels when you have set up a `StringIndexer` on a dataset which you want |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"set up" --> "fit"
"on a dataset which you want to reuse on another" --> "on one dataset and then use it to transform another dataset"
@BenFradet Thanks! I agree you didn't have to write a full example, but it's nice that it explains it very clearly, so I'd keep it. I just had small phrasing comments. |
@jkbradley thanks for the comments. |
LGTM pending tests |
Test build #47600 has finished for PR 10257 at commit
|
Test build #2209 has finished for PR 10257 at commit
|
Merging with master and branch-1.6 |
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10257 from BenFradet/SPARK-12217. (cherry picked from commit aea676c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.
I wonder if I should also add a snippet to the code example, input welcome.