-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-5885][MLLIB] Add VectorAssembler as a feature transformer #5196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #29199 has finished for PR 5196 at commit
|
override lazy val resolved: Boolean = childrenResolved | ||
|
||
override def dataType: StructType = { | ||
assert(resolved, s"CreateStruct is called with unresolved children: $children.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is confusing. CreateStruct
can be called with unresolved children (and you actually did this here), but CreateStruct.dataType
can't, because child types may be unknown.
The build failure was caused by codegen. When the |
Actually, the failed test suites are |
* A feature transformer than merge multiple columns into a vector column. | ||
*/ | ||
@AlphaComponent | ||
class VectorAssembler extends Transformer with HasInputCols with HasOutputCol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it FeatureUnion to keep the same semantic with sklearn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I like the idea of trying to keep naming and architecture as similar to sklearn as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not exactly sklearn's FeatureUnion that takes transformers. VectorAssembler
merges arbitrary columns. I'm a little worried that if we use the same name people would expect the same behavior.
Test build #30018 has finished for PR 5196 at commit
|
Merged into master. Will send a follow-up PR to handle attributes. |
VectorAssembler merges multiple columns into a vector column. This PR contains content from #5195.
carry ML attributes(moved to a follow-up PR)