-
Notifications
You must be signed in to change notification settings - Fork 8
Conversation
val leftOutputColumns = leftView.columns.map{columnName => col(s"datasetA.${columnName}")} | ||
val rightOutputColumns = rightView.columns.map{columnName => col(s"datasetB.${columnName}")} | ||
|
||
pipelineModel.stages(3).asInstanceOf[MinHashLSHModel] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider searching for the MinHashLSHModel instead of by index in case the position changes in the future
} | ||
|
||
// build locality-sensitive hashing model | ||
val minHashLSH = { new MinHashLSH() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style suggestion to remove the surrounding brackets as it doesn't require a block (same for above)
|
||
pipelineModel.stages(3).asInstanceOf[MinHashLSHModel] | ||
.approxSimilarityJoin(datasetA, datasetB, (1.0-stage.threshold)) | ||
.select((leftOutputColumns ++ rightOutputColumns ++ List((lit(1.0)-col("distCol")).alias("similarity"))):_*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you specifically need a list here or can it just be a generic Seq?
before(head) | ||
val result = processStage(head) | ||
after(head, result, true) | ||
val stage = head._1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to do this in the match:
case (stage, index) :: Nil
val result = processStage(head) | ||
after(head, result, false) | ||
val stage = head._1 | ||
val index = head._2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to do this in the match:
case (stage, index) :: tail
Add change to Lifecycle Hook API and add additional config reader types