Skip to content

Commit ce6deb1

Browse files
adrian-wangmarmbrus
authored andcommitted
[SQL] Code Cleanup: Left Semi Hash Join
Some improvement for PR #837, add another case to white list and use `filter` to build result iterator. Author: Daoyuan <daoyuan.wang@intel.com> Closes #1049 from adrian-wang/clean-LeftSemiJoinHash and squashes the following commits: b314d5a [Daoyuan] change hashSet name 27579a9 [Daoyuan] add semijoin to white list and use filter to create new iterator in LeftSemiJoinBNL Signed-off-by: Michael Armbrust <michael@databricks.com>
1 parent 4107cce commit ce6deb1

File tree

52 files changed

+374
-33
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+374
-33
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/joins.scala

Lines changed: 7 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -169,51 +169,25 @@ case class LeftSemiJoinHash(
169169
def execute() = {
170170

171171
buildPlan.execute().zipPartitions(streamedPlan.execute()) { (buildIter, streamIter) =>
172-
val hashTable = new java.util.HashSet[Row]()
172+
val hashSet = new java.util.HashSet[Row]()
173173
var currentRow: Row = null
174174

175175
// Create a Hash set of buildKeys
176176
while (buildIter.hasNext) {
177177
currentRow = buildIter.next()
178178
val rowKey = buildSideKeyGenerator(currentRow)
179179
if(!rowKey.anyNull) {
180-
val keyExists = hashTable.contains(rowKey)
180+
val keyExists = hashSet.contains(rowKey)
181181
if (!keyExists) {
182-
hashTable.add(rowKey)
182+
hashSet.add(rowKey)
183183
}
184184
}
185185
}
186186

187-
new Iterator[Row] {
188-
private[this] var currentStreamedRow: Row = _
189-
private[this] var currentHashMatched: Boolean = false
190-
191-
private[this] val joinKeys = streamSideKeyGenerator()
192-
193-
override final def hasNext: Boolean =
194-
streamIter.hasNext && fetchNext()
195-
196-
override final def next() = {
197-
currentStreamedRow
198-
}
199-
200-
/**
201-
* Searches the streamed iterator for the next row that has at least one match in hashtable.
202-
*
203-
* @return true if the search is successful, and false the streamed iterator runs out of
204-
* tuples.
205-
*/
206-
private final def fetchNext(): Boolean = {
207-
currentHashMatched = false
208-
while (!currentHashMatched && streamIter.hasNext) {
209-
currentStreamedRow = streamIter.next()
210-
if (!joinKeys(currentStreamedRow).anyNull) {
211-
currentHashMatched = hashTable.contains(joinKeys.currentValue)
212-
}
213-
}
214-
currentHashMatched
215-
}
216-
}
187+
val joinKeys = streamSideKeyGenerator()
188+
streamIter.filter(current => {
189+
!joinKeys(current).anyNull && hashSet.contains(joinKeys.currentValue)
190+
})
217191
}
218192
}
219193
}

sql/hive/src/test/resources/golden/semijoin-0-1631b71327abf75b96116036b977b26c

Whitespace-only changes.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
0 val_0
2+
0 val_0
3+
0 val_0
4+
2 val_2
5+
4 val_4
6+
5 val_5
7+
5 val_5
8+
5 val_5
9+
8 val_8
10+
9 val_9
11+
10 val_10

sql/hive/src/test/resources/golden/semijoin-10-ffd4fb3a903a6725ccb97d5451a3fec6

Whitespace-only changes.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
0 val_0
2+
0 val_0
3+
0 val_0
4+
4 val_2
5+
8 val_4
6+
10 val_5
7+
10 val_5
8+
10 val_5

sql/hive/src/test/resources/golden/semijoin-12-6d93a9d332ba490835b17f261a5467df

Whitespace-only changes.

sql/hive/src/test/resources/golden/semijoin-13-18282d38b6efc0017089ab89b661764f

Whitespace-only changes.

sql/hive/src/test/resources/golden/semijoin-14-19cfcefb10e1972bec0ffd421cd79de7

Whitespace-only changes.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
val_0
2+
val_0
3+
val_0
4+
val_10
5+
val_2
6+
val_4
7+
val_5
8+
val_5
9+
val_5
10+
val_8
11+
val_9

sql/hive/src/test/resources/golden/semijoin-16-d3a72a90515ac4a8d8e9ac923bcda3d

Whitespace-only changes.

0 commit comments

Comments
 (0)