-
Notifications
You must be signed in to change notification settings - Fork 207
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
currently hash joins use a monolithic in-memory hash table for joining, which may cause oom in the case where offheap memory is small.
Describe the solution you'd like
add a row/memory limit for building hash table. when exceeded, turn into a spill-merge method:
- build side data is shuffled into N buckets. (say N=1024)
- build buckets into separated hash tables, small buckets can be coalesced.
- shuffle probe side into the same N partitions.
- read each partition, join with the corresponding hash table.
Describe alternatives you've considered
this solves oom problem in most cases, however when there are data skewing, the shuffle does not work, we may fallback to sort-based joining in such situation.
Additional context
Add any other context or screenshots about the feature request here.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request