Implement grace hash join when build input is bigger than the available memory #14675
Open
Description
Feature Request
Is your feature request related to a problem? Please describe:
What we already done before is using temporary disk for build side of Hash Join.
- But when build input is bigger enough, the memory usage of hash table may still be larger than the available memory.
- It spills all build input into disk and abandons the usage of memory for build input. aka, when probing, the reading of each build row is a random I/O.
Describe the feature you'd like:
By implementing GRACE hash join or hybrid hash join, we can spill both build input and probe input when memory limit is exceeded, and for each partition we can join in memory.
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
- CMU course for hash join (include grace hash join), youtube video available, https://15445.courses.cs.cmu.edu/fall2019/schedule.html#oct-02-2019
- https://en.wikipedia.org/wiki/Hash_join
- MySQL blog about how the hash join spills into disk, https://mysqlserverteam.com/hash-join-in-mysql-8/
- Consider using shuffle executor to get partitions, see docs: propose a Shuffle operator #14453
Activity