Three fundamental algorithms for performing a binary join operation exist:
- hash join
- Grace Hash Join [DONE]
- Naive Main Memory Hash Join
- External Hash Table [DONE]
- sort-merge join [DONE]
- symmetric hash join (streaming) [DONE]
- Join Schema which doesn't have Pipeline Breakers
- Send output as soon as you get input rather than waiting for the whole join to end.
- There is a Out-of-core Symmetric Hash Join (X-Join Paper). Not used in practical systems.
- nested loop join
- Double Buffering: CoW Btree Page 50, External Algorithms
- Sampling for Partitioning/Sorting: Used in parallel merge join, IVF Flat using K-means, Distributed KD Tree
- Hashing vs Sorting
- Hashing is Divide, Hash, and Manage
- Sorting is Divide, Sort, and Merge
- Grace Hash Joins 1
- Grace Hash Joins 2
- Reference Projects
- https://en.wikipedia.org/wiki/Hash_join
- https://en.wikipedia.org/wiki/Nested_loop_join
- https://en.wikipedia.org/wiki/Sort-merge_join
- https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-hash/slides.html#s2
- https://pages.cs.wisc.edu/~travitch/notes/cs764-notes-midterm1.pdf
- Loop Join Animation
- https://faculty.cc.gatech.edu/~jarulraj/courses/4420-f20/slides/20-joins.pdf
- https://courses.cs.washington.edu/courses/cse444/10au/lectures/lecture20.pdf
- https://pages.cs.wisc.edu/~yxy/cs764-f21/slides/L2.pdf