Fine grained manipulation on hashing partitioned data

## Enhancement

We know that some operators (agg, join, etc.) could benefit from prehashing. If each inputstream belongs to one hash bucket:
1. For agg, we do not need the merging phase, each thread could directly generate the final blocks.
2. For build side of join, we do not need any lock and the resizing of hash table could only has influence to one thread.
3. For probe side of join, it could start as soon as its corresponding build job ends, not the whole build phase.
4. For exchange sender, we do not need to partition the data again, just directly send them.

Note that all the potential enhancements requires:
1. Use the same (or compatible) hash function.
1. The prehash key is the super set of the sink hash key.

So how to do prehashing? Here are some ideas:
1. For a hash partition table, just let each inputstream corresponds to one partition (N:1 or 1:1 relation). The key is how to carry hash info along with inputstream.
2. For large data volume, the agg will merge data into 256 buckets, however it then discards the hashing info, merging them into one inputstream. The upper operators could benefit from disclosing the hashing info from agg.
3. We could insert some virtual hash partition operators near to TableScan and Exchange operators (which are leaves of a MPP task) to generate inputstreams with hashing info.

Especially, for Exchange, ExchangeSender is a better place to partition data than ExchangeReceiver: partitioning is already part of ExchangeSender's job. 

Note: blindly split data at ExchangeSender could make the packet smaller and weaken the effect of vectorization. A previous demo showed that blindly splitting will decrease the performance (using current implementation), while well-designed batching will increase the performance a lot.

Subtasks for **window function**:
- [x] tiflash code: #5142 
- [x] kvproto: https://github.com/pingcap/kvproto/pull/921
- [x] enable fine grained shuffle when dispatching MPPTasks: https://github.com/pingcap/tidb/issues/35342
- [x] microbenchmark for fine grained shuffle: https://github.com/pingcap/tiflash/issues/5138
- [x] design doc: https://github.com/pingcap/tiflash/pull/5149

Subtasks for **HashAgg/Join**:
- [ ] HashAgg + fine-grained shuffle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine grained manipulation on hashing partitioned data #4631

Enhancement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development