Skip to content

Fine grained manipulation on hashing partitioned data #4631

Closed
@fuzhe1989

Description

Enhancement

We know that some operators (agg, join, etc.) could benefit from prehashing. If each inputstream belongs to one hash bucket:

  1. For agg, we do not need the merging phase, each thread could directly generate the final blocks.
  2. For build side of join, we do not need any lock and the resizing of hash table could only has influence to one thread.
  3. For probe side of join, it could start as soon as its corresponding build job ends, not the whole build phase.
  4. For exchange sender, we do not need to partition the data again, just directly send them.

Note that all the potential enhancements requires:

  1. Use the same (or compatible) hash function.
  2. The prehash key is the super set of the sink hash key.

So how to do prehashing? Here are some ideas:

  1. For a hash partition table, just let each inputstream corresponds to one partition (N:1 or 1:1 relation). The key is how to carry hash info along with inputstream.
  2. For large data volume, the agg will merge data into 256 buckets, however it then discards the hashing info, merging them into one inputstream. The upper operators could benefit from disclosing the hashing info from agg.
  3. We could insert some virtual hash partition operators near to TableScan and Exchange operators (which are leaves of a MPP task) to generate inputstreams with hashing info.

Especially, for Exchange, ExchangeSender is a better place to partition data than ExchangeReceiver: partitioning is already part of ExchangeSender's job.

Note: blindly split data at ExchangeSender could make the packet smaller and weaken the effect of vectorization. A previous demo showed that blindly splitting will decrease the performance (using current implementation), while well-designed batching will increase the performance a lot.

Subtasks for window function:

Subtasks for HashAgg/Join:

  • HashAgg + fine-grained shuffle

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions