Skip to content

Vectorized hashing for hash aggregation code #26

Closed as duplicate
Closed as duplicate
@Dandandan

Description

@Dandandan

Updating the hash aggregate implementation to use vectorized hashing should give a decent speed up to queries that are dependant on fast hash aggregate implementations.

Currently keys are generated of type Vec<u8> and are hashed row-by-row which causes

  • more memory usage
  • slow re-hashing of the backing hashmap
  • type un-aware hashing for simple primitive values

The implementation should also solve hash collisions, so the original should be able to be compared with the values.

There is some WIP code here apache/arrow#9213 which can be used as a starting point / to continue from.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions