Skip to content

Consider using a disk-based hash table for hash join avoiding OOM #11607

Open
@SunRunAway

Description

Feature Request

Is your feature request related to a problem? Please describe:

Consider using a disk-based hash table for hash join avoiding OOM.

HashJoinExecutor uses a hash table describing the map of join keys and inner table rows.

TiDB's hash join is implemented by innerResult and mvmap.MVMap. The innerResult stores all the rows of the inner table, and the mvmap.MVMap stores the map of (join key, inner table pointer). This allows us to use these two structures to get a map of join keys and inner table rows.
When the inner table is particularly large, the innerResult will take up a lot of memory; when the join key is particularly large, mvmap.MVMap will also take up a lot of memory. There will be problems with OOM at this time.

Describe the feature you'd like:

  1. We already have a config mem-quota-query, which set the memory quota for a query in bytes.
  2. Introduce a new config oom-use-tmp-storage, default is true. Set to true to enable use of temporary disk for some executors(in this issue, it is hash join) when mem-quota-query is exceeded.
  3. Show disk usage of an executor in explain analyze
  4. Show disk usage of a query in SELECT * FROM information_schema.processlist;
  5. Consider disk usage in cost model.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

tasks:

  1. The improvement of mvmap.MVMap
  1. Disk-based innerResult
  1. cost model, explain analyze, and disk usage control

Some tiny issues

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

epic/memory-managementhelp wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.sig/executionSIG executiontype/enhancementThe issue or PR belongs to an enhancement.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions