Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize performance for UnionScanExec and MemBuffer #43249

Open
lcwangchao opened this issue Apr 20, 2023 · 3 comments
Open

Optimize performance for UnionScanExec and MemBuffer #43249

lcwangchao opened this issue Apr 20, 2023 · 3 comments
Assignees
Labels
sig/execution SIG execution sig/sql-infra SIG: SQL Infra type/enhancement The issue or PR belongs to an enhancement.

Comments

@lcwangchao
Copy link
Collaborator

Enhancement

The UnionScan is not very efficient, for example:

  • UnionScanExec will filter all rows/indexes in Open instead of Next. It will take more time when the SQL has a LIMIT because Open will not considerate it.
  • Maybe we can implementMemBuffer more efficient.

The union scan's performance will affect the performance of temporary table, cached table and queries in a txn with a lot of uncommitted rows.

@lcwangchao lcwangchao added type/enhancement The issue or PR belongs to an enhancement. sig/execution SIG execution sig/sql-infra SIG: SQL Infra labels Apr 20, 2023
@tiancaiamao tiancaiamao self-assigned this Jun 19, 2023
@tiancaiamao
Copy link
Contributor

cd executor;
 go test -run XXX -bench BenchmarkUnionScan -cpuprofile cpu.out -benchtime 45s

image

The inefficency comes from too parts:

  1. Open() always drain all the data even it might be useless in the Limit scenario. It's better to make it a streaming API
  2. The decode / encode and row format translation cost. When loading, we decode kv -> row in []Datum representation. and when merging, []Datum -> []Datum, and finally output is translated from row representation to chunk, i.e. []Datum -> chunk

@ekexium
Copy link
Contributor

ekexium commented May 14, 2024

Hey @lcwangchao do you have more elaboration in "Maybe we can implementMemBuffer more efficient."? The flamegraph shows that time spent in memdb is quite little. Did you mean codec-related work?

@tiancaiamao
Copy link
Contributor

Hey @lcwangchao do you have more elaboration in "Maybe we can implementMemBuffer more efficient."? The flamegraph shows that time spent in memdb is quite little. Did you mean codec-related work?

After my previous optimization, much of that had been improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution sig/sql-infra SIG: SQL Infra type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants