Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](orc)improve hdfs scan performance #11501

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

dujl
Copy link
Contributor

@dujl dujl commented Aug 4, 2022

Proposed changes

Issue Number: close #11498

Problem summary

When run ssb testsuite for hive orc table, the performance is very slow than trino.
10.9s ( doris) vs 5.61s (trino).

Describe your changes.
fix libhdfs3 seek performance issue
apache/doris-thirdparty#2
performance improvment on ssb 30G orc file format

query after before
Q1.1 3.05 10.944
Q1.2 2.99 10.65
Q1.3 3.47 10.88
Q2.1 5.39 51.97
Q2.2 5.56 51.02
Q2.3 5.45 52.52
Q3.1 42 82.34
Q3.2 34.75 77.48
Q3.3 33.2 71.16
Q3.4 32.81 72.98
Q4.1 101.34 143.61
Q4.2 43.27 96.59
Q4.3 47.65 111.22
total(s) 360.93 843.364

Checklist(Required)

  1. Does it affect the original behavior:
    • [Y] Yes
    • No
    • I don't know
  2. Has unit tests been added:
    • Yes
    • No
    • [N ] No Need
  3. Has document been added or modified:
    • Yes
    • No
    • [ N] No Need
  4. Does it need to update dependencies:
    • Yes
    • [N ] No
  5. Are there any changes that cannot be rolled back:
    • Yes (If Yes, please explain WHY)
    • [N ] No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@SaintBacchus
Copy link
Contributor

good improvement for hms table

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2022

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2022

PR approved by anyone and no changes requested.

@yangzhg yangzhg merged commit c176ff5 into apache:master Aug 4, 2022
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Aug 9, 2022
yiguolei pushed a commit to yiguolei/incubator-doris that referenced this pull request Aug 9, 2022
@yiguolei yiguolei mentioned this pull request Sep 1, 2022
Henry2SS pushed a commit to Henry2SS/incubator-doris that referenced this pull request Sep 14, 2022
@yiguolei yiguolei mentioned this pull request Oct 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/multi-catalog dev/merge-1.1.2 reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] [multi-catalog] scan orc file from hdfs is slow
5 participants