Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support use local file to accelerate the broker load #11196

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Sep 14, 2022

What type of PR is this:

  • Enhancement

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

When there is obvious network latency(e.g. ping command cost time greater than 40ms) between BE and broker file(such as in different idc), broker load ran into slow.

This PR set use_local_cache in broker load option to accelerate it.

LOAD LABEL test_db.label1
(
    xxx
)
WITH BROKER "mybroker"
(
    xxx
)
PROPERTIES
(
    "timeout" = "3600"
    "use_local_cache" = "true"
);

test result:

orc broker load(file count:101,total data size:13GB,table field count:25)

BE&broker file in different idc use Optimize orc load random I/O use local cache elapsed time
NO NO NO 7min
NO YES NO 6.5min
NO NO YES 6.5min
NO YES YES 6.5min
YES NO NO 30min timeoute
YES YES NO 30min timeoute
YES NO YES 11min
YES YES YES 11min

Checklist:

  • I have added test cases for my bug fix or my new feature
  • I have added user document for my new feature or new function

@mchades mchades force-pushed the fast-broker-load branch 2 times, most recently from 90c78d8 to 0f37a48 Compare September 14, 2022 12:39
@mchades mchades changed the title Support use local file to accelerate the broker load [Enhancement] Support use local file to accelerate the broker load Sep 14, 2022
@mchades mchades force-pushed the fast-broker-load branch 6 times, most recently from 0e766a5 to a74f256 Compare September 14, 2022 15:43
@xiaoyong-z
Copy link
Contributor

maybe set config::use_local_filecache_for_broker_random_access_file in broker load's PROPERTIES is a better choice? like the following:

LOAD LABEL test_db.label1
(
    xxx
)
WITH BROKER "mybroker"
(
    xxx
)
PROPERTIES
(
    "timeout" = "3600"
    "use_local_filecache_for_broker_random_access_file" = "true"
);

Then user don't need to restart BE if they want to change use_local_filecache_for_broker_random_access_file

@mchades
Copy link
Contributor Author

mchades commented Oct 9, 2022

maybe set config::use_local_filecache_for_broker_random_access_file in broker load's PROPERTIES is a better choice? like the following:

LOAD LABEL test_db.label1
(
    xxx
)
WITH BROKER "mybroker"
(
    xxx
)
PROPERTIES
(
    "timeout" = "3600"
    "use_local_filecache_for_broker_random_access_file" = "true"
);

Then user don't need to restart BE if they want to change use_local_filecache_for_broker_random_access_file

ok, but use_local_filecache_for_broker_random_access_file seems like too long to use, I take use_local_cache instead

@mchades
Copy link
Contributor Author

mchades commented Oct 9, 2022

run starrocks_fe_unittest

be/src/fs/fs.h Outdated Show resolved Hide resolved
@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 9, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@wanpengfei-git
Copy link
Collaborator

[FE PR Coverage Check]

😞 fail : 14 / 19 (73.68%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/load/loadv2/LoadJob.java 2 4 50.00% [234, 329]
🔵 com/starrocks/analysis/LoadStmt.java 3 6 50.00% [265, 266, 267]
🔵 com/starrocks/planner/FileScanNode.java 4 4 100.00% []
🔵 com/starrocks/load/loadv2/LoadingTaskPlanner.java 4 4 100.00% []
🔵 com/starrocks/load/loadv2/LoadLoadingTask.java 1 1 100.00% []

@@ -218,6 +218,8 @@ struct TBrokerScanRangeParams {
15: optional i32 hdfs_read_buffer_size_kb = 0
// properties from hdfs-site.xml, core-site.xml and load_properties
16: THdfsProperties hdfs_properties
// If use_local_cache is set, we will use local file for broker random access
17: optional bool use_local_cache = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an orc file is split into tasks and has some parallelism, will it download one file for many files?

@mergify mergify bot assigned mchades Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants