Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-7072][Manager][Sort] Resource adaptive adjustment for Hudi #7077

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

featzhang
Copy link
Member

@featzhang featzhang commented Dec 27, 2022

Prepare a Pull Request

[INLONG-7072][Manager][Sort] Resource adaptive adjustment for Hudi

Motivation

Hudi flink jobs often have unreasonable resource allocation. Too much allocation will lead to waste of resources, and too little will lead to back pressure or OOM.

When allocating resources, you first need to determine the concurrency of the source side to ensure that there is no data backlog in the upstream when reading. Here is a general configuration situation, such as partitioning by day, with about 15 billion data per day, and about 50 concurrent configurations. Other data volumes can be converted appropriately.

After determining the concurrency on the source side, you can configure the concurrency of write according to the ratio of 1:1.5 or 1:2.

If OOM occurs in the write operator during operation, you can appropriately add write concurrency and TM memory.

If the following back pressure occurs, the concurrency can be adjusted according to the consumption difference between source and write. As follows, there is a difference of about 50W, that is, there is 50W of data that cannot keep up with the write, and then it can be based on the amount of successfully written data and the running (used) Time to calculate how much write concurrency is needed to calculate the difference of 50W.

image

image

Modifications

  1. Estimate the parallelism of the source node based on the estimated daily data volume input by the user at a rate of 1,000 per second per core.
  2. Configure write concurrency according to the ratio of 1:1.5 or 1:2

Verifying this change

(Please pick either of the following options)

  • This change is a trivial rework/code cleanup without any test coverage.

  • This change is already covered by existing tests, such as:
    (please describe tests)

  • This change added tests and can be verified as follows:

    (example:)

    • Added integration tests for end-to-end deployment with large payloads (10MB)
    • Extended integration test for recovery after broker failure

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a follow-up issue for adding the documentation

/inlong-sort/connectors
/plugins
/inlong-sort/sort-dist.jar
/inlong-manager/plugins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove these lines

@@ -209,6 +209,10 @@
"meta.Sinks.Hudi.PartitionFieldListHelp": "If the field type is timestamp, you must set the format of the field value, support MICROSECONDS, MILLISECONDS, SECONDS, SQL, ISO_8601, and custom, such as: yyyy-MM-dd HH:mm:ss, etc.",
"meta.Sinks.Hudi.FieldFormat": "FieldFormat",
"meta.Sinks.Hudi.ExtListHelper": "The DDL attribute of the hudi table needs to be prefixed with 'ddl.'",
"meta.Sinks.Hudi.RecordPreDayUnit": "row",
"meta.Sinks.Hudi.RecordPreDay": "RecordPreDay",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecordPerDay, not PreDay.


-->

## Apache InLong dev toolkit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not commit unrelated codes with this issue.

@featzhang featzhang marked this pull request as draft January 31, 2023 11:02
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stage/stale Issues or PRs that had no activity for a long time label Apr 4, 2023
@github-actions github-actions bot removed the stage/stale Issues or PRs that had no activity for a long time label Aug 25, 2023
@github-actions
Copy link

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stage/stale Issues or PRs that had no activity for a long time label Oct 24, 2023
@github-actions github-actions bot removed the stage/stale Issues or PRs that had no activity for a long time label Jul 19, 2024
Copy link

This PR is stale because it has been open for 60 days with no activity.

@github-actions github-actions bot added the stage/stale Issues or PRs that had no activity for a long time label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/dashboard component/manager component/sort stage/stale Issues or PRs that had no activity for a long time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Manager][Sort] Resource adaptive adjustment for Hudi
3 participants