Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rate limiting for offline batch jobs, set default bulk size to 500 #3116

Merged
merged 2 commits into from
Oct 16, 2024

Conversation

Zhangxunmt
Copy link
Collaborator

@Zhangxunmt Zhangxunmt commented Oct 15, 2024

Description

Add rate limiting to offline batch inference and ingestion. Set the default bulk size for batch ingestion to 500(optimal value from benchmark). If the batch job tasks exceeds the limits defined in the settings, ml_limit_exceeded exception will be thrown.

POST /_plugins/_ml/_batch_ingestion{...}
<if exceeding the rate limit>
{
  "error": {
    "root_cause": [
      {
        "type": "m_l_limit_exceeded_exception",
        "reason": "Exceeded maximum limit for BATCH_INGEST tasks. To increase the limit, update the plugins.ml_commons.max_batch_ingestion_tasks setting."
      }
    ],
    "type": "m_l_limit_exceeded_exception",
    "reason": "Exceeded maximum limit for BATCH_INGEST tasks. To increase the limit, update the plugins.ml_commons.max_batch_ingestion_tasks setting."
  },
  "status": 429
}

Update settings for rate limits

PUT _cluster/settings
{
  "persistent": {
    "plugins.ml_commons.offline_batch_inference_enabled": true,
    "plugins.ml_commons.offline_batch_ingestion_enabled": true,
    "plugins.ml_commons.max_batch_inference_tasks": 2,
    "plugins.ml_commons.max_batch_ingestion_tasks": 1,
    "plugins.ml_commons.batch_ingestion_bulk_size": 500
  }
}

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Xun Zhang <xunzh@amazon.com>
.intSetting("plugins.ml_commons.max_batch_ingestion_tasks", 10, 0, 500, Setting.Property.NodeScope, Setting.Property.Dynamic);

public static final Setting<Integer> ML_COMMONS_BATCH_INGESTION_BULK_SIZE = Setting
.intSetting("plugins.ml_commons.batch_ingestion_bulk_size", 500, 100, 100000, Setting.Property.NodeScope, Setting.Property.Dynamic);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add these new settings to netty/jetty layers on AOS 2.17

Copy link
Collaborator Author

@Zhangxunmt Zhangxunmt Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout. They will be handled separately when this is backported into the AOS branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow users to set a negative number?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the minimum values are 0 actually as defined in the setting.

@Zhangxunmt Zhangxunmt merged commit 9a4166e into opensearch-project:main Oct 16, 2024
8 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 16, 2024
#3116)

* add rate limiting for offline batch jobs, set default bulk size to 500

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* update error code to 429 for rate limiting and update logs

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
(cherry picked from commit 9a4166e)
@Zhangxunmt Zhangxunmt temporarily deployed to ml-commons-cicd-env October 16, 2024 22:29 — with GitHub Actions Inactive
Zhangxunmt added a commit that referenced this pull request Oct 16, 2024
#3116) (#3121)

* add rate limiting for offline batch jobs, set default bulk size to 500

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* update error code to 429 for rate limiting and update logs

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
(cherry picked from commit 9a4166e)

Co-authored-by: Xun Zhang <xunzh@amazon.com>
Zhangxunmt added a commit to Zhangxunmt/ml-commons that referenced this pull request Oct 16, 2024
opensearch-project#3116)

* add rate limiting for offline batch jobs, set default bulk size to 500

Signed-off-by: Xun Zhang <xunzh@amazon.com>

* update error code to 429 for rate limiting and update logs

Signed-off-by: Xun Zhang <xunzh@amazon.com>

---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Zhangxunmt added a commit that referenced this pull request Oct 17, 2024
#3116) (#3122)

* add rate limiting for offline batch jobs, set default bulk size to 500



* update error code to 429 for rate limiting and update logs



---------

Signed-off-by: Xun Zhang <xunzh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants