Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x](backport #3986) Add timeout to bulker flush, add default case in failQueue #4172

Merged
merged 1 commit into from
Dec 9, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 3, 2024

What is the problem this PR solves?

Scale tests are often blocked at flushing the bulker queue.

How does this PR solve the problem?

Adding a context timeout to the bulker flush so it times out if it takes more time than the deadline.

How to test this PR locally

Ran scale tests here: https://buildkite.com/elastic/observability-perf/builds?branch=increase-poll-action-retries-for-agent-upgrades

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Related issues

Relates https://github.com/elastic/ingest-dev/issues/3783


This is an automatic backport of pull request #3986 done by Mergify.

* use poll timeout in es ctx

* Add some SCALEDEBUG logs

* add agent id to logs

* debug logs in fleet.go

* add default case, log deadline

* 5m timeout

* cleanup logs, move deadline to doFlush

* move timeout before doFlush

* cleanup logs

* extracted const

* remove log

* exit bulker on checkin error

* update to latest stack snapshot

* revert break LOOP

* move deadline inside doFlush

* fix cancel

* remove doFlush param

* separate context

* added changelog

---------

Co-authored-by: Jill Guyonnet <jill.guyonnet@gmail.com>
(cherry picked from commit 6b29ab4)
@mergify mergify bot requested a review from a team as a code owner December 3, 2024 09:01
@mergify mergify bot added the backport label Dec 3, 2024
Copy link
Contributor Author

mergify bot commented Dec 9, 2024

This pull request has not been merged yet. Could you please review and merge it @juliaElastic? 🙏

@juliaElastic juliaElastic merged commit 480bc0a into 8.x Dec 9, 2024
8 checks passed
@juliaElastic juliaElastic deleted the mergify/bp/8.x/pr-3986 branch December 9, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants