Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: add long polling to TestConcurrentFetchers #9971

Conversation

dimitarvdimitrov
Copy link
Contributor

What this PR does

Some of the tests assume that we'd fetch all produced records in one go. The Kafka protocol doesn't guarantee that and sometimes kfake also returns less than that. This means that sometimes the tests would finish early before we've fetched all records.

This PR introduced logPollFetches which loops over fetchers.PollFetches multiple times until a timeout or until we fetch the expected number of records.

I also changed all places which had this assumption and all places which already did some form of log polling.

2s timeout

I set the timeout to 2s because of the MinBytesMaxWait of 1s in tests (code):

  • fetcher receives 1 record (1st request)
  • fetcher sends the fetchResult to fetchWant.result
  • fetcher receives 2 records (2nd request)
  • fetcher can't send the result to fetchWant.result because start() isn't receiving on the channel; start() is trying to send to orderedFetches
  • fetcher start another (3rd) attempt; the attempt takes 1s (MaxWaitMillis in the Fetch request)
  • client calls PollFetches, reads from orderedFetches; receives the first Fetches with 1 records; start() goes back to waiting on fetchWant.result
  • client calls PollFetches, start() is still waiting on fetchWant.result
  • fetcher receives the response to 3rd request with no records; sends the buffered 2 records from 2nd request
  • client receives 2 records

Another solution to waiting for 2s is to make the merging of results in start() for example.

Which issue(s) this PR fixes or relates to

Fixes #9970

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Some of the tests assume that we'd fetch all produced records in one go. The Kafka protocol doesn't guarantee that and sometimes kfake also returns less than that. This means that sometimes the tests would finish early before we've fetched all records.

This PR introduced `logPollFetches` which loops over `fetchers.PollFetches` multiple times until a timeout or until we fetch the expected number of records.

I set the timeout to 2s because of the MinBytesMaxWait of 1s in tests ([code](https://github.com/grafana/mimir/blob/96c92c99ef5599d4507c2aa9f6d7ddf42d5beec2/pkg/storage/ingest/fetcher_test.go#L1373)):

* fetcher receives 1 record (1st request)
* fetcher sends the `fetchResult` to `fetchWant.result`
* fetcher receives 2 records (2nd request)
* fetcher can't send the result to `fetchWant.result` because `start()` isn't receiving on the channel; `start()` is trying to send to `orderedFetches`
* fetcher start another (3rd) attempt; the attempt takes 1s (`MaxWaitMillis` in the Fetch request)
* client calls `PollFetches`, reads from orderedFetches; receives the first `Fetches` with 1 records; `start()` goes back to waiting on `fetchWant.result`
* client calls `PollFetches`, `start()` is still waiting on `fetchWant.result`
* fetcher receives the response to 3rd request with no records; sends the buffered 2 records from 2nd request
* client receives 2 records

Another solution to waiting for 2s is to make the merging of results in `start()` for example.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner November 21, 2024 08:55
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov force-pushed the dimitar/ingest/add-long-polling-to-TestConcurrentFetchers branch from 4303e05 to ce89479 Compare November 21, 2024 09:08
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov merged commit d9c223f into main Nov 21, 2024
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/ingest/add-long-polling-to-TestConcurrentFetchers branch November 21, 2024 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flaky TestConcurrentFetchers/fetch_records_produced_after_startup
2 participants