Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: fix flaky TestConcurrentFetchers #9926

Merged

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Nov 17, 2024

There's a race between produceCtx being cancelled and producing the record with it. So we increment the number of produced records, but then producing the record fails. In the end we expect to receive that record because we counted it.

This is the error that I saw in CI

--- FAIL: TestConcurrentFetchers (1.22s)
    --- FAIL: TestConcurrentFetchers/update_concurrency_with_continuous_production (10.01s)
        logging.go:33: level info component kafka_client msg immediate metadata update triggered why from user ForceMetadataRefresh
        logging.go:33: level info component kafka_client msg producing to a new topic for the first time, fetching metadata to learn its partitions topic test-topic
        logging.go:33: level info component kafka_client msg immediate metadata update triggered why forced load because we are producing to a topic for the first time
        logging.go:33: level info component kafka_client msg done waiting for metadata for new topic topic test-topic
        logging.go:33: level info component kafka_client msg initializing producer id
        logging.go:33: level info component kafka_client msg producer id initialization success id 2916317827932012471 epoch 0
        reader_test.go:2339:
            	Error Trace:	/__w/mimir/mimir/pkg/storage/ingest/reader_test.go:2339
            	            				/__w/mimir/mimir/pkg/storage/ingest/fetcher_test.go:650
            	            				/usr/local/go/src/runtime/asm_amd64.s:1700
            	Error:      	Received unexpected error:
            	            	context canceled
            	Test:       	TestConcurrentFetchers/update_concurrency_with_continuous_production
        fetcher_test.go:692:
            	Error Trace:	/__w/mimir/mimir/pkg/storage/ingest/fetcher_test.go:692
            	Error:      	Not equal:
            	            	expected: 782
            	            	actual  : 783
            	Test:       	TestConcurrentFetchers/update_concurrency_with_continuous_production
            	Messages:   	Should not fetch more records than produced
        fetcher_test.go:706: Total produced: 783, Total fetched: 782
        fetcher_test.go:707: Fetched with initial concurrency: 201
        fetcher_test.go:708: Fetched with high concurrency: 290
        fetcher_test.go:709: Fetched with low concurrency: 291
FAIL

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #9916

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

There's a race between produceCtx being cancelled and producing the record with it.

This is the error that I saw in CI

```
--- FAIL: TestConcurrentFetchers (1.22s)
    --- FAIL: TestConcurrentFetchers/update_concurrency_with_continuous_production (10.01s)
        logging.go:33: level info component kafka_client msg immediate metadata update triggered why from user ForceMetadataRefresh
        logging.go:33: level info component kafka_client msg producing to a new topic for the first time, fetching metadata to learn its partitions topic test-topic
        logging.go:33: level info component kafka_client msg immediate metadata update triggered why forced load because we are producing to a topic for the first time
        logging.go:33: level info component kafka_client msg done waiting for metadata for new topic topic test-topic
        logging.go:33: level info component kafka_client msg initializing producer id
        logging.go:33: level info component kafka_client msg producer id initialization success id 2916317827932012471 epoch 0
        reader_test.go:2339:
            	Error Trace:	/__w/mimir/mimir/pkg/storage/ingest/reader_test.go:2339
            	            				/__w/mimir/mimir/pkg/storage/ingest/fetcher_test.go:650
            	            				/usr/local/go/src/runtime/asm_amd64.s:1700
            	Error:      	Received unexpected error:
            	            	context canceled
            	Test:       	TestConcurrentFetchers/update_concurrency_with_continuous_production
        fetcher_test.go:692:
            	Error Trace:	/__w/mimir/mimir/pkg/storage/ingest/fetcher_test.go:692
            	Error:      	Not equal:
            	            	expected: 782
            	            	actual  : 783
            	Test:       	TestConcurrentFetchers/update_concurrency_with_continuous_production
            	Messages:   	Should not fetch more records than produced
        fetcher_test.go:706: Total produced: 783, Total fetched: 782
        fetcher_test.go:707: Fetched with initial concurrency: 201
        fetcher_test.go:708: Fetched with high concurrency: 290
        fetcher_test.go:709: Fetched with low concurrency: 291
FAIL
```

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner November 17, 2024 18:12
@dimitarvdimitrov dimitarvdimitrov merged commit 60e15f1 into main Nov 18, 2024
29 checks passed
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/ingest/fix-flaky-TestConcurrentFetchers branch November 18, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky TestConcurrentFetchers
3 participants