Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/spanmetrics] Fix flaky test #18024

Merged
merged 3 commits into from
Jan 26, 2023

Conversation

albertteoh
Copy link
Contributor

Description: Fixes a flaky test caused by a race condition between when a WaitGroup is given the signal that all work is "done", and the writing and flushing of logs afterwards to be asserted against.

The flaky test failure can be reliably reproduced by inserting a sleep in the error handling block before the log statement like so:

	if err := p.metricsExporter.ConsumeMetrics(ctx, m); err != nil {
		time.Sleep(100 * time.Millisecond) // Reproduce flaky test failure
		p.logger.Error("Failed ConsumeMetrics", zap.Error(err))
		return
	}

This is solved with assert.Eventually that polls for the presence of logs every 10 ms in the observer over a max period of 1 second.

Link to tracking Issue: Fixes #18014

Testing: Executed tests locally to confirm they pass, even with the additional sleep period to reproduce the test failure.

@albertteoh albertteoh requested review from a team and dmitryax January 25, 2023 11:55
@github-actions github-actions bot added the processor/spanmetrics Span Metrics processor label Jan 25, 2023
@runforesight
Copy link

runforesight bot commented Jan 25, 2023

Foresight Summary

    
Major Impacts

TestReceiveLogs ❌ failed 1 times in 4 runs (25% fail rate).
TestReceiveLogs/1_log_event_per_payload_(configured_max_content_length_is_same_as_event_size) ❌ failed 1 times in 4 runs (25% fail rate).
build-and-test-windows duration(9 seconds) has decreased 41 minutes 6 seconds compared to main branch avg(41 minutes 15 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 9 seconds (41 minutes 6 seconds less than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  check-links workflow has finished in 53 seconds (58 seconds less than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 1 minute 1 second (1 minute 27 seconds less than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  tracegen workflow has finished in 1 minute 3 seconds (1 minute 25 seconds less than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
publish-stable -     🔗  N/A See Details
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 16 seconds and finished at 25th Jan, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  load-tests workflow has finished in 7 minutes 34 seconds (6 minutes 49 seconds less than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 19  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 14 minutes 15 seconds (⚠️ 6 minutes 42 seconds more than main branch avg.) and finished at 25th Jan, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

 build-and-test workflow has finished in 23 seconds (51 minutes 49 seconds less than main branch avg.) and finished at 26th Jan, 2023.


Job Failed Steps Tests
unittest-matrix (1.19, exporter) N/A  ✅ 2429  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, exporter) N/A  ✅ 2429  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-1) N/A  ✅ 1896  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-1) N/A  ✅ 1896  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) N/A  ✅ 4629  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, other) N/A  ✅ 4629  ❌ 0  ⏭ 0    🔗 See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

@albertteoh
Copy link
Contributor Author

CI test failure seems to not be related to this PR:

=== RUN   TestReceiveLogs/1_log_event_per_payload_(configured_max_content_length_is_same_as_event_size)
    client_test.go:658: 
        	Error Trace:	/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/exporter/splunkhecexporter/client_test.go:658
        	Error:      	"{\"time\":0.002,\"host\":\"myhost\",\"source\":\"myapp\",\"sourcetype\":\"myapp-type\",\"index\":\"myindex\",\"event\":\"mylog\",\"fields\":{\"custom\":\"custom\",\"otel.log.name\":\"0_0_2\"}}" does not contain "\"otel.log.name\":\"0_0_1\""

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @albertteoh

@dmitryax dmitryax merged commit 02aa38d into open-telemetry:main Jan 26, 2023
@albertteoh albertteoh deleted the fix-flaky-test branch January 26, 2023 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/spanmetrics Span Metrics processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test TestProcessorConsumeMetricsErrors
3 participants