[processor/resourcedetection] The ec2 detector does not retry on initial failure #35936

atoulme · 2024-10-22T18:46:05Z

Component(s)

processor/resourcedetection/internal/aws/ec2

What happened?

Description

The ec2 resource detection processor attempts to connect to the metadata endpoint of AWS to get ec2 machine information on startup of the collector. If the connection to the metadata endpoint fails, the detector gives up. In some cases, the collector may run ahead of the network being ready on the machine, which may cause the collector to fail to get the information.

Steps to Reproduce

Shut down the network interface of the EC2 AWS instance
Start the collector on the EC2 AWS instance
Notice the ec2 detector reports an error in logs
Start the network interface of the EC2 instance

Expected Result

The collector should eventually add resource attributes with ec2 metadata

Actual Result

The collector never changes the output

Collector version

0.112.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions · 2024-10-22T18:46:22Z

Pinging code owners:

processor/resourcedetection: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dashpole · 2024-10-22T18:48:14Z

Is this specific to ec2, or does this apply to other resource detectors as well?

dmitryax · 2024-10-22T20:50:53Z

We have only seen this in ec2, but I assume it's something that potentially can be seen in other detectors.

The problem is that we assume that not all configured detectors are expected to provide data. We allow not applicable detectors to silently fail at the start time.

I believe the solution would be an additional backoff retry configuration option. If that's enabled, the processor will keep retrying. Maybe we should block the collector from starting until it succeeds or retries are exhausted.

atoulme · 2024-11-03T17:57:19Z

Specifically in this case, the code triggered is:

opentelemetry-collector-contrib/processor/resourcedetectionprocessor/internal/aws/ec2/ec2.go

Lines 80 to 83 in 1d12566

    
           if _, err = d.metadataProvider.InstanceID(ctx); err != nil { 
        
           	d.logger.Debug("EC2 metadata unavailable", zap.Error(err)) 
        
           	return pcommon.NewResource(), "", nil 
        
           }

github-actions · 2025-01-03T03:33:52Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/resourcedetection: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

atoulme added bug Something isn't working needs triage New item requiring triage and removed needs triage New item requiring triage labels Oct 22, 2024

github-actions bot added the processor/resourcedetection Resource detection processor label Oct 22, 2024

github-actions bot mentioned this issue Oct 29, 2024

Weekly Report: 2024-10-22 - 2024-10-29 #36039

Closed

github-actions bot added the Stale label Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/resourcedetection] The ec2 detector does not retry on initial failure #35936

[processor/resourcedetection] The ec2 detector does not retry on initial failure #35936

atoulme commented Oct 22, 2024

github-actions bot commented Oct 22, 2024

dashpole commented Oct 22, 2024

dmitryax commented Oct 22, 2024

atoulme commented Nov 3, 2024

github-actions bot commented Jan 3, 2025