Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP receivers fail with latest docker image (0.104.0) #33896

Closed
honestegg opened this issue Jul 3, 2024 · 4 comments
Closed

OTLP receivers fail with latest docker image (0.104.0) #33896

honestegg opened this issue Jul 3, 2024 · 4 comments
Labels
question Further information is requested

Comments

@honestegg
Copy link

honestegg commented Jul 3, 2024

Component(s)

No response

What happened?

Description

Unable to connect to the OTLP receiver in version 0.104.0 of the collector when using the docker image. This happens for GRPC or HTTP.

Potential Fix

This can be fixed by setting the endpoints to 0.0.0.0. Frustrating that this basic setup doesn't work "out of the box".

Collector version

v0.104.0

Environment information

Environment

Docker compose file:

services:
  otel-collector01:
    image: otel/opentelemetry-collector-contrib:0.104.0
    volumes:
      - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
    ports:
      - "4317:4317"
      - "4318:4318"
  aspire-dashboard:
    image: mcr.microsoft.com/dotnet/aspire-dashboard:latest
    environment:
      - DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS=true
    ports:
      - "18888:18888"
      - "18889:18889"

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:

exporters:
  otlp:
    endpoint: "aspire-dashboard:18889"
    tls:
      insecure: true

service:  
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

Log output

2024-07-03T21:46:07.8688781Z:Exporter failed send data to collector to {0} endpoint. Data will not be sent. Exception: {1}{http://localhost:4317/}{Grpc.Core.RpcException: Status(StatusCode="Unavailable", Detail="Error starting gRPC call. HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse) HttpIOException: The response ended prematurely while waiting for the next frame from the server. (ResponseEnded)", DebugException="System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse)")
 ---> System.Net.Http.HttpRequestException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse)
 ---> System.Net.Http.HttpIOException: An HTTP/2 connection could not be established because the server did not complete the HTTP/2 handshake. (InvalidResponse)
 ---> System.Net.Http.HttpIOException: The response ended prematurely while waiting for the next frame from the server. (ResponseEnded)
   at System.Net.Http.Http2Connection.<ReadFrameAsync>g__ThrowMissingFrame|61_1()
   at System.Net.Http.Http2Connection.ReadFrameAsync(Boolean initialFrame)
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.ProcessIncomingFramesAsync()
   at System.Net.Http.Http2Connection.SendHeadersAsync(HttpRequestMessage request, CancellationToken cancellationToken, Boolean mustFlush)
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.Http2Connection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at Grpc.Net.Client.Balancer.Internal.BalancerHttpHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Grpc.Net.Client.Internal.GrpcCall`2.RunCall(HttpRequestMessage request, Nullable`1 timeout)
   --- End of inner exception stack trace ---
   at Grpc.Net.Client.Internal.HttpClientCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
   at Grpc.Core.Interceptors.InterceptingCallInvoker.<BlockingUnaryCall>b__3_0[TRequest,TResponse](TRequest req, ClientInterceptorContext`2 ctx)
   at Grpc.Core.ClientBase.ClientBaseConfiguration.ClientBaseConfigurationInterceptor.BlockingUnaryCall[TRequest,TResponse](TRequest request, ClientInterceptorContext`2 context, BlockingUnaryCallContinuation`2 continuation)
   at Grpc.Core.Interceptors.InterceptingCallInvoker.BlockingUnaryCall[TRequest,TResponse](Method`2 method, String host, CallOptions options, TRequest request)
   at OpenTelemetry.Proto.Collector.Trace.V1.TraceService.TraceServiceClient.Export(ExportTraceServiceRequest request, CallOptions options)
   at OpenTelemetry.Proto.Collector.Trace.V1.TraceService.TraceServiceClient.Export(ExportTraceServiceRequest request, Metadata headers, Nullable`1 deadline, CancellationToken cancellationToken)
   at OpenTelemetry.Exporter.OpenTelemetryProtocol.Implementation.ExportClient.OtlpGrpcTraceExportClient.SendExportRequest(ExportTraceServiceRequest request, DateTime deadlineUtc, CancellationToken cancellationToken)}

Additional context

No response

@honestegg honestegg added bug Something isn't working needs triage New item requiring triage labels Jul 3, 2024
@mx-psi
Copy link
Member

mx-psi commented Jul 5, 2024

To follow best practices in security, we made a breaking change in v0.104.0 which you can see listed in our release changelogs (core, contrib, releases) that changes the default address from 0.0.0.0 to localhost. You can see this in the logs of your Collector:

otel-collector01-1  | 2024-07-05T11:47:36.977Z	info	otlpreceiver@v0.104.0/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "localhost:4317"}
otel-collector01-1  | 2024-07-05T11:47:36.978Z	info	otlpreceiver@v0.104.0/otlp.go:152	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "localhost:4318"}

There has been a warning related to this in some shape about this since v0.63.0 and a more specific one since v0.94.0. For example, changing your Docker compose configuration to point to v0.103.0 I can see this warning:

otel-collector01-1  | 2024-07-05T11:51:39.685Z	warn	localhostgate/featuregate.go:63	The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.	{"feature gate ID": "component.UseLocalHostAsDefaultHost"}

You can read more about why we did this and how to address the change in this blogpost.

In short, since v0.104.0, you need to start explicitly setting your receiver endpoints, since they now default to localhost. In your particular example you can set the endpoint host to 0.0.0.0 or to otel-collector01:

Example using 0.0.0.0, potentially unsafe
receivers:
  otlp:
    protocols:
      grpc:
         endpoint: 0.0.0.0:4317 # Expose in all network interfaces, may be unsafe depending on network configuration
      http:
         endpoint: 0.0.0.0:4318 # Expose in all network interfaces, may be unsafe depending on network configuration
Example using the container IP address (specific to Docker Compose)

As documented here you can use the service name to get its IP address in Docker compose, so this is also a valid solution. In this case we are only exposing it in a single network interface (still accessible from outside of Docker compose since you have ports).

receivers:
  otlp:
    protocols:
      grpc:
         endpoint: otel-collector01:4317 # Using the service name from your Docker compose file
      http:
         endpoint: otel-collector01:4318 # Using the service name from your Docker compose file

You can also temporarily revert back to the previous behavior by disabling the component.UseLocalHostAsDefaultHost feature gate but note that, as any other feature gate, it will be removed in a future release.

@mx-psi mx-psi added question Further information is requested and removed needs triage New item requiring triage bug Something isn't working labels Jul 5, 2024
@mx-psi
Copy link
Member

mx-psi commented Jul 5, 2024

Adding to this, if you see any examples that don't work out of the box in our official documentation or any of the repositories in the open-telemetry org please report them! We made an effort to keep our documentation updated for this (see e.g. open-telemetry/opentelemetry.io/pull/3847), but we may have missed some. Also, if you have any ideas on how we could have better communicated this change, please also let us know. We used all the tools we had (logs, changelog, the blog...) but I recognize that this is a big change and not all users may check these.

@honestegg
Copy link
Author

Thanks so much for the detailed response! I knew about the change to localhost, but wasn't sure about the right way to configure the collector to work with it. I was frustrated because I had just set up a collector for the first time last week and when it didn't work this week I was pulling my hair out trying to figure out why.

@mx-psi
Copy link
Member

mx-psi commented Jul 5, 2024

Sorry about that, the timing was unfortunate for sure in your case 😅

About

wasn't sure about the right way to configure the collector to work with it

I filed open-telemetry/opentelemetry-collector/issues/10548 to improve our documentation on that regard. I am also going to pin this issue for other people to find this more easily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants