Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TA in clusters with global proxy #3187

Merged
merged 3 commits into from
Aug 5, 2024

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Jul 31, 2024

Operator image: docker.io/pavolloffay/opentelemetry-operator:dev-b67d063e-1722432504

Description:

See the changelog for more info.

On clusters with global proxy following error was happening on the collector

2024-07-22T21:11:58.189Z       info    [service@v0.102.1/service.go:113](mailto:service@v0.102.1/service.go:113)       Setting up own telemetry...

2024-07-22T21:11:58.189Z       info    [service@v0.102.1/telemetry.go:96](mailto:service@v0.102.1/telemetry.go:96)      Serving metrics {"address": ":8888", "level": "Normal"}

2024-07-22T21:11:58.189Z       info    [exporter@v0.102.1/exporter.go:275](mailto:exporter@v0.102.1/exporter.go:275)     Development component. May change in the future.  {"kind": "exporter", "data_type": "metrics", "name": "debug"}

2024-07-22T21:11:58.191Z       info    [service@v0.102.1/service.go:180](mailto:service@v0.102.1/service.go:180)       Starting otelcol...     {"Version": "0.102.1", "NumCPU": 12}

2024-07-22T21:11:58.191Z       info    extensions/extensions.go:34    Starting extensions...

2024-07-22T21:11:58.191Z       info        [prometheusreceiver@v0.102.0/metrics_receiver.go:279](mailto:prometheusreceiver@v0.102.0/metrics_receiver.go:279)   Starting discovery manager        {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}

2024-07-22T21:11:58.192Z       info        [prometheusreceiver@v0.102.0/metrics_receiver.go:121](mailto:prometheusreceiver@v0.102.0/metrics_receiver.go:121)   Starting target allocator discovery      {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}

2024-07-22T21:11:58.265Z       error        [prometheusreceiver@v0.102.0/metrics_receiver.go:154](mailto:prometheusreceiver@v0.102.0/metrics_receiver.go:154)   Failed to retrieve job list        {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "yaml: line 6: mapping values are not allowed in this context"}

[github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).syncTargetAllocator](http://github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).syncTargetAllocator)

        [github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:154](mailto:github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:154)

[github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).startTargetAllocator](http://github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).startTargetAllocator)

        [github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:123](mailto:github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:123)

[github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).Start](http://github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver.(*pReceiver).Start)

        [github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:107](mailto:github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver@v0.102.0/metrics_receiver.go:107)

[go.opentelemetry.io/collector/service/internal/graph.(*Graph).StartAll](http://go.opentelemetry.io/collector/service/internal/graph.(*Graph).StartAll)

        [go.opentelemetry.io/collector/service@v0.102.1/internal/graph/graph.go:421](mailto:go.opentelemetry.io/collector/service@v0.102.1/internal/graph/graph.go:421)

[go.opentelemetry.io/collector/service.(*Service).Start](http://go.opentelemetry.io/collector/service.(*Service).Start)

        [go.opentelemetry.io/collector/service@v0.102.1/service.go:198](mailto:go.opentelemetry.io/collector/service@v0.102.1/service.go:198)

[go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents](http://go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents)

        [go.opentelemetry.io/collector/otelcol@v0.102.1/collector.go:223](mailto:go.opentelemetry.io/collector/otelcol@v0.102.1/collector.go:223)

[go.opentelemetry.io/collector/otelcol.(*Collector).Run](http://go.opentelemetry.io/collector/otelcol.(*Collector).Run)

        [go.opentelemetry.io/collector/otelcol@v0.102.1/collector.go:277](mailto:go.opentelemetry.io/collector/otelcol@v0.102.1/collector.go:277)

[go.opentelemetry.io/collector/otelcol.NewCommand.func1](http://go.opentelemetry.io/collector/otelcol.NewCommand.func1)

        [go.opentelemetry.io/collector/otelcol@v0.102.1/command.go:35](mailto:go.opentelemetry.io/collector/otelcol@v0.102.1/command.go:35)

[github.com/spf13/cobra.(*Command).execute](http://github.com/spf13/cobra.(*Command).execute)

        [github.com/spf13/cobra@v1.8.0/command.go:983](mailto:github.com/spf13/cobra@v1.8.0/command.go:983)

[github.com/spf13/cobra.(*Command).ExecuteC](http://github.com/spf13/cobra.(*Command).ExecuteC)

        [github.com/spf13/cobra@v1.8.0/command.go:1115](mailto:github.com/spf13/cobra@v1.8.0/command.go:1115)

[github.com/spf13/cobra.(*Command).Execute](http://github.com/spf13/cobra.(*Command).Execute)

        [github.com/spf13/cobra@v1.8.0/command.go:1039](mailto:github.com/spf13/cobra@v1.8.0/command.go:1039)

main.runInteractive

        [github.com/os-observability/redhat-opentelemetry-collector/main.go:53](http://github.com/os-observability/redhat-opentelemetry-collector/main.go:53)

main.run

        [github.com/os-observability/redhat-opentelemetry-collector/main_others.go:10](http://github.com/os-observability/redhat-opentelemetry-collector/main_others.go:10)

main.main

        [github.com/os-observability/redhat-opentelemetry-collector/main.go:46](http://github.com/os-observability/redhat-opentelemetry-collector/main.go:46)

runtime.main

        runtime/proc.go:271

2024-07-22T21:11:58.265Z       info    [service@v0.102.1/service.go:243](mailto:service@v0.102.1/service.go:243)       Starting shutdown...

2024-07-22T21:11:58.265Z       info    extensions/extensions.go:59    Stopping extensions...

2024-07-22T21:11:58.265Z       info    [service@v0.102.1/service.go:257](mailto:service@v0.102.1/service.go:257)       Shutdown complete.

Error: cannot start pipelines: yaml: line 6: mapping values are not allowed in this context

2024/07/22 21:11:58 collector server run finished with error: cannot start pipelines: yaml: line 6: mapping values are not allowed in this context

Link to tracking Issue(s):

  • Resolves: #issue-number

Testing:

Documentation:

Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@pavolloffay pavolloffay requested a review from a team July 31, 2024 13:17
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
@@ -5,6 +5,7 @@ metadata:
creationTimestamp: null
name: targetallocator-kubernetessd
spec:
namespace: chainsaw-targetallocator-kubernetessd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to using the static nemaspaces for the tests that assert collector config map with TA endpoint (now it contains namespace).

We will be able to get rid of this once we migrate to v1beta1 and use the namespace substitution provided by chainsaw.

@@ -279,15 +279,15 @@ func TestAddHTTPSDConfigToPromConfig(t *testing.T) {
"job_name": "test_job",
"http_sd_configs": []interface{}{
map[string]interface{}{
"url": fmt.Sprintf("http://%s:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, url.QueryEscape("test_job")),
"url": fmt.Sprintf("http://%s.default.svc.cluster.local:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, url.QueryEscape("test_job")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should always work... do we have any concerns around custom DNS resolvers? i guess this would probably fail today for those use cases, but want to be sure this couldn't possibly break a user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow you. Could you please explain more how this fix could break uses with custom DNS resolver?

Using default.svc.cluster.local is pretty standard on k8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i guess my concern is for users with custom DNS policies set https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config. I think i'm being overly cautious.

@pavolloffay pavolloffay merged commit b20fbd5 into open-telemetry:main Aug 5, 2024
33 checks passed
@foolusion
Copy link
Contributor

I believe this change breaks us. Why is this hardcoded as svc.cluster.local. I assume many people use custom cluster-domain.

@jaronoff97
Copy link
Contributor

@foolusion sorry to hear that, would you be able to open a new issue explaining the bug you're running into?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants