Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is 0.0.0.0 the default host? #8510

Closed
yurishkuro opened this issue Sep 23, 2023 · 28 comments
Closed

Why is 0.0.0.0 the default host? #8510

yurishkuro opened this issue Sep 23, 2023 · 28 comments

Comments

@yurishkuro
Copy link
Member

I am running OTLP receivers with default settings:

receivers:
  otlp:
    protocols:
      grpc:
      http:

The start up logs are littered with these warnings:

internal@v0.85.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}

Why isn't the default config for those exporters is such that it follows the mentioned best practices?

@mx-psi
Copy link
Member

mx-psi commented Sep 25, 2023

Previous discussion can be found on #6151. If I recall correctly ultimately we decided on a SIG meeting around that time not to change the default for backwards compatibility reasons.

@mx-psi
Copy link
Member

mx-psi commented Sep 25, 2023

If we were to change it, what default value would you suggest? localhost? Another problem here is that there is no good default value that works in all cases. Specifically, the containerized cases usually depend on the network configuration, we are able to provide a good default on the Helm chart but not in other cases.

@yurishkuro
Copy link
Member Author

If we believe the default value is unsafe, it should be changed (despite backwards compatibility concerns). If it is safe (and appropriate in some scenarios) it should not log the warning.

The current situation is a bad user experience.

@atoulme
Copy link
Contributor

atoulme commented Sep 25, 2023

I believe 127.0.0.1 should be default. There is no good reason to have warnings show up on the default setting.
We should have an info message indicating we're using 127.0.0.1.
For our distros and Docker image, the default config should have a way to override the default by using an environment variable. That should be documented. The env var should be a well known name that we standardize on.

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

the default config should have a way to override the default by using an environment variable

We don't support this (maybe we should), see #4384 for a related feature request.

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

Filed open-telemetry/opentelemetry-collector-releases#408 so we can address the Docker case independently from the general default.

A quick search on contrib reveals at least the following components in that repository are affected:

But there may be others, e.g. the sapm receiver which sets the endpoint to :7276, also a valid syntax for an unspecified address.

@Aneurysm9
Copy link
Member

the default config should have a way to override the default by using an environment variable

We don't support this (maybe we should), see #4384 for a related feature request.

While I think that would be a nice addition to the env confmap provider, I don't think it's necessary (or sufficient) here. All default configuration can be overridden by explicit configuration and all explicit configuration can be provided through the environment. Allowing fallbacks to be specified for the env provider would still require changes to the input configuration to specify that those fields should come from that provider and would need to specify the fallback at point of use.

Making some configuration fields take input from the environment in the absence of explicit configuration would be a much more significant change and one that I worry would make it even more difficult to reason about the effective configuration that would result from any given collector invocation.

@yurishkuro
Copy link
Member Author

I agree, fallback env var seems like an overkill, and a slippery slope. Hopefully most people manage configurations as code / programmatically, so they can always do this themselves by reusing some base var={desired-ip} in their configs.

This issue is about the warning, we can either change the default or remove the warning. I personally would be fine with just removing the warning, because 0.0.0.0 is a reasonable default in many cases (e.g. :8080 string in Go means 0.0.0.0:8080 and nobody bats an eye at that).

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

For context (see issue linked on my first message) the main rationale for the warning was that this is a known security weakness (CWE-1327) and that we had a specific vulnerability (GHSA-69cg-p879-7622) where the current default made the security implications worse than they could have been.

Since this is ultimately a security-related decision I would be interested in what the @open-telemetry/sig-security-maintainers think.

@yurishkuro
Copy link
Member Author

GHSA-69cg-p879-7622 doesn't look like it's related at all: "HTTP/2 connection can hang during closing if shutdown were preempted by a fatal error"

CWE-1327 - just because it exists doesn't make it a valid one. Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

@atoulme
Copy link
Contributor

atoulme commented Sep 26, 2023

CWE-1327 - just because it exists doesn't make it a valid one. Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

We could and should :)

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

GHSA-69cg-p879-7622 doesn't look like it's related at all: "HTTP/2 connection can hang during closing if shutdown were preempted by a fatal error"

This is described in more detail in #6151 and was discussed on the SIG meeting at the time. Copying from the issue:

While some level of impact was unavoidable on this receiver since exposing an HTTP/2 server is part of its core functionality, the OTLP receiver default endpoints (0.0.0.0:4317 and 0.0.0.0:4318) made it so that this vulnerability could be leveraged more easily than it could have been.

So the relationship is that because of binding to all network interfaces the vulnerability was easier to exploit.

CWE-1327 - just because it exists doesn't make it a valid one.

The fact that is listed as CWE should count as some evidence that this is important; it's fair to argue against it but you should provide equivalent evidence against it.

Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

Actually this is a good case for doing something for 0.0.0.0. We enable TLS by default (see here), so I don't think that's a good argument against nudging users to be secure by default when it comes to choosing what network interfaces to bind to.

The equivalent to the approach we took for TLS would be to change the default to localhost. Here instead we chose to warn people about it and changing the defaults in downstream tools (e.g. the Helm chart) because of backwards compatibility concerns. The only difference is that we got to the TLS case sooner when we had fewer users and were making breaking changes more frequently.

@jpkrohling
Copy link
Member

I think we warned users long enough already, we could indeed consider using localhost as the default host to bind the port to.

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

Inspired by #7769 (comment) I want to vote on what we should do here.

What should we do with this issue?

🚀 Do nothing, leave the warning as is, don't change the default
❤️ Change default to localhost on all receivers and remove the warning
🎉 Change default to localhost on all receivers but don't remove the warning
👀 Remove the warning, make no changes to the defaults

Vote for all options that are acceptable to you

@codeboten
Copy link
Contributor

@mx-psi another option would be to make this a feature-gate to give people a chance to change over more smoothly

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

@mx-psi another option would be to make this a feature-gate to give people a chance to change over more smoothly

I'd say we can use the poll to decide what the end result should be and we can discuss the process to get there once we know that. Does that sound okay?

@hughesjj
Copy link

hughesjj commented Oct 2, 2023

Well, if we're voting, so far "change default to localhost" is winning, leaning towards additionally removing the warning.

@mx-psi
Copy link
Member

mx-psi commented Oct 3, 2023

Well, if we're voting, so far "change default to localhost" is winning, leaning towards additionally removing the warning.

Indeed. I think doing this with a feature gate as @codeboten suggested makes sense. There is one slight inconvenience, which is that this feature gate must be shared across core and contrib modules. One way to work around this is to, on contrib, VisitAll gates in the global registry until getting one with the matching ID(). Another way is to have it as part of the public API of core. I will open a couple PRs with the first approach by the end of the week.

@mx-psi
Copy link
Member

mx-psi commented Oct 5, 2023

PTAL at #8622 as a first step, I am working on the contrib PR but would want to validate the design here first. I ended up exposing this as part of the public API since otherwise it is hard to reason about initialization and registration order.

@bboreham
Copy link
Contributor

Hi, just wanted to suggest that you have somewhere in the docs a warning about the case where localhost gets resolved via DNS to some real address and things go very weird. This has happened enough times to me that I reflexively type 127.0.0.1 instead, but I recognize that ipv6 is a thing.

@mx-psi
Copy link
Member

mx-psi commented Jan 22, 2024

@bboreham moved this to #9338, thanks for the suggestion :)

mx-psi added a commit that referenced this issue Jan 24, 2024
…ocalhost defaults for server-like components (#8622)

**Description:** 

- Define `component.UseLocalHostAsDefaultHost` in the
`internal/localhostgate` package.
- Define `featuregate.ErrIsAlreadyRegistered` error, returned by
`Register` when a gate is already registered.
- Adds support for the localhost gate on the OTLP receiver.

This PR does not remove the current warning in any way, we can remove
this separately.

**Link to tracking Issue:** Updates #8510

**Testing:** Adds unit tests

**Documentation:** Document on OTLP receiver template and add related
logging.
mx-psi added a commit that referenced this issue Jun 27, 2024
…#10352)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

<!-- Issue number if applicable -->

Promotes `component.UseLocalHostAsDefaultHost` feature gate to beta.

#### Link to tracking issue

Updates #8510
mx-psi added a commit to open-telemetry/opentelemetry-collector-contrib that referenced this issue Jun 27, 2024
…eta (#33658)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Counterpart to open-telemetry/opentelemetry-collector/pull/10352.

**Link to tracking Issue:**
open-telemetry/opentelemetry-collector/issues/8510
@yurishkuro
Copy link
Member Author

Odd. v0.104.0 release notes say:

otlpreceiver: Switch to localhost as the default for all endpoints

But when I upgrade Jaeger to this version I am still getting warnings:

2024-07-01T19:38:52.041-0400	info	otlpreceiver@v0.104.0/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-07-01T19:38:52.041-0400	warn	internal@v0.104.0/warning.go:42	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.	{"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-07-01T19:38:52.041-0400	info	otlpreceiver@v0.104.0/otlp.go:152	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}

we use default config

receivers:
  otlp:
    protocols:
      grpc:
      http:

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jul 2, 2024

I haven't been able to reproduce this yet, the otelcorecol build in the repo produces:

❯ ./bin/otelcorecol_darwin_arm64 --config ./bin/config.yaml
2024-07-01T19:58:27.751-0600    info    service@v0.104.0/service.go:115 Setting up own telemetry...
2024-07-01T19:58:27.751-0600    info    service@v0.104.0/telemetry.go:96        Serving metrics {"address": ":8888", "level": "Normal"}
2024-07-01T19:58:27.752-0600    info    exporter@v0.104.0/exporter.go:280       Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-07-01T19:58:27.753-0600    info    service@v0.104.0/service.go:193 Starting otelcorecol... {"Version": "0.104.0-dev", "NumCPU": 10}
2024-07-01T19:58:27.753-0600    info    extensions/extensions.go:34     Starting extensions...
2024-07-01T19:58:27.753-0600    info    otlpreceiver@v0.104.0/otlp.go:102       Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "localhost:4317"}
2024-07-01T19:58:27.757-0600    info    otlpreceiver@v0.104.0/otlp.go:152       Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "localhost:4318"}
2024-07-01T19:58:27.757-0600    info    service@v0.104.0/service.go:219 Everything is ready. Begin running and processing data.

with this config

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  debug:

service:
  pipelines:
    traces:
      receivers:
        - otlp
      exporters:
        - debug

Also tested with otelcontribcol in contrib and the k8s distribution image.

All of those have main.gos that are generated by ocb and jaeger isn't, I wonder if that is related to the issue.

@yurishkuro
Copy link
Member Author

is the feature gate UseLocalHostAsDefaultHost being set in main's? Maybe we could set it manually in Jaeger. I am not too familiar with feature gates - can I override their defaults while still letting the user provide their own overrides?

@mx-psi
Copy link
Member

mx-psi commented Jul 2, 2024

It seems like the issue is specific to how Jaeger uses the Collector APIs, but I can't tell exactly what's going wrong.

@yurishkuro

The default value is set here:

var UseLocalHostAsDefaultHostfeatureGate = mustRegisterOrLoad(

The override is passed here:

flagSet := flags(featuregate.GlobalRegistry())

and eventually Cobra calls the Set method on the flag which then sets the flag using the registry Set value

This should work the same on Jaeger (you are calling NewCommand here).

If you want to override the value on Jaeger using Go code you can do featuregate.GlobalRegistry().Set(id, val). It also seems like the --feature-gates CLI flag should work (I haven't tested that)

@yurishkuro
Copy link
Member Author

@mx-psi I figured it out - turns out I was trying with v104 of collector but still v103 of contrib. And it turns out the same feature is defined in both core and contrib, with the same ID, and whichever one wins depends on module loading order.

@marcalff
Copy link
Member

marcalff commented Jul 3, 2024

The release notes for the collector:

only mention this issue, but not the PR number that changes the endpoint to localhost by default.

A couple of questions:

  • could anyone link the associated PR ?
  • can this issue be closed then ?

@mx-psi
Copy link
Member

mx-psi commented Jul 3, 2024

@yurishkuro Ah, right, that makes sense indeed. I would argue we don't support using versions that are not aligned of core and contrib, but for future feature gates it may make sense to do this differently.

@marcalff

could anyone link the associated PR ?

The PRs that should have closed this are #10352 and open-telemetry/opentelemetry-collector-contrib/pull/33658.

There is also a long tail of older PRs working on this, see #6267, #8622, #6959, all the PRs linked on open-telemetry/opentelemetry-collector-contrib#30702, and the updates to other projects, default configs or docs made by open-telemetry/opentelemetry.io/pull/3847, open-telemetry/opentelemetry-helm-charts/pull/1012 and open-telemetry/opentelemetry-collector-releases/pull/408.

can this issue be closed then ?

Yes! Closing :)

@mx-psi mx-psi closed this as completed Jul 3, 2024
codeboten pushed a commit that referenced this issue Jul 3, 2024
…10529)

Fixes log intended to be logged when the feature gate is enabled, not
disabled.

#### Link to tracking issue

Relates to #8510, updates #10352

---------

Co-authored-by: Yang Song <songy23@users.noreply.github.com>
codeboten pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants