Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsupervisor] RemoteConfigStatus is not populated with failed on invalid config #34785

Open
Asarew opened this issue Aug 21, 2024 · 2 comments
Labels

Comments

@Asarew
Copy link

Asarew commented Aug 21, 2024

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

When passing down "invalid" remote configuration from the otel controller to the supervisor, the supervisor doesn't report back in the RemoteConfigStatus status == failed. It does report back Unhealthy in the ComponentHealth with a LastError, but relying on that seems to break the opamp specification and it doesn't specify any details.

What is happening:

  1. Pushed down valid yaml but with invalid collector config:
    &protobufs.AgentRemoteConfig{
        Config: &protobufs.AgentConfigMap{
          ConfigHash: []byte("abc123")
          ConfigMap: map[string]*protobufs.AgentConfigFile{
            "": &protobufs.AgentConfigFile{
                  ContentType: "text/yaml"
                  Body: []byte(`
                    receivers:
                      nop:
                    exporters:
                      nop:
                    service:
                      pipelines:
                        traces/3:
                          receivers: [nop]
                          exporters: [nop]
                    force_invalid:
                      config:
                        because: "of unknown fields"
                  `)
          },
        },
    }
  2. First message send by supervisor has RemoteConfigStatus: (with corresponding LastRemoteConfigHash)
    &protobufs.RemoteConfigStatus{
        LastRemoteConfigHash: "abc123"
        Status: protobufs.RemoteConfigStatuses_RemoteConfigStatuses_APPLIED
    }
  3. receive ComponentHealth.Healthy == false every 5 seconds with ComponentHealth.LastError:
    Agent process PID={*} exited unexpectedly, exit code=1. Will restart in a bit...
    
  4. agent.log file gets rewritten every 5 seconds with:
    Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
    
    '' has invalid keys: force_invalid
    2024/08/21 13:01:42 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
    
    '' has invalid keys: force_invalid
    
    

Describe the solution you'd like

Call the collector validate command before starting and the agent. if that fails report the error message back in the RemoteConfigStatus.ErrorMessage with the correct status of Failed.

Describe alternatives you've considered

"Reuse" the ComponentHealth as the RemoteConfigStatus for now, but in my opinion that's a bad implementation of the opamp spec from both the controller as the supervisor.

Additional context

No response

@Asarew Asarew added enhancement New feature or request needs triage New item requiring triage labels Aug 21, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@BinaryFissionGames
Copy link
Contributor

Yep, this is absolutely something that's missing right now. It's tracked here:
#21079

Looks like there was a PR opened for this but it slipped through the cracks somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants