Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded Otel Collector into Agent #3800

Merged
merged 62 commits into from
Dec 19, 2023
Merged

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Nov 22, 2023

What does this PR do?

This PR adds an ability to start agent in collector mode.
So far logic is simple. When specifying config using -c flag it checks the name of the config.
If config is otel, otlp, otlpcol then it starts in otel mode.

When name does not match but config contains service and exporters, importers then we assume collector config and start in otel mode.

Otherwise we start in a standard way.

On top of that, we provide custom config provider which serves a default value for logging output == stdout instead of otel default stderr.

Tested on darwin and windows. when changing elastic-agent.yml to contain otel config even service is started in otel mode

DoD from issue

  • A log message clearly states we're starting in OTel mode and its experimental status
  • We should not open any ports on the system for the coordinator
  • Log messages from OTel collector and agent should go to stdout

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

@michalpristas michalpristas added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team backport-skip labels Nov 22, 2023
@michalpristas michalpristas self-assigned this Nov 22, 2023
@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2023

Duplicating the test requirements here so we know when this is done, at least the first test should exist as part of this PR. The second could be a follow up.

Test requirements

We should start with some very basic e2e tests that ensure that we don't break OTel mode:

  • One test that uses an otlpreceiver and the file exporter to verify we can receive OTLP logs
  • One test that uses the filelogreceiver to ingest a very simple log file and exports it via OTLP to APM Server on ESS. Query the logs-apm* indices to verify the logs were delivered.

// not otel continue as usual
return runElasticAgent(ctx, cancel, override, stop, testingMode, fleetInitTimeout, modifiers...)

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would really like to see otel integrated into the coordinator and runtime system. That way we get all the status reporting, and the ability to run both at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirement for now was not to do that: https://github.com/elastic/ingest-dev/issues/2612#issue-1976319527
as i understood it.
if we decide to run both at the same time we need to figure out/revisit requirements regarding control ports, logging (you cannot pass logger to collector so far) and making sure otel config is embeddable into our config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshdover what was your idea of the scope for the initial integration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we didn't want to allow both to be ran at the same time, by not using the coordinator and runtime means that none of the status reporting will be correct. Don't we want elastic-agent status to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to keep it as simple as possible for now to allow us to experiment hands on with it. We don't have short-term plans to integrate OTel Collector into Agent mode, it will come later.

I think we can discuss how elastic-agent status should work - not against supporting that and using the coordinator to do it. But I still think we can merge this PR with this gap, aiming for very small chunks of work here.

@ycombinator
Copy link
Contributor

@michalpristas Left a couple of very minor comments/questions, mostly looks like some missing cleanup.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — we now have a foundation to build on — thanks!

@michalpristas
Copy link
Contributor Author

michalpristas commented Dec 18, 2023

e2e failing on endpoint not being able to connect to ES
@cmacknz @pierrehilbert could we force push? sonar is complaining about coverage but this PR is mostly tests (code lines wise besides config) not sure why it's so pessimistic.
e2e failures not related

internal/pkg/otel/run.go Outdated Show resolved Hide resolved
@cmacknz
Copy link
Member

cmacknz commented Dec 18, 2023

For reference this PR increases the size of the elastic-agent binary by 22 MB on my Mac.

du -sh data/elastic-agent-1af429/elastic-agent.app/Contents/MacOS/elastic-agent
 74M    data/elastic-agent-1af429/elastic-agent.app/Contents/MacOS/elastic-agent

du -sh ../elastic-agent-8.11.3-darwin-aarch64/data/elastic-agent-f4f6fb/elastic-agent.app/Contents/MacOS/elastic-agent
 52M    ../elastic-agent-8.11.3-darwin-aarch64/data/elastic-agent-f4f6fb/elastic-agent.app/Contents/MacOS/elastic-agent

Copy link

Quality Gate failed Quality Gate failed

Failed conditions

26.7% 26.7% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@michalpristas
Copy link
Contributor Author

failing on unrelated e2e again

@cmacknz
Copy link
Member

cmacknz commented Dec 19, 2023

Both test failures are known issues. I'm going to force merge this one.

@cmacknz cmacknz merged commit 1410d2a into elastic:main Dec 19, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants