Skip to content

Status subcommand reporting agent status for otel mode - Phase 2#4047

Merged
cmacknz merged 11 commits intoelastic:mainfrom
michalpristas:otel/status
Jan 22, 2024
Merged

Status subcommand reporting agent status for otel mode - Phase 2#4047
cmacknz merged 11 commits intoelastic:mainfrom
michalpristas:otel/status

Conversation

@michalpristas
Copy link
Copy Markdown
Contributor

@michalpristas michalpristas commented Jan 9, 2024

This is a phase 2 of making status for otel work properly

Bolds are differences with previous Phase

Phase 1:

  • agent capable of running in normal OR otel mode
  • coordinator not running at all
  • otel collector running in a separate thread
  • otel agent not able to report state

Phase 2: We are here

  • agent capable of running in normal OR otel mode
  • coordinator running with dumb config provider (no config)
  • otel collector running in a separate thread
  • status reports only agent status (otel status not reported)

Fixes partially: #3872

@michalpristas michalpristas added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team backport-skip skip-changelog labels Jan 9, 2024
@michalpristas michalpristas self-assigned this Jan 9, 2024
@michalpristas michalpristas marked this pull request as ready for review January 10, 2024 08:00
@michalpristas michalpristas requested a review from a team as a code owner January 10, 2024 08:00
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pierrehilbert
Copy link
Copy Markdown
Contributor

Failure related to #3342

@michalpristas michalpristas removed the request for review from faec January 10, 2024 08:23
@ycombinator
Copy link
Copy Markdown
Contributor

@michalpristas Just to clarify, phase 1 was implemented as part of #3800, right?

@ycombinator
Copy link
Copy Markdown
Contributor

ycombinator commented Jan 10, 2024

Also, I'm confused by this:

Bolds are differences with previous Phase

Phase 1
otel collector running in a separate thread

Phase 2
otel collector running in a separate thread

What is the difference (indicated by the bolding in Phase 2)?

@michalpristas
Copy link
Copy Markdown
Contributor Author

my bad

@ycombinator
Copy link
Copy Markdown
Contributor

I built Agent from this PR's branch, extracted it, and ran $ ./elastic-agent -c otel.yml status. I'm getting the following output:

Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /private/tmp/elastic-agent-8.12.0-SNAPSHOT-darwin-aarch64/elastic-agent.sock: connect: no such file or directory"
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html

Is this expected behavior, even with the changes in this PR?

Please ignore the 8.12 references above; that's because we're currently having an issue with building Agent from main with EXTERNAL=true; the workaround is to also specify BEAT_VERSION=8.12.0.


BTW, I re-wrote the issue that this PR is supposed to resolve to include more details: #3872. Not sure if that changes the requirements for this PR or if a follow up PR will be needed to completely resolve that issue. Apologies if the original issue description was not clear enough.

@michalpristas
Copy link
Copy Markdown
Contributor Author

michalpristas commented Jan 12, 2024

the workflow i tested

  • mage package agent
  • sudo ./elastic-agent run -c otel.yml (alt test without -c to see it works with normal agent behavior)
  • keep it running
  • other terminal sudo ./elastic-agent status it should show output -c flag is not needed

what are your expectations of running ./elastic-agent -c otel.yml status on its own?

@ycombinator
Copy link
Copy Markdown
Contributor

ycombinator commented Jan 12, 2024

what are your expectations of running ./elastic-agent -c otel.yml status on its own?

None, I had a brain fade 😊. Of course this cannot expected to work as we need a running Agent + GRPC server for status to talk to. 🤦

Your workflow makes sense for running Agent in OTel mode. I will test it with this PR. Thanks.

@ycombinator
Copy link
Copy Markdown
Contributor

Confirmed that the steps listed in #4047 (comment) work as expected. Reviewing the implementation now...

Comment thread internal/pkg/agent/cmd/run.go Outdated
Comment thread internal/pkg/agent/cmd/run.go Outdated
@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Jan 18, 2024

I think this is probably mostly covered by integration tests and the SonarQube complaint can be bypassed.

status reports only agent status (otel status not reported)

It seems to me like the integration test coverage could be improved to cover at least this, by running elastic-agent status in one of the otel tests and confirming it succeeds. Before this change this command would unconditionally fail wouldn't it?

Right now you have no net new test coverage at all, but there is actually added functionality to cover here.

@elastic-sonarqube
Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions

0.0% 0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@michalpristas
Copy link
Copy Markdown
Contributor Author

michalpristas commented Jan 19, 2024

status check added to TestOtelFileProcessing test

Copy link
Copy Markdown
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test!

@cmacknz
Copy link
Copy Markdown
Member

cmacknz commented Jan 22, 2024

Force merging, ignoring the code coverage complaint because this is now covered in the integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip enhancement New feature or request skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants