Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Update prometheus to match OTel v0.120.0 #42748

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

andrzej-stencel
Copy link
Contributor

@andrzej-stencel andrzej-stencel commented Feb 18, 2025

Proposed commit message

Update github.com/prometheus/prometheus dependency to v0.300.1
by running go get github.com/prometheus/prometheus@v0.300.1.
This version matches the version used in OTel v0.120.0,
and this update is needed to include OTel v0.120.0 in Elastic Agent.
See https://github.com/elastic/elastic-agent/pull/6912/files#diff-33ef32bf6c23acb95f5902d7097b7a1d5128ca061167ec0716715b0b9eeaa5f6R496.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@andrzej-stencel andrzej-stencel requested a review from a team as a code owner February 18, 2025 13:47
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 18, 2025
Copy link
Contributor

mergify bot commented Feb 18, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @andrzej-stencel? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@andrzej-stencel andrzej-stencel added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-8.x Automated backport to the 8.x branch with mergify labels Feb 18, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 18, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@andrzej-stencel
Copy link
Contributor Author

cc @blakerouse for 👀 as you created a similar PR before.

Added `fallbackType == ""`  and `skipOMCTSeries == false`.
OMCT series are OpenMetrics CreatedTimestamp timeseries,
and I believe the default for this is false.
@andrzej-stencel
Copy link
Contributor Author

andrzej-stencel commented Feb 18, 2025

I've pushed a commit to fix the compilation error by adding missing parameters to the textparse.New call.

See textparse.New@v0.54.1 (before this change) and textparse.New@v0.300.1 (after this change).

I've set fallbackType argument to empty string "" and skipOMCTSeries to false.
OMCT series are OpenMetrics CreatedTimestamp timeseries,
and I believe the default for this is false.

@andrzej-stencel andrzej-stencel marked this pull request as ready for review February 18, 2025 14:47
@andrzej-stencel andrzej-stencel requested a review from a team as a code owner February 18, 2025 14:47
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Feb 18, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz cmacknz requested a review from a team February 18, 2025 15:48
@cmacknz
Copy link
Member

cmacknz commented Feb 18, 2025

Pinging @elastic/obs-infraobs-integrations as they own the prometheus integration that depends on this and are most likely to know how to test any impact this might have. https://github.com/elastic/integrations/blob/119156a21359b7a56756e91743b18d6970b9c643/packages/prometheus/manifest.yml#L35

@MichaelKatsoulis
Copy link
Contributor

In https://github.com/elastic/beats/blob/26fecc126fe80b648471e401a7b175cf2165ab1b/metricbeat/mb/testing/testdata.go#L119C6-L119C23 the default content type is application/json which in new textParse

https://github.com/prometheus/prometheus/blob/1f56e8492c31a558ccea833027db4bd7f8b6d0e9/model/textparse/interface.go#L128

func New(b []byte, contentType, fallbackType string, parseClassicHistograms, skipOMCTSeries bool, st *labels.SymbolTable) (Parser, error) {
	mediaType, err := extractMediaType(contentType, fallbackType)
	// err may be nil or something we want to warn about.

	switch mediaType {
	case "application/openmetrics-text":
		return NewOpenMetricsParser(b, st, func(o *openMetricsParserOptions) {
			o.SkipCTSeries = skipOMCTSeries
		}), err
	case "application/vnd.google.protobuf":
		return NewProtobufParser(b, parseClassicHistograms, st), err
	case "text/plain":
		return NewPromParser(b, st), err
	default:
		return nil, err
	}
}

is not supported as mediaType. For the same behaviour as in previous versions we could use text/plain instead.

@MichaelKatsoulis MichaelKatsoulis requested a review from a team as a code owner February 19, 2025 13:44
@MichaelKatsoulis
Copy link
Contributor

@pkoutsovasilis before we merge this, I have a concern regarding one test that is Flaky.
It is related to the leaderlection process.
Here is the test:
https://github.com/elastic/beats/blame/e26baff8300e552174c1027db79a3cc0fc067ea9/libbeat/autodiscover/providers/kubernetes/kubernetes_test.go#L102

The test timeouts and it is something that is never happening in the main branch. The test uses the k8s.io library that was also updated as dependency in this PR.
They have changed some stuff in the LeaderElection process between the old version 0.29.5 and 0.31.1.
Maybe it is just that something that the test was taking for granted in old version is not happening in the new.

The test breaks when one node loses the leader lock but stopLeading is not executed.
Because of that when a second nodes becomes the leader, and test enters in lines , then checkLoosingLeaders is called that waits in these lines for the previous node to signal that it stopped leading. That never happens and it timeouts.

That is weird and I tested multiple times and this signal is never received.

@pkoutsovasilis
Copy link
Contributor

Thanks for the heads up @MichaelKatsoulis! I saw the CI green and the code changes made sense to me thus I didn't assume that there isn't any Flaky tests in it 🙂 Please hold your horses people on merging this one. @MichaelKatsoulis please let us know when you are confident about this one

applicationJson = "application/json"
textPlain = "text/plain"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal because it's a test, but why was the content type changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#42748 (comment) looks like it explains this

@swiatekm
Copy link
Contributor

@MichaelKatsoulis have you been able to reproduce this test failure locally? I tried on main after bumping the k8s libraries, and it didn't fail once in 100 runs.

@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Feb 21, 2025

@MichaelKatsoulis have you been able to reproduce this test failure locally? I tried on main after bumping the k8s libraries, and it didn't fail once in 100 runs.

@swiatekm
Yes it is failing many times locally. Did you clean the go test cache before running the tests?
I was using:

go clean -testcache && /usr/local/go/bin/go test -timeout 30s -run ^TestNewLeaderElectionManager$ github.com/elastic/beats/v7/libbeat/autodiscover/providers/kubernetes -v

Also the PR ci was failing almost every time. Check one buildkite output that shows exactly that.

I updated the test logic because I think it was faulty in the part that it was forcing the lease renewals.
I believe now it is better.
Also I tested in a real cluster and there was no problem with the new library. So there was no bug.

@MichaelKatsoulis
Copy link
Contributor

For me the PR is good to be merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants