Skip to content

Conversation

@GreyLilac09
Copy link
Contributor

@GreyLilac09 GreyLilac09 commented Jun 27, 2025

Summary

  • Add a TTL-based cache for metrics sets
  • Add a expire_metrics_secs config for Prometheus remote write sink which uses the TTL-based cache
  • This fixes an issue where incremental metrics are preserved for the lifetime of Vector's runtime, which causes indefinite memory growth

Vector configuration

How did you test this PR?

Example showing vector_component_allocated_bytes running for 12 hours before and after the change setting expire_metrics_secs to 120:

Screenshot 2025-07-02 at 11 23 06 AM Screenshot 2025-07-02 at 11 22 38 AM

with the following sink:

sinks:
  prometheus:
    type: prometheus_remote_write
    inputs:
      - remap_metrics_to_prom
    endpoint: <...>
    batch:
      max_events: 5000
    buffer:
      type: memory
      max_events: 100000
      when_full: drop_newest
    **expire_metrics_secs: 120**
    healthcheck:
      enabled: false

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

@GreyLilac09 GreyLilac09 requested review from a team as code owners June 27, 2025 19:56
@bits-bot
Copy link

bits-bot commented Jun 27, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation labels Jun 27, 2025
@GreyLilac09 GreyLilac09 changed the title feat(metrics): add TTL-based cache for metric sets and add expire_metrics_secs for Prometheus remote write sink #1 feat(metrics): add TTL-based cache for metric sets and add expire_metrics_secs for Prometheus remote write sink Jun 27, 2025
@GreyLilac09
Copy link
Contributor Author

FYI I haven't tested this E2E as I can't get it to build on my M4 Mac.. would love if someone could get a Dockerfile up (alpine) so I can test it on my Kube cluster

@pront

This comment was marked as outdated.

@GreyLilac09

This comment was marked as outdated.

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great PR

@GreyLilac09
Copy link
Contributor Author

@pront would you be able to run/upload a build of it so I can do a smoke test on my kube cluster?

@github-actions github-actions bot removed the meta: awaiting author Pull requests that are awaiting their author. label Jul 1, 2025
@pront
Copy link
Member

pront commented Jul 1, 2025

@pront would you be able to run/upload a build of it so I can do a smoke test on my kube cluster?

Not sure what is the ask here, do you want a custom docker image from this PR? You should be able to run scripts/build-docker.sh with a custom target repo.

@GreyLilac09
Copy link
Contributor Author

@pront Was able to test it and it looks good (see the screenshots in the PR description), should be good to merge

@pront pront enabled auto-merge July 2, 2025 17:35
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

auto-merge was automatically disabled July 2, 2025 20:34

Head branch was pushed to by a user without write access

@GreyLilac09
Copy link
Contributor Author

@pront hey sorry can you approve it again? not sure how this process works but it looks like the auto-merge disabled itself

@pront pront enabled auto-merge July 3, 2025 15:24
@GreyLilac09
Copy link
Contributor Author

@pront why are the tests not running when you've approved it and set to auto-merge?

@pront
Copy link
Member

pront commented Jul 7, 2025

@pront why are the tests not running when you've approved it and set to auto-merge?

Each time you push a commit it requires manual approval from a maintainer. It is a security measure. It should be merged today.

@pront pront added this pull request to the merge queue Jul 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 7, 2025
@pront pront added this pull request to the merge queue Jul 7, 2025
Merged via the queue into vectordotdev:master with commit a2c6c44 Jul 7, 2025
42 checks passed
@GreyLilac09
Copy link
Contributor Author

@pront This issue also makes this issue pretty easy to resolve: #23018 (comment)

Options:

  1. adding an optional incremental -> absolute conversion with expiring interval in statsd
  2. on the file output plugin by adding a new codec that does incremental -> absolute conversions, or adding the option to existing codecs
  3. adding a new transform entirely that just does incremental -> absolute conversions with optional expire_metrics_secs

Curious what you think what be the best approach here

@GreyLilac09
Copy link
Contributor Author

^ will move convo to the other thread

GreyLilac09 added a commit to GreyLilac09/vector that referenced this pull request Jul 11, 2025
…rics_secs for Prometheus remote write sink (vectordotdev#23286)

* add ttl policy to metricset

* rust fmt

* rust

* update cue

* fix secs format

* fix float

* component docs

* update comments

* fix fmt

* update comment

* add cleanup and fix timestamp

* remove unnecessary function

* add MetricEntry as struct

* fix lint issues

* use is_some_and

* revert greptime_logs.cue
@pront
Copy link
Member

pront commented Aug 13, 2025

Adding this to the release highlights: #23586

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prometheus remote write sink causes indefinite memory growth because it converts incremental to absolute values prometheus_remote_write memory leak

4 participants