feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger #737

duncanista · 2025-07-10T15:13:02Z

What?

Creates an aggregator and flusher for APM proxy data.

Motivation

With the introduction of #684, the Extension is way faster to continue invocations, I think this could have affected how data is aggregated for this various APM products in which we were just sending the request to intake as soon as it arrives.

Now, we aggregate it and respect the flushing pattern and algorithm.

Testing

Profiling:

LLMObs:
DSM:
Live Debugger:

duncanista · 2025-07-11T18:13:31Z

bottlecap/src/traces/proxy_aggregator.rs

Don't like how simple this aggregator is, its just a wrapper for a queue, but I think so far we should respect the way we are following this pattern, maybe we can create a trait and consolidate how we create aggregators

Copilot

Pull Request Overview

This PR adds a buffering layer for APM proxy requests in the Trace Agent, introducing an in-memory aggregator and a flusher that batches and retries HTTP calls for DSM, Profiling, LLMObs, and Live Debugger.

Introduce proxy_aggregator and proxy_flusher modules to buffer and periodically flush requests.
Update TraceAgent endpoints to enqueue proxy requests instead of forwarding immediately.
Adjust get_client signature to take a reference and propagate that change to related flushers.

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
bottlecap/src/traces/trace_agent.rs	Switched proxy endpoints to enqueue into `proxy_aggregator`
bottlecap/src/traces/proxy_aggregator.rs	Added `ProxyRequest` struct and `Aggregator` for request buffering
bottlecap/src/traces/proxy_flusher.rs	Added `Flusher` to batch, send, and retry buffered proxy requests
bottlecap/src/traces/mod.rs	Registered new modules and added `DD_ADDITIONAL_TAGS_HEADER` const
bottlecap/src/logs/flusher.rs	Updated `get_client` call to pass a reference
bottlecap/src/http.rs	Changed `get_client` and `build_client` to accept `&Arc<Config>`

Comments suppressed due to low confidence (1)

bottlecap/src/traces/proxy_flusher.rs:26

[nitpick] There are no unit tests for the new Flusher behavior. Please add tests covering header merging, batch flushing, retry logic, and failure scenarios.

pub struct Flusher {

bottlecap/src/traces/proxy_flusher.rs

lym953

LGTM. Feel free to request additional review from others if needed.

bottlecap/src/traces/proxy_aggregator.rs

lym953 · 2025-07-11T19:00:41Z

bottlecap/src/traces/proxy_flusher.rs

+
+    pub async fn flush(
+        &self,
+        retry_requests: Option<Vec<reqwest::RequestBuilder>>,


For my learning: is there a limit on the number of retries? I assumed the retry count in send() is a different things.

The retry count in the send method is mainly for the amount of times we try to retry a payload during a flush.

I think there's no limit on the number of failed payloads we try to retry – this is mainly bc we could have a network error which we need to handle.

For 400/500 errors we don't retry, maybe we should at some point, but we want to ensure we don't timeout the lambda or fill the buffer with data that will never arrive

no need to pass a whoole clone here

only acknowledges data and lets the flusher take care of the rest

duncanista requested a review from a team as a code owner July 10, 2025 15:13

duncanista commented Jul 11, 2025

View reviewed changes

duncanista requested a review from Copilot July 11, 2025 18:17

Copilot AI reviewed Jul 11, 2025

View reviewed changes

bottlecap/src/traces/proxy_flusher.rs Outdated Show resolved Hide resolved

bottlecap/src/traces/proxy_flusher.rs Show resolved Hide resolved

lym953 approved these changes Jul 11, 2025

View reviewed changes

astuyve approved these changes Jul 14, 2025

View reviewed changes

duncanista added 9 commits July 14, 2025 15:11

pass reference of config in get_client

bc399c4

no need to pass a whoole clone here

add simple aggregator for APM proxy data

7405e4b

add APM proxy flusher with retry mechanism

a00a5c5

make modules public

b490973

use APM proxy in main

7b219f4

update trace agent to use new proxy aggregator

2ea562b

only acknowledges data and lets the flusher take care of the rest

fmt

39ac154

make sure we send right headers

1c994a7

typo

07aa341

duncanista force-pushed the jordan.gonzalez/trace-agent/aggregation-for-proxy branch from 4c1162d to 07aa341 Compare July 14, 2025 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger #737

feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger #737

duncanista commented Jul 10, 2025 •

edited

Loading

Uh oh!

duncanista Jul 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

lym953 left a comment

Uh oh!

Uh oh!

lym953 Jul 11, 2025

Uh oh!

duncanista Jul 11, 2025

Uh oh!

Uh oh!

feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger #737

Are you sure you want to change the base?

feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger #737

Conversation

duncanista commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Motivation

Testing

Uh oh!

duncanista Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

lym953 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lym953 Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

duncanista Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

duncanista commented Jul 10, 2025 •

edited

Loading