Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main][stats beat] implement stats beat in application insights #2489

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

siyuniu-ms
Copy link
Contributor

No description provided.

@siyuniu-ms siyuniu-ms requested a review from a team as a code owner March 5, 2025 02:16
@siyuniu-ms
Copy link
Contributor Author

Current StatsBeat Logic
Definition & Creation in Core: StatsBeat is defined and instantiated in the core without a channel, making it compatible with both AI and 1DS.
Initialization in Sender: The sender initializes StatsBeat and assigns the appropriate channel for sending telemetry.
Pros
✔ Prevents infinite loops: Eliminates the risk of recursive creation between StatsBeat and Sender.
✔ Unified implementation: Both AI and 1DS share the same StatsBeat class, reducing duplication.

Cons
❌ Always included in the codebase: Since StatsBeat is created in the core, it remains part of the code even if the user disables it.

StatsBeat Data Handling
StatsBeat data is stored within the payload.
When the final result is ready to be sent, the data is extracted before transmission.

@siyuniu-ms siyuniu-ms requested review from lzchen and a team March 6, 2025 00:38
_networkCounter.totalRequest++;
_networkCounter.requestDuration += utcNow() - payloadData["statsBeatData"]["startTime"];
}
let retryArray = [401, 403, 408, 429, 500, 503];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the status code from https://github.com/aep-health-and-standards/Telemetry-Collection-Spec/blob/main/ApplicationInsights/statsbeat/statsbeat.md, I could update the code there and specify its for js use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not specifically for js use, all languages should be doing so. Let's merge the spec change and then you can make the change here too.

} else if (throttleArray.includes(status)) {
_networkCounter.throttle++;
} else if (status !== 307 && status !== 308) {
_networkCounter.failure[status] = (_networkCounter.failure[status] || 0) + 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing logic for partial success 206?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the new update, I am treating 206 neither a success or a failure (it would not be counted in any category except totalReuqeustCounter)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, Python is treating 206 as success, I guess we did not explicitly specify what this is. @JacksonWeber @TimothyMothra @jeanbisutti do we know how each language treats 206 in terms of success/failure?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzchen we treat 206 as retriable and persist the failed envelopes to disk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect a 206 to be treated as a success as it most likely means that breeze has either

  • applied server side sampling
  • As our data is going to be mixed in with theirs (the single transport) we may be getting throttled, so when we extend this to client side details, we would also want to treat this as success -- unless we get told which ones failed and we add code to "parse" out the dropped events

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be a little more simplistic, I would expect any 200 (200-299) response to be treated as a success, as a 204 response (no content) is a valid response that we get from the 1DS Collector.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it looks like Node.js is out of spec treating these as retriable. I'll create a work item.




_self.countException = (endpoint: string, exceptionType: string) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super clear on this logic, should this be only incremented on exception?


function _setupTimer() {
if (!_timeoutHandle) {
_timeoutHandle = scheduleTimeout(() => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSNev

Confirm that scheduleTimeout returns ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, based on the comment in ts-utils
* @param timeout - The time, in milliseconds that the timer should wait before the specified

_cikey = ikey;
_language = STATSBEAT_LANGUAGE;
_os = STATSBEAT_TYPE;
_runTimeVersion = STATSBEAT_TYPE;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more useful value we can track for runtime version for web?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be able to parse the userAgent to get this information. This might be good default information though if none can be retrieved from that header.

Copy link
Collaborator

@MSNev MSNev Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this component is in the core we don't really have access to the existing code that parses out the user-agent, we do a bunch of stuff in the 1DS code and some additional in the OsPlugin.

We could look at enhancing this in a later release -- probably as part of adding the OsPlugin by default (which we are planning in a future release) - not sure of the timing now though.

"runtimeVersion": _runTimeVersion,
"os": _os,
"language": _language,
"version": _sdkVersion,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other sdks, we do not populate this with sdkVersion but rather JUST the version of the SDK.

iKey: INSTRUMENTATION_KEY,
name: name,
baseData: {
name: name,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct way to build the TelemetryItem? In Python we have type(TelemetryItem.baseData) = MetricsData and type(MetricsData.metrics) = List[MetricDataPoint].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as we will do some envelop transform before we sent out telemetry, I will post some screenshot of how the telemetry will look like when go out of door here later

@siyuniu-ms
Copy link
Contributor Author

image
here is the example of request_duration and request_success_count telemetry that send out

Copy link

@JacksonWeber JacksonWeber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

let _os: string;
let _runTimeVersion: string;
dynamicProto(Statsbeat, this, (_self, _base) => {
_self.initialize = (core: IAppInsightsCore, ikey: string, endpoint: string, version?: string) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets update the signature to pass the values as a config object (iKey, endpoint, version) as I'm feeling that we might want to update this in the future.

We could also then make these dynamic, so we could "update" then for things like passing in the browser version etc. So basically follow the normal type of "onConfigChange()" with setting defaults -- the code "supports" defining this for internal types (not part of the global config object (IConfiguration)), but it may need some signature updates or we fudge it.

For now just create an interface with these 3 and pass them in (ie. don't treat as dynamic or use the onConfigChange)

@@ -2,8 +2,8 @@
"name": "Telemetry Viewer - M3",
"short_name": "Telemetry Viewer M3",
"description": "A browser extension that provides a real time view of what's happening in Application Insights including what telemetry is being logged by the web application",
"version": "0.7.5",
"version_name": "0.7.5",
"version": "0.7.6",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file really shouldn't be in this PR.

@@ -283,6 +288,11 @@ export class Sender extends BaseTelemetryPlugin implements IChannelControls {
}
}

if (!config.disableStatsBeat && _statsBeat && !_statsBeat.isInitialized()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to "set" a default otherwise disableStatsBeat will be undefined and not dynamic.

I'm also in 2 minds on this as we currently don't have any throttling in place, so once someone turns this one (by looking at the code or the tests), this will spam (overload) our statsbeat subscription with no option to stop it....

So we "might" want to check this code in with all of the "enable" commented out so there is zero chance of someone accidently enabling this. And then as part of the follow up PR to add the throttling we then add the config and support for "enabling" / "disabling".

Basically, we don't want someone turning this on a killing ALL of our statsbeat metrics. As the end-user (customers) subscription won't see any of this data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants