Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(instrumentation-runtime-node)!: add prom-client-metrics #2136

Conversation

pikalovArtemN
Copy link
Contributor

@pikalovArtemN pikalovArtemN commented Apr 22, 2024

Which problem is this PR solving?

Short description of the changes

  • Add implementation for collecting event loop lag, garbage collector, heap size and heap space

@pikalovArtemN pikalovArtemN requested a review from a team April 22, 2024 14:29
Copy link

linux-foundation-easycla bot commented Apr 22, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pikalovArtemN pikalovArtemN changed the title Feat/node/prom client implementation feat(instrumentation-runtime-node): add prom-client-metrics Apr 22, 2024
* @default 5000
*/
eventLoopUtilizationMeasurementInterval?: number;
monitoringPrecision?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious about this change, can you give more context why the rename?

Copy link
Contributor Author

@pikalovArtemN pikalovArtemN Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because right now i use it for every metrics, not only for eventLoopUtilization. You think I needed to add parameters for every new metrics group ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need

@JCMais
Copy link

JCMais commented May 1, 2024

I am super excited for this one, I think it was the main thing missing from OpenTelemetry's Node.js instrumentation! Thanks for opening this PR.

Copy link
Member

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please take a look at open-telemetry/semantic-conventions#991 and give your input there? It looks like you made some different decisions than what was made there and I want to make sure we're all on the same page before merging something.

@pikalovArtemN
Copy link
Contributor Author

pikalovArtemN commented May 2, 2024

Can you please take a look at open-telemetry/semantic-conventions#991 and give your input there? It looks like you made some different decisions than what was made there and I want to make sure we're all on the same page before merging something.

Thanks. Yes, i have some mismatches.

@maryliag
Copy link
Contributor

maryliag commented May 2, 2024

@pikalovArtemN I would love to hear your opinion on the mismatches. Let me know what you think makes more sense 😄

@pikalovArtemN
Copy link
Contributor Author

pikalovArtemN commented May 3, 2024

@pikalovArtemN I would love to hear your opinion on the mismatches. Let me know what you think makes more sense 😄

I have mismatches in names like eventloop => event_loop, and i need to fix units, i totaly forgot to set it properly XD. And fix attributes.

I decided to devide the metrics not by label, but names like it was in the prom-client library, but i think you do a better solution in the convension.

I think we need to colobarate, not all of metrics and attributes that i added exists in convention such as:

  1. attribute nodejs.eventloop.lag.type -> stddev and mean not exists
  2. nodejs.eventloop.utilization metric does't exist in specification
  3. nodejs.active_handles.count -> does't exist in runtime-node, do i need to add it ?
  4. nodejs.memory.size -> i have 2 splited metrics heap size and heap space, i think we need to split it by memory, heap size, and heap space in different metrics with attributes nodejs.memory.state: (total and used), nodejs.heapsize.state (total and used), nodejs.heapspace.state( total, used, available) and i add nodejs.memory.size in instumentation-runtime-node
  5. nodejs.active_libuv_requests.count -> does't exist in runtime-node, do i need to add it ?

@maryliag
Copy link
Contributor

maryliag commented May 3, 2024

attribute nodejs.eventloop.lag.type -> stddev and mean not exists

I can add that

nodejs.eventloop.utilization metric does't exist in specification

I can add that

nodejs.active_handles.count -> does't exist in runtime-node, do i need to add it ?

I think it would be good, I got that from the prom-client metrics list, so should be possible

nodejs.memory.size -> i have 2 splited metrics heap size and heap space, i think we need to split it by memory, heap size, and heap space in different metrics with attributes nodejs.memory.state: (total and used), nodejs.heapsize.state (total and used), nodejs.heapspace.state( total, used, available) and i add nodejs.memory.size in instumentation-runtime-node

I can make those changes on the convention

nodejs.active_libuv_requests.count -> -> does't exist in runtime-node, do i need to add it ?

I think it would be good, I also got that from prom-client, but it doesn't have libuv in the name, I added on the convention because of a feedback to make it more clear what it is

@maryliag
Copy link
Contributor

maryliag commented May 3, 2024

heads up: because the semantic convention could be for all JS runtime (not just nodejs, but also others such as denojs or bunjs), it was suggested to not use nodejs on the metric name, so I replaced with jsruntime. I'm still waiting on more feedback if people think that is a good name, but that means you would also probably need to make the same update here

@pikalovArtemN
Copy link
Contributor Author

heads up: because the semantic convention could be for all JS runtime (not just nodejs, but also others such as denojs or bunjs), it was suggested to not use nodejs on the metric name, so I replaced with jsruntime. I'm still waiting on more feedback if people think that is a good name, but that means you would also probably need to make the same update here

Ok

@Flarna
Copy link
Member

Flarna commented May 3, 2024

If it is about any js runtime the node.js specific parts should be removed/renamed/...

  • anything coming from v8 is not automatically applicable to js engine in firefox or safari
  • browsers do not use libuv as far as I know (not sure about bun either)

@maryliag
Copy link
Contributor

maryliag commented May 3, 2024

anything coming from v8

I don't understand what you mean here.

For things that are node specific, we can still create the metrics for it, such as the active libuv requests, I just need remove that from the semantic convention, but this PR can still keep it because can be useful.
This way we can move comments about the semantic convention to that PR, to keep things in one place

@Flarna
Copy link
Member

Flarna commented May 3, 2024

There are JS engines (e.g. quickjs) which use a different heap structure/gc then google v8 (the JS engine used by node.js or deno).
The proposed gc/heap metrics match fine to node.js because it uses v8 but they might not fit well to other JS engines.

As a result I'm not sure if GC metrics should have v8 in the name to allow to distinguish them from other engines.
Alternative would be to create universal gc metrics for all types of runtimes using a gc. But I doubt that will be easy.

@pikalovArtemN
Copy link
Contributor Author

I think right now we need to concentrate on nodejs realization only, because Node is most popular and in the next iteration adopt instrumentation to work with different engines. Right now it's so much work to do

@david-luna
Copy link
Contributor

@pikalovArtemN

now I've noticed the package-lock.json file seems to be out of sync with too many changes although you made a small one in you package.json. That explains the installation issues.

Please get the latest version of package-lock.json file from main branch and run npm i from the root folder to add your small update. Push the changes and I'll run the CI again :)

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 87.70950% with 22 lines in your changes missing coverage. Please review.

Project coverage is 90.79%. Comparing base (6234918) to head (4941407).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...umentation-runtime-node/src/metrics/gcCollector.ts 72.00% 7 Missing ⚠️
...entation-runtime-node/src/metrics/baseCollector.ts 66.66% 4 Missing ⚠️
...untime-node/src/metrics/eventLoopDelayCollector.ts 93.47% 3 Missing ⚠️
...nstrumentation-runtime-node/src/instrumentation.ts 90.00% 2 Missing ⚠️
...runtime-node/src/metrics/eventLoopTimeCollector.ts 90.47% 2 Missing ⚠️
...-node/src/metrics/eventLoopUtilizationCollector.ts 88.23% 2 Missing ⚠️
...node/src/metrics/heapSpacesSizeAndUsedCollector.ts 94.11% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2136      +/-   ##
==========================================
- Coverage   90.86%   90.79%   -0.07%     
==========================================
  Files         161      169       +8     
  Lines        7858     8009     +151     
  Branches     1612     1632      +20     
==========================================
+ Hits         7140     7272     +132     
- Misses        718      737      +19     
Files with missing lines Coverage Δ
...trumentation-runtime-node/src/consts/attributes.ts 100.00% <100.00%> (ø)
...n-runtime-node/src/types/ConventionalNamePrefix.ts 100.00% <100.00%> (ø)
...nstrumentation-runtime-node/src/instrumentation.ts 89.65% <90.00%> (+0.46%) ⬆️
...runtime-node/src/metrics/eventLoopTimeCollector.ts 90.47% <90.47%> (ø)
...-node/src/metrics/eventLoopUtilizationCollector.ts 88.23% <88.23%> (ø)
...node/src/metrics/heapSpacesSizeAndUsedCollector.ts 94.11% <94.11%> (ø)
...untime-node/src/metrics/eventLoopDelayCollector.ts 93.47% <93.47%> (ø)
...entation-runtime-node/src/metrics/baseCollector.ts 66.66% <66.66%> (ø)
...umentation-runtime-node/src/metrics/gcCollector.ts 72.00% <72.00%> (ø)

@maryliag
Copy link
Contributor

@dyladan you still have a request here, which was already addressed, can you take a look again?

@david-luna
Copy link
Contributor

@dyladan is seems you need to approve since previously requested changes in this PR.

* See the License for the specific language governing permissions and
* limitations under the License.
*/
export const V8_HEAP_SIZE_NAME_ATTRIBUTE = 'heap.space.name';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are just temporary until the next semconv release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, they already exist on the current release (they didn't when the PR was created), so @pikalovArtemN you can replace for the final values. For example, this one is ATTR_V8JS_HEAP_SPACE_NAME

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@david-luna
Copy link
Contributor

@pikalovArtemN this is almost done :)

could you fix the conflicts and address the last comment from @maryliag? Thank you

@@ -62,7 +62,7 @@ nodejs_performance_event_loop_utilization 0.010140079547955264

| name | type | unit | default | description |
|---|---|---|---|---|
| [`eventLoopUtilizationMeasurementInterval`](./src/types.ts#L25) | `int` | millisecond | `5000` | The approximate number of milliseconds for which to calculate event loop utilization averages. A larger value will result in more accurate averages at the expense of less granular data. Should be set to below the scrape interval of your metrics collector to avoid duplicated data points. |
| [`monitoringPrecision`](./src/types.ts#L25) | `int` | millisecond | `5000` | The approximate number of milliseconds for which to calculate event loop utilization averages. A larger value will result in more accurate averages at the expense of less granular data. Should be set to below the scrape interval of your metrics collector to avoid duplicated data points. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(non-blocking): update the default

You changed the default to 10 so we should update the README too

Suggested change
| [`monitoringPrecision`](./src/types.ts#L25) | `int` | millisecond | `5000` | The approximate number of milliseconds for which to calculate event loop utilization averages. A larger value will result in more accurate averages at the expense of less granular data. Should be set to below the scrape interval of your metrics collector to avoid duplicated data points. |
| [`monitoringPrecision`](./src/types.ts#L25) | `int` | millisecond | `10` | The approximate number of milliseconds for which to calculate event loop utilization averages. A larger value will result in more accurate averages at the expense of less granular data. Should be set to below the scrape interval of your metrics collector to avoid duplicated data points. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6b5fc8b

@kristian240
Copy link

@david-luna any idea how fast this can be released after it is merged?

@maryliag
Copy link
Contributor

maryliag commented Nov 3, 2024

@kristian240 after this gets merged, we can aim for a new relase in a couple of days 😄

@david-luna david-luna merged commit 80d0c74 into open-telemetry:main Nov 5, 2024
25 checks passed
@dyladan dyladan mentioned this pull request Nov 5, 2024
@pikalovArtemN pikalovArtemN deleted the feat/node/prom-client-implementation branch November 6, 2024 18:27
@@ -32,7 +32,7 @@ const prometheusExporter = new PrometheusExporter({
const sdk = new NodeSDK({
metricReader: prometheusExporter,
instrumentations: [new RuntimeNodeInstrumentation({
eventLoopUtilizationMeasurementInterval: 5000,
monitoringPrecision: 5000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor Q: It looks like the default is 10 ms (from lower down in this README). Should the example here show 10 rather than 5000?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.