Skip to content

Conversation

@last-genius
Copy link
Contributor

Previously, the transform function was not serialized in the plugin-server protocol (it was only used in xcp_rrdd itself, not in plugins). This issue was revealed by the previous work around splitting cpu metrics into a separate plugin.

Instead of allowing arbitrary functions (which would be difficult to serialize), for 'fun x' offer just two options:

  • Inverse (1.0 - x), and
  • Identity (x)

Default (if the parameter is not provided, either in OCaml or JSON), is Identity.

@last-genius
Copy link
Contributor Author

Will test tomorrow to see if this was indeed the cause of currently misreported CPU metrics and if I haven't missed anything else.

@last-genius last-genius requested a review from psafont October 7, 2024 15:05
type ds_transform_function = Inverse | Identity

let apply_transform_function f x =
match f with Inverse -> 1.0 -. x | Identity -> x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works with the current sources, we might want to use max instead of 1.0 (but that has problems when the max is infinite, so we need more considerations here)

It's probably a better idea to change how xen provides stats to provide busy time instead, or the plugin report idle time. I'd prefer the former

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we just transform the data in the new plugin, where we collect the data from xen? Then we don't need complicated generic transform functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we just transform the data in the new plugin, where we collect the data from xen? Then we don't need complicated generic transform functionality.

This is worth trying, but it needs precise tracking of time to calculate the difference between the previous read and the current one to be able to calculate the busy time, from the idle time in the current period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is an awkward workaround (even though it's implementing the actual specified parameters), just increasing the size of the memory-mapped files (I'm not sure if we could omit transform: Identity, taking it as the default, for example, to optimize this). Once I get the timestamps of the plugin to be reliably taken into account, I can drop this and move the transform logic into the plugins.

For now, to accommodate taking inverse from different values, perhaps the variant could take a parameter it would take the inverse from: Inverse x?

@last-genius last-genius force-pushed the private/asultanov/rrdd-fix branch from 2d948ff to d7e679f Compare October 8, 2024 07:23
@last-genius last-genius changed the title rrd: Serialize transform parameter for data sources CA-400124 - rrd: Serialize transform parameter for data sources Oct 8, 2024
@last-genius last-genius marked this pull request as ready for review October 8, 2024 08:03
@last-genius
Copy link
Contributor Author

last-genius commented Oct 8, 2024

This passes BST+BVT, and also shows sensible CPU load metrics, compare xentop's output (5.3% in the screenshot) with the running rrd2csv percentages (out of bounds values are a different, larger issue):

{230F80B4-1E68-41DF-AB21-A8EFB5689A80}

Percentages go up to 100% under full load too.

exception Invalid_data_source of string

(** Inverse is (fun x -> 1.0 - x) *)
type ds_transform_function = Inverse | Identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

type transform = Inv of float -> float | Id of float -> float

that is, make the actual transformation part of the type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be wary of adding more information in the pages, we're filling the memory pages, for not a big benefit. I'm even thinking whether serializing it as a number to reduce the amount of bytes added would be worth it (0 being Identity, 1 the inverse)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it'd need to be serialized. the intention is to make this as simple as possible to transfer between plugin and server

@psafont
Copy link
Member

psafont commented Oct 8, 2024

This adds data to the memory-mapped files, can you check that it's not causing problems by running a boot storm or configuration limits test suite?

@last-genius
Copy link
Contributor Author

Running the config limits test suite, but also pushed an improvement where default transform = Identity is not serialized, so files will not grow where they don't need to.

@last-genius
Copy link
Contributor Author

make check doesn't fail for me locally, has the dependency version been incremented somewhere I missed? this should be affecting other PRs too

@psafont
Copy link
Member

psafont commented Oct 8, 2024

You're right, we recently updated the uuidm library, will take a look at it

@psafont
Copy link
Member

psafont commented Oct 8, 2024

Fix is in #6045

@last-genius
Copy link
Contributor Author

Tested the optimization, works as intended, cpu metrics are still correctly reported and:

$ grep transform /dev/shm/metrics -r
Binary file /dev/shm/metrics/xcp-rrdd-cpu matches

Andrii Sultanov added 2 commits October 9, 2024 08:18
Previously, the transform function was not serialized in the plugin-server
protocol (it was only used in xcp_rrdd itself, not in plugins). This issue was
revealed by the previous work around splitting cpu metrics into a separate plugin.

Instead of allowing arbitrary functions (which would be difficult to
serialize), for 'fun x' offer just two options:
* Inverse (1.0 - x), and
* Identity (x)

Default (if the parameter is not provided, either in OCaml or JSON), is
Identity.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
This saves on space, since only cpuX and cpu_avg datasources currently have
transform=inverse, all the others have the default identity function specified.

Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
@last-genius last-genius force-pushed the private/asultanov/rrdd-fix branch from b04e5b5 to fe79b0f Compare October 9, 2024 07:18
@last-genius
Copy link
Contributor Author

last-genius commented Oct 9, 2024

Rebased on top of master to get the fix for CI failure

@last-genius
Copy link
Contributor Author

config limits test suite passed too, even without the optimization.

@psafont
Copy link
Member

psafont commented Oct 9, 2024

config limits test suite passed too, even without the optimization.

The tests will pass even if there are failing metrics, could you check those in the logs? (They might log to daemon.log rather than xenserver.log)

@last-genius
Copy link
Contributor Author

last-genius commented Oct 9, 2024

The tests will pass even if there are failing metrics, could you check those in the logs? (They might log to daemon.log rather than xenserver.log)

All the rrdd plugin daemons seem to have started and not produced any errors (Run #206094)

@last-genius last-genius added this pull request to the merge queue Oct 9, 2024
Merged via the queue into xapi-project:master with commit 8e3cb5f Oct 9, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants