LDMS Connector: update to LDMS 4.5.1 #697
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work in progress -- do not merge (I'm still testing locally on a sample application, and we should consider adding some unit testing for the connector)
These changes are required to compile Caliper with LDMS 4.5.1. In addition to general updates to the API, I encountered several bugs in the implementation of the pipeline.
API Updates
ldms_xprt_put()now requires a name. As a placeholder, I'm using"tst"ldms_xprt_new_with_authhas a new function signature--I removed the second nullptr from the argsSummary of Bug Fixes
1. CMake
The LDMS_PREFIX cmake variable was not being used anywhere, so Caliper could not link with a custom LDMS installation. I added a few lines in FindLDMS.cmake to use LDMS_PREFIX.
2. Env Variables
In
LdmsForwarder.cpp, if one of the env variables is not set, all are set to the defaults. I broke up that if-block so I could set just one var and the rest fall back to defaults.3. JSON
In
LdmsForwarder.cpp(write_ldms_record()), the JSON string is missing a key for the"caliper-perf-data"value. I added the "stream" key so the JSON went from:to
4. Connection
I was getting errors running on more than 6 ranks. I think that the issue was that
write_ldms_record()was callingcaliper_ldms_connector_initialize()on every write, so as the number of ranks increased I was creating too many connections.As a possible fix, I made the LDMS connection into a member variable in the
LdmsForwarderclass, initializing this once per rank and reusing it for every write on that rank. I'm not sure if this has consequences beyond my narrow use-case, but it did seem to resolve the problem for me.FYI @ppebay