Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(registration): registration checks and re-registration flow #86

Merged
merged 6 commits into from
Mar 30, 2023

Conversation

andrewazores
Copy link
Member

  • fix(registration): check if credentials already exist before submitting
  • fix(registration): check and retry registration on slow cadence
  • extract registration (re)check config property

Fixes #85
Depends on https://github.com/cryostatio/cryostat/pull/1425

Copy link
Member

@maxcao13 maxcao13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended result if the user deletes the jvmId==hash credential within Cryostat? After a non-200 GET response from the check, the agent will repeatedly try to POST another registration request many many times in quick succession until something happens in the end that I can't understand.

Here is the attached logs, but it seems both agent worker threads POST to the discovery endpoint during the same time and the retries stop. Then we must sigkill that agent.

t.log

@andrewazores
Copy link
Member Author

Hmm, I'll try to figure out what's happening there. If the stored credentials get deleted then that shouldn't actually do anything to the agent right away - its registration information and token are still valid, so it should still be able to check its registration status periodically. If it needed to it should still be able to publish updates about its discovery subtree (just itself) as well, but the timing there is very unlikely.

However, once the credentials are deleted, then when Cryostat tries to ping the agent instance after some time, the ping should fail because Cryostat is unable to look up the stored credentials that the agent expects, resulting in the agent rejecting Cryostat's ping. This should lead to Cryostat deregistering the agent. The next time the agent tries to check its registration state it will see this and should re-enter its initial registration flow, which should start with attempting to re-submit new stored credentials.

@andrewazores
Copy link
Member Author

I think I've solved it with the last commit. @maxcao13 give it another test run please

@maxcao13
Copy link
Member

maxcao13 commented Mar 28, 2023

I tried the pr and the issue with the looping registration is fixed now, but I notice something else weird now.

I tried lowering the ping period of the discovery storage pinging the plugins CRYOSTAT_DISCOVERY_PING_PERIOD="10000" for 10 seconds, and it seems like the harvester for quarkus-test-agent-1 runs into some issues with not being able to upload the exit dump file.

@andrewazores
Copy link
Member Author

Ah yes, I think that makes sense with the way the agent's internal state machinery works. The discovery ping request tells the agent to re-register itself, which really means de-register and start the registration flow over. But, the harvester also intentionally stops itself when the agent has become deregistered, and starts on successful registration. So if the discovery ping period is shorter than the harvester period you'll end up with the agent probably never even attempting to push harvested files, or if it does then perhaps it gets interrupted part way through sometimes.

There's probably some approach I could take like adding an internal state for "re-registering" to allow this to smoothly transition over, but I'm not sure how worthwhile it is.

This does point out an interesting side-effect of the discovery ping though, because it means that the harvester period will result in files being pushed with that periodicity on one interval, but the interval will reset at the next re-registration time. If the discovery ping period is relatively long and the harvester period is relatively short then this won't be very noticeable, but as these two periods become closer in value then the skew could become noticeable, up to the point you've identified where the skew actually can prevent any harvesting from happening at all.

I think that's a separate issue to work on, maybe for next development cycle.

maxcao13
maxcao13 previously approved these changes Mar 29, 2023
Copy link
Member

@maxcao13 maxcao13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Signed-off-by: Andrew Azores <aazores@redhat.com>
Signed-off-by: Andrew Azores <aazores@redhat.com>
Signed-off-by: Andrew Azores <aazores@redhat.com>
remote credentials deleted externally

Signed-off-by: Andrew Azores <aazores@redhat.com>
@andrewazores
Copy link
Member Author

andrewazores commented Mar 30, 2023

Rebased + signed

@andrewazores andrewazores merged commit 41483f9 into cryostatio:main Mar 30, 2023
@andrewazores andrewazores deleted the registration-flow branch March 31, 2023 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Bug] Agent registration flow after Cryostat restarts
2 participants