Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HWP Network Monitor agent #756

Open
jlashner opened this issue Sep 17, 2024 · 2 comments
Open

HWP Network Monitor agent #756

jlashner opened this issue Sep 17, 2024 · 2 comments
Assignees
Labels
new agent New OCS agent needs to be created

Comments

@jlashner
Copy link
Collaborator

One part of the HWP shutdown process that we discussed is having an agent that is in charge of managing the HWP ibootbars in the event of extended network outages. I will start implementing this but have a couple of related questions...

@ykyohei, @bbixler500, do you happen to know what Ibootbar outlets are being used for the PMX and the LED driver board for each telescope, or where that's recorded? Also do you have a sense of about how long we can wait on network outage before each of these needs to be turned off?

@BrianJKoopman is there a summary anywhere with info on the work you did about agent zombie processes, and trying to get agent processes to keep running after a crossbar disconnect?

Thanks!

@jlashner jlashner self-assigned this Sep 17, 2024
@bbixler500
Copy link
Contributor

The outlet information is on the general hwp confluence page here. The driver board has two Acopian power supplies, which are +5V and -10V. As for a standard time to wait for network outages, I don't really have a number in mind. We have had minor outages while running scans in the past, which the hwp remained rotating through, so I wouldn't want the threshold to be too short.

@BrianJKoopman
Copy link
Member

@BrianJKoopman is there a summary anywhere with info on the work you did about agent zombie processes, and trying to get agent processes to keep running after a crossbar disconnect?

Yup! Documentation for the connection timeout is the ocs site config docs.

This can also be passed as the environment variable CROSSBAR_TIMEOUT, useful for the docker containers. That's a bit hidden, but is in the ocs-agent-cli docs.

@BrianJKoopman BrianJKoopman added the new agent New OCS agent needs to be created label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new agent New OCS agent needs to be created
Projects
None yet
Development

No branches or pull requests

3 participants