-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System Health Monitoring Deamon #2134
Comments
This should include the (backend) functionality of what is described in #6. |
Are there any plans to simply the existing sys-net/firewall/etc domains with unikernels? I know it was discussed briefly a few months back and might be a bit off-topic, but unikernels boot in far less time than a whole Linux system, and might mesh with a System Health Monitor (what Minix refers to as a "resurrection server") rather well. |
Looks like the Mirage net/fw-vm was adopted for the GSoC, which is awesome. Would something like Monit be adaptable or work as a System Monitoring Daemon? |
Most likely not. We don't want to introduce a centralized daemon which exposes large attack surface for any VM to attack it in a dozen of ways, and then attack other VMs from it. |
I'm imagining something like a Prometheus exporter running on the VM, listening on a local UNIX socket, and made accessible from the Manager VM via a qrexec service. Then a Prometheus master can be made to collect this info (with minimal retention) and queries against it can then be used to act upon. This would also mesh quite well with, say, running the node exporter on the VMs as well. Exporters can be written that use minimal amounts of RAM, so they would be virtually cost-free. |
As part of the effort to "hide as much Qubes infrastructure from the user as possible" that we would like to embrace for the upcoming Qubes 4.x, we will need a global system health monitoring daemon. This is necessary because system VMs, such as e.g. net/USB-holding VMs do crash from time to time. If we don't want the user to be concerned with such system VMs, we need to automatically be able to detect their crash (easy via qrexec service from Dom0) and restart automatically (currently not so easy due to difficulties with reconnecting Xen net front/backend).
The text was updated successfully, but these errors were encountered: