Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datacentre discovery #789

Open
XAMPPRocky opened this issue Sep 15, 2023 · 4 comments
Open

Add datacentre discovery #789

XAMPPRocky opened this issue Sep 15, 2023 · 4 comments
Labels
kind/feature New feature or request

Comments

@XAMPPRocky
Copy link
Collaborator

For adding more accurate latency measurement between a proxy and a datacenter, we need the proxy to know what datacentres are available to measure.

As a solution we were thinking that we'd add a new xDS resource type called Datacentre or similar that a proxy which would contain the IP address and the QCMP port. The proxy can then use that address for QCMP latency measurements.

For relay deployments it would send all agents that are connected to it, for single cluster control plane deployments, it would return its own IP and QCMP port.

@XAMPPRocky XAMPPRocky added the kind/feature New feature or request label Sep 15, 2023
@markmandel
Copy link
Member

For relay deployments it would send all agents that are connected to it, for single cluster control plane deployments, it would return its own IP and QCMP port.

It could also be an optional element - if it's not there, then it's not going to check latency and keep a metric of it (single cluster, and also if people just don't care 😃, say for example if people have separate installs in the same Cloud Region/data centre).

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Sep 18, 2023

I'm not sure I see the value of it being optional. Even if you're hosting in the same datacentre, understanding the latency between hops is important, as latency isn't only dictated by distance. If there is an intra-datacentre issue causing latency spikes (as opposed to inter-datacentre), then this would provide that information, where as if it's optional then you would be in the dark.

If you don't want that metric it's easier for the user to just add a filter to your grafana_agent to remove it. Having this information is important for quilkin to be able to build a network topology on top of this, so that we can accurately assign players to the cluster that is closest to their proxy.

@markmandel
Copy link
Member

I'm not sure I see the value of it being optional. Even if you're hosting in the same datacentre, understanding the latency between hops is important, as latency isn't only dictated by distance. If there is an intra-datacentre issue causing latency spikes (as opposed to inter-datacentre), then this would provide that information, where as if it's optional then you would be in the dark.

That is true. But I also wonder if some people won't want the extra traffic (even though it's minimal).

I tend to err on the side of flexibility. Not a super strong opinion, but just something to consider.

@XAMPPRocky
Copy link
Collaborator Author

I can understand that, I'm always weary of adding something as option without a compelling reason to do so, as it adds another variation to test, and adds cognitive overhead (you have to know that the feature exists, and how to turn it on.).

I feel like if someone comes and provides a good reason, or we find it adds too much overhead, we should provide a way to disable it, without that though I think it should be included without an option, as it provides you with more insight, and having this work done for you makes Quilkin a more compelling product for operators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants