Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to optionally not hit all nodes in a server_group #3

Closed
jacksontj opened this issue Mar 8, 2018 · 3 comments
Closed

Comments

@jacksontj
Copy link
Owner

Separately from the scatter-gather, promxy hits each node in a given server group and merges there results together. In the case where the first result has no "holes" (as defined by the anti-affinity config) then it doesn't look at the second response. Right now the query is sent to both for consistent load/performance, but if there where more hosts in the server_group (>2 -- something like 10) then hitting all nodes would be excessive. It seems prudent to add some configs:

  1. parallel server fetch count -- how many servers to send the initial request to
  2. max server fetch count -- how many we'll continue sending until

With these a user could control (1) how many servers to hit and (2) if promxy should hold the request trying to get the data from more servers.

@linuxtechie
Copy link

@jacksontj it's not clear from the documentation, but is the option available?

@darcydai
Copy link

this feature is important to improve queries concurrently. a large monitor system will have a large number of queries by the alert system or others, and we will think of saving duplicate data as replication to improve read performance, so I think hosts in service_group should not be the only purpose for HA.
we can set initial_request_count > 2 to guarantee queries HA, and more duplicates to improve query performance.

@jacksontj
Copy link
Owner Author

So as of today there are a number of features (e.g. #560) which reduce the servergroups that a given query needs to hit. This issue is specifically on reducing the number of requests within a servergroup.

The main complexity here is that promxy has no idea what "correct" looks like. The merge logic today will basically hit all nodes within a servergroup and merge data (merging the series as well as merging the points within a series if there are holes).

So to highligh the level of issues, here are a couple scenarios that we'd need to cover:

  • a server has a hole for a given labelset
  • a server is missing a labelset and data

As there is no knowledge of what a complete dataset looks like -- it seems impossible to guarantee that the result is complete/correct without hitting all of the nodes within the group (as "a server" could be the last one we query -- as such missing it would make for an incomplete dataset). Given that promxy is a monitoring/alerting tool it definitely leans on correctness over all else.

So, with all of that context I think I'm going to close out this issue as I don't see a way forward with this that doesn't fundamentally compromise the correctness of the subsequent data.

If anyone has other ideas/suggestions feel free to chime in, but for now I'll consider this "won't do".

@jacksontj jacksontj closed this as not planned Won't fix, can't repro, duplicate, stale Aug 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants