Skip to content

Commit 65718db

Browse files
committed
query-audit docs
Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
1 parent 1a159f0 commit 65718db

File tree

2 files changed

+141
-1
lines changed

2 files changed

+141
-1
lines changed

docs/configuration/config-file-reference.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -642,7 +642,8 @@ results_cache:
642642
# CLI flag: -querier.max-retries-per-request
643643
[max_retries: <int> | default = 5]
644644
645-
# Perform query parallelisations based on storage sharding configuration and query ASTs.
645+
# Perform query parallelisations based on storage sharding configuration and
646+
# query ASTs.
646647
# CLI flag: -querier.parallelise-shardable-queries
647648
[parallelise_shardable_queries: <boolean> | default = false]
648649
```

docs/operations/query-auditor.md

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: "Query Auditor (tool)"
3+
linkTitle: "query auditor (tool)"
4+
weight: 1
5+
slug: query-auditor
6+
---
7+
8+
The query auditor is a tool bundled in the cortex repo, but **not** included in docker images -- this must be built from source. It's primarily useful for those _developing_ cortex, but can be helpful to operators as well during certain scenarios (backend migrations come to mind).
9+
10+
## How it works
11+
12+
The `query-audit` tool performs a set of queries against two backends that expose the prometheus read API. This is generally the `query-frontend` component of two cortex deployments. It will then compare the differences in the responses to determine the average difference for each query. It does this by:
13+
- Ensuring the resulting label sets match
14+
- For each label set, ensuring they contain the same number of samples as their pair from the other backend
15+
- For each sample, calculates their difference against it's pair from the other backend/label set.
16+
- Calculates the average diff per query from the above diffs.
17+
18+
### Limitations
19+
20+
It currently only supports queries with `Matrix` response types, but should be simple to extend to `Vector`s as well, should the need arise.
21+
22+
### Use cases
23+
24+
- Correctness testing when working on the read path.
25+
- Comparing results from different backends.
26+
27+
### Example Configuration
28+
29+
```yaml
30+
control:
31+
host: http://localhost:8080/api/prom
32+
headers:
33+
"X-Scope-OrgID": 1234
34+
35+
test:
36+
host: http://localhost:8081/api/prom
37+
headers:
38+
"X-Scope-OrgID": 1234
39+
40+
queries:
41+
- query: 'sum(rate(container_cpu_usage_seconds_total[5m]))'
42+
start: 2019-11-25T00:00:00Z
43+
end: 2019-11-28T00:00:00Z
44+
step_size: 15m
45+
- query: 'sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name)'
46+
start: 2019-11-25T00:00:00Z
47+
end: 2019-11-28T00:00:00Z
48+
step_size: 15m
49+
- query: 'sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name)'
50+
start: 2019-11-25T00:00:00Z
51+
end: 2019-11-26T00:00:00Z
52+
step_size: 15m
53+
- query: 'histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job))'
54+
start: 2019-11-25T00:00:00Z
55+
end: 2019-11-25T06:00:00Z
56+
step_size: 15m
57+
# two shardable legs
58+
- query: 'sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m]))'
59+
start: 2019-11-25T00:00:00Z
60+
end: 2019-11-25T06:00:00Z
61+
step_size: 15m
62+
# one shardable leg
63+
- query: 'sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m])'
64+
start: 2019-11-25T00:00:00Z
65+
end: 2019-11-25T06:00:00Z
66+
step_size: 15m
67+
```
68+
69+
### Example Output
70+
71+
Under ideal circumstances, you'll see output like the following:
72+
73+
```
74+
$ go install ./tools/query-audit/ && query-audit -f ~/grafana/tmp/equivalence-config.yaml
75+
76+
0.000000% avg diff for:
77+
query: sum(rate(container_cpu_usage_seconds_total[5m]))
78+
series: 1
79+
samples: 289
80+
start: 2019-11-25 00:00:00 +0000 UTC
81+
end: 2019-11-28 00:00:00 +0000 UTC
82+
step: 15m0s
83+
84+
0.000000% avg diff for:
85+
query: sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name)
86+
series: 95
87+
samples: 25877
88+
start: 2019-11-25 00:00:00 +0000 UTC
89+
end: 2019-11-28 00:00:00 +0000 UTC
90+
step: 15m0s
91+
92+
0.000000% avg diff for:
93+
query: sum(rate(container_cpu_usage_seconds_total[5m])) without (container_name)
94+
series: 4308
95+
samples: 374989
96+
start: 2019-11-25 00:00:00 +0000 UTC
97+
end: 2019-11-26 00:00:00 +0000 UTC
98+
step: 15m0s
99+
100+
0.000000% avg diff for:
101+
query: histogram_quantile(0.9, sum(rate(cortex_cache_value_size_bytes_bucket[5m])) by (le, job))
102+
series: 13
103+
samples: 325
104+
start: 2019-11-25 00:00:00 +0000 UTC
105+
end: 2019-11-25 06:00:00 +0000 UTC
106+
step: 15m0s
107+
108+
0.000000% avg diff for:
109+
query: sum without (instance, job) (rate(cortex_query_frontend_queue_length[5m])) or sum by (job) (rate(cortex_query_frontend_queue_length[5m]))
110+
series: 21
111+
samples: 525
112+
start: 2019-11-25 00:00:00 +0000 UTC
113+
end: 2019-11-25 06:00:00 +0000 UTC
114+
step: 15m0s
115+
116+
0.000000% avg diff for:
117+
query: sum without (instance, job) (rate(cortex_cache_request_duration_seconds_count[5m])) or rate(cortex_cache_request_duration_seconds_count[5m])
118+
series: 942
119+
samples: 23550
120+
start: 2019-11-25 00:00:00 +0000 UTC
121+
end: 2019-11-25 06:00:00 +0000 UTC
122+
step: 15m0s
123+
124+
0.000000% avg diff for:
125+
query: sum by (namespace) (predict_linear(container_cpu_usage_seconds_total[5m], 10))
126+
series: 16
127+
samples: 400
128+
start: 2019-11-25 00:00:00 +0000 UTC
129+
end: 2019-11-25 06:00:00 +0000 UTC
130+
step: 15m0s
131+
132+
0.000000% avg diff for:
133+
query: sum by (namespace) (avg_over_time((rate(container_cpu_usage_seconds_total[5m]))[10m:]) > 1)
134+
series: 4
135+
samples: 52
136+
start: 2019-11-25 00:00:00 +0000 UTC
137+
end: 2019-11-25 01:00:00 +0000 UTC
138+
step: 5m0s
139+
```

0 commit comments

Comments
 (0)