Description
What
The current Cloud ingestion service receives the metrics on the CLOUD_URL/v1/metrics/<TEST_REF_ID>
endpoint. Each HTTP request contains a JSON payload with an array of Sample at the root and each Sample contains a data
field where the type is one of the types defined below.
We want replace it implementing a new HTTP body format using a binary encoding.
Current JSON payload
[
{
"type": "<TYPE>",
"metric": "<NAME>",
"data": {
...
},
},{
...
}
]
Single point
Show me the JSON
{
"type": "Point",
"metric": "vus",
"data": {
"time": "%d",
"type": "gauge",
"tags": {
"aaa": "bbb",
"ccc": "123"
},
"value": 999
}
}
Multi points
Show me the JSON
{
"type": "Points",
"metric": "iter_li_all",
"data": {
"time": "%d",
"type": "counter",
"tags": {
"test": "mest"
},
"values": {
"data_received": 6789.1,
"data_sent": 1234.5,
"iteration_duration": 10000
}
}
}
Aggregated points
Show me the JSON
{
"type": "AggregatedPoints",
"metric": "http_req_li_all",
"data": {
"time": "%d",
"type": "aggregated_trend",
"count": 2,
"tags": {
"test": "mest"
},
"values": {
"http_req_duration": {
"min": 0.013,
"max": 0.123,
"avg": 0.068
},
"http_req_blocked": {
"min": 0.001,
"max": 0.003,
"avg": 0.002
},
"http_req_connecting": {
"min": 0.001,
"max": 0.002,
"avg": 0.0015
},
"http_req_tls_handshaking": {
"min": 0.003,
"max": 0.004,
"avg": 0.0035
},
"http_req_sending": {
"min": 0.004,
"max": 0.005,
"avg": 0.0045
},
"http_req_waiting": {
"min": 0.005,
"max": 0.008,
"avg": 0.0065
},
"http_req_receiving": {
"min": 0.006,
"max": 0.008,
"avg": 0.007
}
}
}
}
Why
It is required to have better efficiency at scale. An encoding binary format would reduce the size of the payload and the hardware requirements for encoding/decoding operations both on cloud and on clients.
Non-Goals
- Aggregation algorithm for reducing the dataset of flushed data (e.g. aggregation).
How / Proposals
Create a new Cloud output (v2) that flushes metrics creating HTTP requests using the Protobuf mechanism for serializing the body.
In summary an example of the HTTP request:
POST CLOUD_URL/v2/metrics/<TEST_REF_ID> HTTP/1.1
Host: www.example.com
User-Agent: k6
Content-Type: application/x-protobuf
Content-Encoding: snappy
K6-Metrics-Protocol-Version: 2.0
To stay closer to the Prometheus implementation, the output has to compresss using the Snappy algorithm.
The code below contains a Protobuf proposal inspired by OpenMetrics to use for encoding the body request:
Show me the Proto file
EDIT: The Protobuf after several iterations, https://github.com/grafana/k6/blob/0cddc417243fd152f0a2e532b1870fa6d8635d03/output/cloud/expv2/pbcloud/metric.proto
TODO: Use a HDR histogram implementation for mapping the Trend type.
It is a requiment to add the name and the test run id as part of the tag set. The output has to add:
metrics.<metric>.tags["__name__"] = "<metric-name>"
metrics.<metric>.tags["test_run_id"] = "<test-ref-id>"
Additional implementation details
Encapsulate the new Cloud output startup from the current Cloud output based on a config option logic. In this way, we can overwrite at runtime the used output and fallback on the previous logic in case it is required.
Action Plan
- Quick and dirty implementation of a basic Cloud output v2
- Config option for using v2 and the relative logic in v1
- Ability to flush metric samples encoded as defined by the new protocol
- No Trend implementation
- Trend implementation as HDR
- Reiterate for polish and stability
Future
- Better metrics aggregation (e.g. Cloud Aggregation for Counter, Gauge and Rate #1700)
- Consider a full Prometheus Remote-write implementation
Origin
(e.g Builtin or Custom) of the metrics (at the moment the cloud backend has a fixed list of the Builtin metrics).
Open Questions
HDR format in protobufDone
Work log
It collects all the tasks required for the new cloud output. The new cloud outputs include a consistent refactor, a new binary format for the metrics requests' payload and samples aggregation and HDR Histogram generation on the client.
It depends on the following PRs as a prerequisite:
- cloud/v1: Aggregation in a dedicated pkg #3063
- output/cloud: Versioning #3041 Support multiple versions of the output
The following PRs are expected to be merged to have the final working output:
- cloud: New output v2 #3072 Experimental v2 Output foundations
- output/cloudv2: Aggregation #3071 Samples aggregation
- output/cloudv2: Binary-based payload #2963 Protobuf models and client
- output/cloudv2: Error handling for flush #3082 Handle errors on flush
- output/cloudv2: Flush the aggregated metrics #3083 Flushing of the aggregates
- output/cloudv2: Trend as Histogram #3027 HDR Histogram
- output/cloudv2: Optimized metric sinks #3085 Dedicated and optimized sinks
- output/cloudv2: Use unix nano as bucket time #3098 Bucket time as unix nano