Skip to content

Agent remote configuration #76

Closed
Closed
@jalvz

Description

@jalvz

Overview

Following #4 , agents need to be able to poll apm-server for configuration changes received upstream from Kibana, apply them, and log the result with status (success | failure), failure cause (if any), timestamp, setting name and value.

We will start providing support for TRANSACTION_SAMPLE_RATE.

Requirements

At minimum, agents need to agree on:

  • name and default value for config polling interval setting (RUM can have a different default).
  • exact message pattern and log level.
  • top 2-5 settings that should follow sampling rate, in order of priority.

APM Server API

  • Server will expose a /config/v1/agents endpoint, for agents to GET with a service.name URL query parameter (required), and service.environment (optional)

  • Agents might send a request with a If-None-Match header, to which Server will respond with a 304 - not modified response; or with 200, a response body with the configuration, and an Etag header.

  • Example
    curl -v -H "If-None-Match:1" "http://localhost:8200/config/v1/agents?service.environment=prod&service.name=opbeans"

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8200 (#0)
> GET /config?service.environment=prod&service.name=opbeans HTTP/1.1
> Host: localhost:8200
> User-Agent: curl/7.61.0
> Accept: */*
> If-None-Match:1
> 
< HTTP/1.1 200 OK
< Cache-Control: max-age=0
< Content-Type: application/json
< Etag: 2
< Date: Thu, 11 Apr 2019 14:08:32 GMT
< Content-Length: 27
< 
{
  "transaction_sample_rate": 0.7
}

Update 23/04

Configuration settings are taken from the environment variable names, without the ELASTIC_APM_ prefix, and lower case.

Update 06/05

As per comment #76 (comment), the Server will accept query parameters both in the URL and in the body of a POST request.
If different values for the same attribute are provided as POST and GET, request will be 400-rejected; different attributes will be joint.

Other notes that slipped in the initial description:

  • As pointed out in Agent remote configuration #76 (comment), agents should also align the error handling behaviour. For instance, if a config update can't be applied, should fallback to the last good value, to the agents default value, or to the value that the process started with?

  • Regarding service.environment, if none is passed in the query, only config updates without service environment will match. Likewise, if one is passed, only config updates with that value will match (and not config updates without value). In other words, a missing service environment is treated like any other (with a value of "" if you want to see it that way).

Update 29/05

  • The Server will send all attributes as strings.

Update 01/07

  • The Server caches the agent configuration for 30s by default (changeable via apm-server config) and sets the expiration time via Cache-Control: max-age header in every successful response. For failing agent requests the header will be set with a max-age: 300 (5 mins) since querying again after 30s doesn't make sense. Decision was made to set to 5 mins instead of not setting or setting to 0 so agents don't need to put their own logic and can differentiate between server not supporting remote config and failures. More details on this in [ACM] Optimization / caching apm-server#2220.

Updates for 7.3

Status

@elastic/apm-agent-devs please link your implementation issues

Let me know if you have any questions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions