Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config folder with many json definition files: Health checks are not correctly registered #3554

Closed
javicrespo opened this issue Oct 8, 2017 · 4 comments
Milestone

Comments

@javicrespo
Copy link
Contributor

javicrespo commented Oct 8, 2017

consul version for both Client and Server

Client: 1.0.0-beta2
Server: 1.0.0-beta2

consul info for both Client and Server

Client:

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 1
        services = 14
build:
        prerelease = beta2
        revision = edf7a16
        version = 1.0.0
consul:
        known_servers = 1
        server = false
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 41
        max_procs = 4
        os = windows
        version = go1.9.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 3
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 67
        members = 4
        query_queue = 0
        query_time = 1

Server:

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = beta2
        revision = edf7a16
        version = 1.0.0
consul:
        bootstrap = true
        known_datacenters = 1
        leader = true
        leader_addr = 10.7.18.50:8300
        server = true
raft:
        applied_index = 67206
        commit_index = 67206
        fsm_pending = 0
        last_contact = 0
        last_log_index = 67206
        last_log_term = 2
        last_snapshot_index = 65569
        last_snapshot_term = 2
        latest_configuration = [{Suffrage:Voter ID:09d7375c-fc34-6008-602e-303e3
7a8f9c8 Address:10.7.18.50:8300}]
        latest_configuration_index = 1
        num_peers = 0
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 2
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 89
        max_procs = 4
        os = windows
        version = go1.9.1
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 3
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 73
        members = 4
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Operating system and Environment details

Windows 2012 R2

Description of the Issue (and unexpected/desired result)

We use a folder with several json files to configure a consul agent. In one of our DEV servers we have one consul agent and several services on the same machine. So in the config folder, we have a _basic-config.json for the agent configuration, and a json file for each service, each one defining its own http api health check
While testing 1.0.0-beta2, we've realized that only the HTTP API healtcheck of the last (alphabetically ordered) service is registered. All the other services are included in Consul, but the only health check that is defined for them is Serf's.
So for example:

  • _basic-config.json
  • Service1.json
  • Service2.json

So while both Service1 and Service2 are registered in Consuol, only Service2.json's health checks are.
With version 0.9.3 all the API health checks were correctly registered for all the services. Using 1.0.0-beta2 for the server and 0.9.3 for the client works as expected.
I've just noticed that all the checks have the same id="api", so I suspect, they are getting overwritten somewhere here: https://github.com/hashicorp/consul/blob/master/agent/config/builder.go#L211
As the check is defined as a nested entity within the service definition, I'd expect that the id should be unique within the service definition, not across the agent(?)

Reproduction steps

Use 1.0.0-beta2 for the consul client. Create a config folder with at least 2 json files, each of them defining a different service with its http healtcheck http api. Only the "last" healtcheck is registered in consul.

@magiconair magiconair self-assigned this Oct 8, 2017
@magiconair
Copy link
Contributor

@javicrespo Can you provide the two service files?

@magiconair
Copy link
Contributor

I can reproduce this now. I think this is - unfortunately - a consequence of #3179.

To set an id for a check requires setting a different field if you define the check outside of a service or as part of a service. In the first case you use the id field and in the second case you use the CheckID or checkid field.

When the ID of the check is not set then the agent will register the check with the id service:<svcid>[:<checknum>] and my guess is that your checks aren't registered with your id but under service:yourservicename. This will register all checks and services.

However, the new config parser is more strict and at the same time provides a stepping stone to fix #3179. It accepts both id and check_id in both cases which now sets the id and prevents the generation of a unique check id. The fact that only one check gets registered probably also answers whether check ids have to be unique per agent (they do). I'll update the docs for that.

You can reproduce the same behavior with 0.9.3 by using checkid instead of id and set the same value:

{
	"service": {
		"name": "svc1",
		"address": "127.0.0.1",
		"port": 80,
		"check": {
			"checkid": "api",
			"http": "/path",
			"interval": "10s"
		}
	}
}

{
	"service": {
		"name": "svc2",
		"address": "127.0.0.1",
		"port": 90,
		"check": {
			"checkid": "api",
			"http": "/path",
			"interval": "10s"
		}
	}
}

Therefore, if you provide check ids they have to be unique per agent. I'll update the changelog to make sure that this is clear.

@javicrespo
Copy link
Contributor Author

Thanks for the info!
I dropped the check "Id" field and now everything works as expected. We don't need to reference individual checks, so an autogenerated Id is fine.

This paragraph from the doc should be corrected though:

Each type of definition must include a name and may optionally provide an id and notes field. The id is set to the name if not provided. It is required that all checks have a unique ID per node: if names might conflict, unique IDs should be provided.

https://www.consul.io/docs/agent/checks.html

With what you said above

When the ID of the check is not set then the agent will register the check with the id service:<svcid>[:<checknum>]

@slackpad
Copy link
Contributor

Fixed in #3557, including a documentation update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants