Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service did not deregistered properly #3122

Open
waterdudu opened this issue Jun 8, 2017 · 6 comments
Open

service did not deregistered properly #3122

waterdudu opened this issue Jun 8, 2017 · 6 comments
Labels
theme/consul-nomad Consul & Nomad shared usability theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected type/docs Documentation needs to be created/updated/clarified

Comments

@waterdudu
Copy link

waterdudu commented Jun 8, 2017

I have both consul cluster and nomad cluster running in the same 3 nodes(node-0, node-1, node-2) and a service call goddess-of-harvest running with count of 2, deployed by nomad.

After releasing several versions of that service, I noticed a zombie service occured.

After checking the consul log file, I noticed the service was synced and deregistered infinitely. Below is the consul log.

2017/06/07 17:24:57 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:01 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:02 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:06 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:07 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:11 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:12 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:16 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:17 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:21 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:22 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:26 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:27 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:31 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:32 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:36 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:37 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'

I tried to call consul deregister api curl -v --request PUT "http://127.0.0.1:8500/v1/agent/service/deregister/goddess-of-harvest-fe". Nothing happened and the service can still be seen in v1/catalog/service api response.

What was strange, however, the consul log said service does not exist.

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "goddess-of-harvest-fe": Service does not exist

I also tried to call deregister api curl --request PUT --data @payload.json http://127.0.0.1:8500/v1/catalog/deregister with payload file to deregister the service.

{
  "Datacenter": "dc1",
  "Node": "node-2",
  "ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service"
}

Althought that PUT request returned true, consul catalog api told me the service was still available.

curl -s localhost:8500/v1/catalog/service/goddess-of-harvest-fe | jq .

[
  {
    "ModifyIndex": 28094261,
    "CreateIndex": 28094261,
    "ServiceEnableTagOverride": false,
    "Node": "node-2",
    "Address": "10.0.1.10",
    "TaggedAddresses": {
      "wan": "10.0.1.10",
      "lan": "10.0.1.10"
    },
    "ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service",
    "ServiceName": "demeter-fe",
    "ServiceTags": [
      "demeter-fe",
      "http-service"
    ],
    "ServiceAddress": "10.0.1.10",
    "ServicePort": 38846
  }
]

consul info for both Client and Server

Client:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 4
build:
	prerelease =
	revision = '21f2d5a
	version = 0.7.5
consul:
	known_servers = 3
	server = false
runtime:
	arch = amd64
	cpu_count = 1
	goroutines = 43
	max_procs = 1
	os = linux
	version = go1.7.5
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 39
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 2039
	members = 35
	query_queue = 0

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 3
	services = 5
build:
	prerelease =
	revision = 'a9455cd
	version = 0.7.1
consul:
	bootstrap = false
	known_datacenters = 1
	leader = false
	leader_addr = 10.0.1.3:8300
	server = true
raft:
	applied_index = 28168266
	commit_index = 28168266
	fsm_pending = 0
	last_contact = 50.741295ms
	last_log_index = 28168266
	last_log_term = 43
	last_snapshot_index = 28167119
	last_snapshot_term = 43
	latest_configuration = [{Suffrage:Voter ID:10.0.1.2:8300 Address:10.0.1.2:8300} {Suffrage:Voter ID:10.0.1.3:8300 Address:10.0.1.3:8300} {Suffrage:Voter ID:10.0.1.4:8300 Address:10.0.1.4:8300}]
	latest_configuration_index = 4247902
	num_peers = 2
	protocol_version = 1
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Follower
	term = 43
runtime:
	arch = amd64
	cpu_count = 2
	goroutines = 209
	max_procs = 2
	os = linux
	version = go1.7.3
serf_lan:
	encrypted = false
	event_queue = 0
	event_time = 39
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 2039
	members = 35
	query_queue = 0
	query_time = 2459
serf_wan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

My consul version is Consul v0.7.1 and noamd version is Nomad v0.5.4.

Thanks

@slackpad
Copy link
Contributor

slackpad commented Jun 8, 2017

Hi @waterdudu thanks for opening an issue. Are there any messages in the logs on your Consul servers when this is happening on the agent (the dereg/sync loop and when you try to remove it from the catalog manually)?

@slackpad slackpad added type/bug Feature does not function as expected theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization labels Jun 8, 2017
@waterdudu
Copy link
Author

Hi @slackpad

Since log level is INFO, there are only a few synced node lines except for the Deregistered service/Synced servcie lines.

2017/06/07 11:36:27 [INFO] agent: Synced node info

@cstruct
Copy link

cstruct commented Oct 10, 2017

Ran into the same issue, only way for me to get rid of the zombies was to stop the consul agent, nuke its data directory and then start it again.

@daghb
Copy link

daghb commented Jun 25, 2019

Looks like this is still an issue - at least on the 1.4.3 version. Also, deregistering a service makes a rendom number of other services turn critical for some seconds until they have been run through the next health check. Deleting the data directory fixes the problem.

@jsosulska jsosulska added the theme/consul-nomad Consul & Nomad shared usability label Apr 17, 2020
@leoowu
Copy link

leoowu commented Sep 29, 2020

Same issue on the version 1.8.3

@jkirschner-hashicorp
Copy link
Contributor

Hello everyone,

My apologies that this is being revisited after so long. Without more information than was available in the provided logs at the INFO level, our ability to guide in troubleshooting was limited.

If any of you are still actively facing issues related to this, please reply. We'd like to understand whether the different commenters were experiencing the same or different issues.

On getting the deregistration to work

One problem I noticed in the original post is that the call to the deregister service endpoint uses the ServiceName (goddess_of_harvest_fe) instead of the ServiceID (_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service).

Deregister service unfortunately only works with a service ID, and I can see how that error message would be quite confusing (because a service with that name does exist, just not one with that id).

So...

curl -v --request PUT "http://127.0.0.1:8500/v1/agent/service/deregister/goddess-of-harvest-fe"

... should be ...

curl -v --request PUT "http://127.0.0.1:8500/v1/agent/service/deregister/_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service"

It looks like service ID was correctly used in the call to the /v1/catalog/deregister endpoint, but that endpoint isn't recommended. The documentation hints that the agent endpoints for deregistration (/v1/agent/service/deregister) should be used instead, but this isn't very clear. Consul treats the state of the agent as authoritative. Though you removed the service from the catalog, because the service is still registered with the agent, the service will be re-added to the catalog by anti-entropy syncs.

I've marked this issue with the type/docs tag as a reminder to consider clarifying some documentation.

On the origins of the sync/deregister loop

We are not aware of a mechanism purely within Consul that would cause the sync/deregister loop. It is our understanding that nomad must be the cause of re-registering of the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/consul-nomad Consul & Nomad shared usability theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

No branches or pull requests

7 participants