Provide clearer warning when Name is used when ID is needed instead #10861

jkirschner-hashicorp · 2021-08-17T00:07:21Z

Feature Description

Some entities within Consul have an ID and a Name (e.g., services). When the CLI or an API endpoint for such an entity specifically requires an ID, and will not work with a Name, Consul should provide an error message that makes this mistake clear to the user.

Example taken from #3122: let's say a user has a service with ID web-service-id and Name web-service-name. If the user executes /v1/agent/service/deregister/web-service-name, the following is output to the log:

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "web-service-name": Service does not exist

That error message is very misleading, as a service with that name does exist... just not a service with that id. The user might reasonably conclude something is wrong with the Consul catalog / state, when the actual problem is that the wrong field was used in the API call. Perhaps if Consul fails to lookup a service with that id, it could lookup whether any services exist with that name and, if some do, output a clearer warning message... such as:

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "web-service-name": a service ID must be used to deregister a service, not a service name.

This description will be updated to include relevant cases in the list below.

API:

GET /agent/service/:service_id: requires service ID, not name
PUT /agent/service/deregister/:service_id: requires service ID, not name (potentially related issues: Unable to delete unwanted services #9861, service did not deregistered properly #3122, Unable to deregister service if multiple nodes available #9415). See PR Clarify service and health check not found error messages #10894.
PUT /catalog/deregister: requires node ID, not name (user has tried to deregister using the node name in this issue and in the past Zombie agent trapped in catalog #10848)... EDIT: the docs are wrong, node name is required, not ID. Need to update the docs.

Use Case(s)

Ease troubleshooting of any CLI or API endpoints that accept an ID but not a name.

The text was updated successfully, but these errors were encountered:

jkirschner-hashicorp · 2021-08-21T03:40:53Z

@dnephin : I'm not actually sure this is accurate

PUT /catalog/deregister: requires node ID, not name

The docs for deregister do say that the node argument must be "the ID of the node", but I think the code suggests that it's the node name that is used, not node ID. Node name is supposed to be unique within the cluster, so that seems reasonable?

Q: What's your read on whether node name or node ID is used for /catalog/deregister? I can update the docs if needed.

Below is my attempt to trace through the code, though I feel like I went wrong somewhere... (because I ended up in a services table, not a node table)

I think the Node arg from the request gets passed to catalog_endpoint Deregister:

consul/agent/consul/catalog_endpoint.go

Lines 334 to 337 in 997547b

    
           func (c *Catalog) Deregister(args *structs.DeregisterRequest, reply *struct{}) error { 
        
           	if done, err := c.srv.ForwardRPC("Catalog.Deregister", args, reply); done { 
        
           		return err 
        
           	}

then State.NodeService:

consul/agent/consul/catalog_endpoint.go

Line 360 in 997547b

_, ns, err = state.NodeService(args.Node, args.ServiceID, &args.EnterpriseMeta)

That eventually ends up within getNodeServiceTxn and performs a query using node name:

consul/agent/consul/state/catalog.go

Lines 1252 to 1256 in 997547b

    
           service, err := tx.First(tableServices, indexID, NodeServiceQuery{ 
        
           	EnterpriseMeta: *entMeta, 
        
           	Node:           nodeName, 
        
           	Service:        serviceID, 
        
           })

And the query above seems to lead to a table of services:

consul/agent/consul/state/catalog_schema.go

Lines 133 to 135 in 997547b

    
           func servicesTableSchema() *memdb.TableSchema { 
        
           	return &memdb.TableSchema{ 
        
           		Name: tableServices,

dnephin · 2021-09-30T18:34:14Z

Sorry, just catching up on this now. I think the second line you link (state.NodeService) is only when ServiceID != "", and for a node deregister that should be the empty string, so we'd skip that line.

The first place I see the Node field being referenced is in checking the ACL permissions here:

consul/agent/consul/catalog_endpoint.go

Lines 403 to 405 in 01e9740

    
           if authz.NodeWrite(subj.Node, &authzContext) == acl.Allow { 
        
           	return nil 
        
           }

Our acl docs for node rule say it is expecting a node name, and I see other callers passing a node name, so definitely the ACL authorization appears to expect node name (not ID).

After that it goes into raftApply, which ends up in the FSM here:

consul/agent/consul/fsm/commands_oss.go

Line 172 in 01e9740

    
           if err := c.state.DeleteNode(index, req.Node, &req.EnterpriseMeta); err != nil {

And into the state store here:

consul/agent/consul/state/catalog.go

Lines 546 to 548 in 01e9740

    
           // Look up the node. 
        
           node, err := tx.First(tableNodes, indexID, Query{ 
        
           	Value:          nodeName,

This query for tableNodes, indexID is matching the structs.Node.Node field, which is indeed the name.

So it does seem our docs are wrong, this API uses node name, not node ID.

jkirschner-hashicorp · 2021-12-10T16:06:38Z

Regarding:

GET /agent/service/:service_id: requires service ID, not name

The current error message is "Unknown service ID: <service_id>". This makes it clear that an ID is needed, not name.

jkirschner-hashicorp added type/enhancement Proposed improvement or new feature theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner labels Aug 17, 2021

jkirschner-hashicorp mentioned this issue Aug 17, 2021

Unable to delete unwanted services #9861

Closed

jkirschner-hashicorp added the type/umbrella-☂️ Makes issue the "source of truth" for multiple requests relating to the same topic label Aug 19, 2021

jkirschner-hashicorp mentioned this issue Aug 22, 2021

Clarify service and health check not found error messages #10894

Merged

jkirschner-hashicorp mentioned this issue Aug 31, 2021

Unable to deregister service if multiple nodes available #9415

Closed

jkirschner-hashicorp mentioned this issue Oct 1, 2021

Zombie agent trapped in catalog #10848

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide clearer warning when Name is used when ID is needed instead #10861

Provide clearer warning when Name is used when ID is needed instead #10861

jkirschner-hashicorp commented Aug 17, 2021 •

edited

Loading

jkirschner-hashicorp commented Aug 21, 2021

dnephin commented Sep 30, 2021 •

edited

Loading

jkirschner-hashicorp commented Dec 10, 2021

Provide clearer warning when Name is used when ID is needed instead #10861

Provide clearer warning when Name is used when ID is needed instead #10861

Comments

jkirschner-hashicorp commented Aug 17, 2021 • edited Loading

Feature Description

Use Case(s)

jkirschner-hashicorp commented Aug 21, 2021

dnephin commented Sep 30, 2021 • edited Loading

jkirschner-hashicorp commented Dec 10, 2021

jkirschner-hashicorp commented Aug 17, 2021 •

edited

Loading

dnephin commented Sep 30, 2021 •

edited

Loading