Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mdns]: Prevent deadlock when deleting a browse request (IDFGH-13947) #676

Merged

Conversation

arex-ebee
Copy link
Contributor

Description

With v1.3.0 the mDNS component supports continuous browsing for services by calling mdns_browse_new(). In v1.4.0 whenever one wants to stop browsing for a specific service via mdns_browse_delete() there is a deadlock resulting in underlying mdns task to block forever, caused by the API locking introduced with the "Fix API race" commits for v1.4.0.

Background:
API requests are generally serialized into the mdns task - this also applies to the mdns_browse_... calls. The request to stop browsing for a service is put into the mdns task queue and is actually handled by the task via _mdns_execute_action() -> _mdns_browse_finish(). There it performs a linear search through the queue of pending browse requests to find a matching mDNS service/protocol tuple. Upon match it deletes the item via _mdns_browse_item_free() what calls mdns_query_results_free(). The latter is a public API call that tries to acquire the lock semaphore. Unfortunately the initial call to _mdns_execute_action() in the mdns task is already within a locked section. So in result the mdns_query_results_free() method waits endlessly for the semaphore getting released.


Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed. (no functional change)
  • Tests are updated or added as necessary. (no functional change)
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

@CLAassistant
Copy link

CLAassistant commented Oct 25, 2024

CLA assistant check
All committers have signed the CLA.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Oct 25, 2024
@github-actions github-actions bot changed the title [mdns]: Prevent deadlock when deleting a browse request [mdns]: Prevent deadlock when deleting a browse request (IDFGH-13947) Oct 25, 2024
@david-cermak david-cermak self-assigned this Oct 25, 2024
Copy link
Collaborator

@david-cermak david-cermak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@david-cermak
Copy link
Collaborator

@gytxxsy PTAL, too

@gytxxsy
Copy link
Contributor

gytxxsy commented Oct 25, 2024

@gytxxsy PTAL, too

I think this is a necessary fix. However, I noticed some behavior changes introduced by the modification. For example, in the API mdns_browse_new, it might directly call _mdns_browse_item_free. This change could lead to the following sequence when other tasks call mdns_browse_new:

mdns_browse_new --> _mdns_browse_item_free --> _mdns_query_results_free

Throughout this process, the mdns lock is not acquired by the task. I'm not sure if this behavior could potentially cause some issues.

@arex-ebee
Copy link
Contributor Author

@gytxxsy PTAL, too

I think this is a necessary fix. However, I noticed some behavior changes introduced by the modification. For example, in the API mdns_browse_new, it might directly call _mdns_browse_item_free. This change could lead to the following sequence when other tasks call mdns_browse_new:

mdns_browse_new --> _mdns_browse_item_free --> _mdns_query_results_free

Throughout this process, the mdns lock is not acquired by the task. I'm not sure if this behavior could potentially cause some issues.

I must admit that I do not fully overview the architecture of the mdns component, but the mdns_browse_new() method doesn't seem to operate on shared data, hence a locking is not necessary in my opinion.

@david-cermak
Copy link
Collaborator

I'm not sure if this behavior could potentially cause some issues.

No issues here, we're just freeing some local data (which weren't posted to the action queue) on exit path.

but the mdns_browse_new() method doesn't seem to operate on shared data, hence a locking is not necessary in my opinion.

Agree.

@gytxxsy Thanks for the comments!
@arex-ebee Thanks for the fix!

@david-cermak david-cermak merged commit a5b0b9d into espressif:master Oct 25, 2024
95 checks passed
@arex-ebee arex-ebee deleted the deadlock-on-mdns_browse_delete-call branch October 28, 2024 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants