Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly reset callbacks without going through lwip functions (fix #30) #31

Merged
merged 1 commit into from
Feb 25, 2025

Conversation

mathieucarbou
Copy link
Member

fix #30

@mathieucarbou mathieucarbou self-assigned this Feb 17, 2025
@mathieucarbou mathieucarbou requested a review from a team February 17, 2025 13:48
@willmmiles
Copy link

At least as I understood the documentation, LwIP frees the tcp_pcb when _tcp_error returns. Any dereferencing of _pcb in AsyncClient::_error, or any other event callback after that point, is a use-after-free and will result in corrupting memory. This was the major impetus behind the redesign in #21.

@mathieucarbou
Copy link
Member Author

At least as I understood the documentation, LwIP frees the tcp_pcb when _tcp_error returns. Any dereferencing of _pcb in AsyncClient::_error, or any other event callback after that point, is a use-after-free and will result in corrupting memory. This was the major impetus behind the redesign in #21.

@me-no-dev suggested also another approach I will try, which is to immediately cleanup and after send the event in the queue (for the user callback only)

@willmmiles
Copy link

You can skip most of the API cleanup as the pcb is about to be freed; none of the other callbacks will get called again anyways. The key cleanup is to invalidate the close_slot so that any in-flight operation on the async thread won't accidentally try to dereference the now-released pcb.

@mathieucarbou
Copy link
Member Author

mathieucarbou commented Feb 24, 2025

@willmmiles @me-no-dev : just a heads up for this issue / PR:

We had 2 users having the issue (@hosseinaghaie and @moreArti - see #30).

@moreArti was able to test and confirm that the initial fix works. The initial fix was just about cleaning up states without calling the lwip functions.

void AsyncClient::_error(int8_t err) {
  if (_pcb) {
    TCP_MUTEX_LOCK();
    tcp_arg(_pcb, NULL);
    if (_pcb->state == LISTEN) {
      // tcp_sent(_pcb, NULL);
      _pcb->sent = NULL;
      // tcp_recv(_pcb, NULL);
      _pcb->recv = NULL;
      // tcp_err(_pcb, NULL);
      _pcb->errf = NULL;
      // tcp_poll(_pcb, NULL, 0);
      _pcb->poll = NULL;
      _pcb->pollinterval = 0;
    }
    TCP_MUTEX_UNLOCK();
    _free_closed_slot();
    _pcb = NULL;
  }

I have update this PR yesterday to update the fix with something we discussed with @me-no-dev : cleanup earlier within the lwip thread and only do the callbacks from the async_tcp task.

I am not sure about the _free_closed_slot(), if can be done earlier too, or if it has to be executed from the async_tcp task.

Do you know ?

Also, I have asked @moreArti to test again with this new fix.

Thanks everybody!

@mathieucarbou mathieucarbou merged commit 93c6ccd into main Feb 25, 2025
13 checks passed
@mathieucarbou mathieucarbou deleted the issue-30 branch February 25, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in AsyncTCP due to sudden WiFi disconnection
3 participants