Skip to content

feat(websocket): add ability to reconnect after close (IDFGH-15455) #752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bryghtlabs-richard
Copy link
Contributor

@bryghtlabs-richard bryghtlabs-richard commented Jan 30, 2025

Description

Add a flag to client configuration to enable reconnection after websocket closed cleanly. Needed to support protocols like Socket.IO, where some websocket closes issued by the server are reconnectable.

Testing

Testing consists of issuing a websocket close from the server, and the client reconnecting.

Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes.
  • All CI checks (GH Actions) pass.
  • Documentation is updated as needed.
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas.
  • Git history is clean — commits are squashed to the minimum necessary.

Needed to support protocols like Socket.IO, where some
websocket closes issued by the server are reconnectable.
@euripedesrocha
Copy link
Collaborator

Hi @bryghtlabs-richard thanks for your contribution.

Could you reduce the scope of the semaphore take? The reason for that is that we should state a clear behavior in case of the failure of taking the semaphore and with this we isolate the new feature.

@github-actions github-actions bot changed the title feat(websocket): add ability to reconnect after close feat(websocket): add ability to reconnect after close (IDFGH-15455) Jun 11, 2025
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jun 11, 2025
@bryghtlabs-richard
Copy link
Contributor Author

Hi @euripedesrocha ,

Could you reduce the scope of the semaphore take? The reason for that is that we should state a clear behavior in case of the failure of taking the semaphore and with this we isolate the new feature.

I had thought about it, but never found a better way to handle failing to take the semaphore other than letting the websocket close and firing the WEBSOCKET_EVENT_CLOSED hook. I had kept them shared since the !client->config->close_reconnect path and failing to take the semaphore have the same handling, but I can split them up if it's preferred. Something like this?

            if (client->config->close_reconnect) {
                if(xSemaphoreTakeRecursive(client->lock, lock_timeout) == pdPASS) {
                    client->state = WEBSOCKET_STATE_WAIT_TIMEOUT;
                    client->error_handle.error_type = WEBSOCKET_ERROR_TYPE_SERVER_CLOSE;
                    esp_transport_close(client->transport);
                    client->reconnect_tick_ms = _tick_get_ms();
                    ESP_LOGI(TAG, "Reconnect after %d ms", client->wait_timeout_ms);
                    xEventGroupClearBits(client->status_bits, STOPPED_BIT | CLOSE_FRAME_SENT_BIT);
                    xSemaphoreGiveRecursive(client->lock);
                    esp_websocket_client_dispatch_event(client, WEBSOCKET_EVENT_CLOSED, NULL, 0);
                } else {
                    client->run = false;
                    client->state = WEBSOCKET_STATE_UNKNOW;
                    esp_websocket_client_dispatch_event(client, WEBSOCKET_EVENT_CLOSED, NULL, 0);
                    break;
                }
            } else {
                client->run = false;
                client->state = WEBSOCKET_STATE_UNKNOW;
                esp_websocket_client_dispatch_event(client, WEBSOCKET_EVENT_CLOSED, NULL, 0);
                break;
            }

-Richard

@euripedesrocha
Copy link
Collaborator

@bryghtlabs-richard I agree that the repetition doesn't look great, but it opens space for at least log to inform the error. This way we introduce the feature and can differentiate between both options. An alternative would be also introduced a log.
Reconsidering the goal, we can also just introduce a log in the failure scenario. You can leave as is for now, I can inroduce the log later.

Let me take a look on the Ci failures, they seems to be on our side.

@bryghtlabs-richard
Copy link
Contributor Author

Thanks @euripedesrocha , I agree - a failure log in the console would be helpful if we ever cannot take the semaphore.

I also think we should move sending the WEBSOCKET_EVENT_CLOSED so it doesn't happen within the semaphore. Sending WEBSOCKET_EVENT_DATA with the semaphore held had caused a lot of deadlocks for me, and I worry about WEBSOCKET_EVENT_CLOSED also causing them less often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants