Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFiClientSecure doesn't respect timeout in send_ssl_data #7356

Closed
cziter15 opened this issue Oct 14, 2022 · 15 comments
Closed

WiFiClientSecure doesn't respect timeout in send_ssl_data #7356

cziter15 opened this issue Oct 14, 2022 · 15 comments
Labels
Area: BT&Wifi BT & Wifi related issues Status: In Progress Issue is in progress

Comments

@cziter15
Copy link
Contributor

cziter15 commented Oct 14, 2022

No details about the device required as whole ecosystem is affected.

Affected component: WiFiClientSecure, ssl_client

Description

Function send_ssl_data, called by WiFiClientSecure doesn't respect timeout set on socket.

After disconnecting the network (not WiFi directly, just turn off LTE or disconnect network cable), having some small piece of code sending periodically few messages to MQTT broker via SSL, I've encountered freeze inside send_ssl_data function.

After some investigation, it appears to hang inside while loop.

Temporary working solution, but 5000 is a magic constant:

int send_ssl_data(sslclient_context *ssl_client, const uint8_t *data, size_t len)
{
    log_v("Writing HTTP request with % bytes...", len); //for low level debug
    int ret = -1;

    unsigned long send_start_time=millis();
    while ((ret = mbedtls_ssl_write(&ssl_client->ssl_ctx, data, len)) <= 0) {
        if((millis()-send_start_time)>5000)
            return -1;

        if (ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_WANT_WRITE && ret < 0) {
            log_v("Handling error %d", ret); //for low level debug
            return handle_error(ret);
        }
        //wait for space to become available
        vTaskDelay(2);
    }

    return ret;
}

The best way to fix this will be to introduce send_timeout inside sslclient_context and set it to socket timeout (passed in connect method by WiFiClientSecure).

@cziter15 cziter15 added the Status: Awaiting triage Issue is waiting for triage label Oct 14, 2022
@cziter15 cziter15 changed the title WiFiClientSecure don't respect timeout in ssl_send WiFiClientSecure don't respect timeout in send_ssl_data Oct 14, 2022
@cziter15 cziter15 changed the title WiFiClientSecure don't respect timeout in send_ssl_data WiFiClientSecure doesn't respect timeout in send_ssl_data Oct 16, 2022
@cziter15
Copy link
Contributor Author

I've submitted PR which adds socket_timeout to ssl_context and copies timeout value there. Then this socket_timeout variable is used in send_ssl_data as timeout.

@VojtechBartoska VojtechBartoska added Area: BT&Wifi BT & Wifi related issues Status: In Progress Issue is in progress and removed Status: Awaiting triage Issue is waiting for triage labels Oct 21, 2022
@VojtechBartoska
Copy link
Collaborator

@cziter15 Thank you for contribution, we will review your PR soon.

@TD-er
Copy link
Contributor

TD-er commented Nov 7, 2022

Could the timeout (also) be related to this one?
#6676

@cziter15
Copy link
Contributor Author

cziter15 commented Nov 9, 2022

This timeout here is something specific, not related to ms vs second interpretation.

@dsilletti
Copy link

dsilletti commented Nov 20, 2022

@cziter15 could you please provide your fix (socket_timeout)? This is probably the reason why my ESP32 remain stuck if I disconnect the WAN interface of my router, while connected via Wi-Fi to the MQTT broker (using PubSubClient with WiFiClientSecure and publishing every second).

Both the PubSubClient setSocketTimeout(1) and the setKeepAlive(15) do not help, the ESP32 remain stuck for 60s till the watchdog get triggered.

Thanks

@dsilletti
Copy link

dsilletti commented Nov 20, 2022

temporary fixed limiting the loop to 100 retries, finally the MQTT get disconnected and reconnects properly when the WAN is back.
The timer based fix you used (I tried with 3000ms) was not working properly for me, the MQTT was disconnecting but not reconnecting.

int send_ssl_data(sslclient_context *ssl_client, const uint8_t *data, size_t len)
{
    log_v("Writing HTTP request with %d bytes...", len); //for low level debug
    int ret = -1;

    int retry = 0; 

    while ((ret = mbedtls_ssl_write(&ssl_client->ssl_ctx, data, len)) <= 0) {
        
        if (retry > 100) {
            return -1;
        }

        if (ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_WANT_WRITE && ret < 0) {
            log_v("Handling error %d", ret); //for low level debug
            return handle_error(ret);
        }
        //wait for space to become available
        vTaskDelay(2);
       
        retry = retry + 1;
    }

    return ret;
}

@cziter15
Copy link
Contributor Author

cziter15 commented Nov 20, 2022 via email

@dsilletti
Copy link

dsilletti commented Nov 20, 2022

Thanks @cziter15, I had to tune my timers, it works now properly also configuring the timeout in seconds, according to your code.

It should definitely be fixed, probably using the value configured with:

int WiFiClientSecure::setTimeout(uint32_t seconds)

because it's not a timeout in the connection but while sending the data.

@VojtechBartoska could you please prioritize this? All ESP32 devices using WiFiClientSecure get stuck in a while loop if they lose internet connectivity while sending data (if Wi-Fi stays connected). Thanks

@cziter15
Copy link
Contributor Author

cziter15 commented Nov 20, 2022

Yup, I can confirm that behavior. Device is geting stuck inside task and it's not triggering watchdog. Network stack is working as you can still ping the device. Afaik, my PR has been reviewed by @me-no-dev, but @SuGlider has still to review it. Anyway, handshake timeout might be probably tweaked as now it has really huge timeout.

@dsilletti
Copy link

dsilletti commented Nov 20, 2022

my workaround to have a timeout configurable in the sketch using the handshake timeout:

set in the sketch the timeout and the handshake timeout to a reasonable value (in seconds):

WiFiClientSecure espClientSecure;
espClientSecure.setTimeout(3);
espClientSecure.setHandshakeTimeout(3);

then add the timeout in ssl_client.cpp in the send_ssl_data() function like this:

int send_ssl_data(sslclient_context *ssl_client, const uint8_t *data, size_t len)
{
    log_v("Writing HTTP request with %d bytes...", len); //for low level debug
    int ret = -1;

    unsigned long send_start_time=millis(); //start time sending data
    while ((ret = mbedtls_ssl_write(&ssl_client->ssl_ctx, data, len)) <= 0) {   
        if (ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_WANT_WRITE && ret < 0) {
            log_v("Handling error %d", ret); //for low level debug
            return handle_error(ret);
        }
        if ((millis()-send_start_time)>ssl_client->handshake_timeout) //timeout configured to handshake timeout
            return -1;
        //wait for space to become available
        vTaskDelay(2);
    }

    return ret;
}

at least the handshake timeout is accessible directly using ssl_client->handshake_timeout with no other changes required

@cziter15
Copy link
Contributor Author

cziter15 commented Nov 20, 2022

There's setHandshakeTimeout function if you want to manipulate that value in ssl client ctx. Anyway, i've issued PR which uses socket timeout (new field in ssl client struct). Don't know if handshake needs specific timeout, maybe all these things might be handled using socket timeout. But that requires to be discussed consciously.

@dsilletti
Copy link

dsilletti commented Nov 20, 2022

Sure, mine it's just a temporary workaround.

@cziter15
Copy link
Contributor Author

cziter15 commented Nov 20, 2022

Check out my fork, I'll keep it up to date until this issue is fixed on master of arduino-esp32. I've issued PR with these changes.

If you use platformio you can easily replace framework source to use my github fork. It will automatically clone repository and will work smoothly in github actions.

There's also another fix inside, for getHostByName function, which checks if passed string is an IP address and instead of triggering dns flow in that case it just converts passed string ip to IPAddress structure.

https://github.com/cziter15/arduino-esp32

@dsilletti
Copy link

cool, thanks!

@cziter15
Copy link
Contributor Author

cziter15 commented Dec 30, 2022

Closing issue. Fixed in 2.0.6 release by PR #7351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: BT&Wifi BT & Wifi related issues Status: In Progress Issue is in progress
Projects
None yet
Development

No branches or pull requests

4 participants