Skip to content

WiFi not recovering from wait for available TCP send buffer #118

Closed
@GregTerrell

Description

@GregTerrell

While developing a solution based on the Azure IoT Hub I detected a problem where the WIFI driver would appear locked and the WINC activity light would stay on solid. For testing purposes I switch to use the remote_monitorning example project (from Microsoft in the Azure-Samples/iot-hub-c-m0wifi-getstartedkit github repository). For reference, I was using a Adafruit Feather M0 WIFI development board. The issue typically took 5-8 hours of running in order to appear, but occasionally happened shortly after test start.

The issue was initiated by the azure-iot-arduino library’s event send function. When an event is sent to iothub, azure-iot-arduino performs 18 sends to complete sending of the event via http (consisting of the http request, the headers and body [this can be improved, see workaround below] ). After 16 (of the 18) sends, the WINC code hif_send() (in m2m_hif.c) will typically start to loop waiting for a dma_addr != 0. The loop exits on a send callback receipt or 1000 requests for a dma_addr. So effectively this should implement a TCP sliding window of 16 frames. If the hif_send() times out (for-loop 1000 dma_addr requests), the layer above (WiFiClient::write() in WiFiClient.cpp) tries to send again (infinite loop) waiting for a buffer aka dma_addr. I found that once the buffer wait starts, it will require about 160-180 loops to start receiving send callbacks at open buffers for sending. I my test the events where 10 seconds apart, so all 16 send buffers would be available at the next event.

All good, most of the time. But, it seems under some condition (usually hours into the test) the send callback for any of the 16 outstanding TCP sends is never processed, a needed buffer is never freed and consequently the send window is never re-opened. The WINC appears locked up with a solid activity light as the WiFi code endlessly loops waiting for a buffer.

Does this appear to be a WINC firmware\code issue to report to Atmel (if so, how does one report this to Atmel)? The WIFI code at this point is pure Atmel reference based on their Atmel-42420-WINC1500-Software-Design-Guide_UserGuide document.

Workaround (implemented in azure-iot-hub):
Within the azure-iot-hub library (adapter\httpapi_compact.c, SendHeadsToXIO() )… consolidate sending separate TCP sends for each header and the following CRLF into a send of single string containing the header text and the terminating CRLF. This reduces the number of TCP sends for an event posting from 18 to 11 (this could be reduced to 10 with further refactoring). This keeps the WINC from ever hitting the 16 buffer limit. Prior to implementing this workaround, I could never exceed 8 hours of run time before lockup; my test has now been executing for 40 hours.

The workaround above seems to have fixed my use case (now running for 36 hours, never made it past 8 hours before workaround), but other applications sending more frequently are likely to fail without such an easily implemented workaround.

Thanks,
Greg

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: imperfectionPerceived defect in any part of project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions