Skip to content

Long Running Futures Eventually Stall #3

Open
@fko-kuptec

Description

@fko-kuptec

I am currently developing a firmware for the ESP that needs to serve the camera feed as HTTP stream. For testing purposes, I reduced that further down to serving a simple JPEG animation. I initially used edge-executor::LocalExecutor to run my custom async HTTP server, handling every connection in a separate future spawned to the same thread-local executor. One stream to one client is therefore one (possibly infinitely) running async task.

Unfortunately, I ran into some weird issues: When running multiple streams in parallel, any stream might freeze randomly and never get back to live. Sometimes, the whole ESP crashes with some unhandled exception. It does not seem to be the same exception every time, but here is one:

Guru Meditation Error: Core  0 panic'ed (Unhandled debug exception). 
Debug exception reason: BREAK instr 
Core  0 register dump:
PC      : 0x420171d7  PS      : 0x00060b36  A0      : 0x82010e96  A1      : 0x3fccf060  
0x420171d7 - alloc::task::raw_waker::clone_waker
    at ??:??
A2      : 0x3fcd1154  A3      : 0x3fcd1154  A4      : 0x00000000  A5      : 0x3fccf3a0  
A6      : 0x3fc9ade8  A7      : 0x3fca12dc  A8      : 0xfefefefe  A9      : 0xffffffff  
0x3fc9ade8 - xKernelLock
    at ??:??
0x3fca12dc - pxReadyTasksLists
    at ??:??
A10     : 0xfefefefe  A11     : 0xfefefefe  A12     : 0xfefefefe  A13     : 0x00060b23  
A14     : 0xfffffffe  A15     : 0x0000cdcd  SAR     : 0x00000020  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x40056f5c  LEND    : 0x40056f72  LCOUNT  : 0xffffffff  


Backtrace: 0x420171d4:0x3fccf060 0x42010e93:0x3fccf080 0x4201800a:0x3fccf9c0 0x4202bb7f:0x3fccf9f0 0x42030838:0x3fccfa10
0x420171d4 - alloc::task::raw_waker::clone_waker
    at ??:??
0x42010e93 - std::sys_common::backtrace::__rust_begin_short_backtrace
    at ??:??
0x4201800a - core::ops::function::FnOnce::call_once{{vtable.shim}}
    at ??:??
0x4202bb7f - std::sys::pal::unix::thread::Thread::new::thread_start
    at ??:??
0x42030838 - pthread_task_func
    at ??:??

At some point I tried using a different executor. After replacing edge-executor by futures-executor::LocalPool, I could not reproduce these issues anymore. The streams seem to continue running without problems. Then, I've written my own version of LocalExecutor from scratch without using any third-party crates, and this also seems to work fine.

This makes me believe, that edge-executor has a bug... somewhere. I have not really a clue where it is. I just found this open issue in async-task talking about tasks not getting rescheduled randomly. That would fit my observations, at least.

Sorry for not providing sample code. I am developing the firmware as employee and therefore cannot just share our product's firmware. If you are interested, I can however try to build a minimal example, when I find the time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions