-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware WDT Stack Dump Tool #7010
Conversation
This Sketch demonstrates the use of a tool to print a stack dump at reboot after a Hardware WDT event. The module hwdt_app_entry.cpp writes a stack dump to the serial interface after a Hardware Watchdog Timer has struck and a new boot cycle has begun. The sketch must properly initialized the Serial port before the crash. hwdt_app_entry.cpp is the core file that does the work.
Improved comments. Added option to match the UART speed used by the sketch. Added option to print greeting at the start to indicate the HWDT stack dump code is active. Isolated logic for handling strings: one assuming they are not inited at the time the code is running and one that does. The later appears to be the case.
Fix issue with HWDT reason detection when sketch crashes too fast. Added sample sketch menu option for crashing with a function defined with a weak attribute via prototype, but never actually defined in full function form. eg. `void trouble(void){return;}`
Adjusted configuration option order for most likely to be used to the top. Tried to improve comments. Replace numbers with enum values.
Regular global strings have worked reliably. Tweaked reset reason detection. Reordered elements in global structure. Improve #if test for debug option Always improving comments.
data problem that occurs when printing after a flash upload using esptool. Curiously, esptool stub also uses similar delays. Word choices and description improvements.
…into pr-hwdt-stack-dump
no mixed up stack frame was created. Also removed no longer needed extra level of function calling. Use existing macros from uart_register,h to handle getting current UART speed. Added some missing `const`.
I haven't tried yet but I read the comments. |
@d-a-v, Sorry for the confusion, In order to capture and print the stack, I am running as an eboot.elf's stack usage, with just a flash boot, (w/o running the uncompress OTA logic) appears to be down around 720 bytes. At this time I have a conservative 1024 reserved for ROM and eboot. I think that can come down. Just waiting to see if anything grows. |
Serial speed is generally 74880 at boot time (115200*26/40). |
@d-a-v, ahh, I tend to over detail things. I think I went too far the other way. I have a 26MHz crystal. I understand that the boot ROM has initially set the UART clock divisor to support 115200 bps with the assumption that it has a 40MHz crystal. For a 26MHz crystal, the resulting error gives us a real rate of 74880 bps and prints jibberish at first until the CPU clock frequency is recalibrated. While CH_PD and EST_RST bring the CPU back to this state of underclocking with a 26MHz crystal, the Hardware WDT reset does not appear to do the same. The UART appears to continues to be clocked at a rate to support a device connected at 115200 bps. And, this allows us to see the famous and cryptic WDT message. I'll come up with a clearer description so people that are aware of the crystal issue are not distracted by my too brief description. On the delay for FIFO to clear, as I was thinking this through again, I remembered another factor in just using the delay was to save on IRAM. It only runs at startup, so I opted for keeping code smaller vs faster in this area. |
Added a few newlines to printing. Decreased the settling delay after the uart_div_modify call.
Print caution message when stack integrity checks failed.
…into pr-hwdt-stack-dump
Comment improvements Added missing ";"
Code cleanup. Comments.
Cache_Read_Enable working. Free 1K of IRAM and 200 bytes of DRAM.
Findings from exploring the Extra 4K heap option crash further: For the possible SYS stack memory range 0x3FFFE000 up to 0x40000000, there is hole at 0x3FFFEA80 up to 0x3FFFEB30 Looking at one of my projects I see the SYS stack high watermark at 0x3FFFE720. This places the hole within the active stack range and yet it does not crash. I assume when the WiFi connection is being done, the actively used stack space is low at that moment. However, when HWDT (with ROM Stack set to 1024) is used with Extra 4K Heap I do crash. My assumption is that the "actively used stack space" is overlapping the hole. In contrast to when it works with reserved ROM Stack set to 752. |
Improves the SYS stack space available when using the extra 4K Heap option in conjunction with HWDT. Replaces the ROM AES buffer at 0x3FFFEA80 with one provided by malloc().
Updated boards
Check if cont stack is yielding to SYS, use g_pcont->sp_yield to limit the amount of the cont stack dumped. Generalized dev logic path to create a generalized debug function hwdt_pre_sdk_init_icache.
Some HWDT crashes are improved by #7704 (comment). That PR moves them to crashes postmortem can report with much-improved context. |
@mhightower83 there are conflicts in this one. Could you please resolve them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Our debug menu is enormous already, adding a few more entries is no problem. The replacement startup looks good (if massively over-engineered and over-configurable, but it's mostly hidden if you don't care about it), and only comes into effect in specific debug modes.
Thx again, @mhightower83 !
Yea, I either under-plan or over-plan ahead. Also, with the discovery of how to turn on Flash execution, this PR transitioned from running mostly in IRAM to running mostly from Flash. Assuming we have no need for the IRAM build path, I should go back and trim that out. I don't think features in that build path have kept up with the Flash side. Assuming yes, to avoid introducing new bugs, it might be best to do in a new PR. On the over-plan side, I do have a generalization in the startup, that I think will allow for a possible user-supplied function ( Anytime you think I have gone too far, feel free to encourage me to come back 😄 |
A Hardware WDT Reset stack dump tool by Michael Hightower
This Sketch demonstrates the use of a tool that prints a stack dump after a Hardware WDT reset. After a Hardware Watchdog Timer reset, the module
hwdt_app_entry.cpp
writes a stack dump to the serial interface.The goal was to have a file (
hwdt_app_entry.cpp
) that could be dropped into a sketch directory, then open the sketch project, build, upload, and debug.When the ESP8266 restarts because of a Hardware WDT reset, the serial port speed defaults to 115200 bps. For support of other speeds, review the comments on option
HWDT_UART_SPEED
inwdt_app_entry.cpp
.See
hwdt_app_entry.cpp
for more options and details.