-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using this code renders I2C unstable on ESP32/arduino? #4
Comments
Do you have any sample code? What pins are you assigning for I2C and for the ws2812 output? I won't be able to test this til after this weekend, but more info will allow me to see if I can reproduce the problem. If I2C and RMT.dont play well together that'll be disappointing... I have enough trouble with I2C on the ESP32 as it is (in terms of timing and reliability). |
Yeah, I've been trying to narrow it down to something simple that causes a crash on top of your demo1 code, but so far no luck. And yet, the more I move the LED code inside my code (that does a lot more), I get all kinds of Wire/I2C crashes. 0x40083a21: xQueueGenericReceive at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/./queue.c line 1998 0x400d61c8: i2cWrite at /home/merlin/Arduino/hardware/espressif/esp32/cores/esp32/esp32-hal-i2c.c line 379 0x400d65b3: i2cSetFrequency at /home/merlin/Arduino/hardware/espressif/esp32/cores/esp32/esp32-hal-i2c.c line 379 |
@MartyMacGyver so I think I'm hitting a compiler bug or something, adding your code causes crashes in I2C later reliably. Those crashes go away if I disable your code, or if I comment out 1 lines of simple code: |
Will be discussing this on https://gitter.im/espressif/arduino-esp32 and will report back |
I'm not sure what constrain() is but subtle problems like this sometimes point to code bugs like uninitialized pointers (in your code, my code, or perhaps the base Arduino-ESP32 code...). A compiler bug is possible but IMHO less likely. This code seems pretty straightforward. |
Thanks. Good point on the uninitialized pointer suggestion. |
I've never tried that on this platform. |
Ok, so I did more debugging and basically my code crashes soon after I run ws2812_setColors(NUMPIXELS, pixels); |
I can't test this on hardware til next week, but on what schedule are you reading your I2C device? Polled or interrupt driven? The RMT scheme uses interrupts and I wonder if that is a factor here. |
I'm talking to the ESP32 Wire library, stuff like this: |
It doesn't sound like you're doing interrupt processing there (e.g. a sensor that signals data readiness by triggering an interrupt pin), just normal polling and writing. But it might be happening at a lower level in the HAL. Let us know what gpio pins you have things wired to. It might matter. |
Right, I'm not having ISRs on top to talk to I2C. |
Ok, did a bit more narrowing down and basically it got down to 3 lines of code in this function void ws2812_setColors(uint16_t length, rgbVal *array) { |
I'm just confused as to why even the single xSemaphoreTake(ws2812_sem, portMAX_DELAY); line causes the code to hang when the RMT lines are also commented out. Also, with the pixel line above commented out (and LEDs working) if I comment out the 2 RMT lines, then the code hangs in your setColors function on cannot make sense of that either :( |
Sounds like a race condition... |
between which components? (assuming you have a guess) |
I did another round of tests, because this is driving me crazy :) I've done 3 tests to figure out what's going on, and maybe with the results, you can venture a guess better than me. Your code either works, hangs on the semaphore, or succeeds but causes I2C in my code to crash later, or even Serial.println to hang half way through a print. Here are the 3 combinations. Does that give you a better hint of what's going on? ---------------- Test #1 Before colors ---------------- Test #2 Before colors ---------------- Test #3 Before colors |
This code may be the problem (or part of it): I'm guessing the intention is that ws2812_buffer is not freed until after the RMT interrupt has fully copied it into the RMT peripheral memory? As implemented, it has at least one race condition causing a use-after-free. This may may lead to the kind of crashes you're seeing. The race is - if the RMT interrupt happens after vSemaphoreDelete, but before ws2812_sem is nulled out, then xQueueGiveFromISR in the ISR will access freed memory. Rather than create and delete the semaphore each time, I'd suggest creating it once for the life of the driver, and use Although you may also want to think about whether you can structure this code without the semaphore at all - using a queue of RMT items or something similar. |
Interesting - this is where I'm leaning on @FozzTexx's original code (https://github.com/FozzTexx/ws2812-demo) so it's a latent defect. I will look at how to fix this. At the very least getting rid of the race condition will be the first step. When you say a queue of RMT items, do you have an example or reference for how that might work? |
@projectgus a huge thank you for the analysis and finding the issue |
Thank @FozzTexx... he wrote it, I just took it and improved upon it. (That said, whatever I do to make it work right will need to be fed back to his as well, for completeness.) First I need to be able to reproduce the issue on my own setup, so I can at least see if I can make the problem go away with said changes. Will work on it some tonight. * And there is no small irony that your code is what led me to find this code which in turn you ended up using. I thought I recognized your name here... :-D |
@MartyMacGyver yes I just realized 2 days ago that you were the same person on the adafruit issue saying that my patch wasn't so good (which indeed is true) :) |
@marcmerlin Yeah, I didn't mean to hate on your code - it's just that bit-banging becomes bang-head-on-wall-ing with the ESP32. If it wasn't for @FozzTexx's original code to use the RMT peripheral I might've given up. (As it is, I'm dealing with fundamental I2C issues on this platform that are really vexing, but that's another matter - out of curiosity, have you had any issues with I2C devices dropping offline, glitching/sending corrupt data, or otherwise acting up versus other devices like the Teensy?) |
@MartyMacGyver no worries, no offence taken, you and me-no-dev were both right that it was a dead end route, just the quickest one I could take for now, and still the one I'm using until I can use your code again. |
@marcmerlin While I can't reproduce the errant behavior on my setup (probably because I'm not doing I2C in an ISR), I did make the changes suggested by @projectgus (but I'm still using semaphores for now). If you try to take the semaphore before the Give branch bugfix_i2c_crash a try and let me know if it works for you - if it does, I'll integrate that change as well as a small LED timing fix (better timing compliance to the spec for the WS2812B, though I suspect it won't have any visible effect whatsoever). |
Thanks for the attempt. Sadly I get the same result:. Decoding 9 results Then I go back to the adafruit bit bang, and the I2C crash goes away |
Attached is a greatly reduced example that demonstrates the problem. If The crash is slightly different now, though strategic placement of extra superfluous blocks of code can change the characteristics of the error slightly it's always connected to i2cWrite() in the hal.
At this point it's either a compiler bug, a code bug, or an interaction with system semaphores. @marcmerlin If you get a moment, build this code as-is and see if it causes a crash for you too. It should. Then remove the |
@MartyMacGyver good job taking the code way down still. Yes, I can confirm it crashes for me still. |
I kept sprinkling in I decided to create a crashing and non-crashing build and disassemble each with Here's what I changed, along with the relevant dump: Version 1 (latest version from earlier upload, crashes)Relevant code:
} Disassembly of
Version 2 (slight code change, doesn't crash)Relevant Code:
Disassembly of
That said, most of the disassembly is different. A lot is due to small shifts in the addresses of things, as would be expected. But as we see above, a small change has a surprisingly big effect, though that's the nature of compilers and optimizers... Regardless, somewhere in all of this something is going very wrong and I'd really like to know what it is. |
Thanks, 2 thoughts:
|
By the way, you should probably hang in https://gitter.im/espressif/arduino-esp32 so that we can @ you on relevant stuff :) |
I'll give the no-optimization thing a try again, just to see if it's more informative. I'd like to reduce this further, get rid of the redundant unused functions... but it doesn't want to break then. I'm in the channel on gitter now, but responses there may be a little delayed. |
Right, I think you can leave the other code around though if you're diffing the assembly output. If you can reduce the crash vs no crash to a simple y = y +1 being commented out, or not, that should give a much smaller diff hopefully. |
Well, I'm also trying to tighten it up some. It's not clear to me exactly why dead code is having any effect, but obviously it is. I'll post more info later, but (optimization unchanged for now) after making those constrain() changes and trimming things a bit I still get the same xQueueGenericReceive() crash stemming from i2c. But... rgbLedFade() (renamed to asubfoo4() now - renaming had no effect, just cleaning up) certainly changes things if you remove it. It crashes... in the UART hal.
That's new - and that suggests that this may not be about I2C after all. Investigating... |
So I used -O0 again instead of -Os. While it makes the objdump output more readable (along with the "-d -s -S" option set) it's still not clear why any of this should affect things. One observation though: "asubfoo1()" gets converted to code. None of the others unused functions do that.... so that's weird. The .ino is attached (notations within assume -Os, the "remove one line and it crashes differently" thing doesn't seem to occur with -O0 - it crashes just the same as with that extra one line. Remove the block and it stops crashing. Actually, one other observation - the UART and I2C code is later in the elf file than the setup and loop code. If something is stomping on memory or otherwise leading it to get corrupted, this could be important. One last thing: all things being equal, if you remove the "Wire.h" import it doesn't crash (at least, I'm not yet seeing it crash in that config). But clearly I'm never calling anything from Wire.h.... so it's back to being either my code or some compiler/linking issue. |
I removed all reference to semaphores in my code and it's all still crashing on the UART hal (but only when the optimization option is |
Reverted semaphore removal mostly. If I comment out esp_intr_alloc() (and xSemaphoreTake() since it will pause forever otherwise) this doesn't crash no matter -O0 or -Os. Is there a problem with |
@projectgus , @me-no-dev Do you have any thoughts on this? Similarly random problem in #7 and I'm beginning to wonder if it's interrupt-related rather than semaphores, given the tests I've done so far, but I have no idea what might be going on. Is there any possibility the RMT has a problem in itself, something like the "random data loss" bug in the errata that requires changing the base address of the registers? |
@projectgus The buffer won't be freed until after the ISR is done with it. ws2812_setColors will stop and wait at xSemaphoreTake until the semaphore is released which doesn't happen until after all of the data has been sent to the RMT. |
Thank you for the comment (and the original code) @FozzTexx! Even ripping out most of the semaphore stuff didn't change the problem, but it's good to know the code is solid (but if it ends up needing the changes @projectgus described or similar, I'll get on that. I doubt this is the cause of the problem though - I suspect it's all triggering some latent core defect.) And as we discussed elsewhere, this problem appeared before for you, then disappeared. So it's probably not any of the code I added/changed either. Could be there's some weird bug in the code, but it may well be symptomatic of something in the SDK/chipset. One new thought, especially in light of the I2C thing recently fixed on the Arduino side, is that maybe some critical part of Arduino-ESP32 is out of sync versus esp-idf, leading to this. That all said, I still wonder if there's something wonky with interrupts on the ESP32 in multiplicity, or if it's a manifestation of a chipset bug known or unknown (like the base register thing). I await Espressif's input here. |
@FozzTexx Thanks, this is good to know. I apologise for only glancing at the code before I first replied. The check for "&& ws2812_sem", and setting ws2812_sem to null after freeing it is what made me suspicious that this interrupt may happen multiple times. However I see now that the whole series of events is intended to happen within the context of a single ws2812_setColors() call, which makes this a non-issue and maybe the null-check & null are unnecessary. It may still be worth trying an Another potential weirdness that just occured to me may be because Arduino ESP32 HAL functions don't use the esp_intr_alloc() interrupt allocation API yet. Which means that the same interrupt number that is assigned dynamically via esp_intr_alloc() may also be one which is assigned statically to a peripheral in the Arduino HAL. This may be worth checking on, although I'd expect the RMT functionality to never work at all if this was the case.
Everything you're describing (LoadStoreErrors randomly coming and going depending on moving things around or on the compiler optimisation level) sounds to me like a memory corruption bug. Something is writing to memory in a place that it shouldn't, sometimes this matters and sometimes it doesn't depending on combinations of execution flow, timing and how things are laid out in memory by the compiler, linker, and runtime order. Specifically, the LoadStoreErrors with the low EXCVADDR (0x38) that @marcmerlin describes above are an indication of a null pointer dereference - something is pointing to NULL which shouldn't, and the calling code is dereferencing the null pointer as it tries to read a field of a pointed-to-structure at offset 0x38. I would love to look at this more closely but I haven't had time yet, sorry. If you have any strong indication that it's an IDF-specific bug then I can prioritise it in my work schedule, but at the moment that seems like an open question. Angus |
@projectgus I appreciate the feedback - it would help if the compiler didn't include random bits of dead code as it's doing (further muddying the isue) - but yes, it does sort-of seem like a pointer problem somewhere. It'll be nice if it's in my code here, but if not then it could be anywhere. However... if that was the case I'd expect the crashes to be far more random in nature: deterministic perhaps for a given build, but random in how they manifest. Here, the kind of crash is always the same even if it's in different subsystems. I2C crashes on write... UART crashes on write. It's fully repeatable but it's always something that's dealing with interrupts or mutexes, as if an interrupt is returning to the wrong place or interrupts are interfering with each other.. If interrupts are clobbering each other, that would explain one helluva lot. It was unclear to me how many interrupts - UART, I2C, RMT, timers, etc. - can actually co-exist per the specs... but it's probably time to bring Arduino-ESP32 into parity with esp-idf in this important regard. |
@projectgus note that when I use the RMT code, it also seems to break analog read by shifting values by 2x as per #7 but then, sadly, if I try this in a short constrained example, analogread continues to work ok after I enable RMT. |
@MartyMacGyver were the replies from FozzTexx and projectgus helpful to you, or you're still stuck with my example code that crashes when I2C is called? |
I was busy with other things this week but hope to look at this and more tonight and tomorrow. If it's the interrupts clobbering each other then I'm not sure yet what the next step should be. And of course it's not really an I2C issue at this point... |
@marcmerlin I see the debug functionality in my code wasn't working, so I've been fixing that tonight. It leads me to wonder if this is some level of Extern Hell wherein the externs necessary to make this thing work with C++ are causing issues. Probably not but who knows? If I had the hardware and the ESP32 handling for it was reliable, JTAG debugging would be an option. @projectgus As I search the thread for EXCVADDR with a value of 0x38 I'm not coming up with anything in the original dumps. It's not clear how much that would tell me, given that this bug is wandering anyway, but I'm curious where you saw that particular value. Edit: I refactored away the constant malloc/free stuff, and made sure to only call Edit2: Actually, even that didn't help... one thing that catches my eye is the use of |
Sorry, I miscopied - it was 0x18 not 38 in your post here. It's not particularly useful except that these very low EXCVADDR values for LoadStoreError almost always indicate a null pointer dereference - it's saying "load the field at PTR + 0x18" and PTR is unexpectedly zero. |
@projectgus Thanks for the clarification. It's still not clear what the hell is going on or how this is walking on memory or the stack or whatever it's doing to cause this weird behavior. The current code is attached - I've enabled more debugging messages to help trace my way through the code. As it is, it crashes in
|
@MartyMacGyver I'm not able to reproduce this crash. With current arduino-esp32 and the code you attached above, I get this output: https://gist.githubusercontent.com/projectgus/fc3c595b79b3363a62e35a97fa6b57c5/raw/dbf215084a12fe2f602bf683d4a5def6c3430291/rmt_output.log Can you confirm the code is the same? If so, can you please attach the ELF & BIN files from the Arduino build directory, that correspond to the crashing code? |
Attaching the code (the same as in the comment above) as well as the bin and elf. It crashes here. |
I get the same crash you're reporting when building with arduino-esp32 commit 90322ae7, but not with current master (2b075f3204). After poking around a bit more I don't think this is coincidence, as a fix was just merged in IDF where the CPU1 ISR stack space was located incorrectly. Arduino sketches run on CPU1, and I think this is what was clobbering memory. Will confirm with @me-no-dev that this fix is in the latest arduino-esp32. |
So, I just upgraded to arduino-esp32 master, and the crash is gone for me too. |
Yesterday there were just a few commits since my last update, none related. Today... lots more. I'll update and give it all a try later tonight. I'll be a happy camper if this is working now. :-) |
Fingers crossed, but the IDF update (espressif/arduino-esp32@8032231) appears to have resolved this! Thanks to @marcmerlin for reporting this, @projectgus and @me-no-dev for help debugging this, and @me-no-dev for updating Arduino-ESP32. |
Thanks for the code, it works great, except that as soon as I use it on my project, talking to I2C causes a crash:
0x400d65ee: i2cRead at /home/merlin/Arduino/hardware/espressif/esp32/cores/esp32/esp32-hal-i2c.c line 379
0x400d3e3e: TwoWire::requestFrom(unsigned char, unsigned int, bool) at /home/merlin/Arduino/hardware/espressif/esp32/libraries/Wire/src/Wire.cpp line 143
0x400d3e98: TwoWire::requestFrom(int, int) at /home/merlin/Arduino/hardware/espressif/esp32/libraries/Wire/src/Wire.cpp line 143
0x400d44e6: IoTuz::i2cexp_read() at /home/merlin/Arduino/libraries/IoTuz/IoTuz.cpp line 130
If I disable your code, then my I2C read which has been working for over a month, works again.
Are you using the same part of the chip, or is there some kind of incompatibility between using RMT and I2C?
The text was updated successfully, but these errors were encountered: