-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken jpegs - fixed ??? - 1.0.5rc6, config.xclk_freq_hz = 20000000, ov2640 and ov5640 jpeg i2s problem #244
Comments
Having studied a couple examples, it seems there is 1 bit wrong, and the decoder cannot find the end-of-block on one table of one MCU, which swallows up the entire next MCU, and then the relative colors are all wrong when you finally get synchronized again. And there are no resets used in this camera, so the rest of the picture is bad. I wonder if there is code to "jpeg decode check" rather than full decode? https://github.com/ImpulseAdventure/JPEGsnoop/blob/master/source/ImgDecode.cpp |
I tried this code - removed the parts to decompress the jpeg image -- but it still took about 150 ms to parse a 1280x720 jpeg image to check it was perfectly valid. So that is not going to work at 20+ frames per second. Although on a time-lapse, where a bad image is a a bigger problem, it might be useful. https://github.com/lvgl/lv_lib_split_jpg/blob/master/tjpgd.c |
So running the original code to search for the 64 byte patterns, which takes about 8 ms, as well as the code to parse the entire jpeg, which takes the 150 ms for a regular hd from a 5640 camera, .... after many hours of running both at about 5 fps, in a static non-complex scene, I have not got any jpeg with more than 40 of the 64 elements (my guess at a bit error that destroys the jpeg after swallowing the next mcu), but still many of the 64 bytes pattern problems, at maybe 0.05 %,. The camera must be creating that data with the extra 64 bytes. ???. Although I am now available for jpeg internals consulting. 😄 |
James, |
I noticed this broken jpeg problem getting worse in bright sun with complex scene - sometimes 10% of jpegs where broken using FRAMESIZE_QSXGA and quality 12, with the buffers set for 6, which should give 983,040 bytes for the picture ... but it would hit the FB_GET_TIMEOUT of 4 seconds I think. The jpeg sizes would get to 600,000, but I didn't see anything near 900,000. But many would have a bit or 2 wrong that would break the jpeg decode. As soon as I pulled it out of the sunny window, the frames would start working again. Setting the buffers to quality 5, meaning I had to use non-continuous (count=1), which should give 2,457,600 bytes per image, and lowering the quality down to 20, that cut the average jpeg size, and things worked much better. But I noticed this strange effect - the jpeg sizes are quite consistent (logical as the scene was not changing much), but the time to transfer the image in non-continuous mode varied widely. This is 1800 frames - one per second. Maybe the camera is getting too hot - it is grinding away at these big images - and sitting in bright sun indoors. |
James, My problem is similar - heat does not go well with the OV2640 module. I was taking CIF images at 20FPS with my OV2640 in a plastic enclosure, and the data would come back corrupted or not at all. I ended up using a small piece of metal on the back of my camera module The time to get frames unfortunately does vary wildly. I have found this to be the case in both continuous and non-continuous mode, with CIF resolution and a low quality of resolution (20FPS) I have a solution which might help reduce your image corruption. sdmmc_read_sector and sdmmc_write_sector in sdmmc_cmd are poorly written - each time you write to / read from SD, it allocates / deallocated 512 bytes of memory, over and over and over again. The line
Always runs with a block size of 512. What I do is simply run malloc once, permanently allocating a buffer of 512 bytes. I do this for both sdmmc_read_sector and sdmmc_write_sector (two separate buffers). I think this may have lowered the amount of image corruption I receive. The top of my sdmmc_cmd is now
|
Hi, that is interesting. I don't think I use the sdmmc, but rather the more basic calls (I think) fseek and fwrite. |
fwrite calls sdmmc under the hood, so you are likely executing huge numbers of 512 byte writes, allocating and then deallocating. You are right in transferring to RAM as a way stop between the SD card and PSRAM, since DMA can speed data transfer between SD and RAM or PSRAM and RAM, but not directly between PSRAM and SD card. The glitchyness of the camera is a definite pain. I've noticed that glitches increase as duration of video recording increases, regardless of the camera module heatsink, but that might be another continuation of the heat issue. One thing I might try is really digging into the camera driver and seeing whether the thing is constantly allocating/deallocating memory for frames. It might speed the process up if space for the frame was allocated once throughout program execution. Just a brainstorm, I need to look at the camera driver again. |
I experimented a bit trying to find a way to keep the ov2640 cool, but neither longer cables and mounting the cam to a heat sink nor active cooling helped with any of the symptoms. I bet the problem is completely on the driver side. |
So you would make the change in this code which would replace with the modified version of the sd_mmc package??? I'll give it a try. |
James, did changing the sdmmc file produce any improved results? Congrats on getting the error rate down to less than ten flaws with regular recording per 30 minutes on HD at such a high framerate. Is this while devoting all resources to recording video (both cores, DMA, etc)? What do you mean by avoid the middle of the PSRAM, and how would that be implemented? In addition, I was looking through your code for the ESP32CAM junior. One thing I noticed was that your code for another_save_avi utilizes an array called framebuffer_static, which takes around 64KB of global memory. Instead of copying the result of esp_camera_get() to this frame, it might make sense to simply directly edit the frame at the pointer given by esp_camera_get(). Should get a speedup that way, as well as saving 64K of heap memory. At the moment I'm dealing with the broken jpegs serverside |
Howdy, I have not got to that yet. I was making the junior version faster by adding mutexes and studying exactly how long things are taking. I think a V10 sd card, can keep up the camera on HD recording on both the ov2640 (12 fps) and ov5640 (25 fps) cameras, while just using core 1 for camera and sd, and core 0 is free for wifi, and a streaming task. Avoid the middle of PSRAM is that advice from schaggo in the issue #249 PIXFORMAT_RAW support missing? ... , where he said he observed jpeg errors if the jpeg crossed the mid-point of the psram, so I started checking the address of the framebuffers in psram to see where they compared to the middle at 3FA0 0000. I had been allocating giant framesize / quality from old problems with the jpeg exceeding the buffers, but my simple solution is just allocated HD, 3 buffers, medium quality, which takes much less than half the psram. Not sure if it improved things or not. Long run my plan was just to load a jpeg into the midpoint buffer, and just leave it there, so I'm using buffers in the top and bottom halves, but not crossing the middle. The 64K static ram was an attempt to solve the psram -> sd slowness problem. As 1.05 needs more ram than 1.04, I think I have reduced this to 4k or 8k, which is just as good. I think there is advice to write the sd with 32kb writes for speed, but if sdmmc breaks them into 512 byte blocks, that 32kb objective is not achieved. But am I correct that running through the full jpeg accessing each byte from from psram would be much slower than copying 4k chucks over to sram and scanning it there. |
Hey, I am loosely following this and other threads and am stoked to see what you guys are coming up with.
I don't know what of this is still relevant in the latest driver, but especially the last point would be where I'd pick up the project again in a couple of months. |
Howdy, the 10MHz/20MHz versus others is interesting. In the 1.04 version 10MHz got you fast performance (with a clock divider or something), but 1.05 is now 20MHz. The wire is only half an inch long, but dirty connectors or electrical issues such might cause problems at that speed??? I'll give that a try. The midpoint bug I thought is plausible as well, with some memory mapping idea I read about somewhere ... but so far I am still getting some broken jpegs while avoiding the midpoint. Not rigorously measured. The sdmmc alloc/de-alloc issue I thought might explain the 64bytes of patterned data that started this thread. And there is the old ffd9 bug that the driver needs a 0 after the ffd9 in certain cases which may confuse the camera who thought it was done transmitting -- that again needs to re-complile the camera software, which opens a can or worms with the i2s system I think (not spi as I said in the title!). It is all very low error rates, so it doesn't point to a clear bug somewhere, but occasional heat/electrical issues. I have a "sense" that my videos done in 1.04 code are better quality with fewer broken jpegs. But those cameras are setup in comfortable surroundings with ov2640 camera, while some of the very bad videos were done with the higher current (hotter) ov5460 camera in the hot sunshine. In the last few days I have been testing outside in the zero celsius which might improve things with the cooling. Thanks for the info 😄 |
Anybody have an IEEE account? https://ieeexplore.ieee.org/document/664106 |
I don't have an IEEE account, unfortunately. Out of curiosity, James, do you primarily code in ESPIDF or with Arduino? The error rate for videos does seem to be determined by external factors. If a scene is complicated (or noisy, like with high gain ceilings such as GAINCEILING_16X), corrupt JPEGs will occur more frequently. If the temperature is hot, corrupt JPEGs will occur. I don't think it is a frame size issue - tiny little CIF JPEGs can suffer from corruption (15fps in my case, while uploading and writing to SD simultaneously). I don't think corruption is a RAM vs PSRAM issue. I lowered my resolution to CIF, and allocated all frame buffers in RAM instead of PSRAM. Still ended up with corruption. To be fair, I am simultaneously transmitting via WiFi. A hardware engineer I spoke with mentioned that it could be because of WiFi transmissions causing interference with the CMOS sensor. Apperently CMOS sensors are vulnerable to RF interference. Schaggo,
I need to dig into the OV2640 documentation to figure out exactly what setting this register does. |
All of the phenomena also occur on ov3660s - if that helps |
Hi, I got the article from a teenager with a powerful library card! Haven't read it yet. I'm using Arduino mostly - I've tried the PlatformIO and ESPIDF, but haven't switched yet. I think it would be nice to find the problem, but an acceptable alternative would be an easy way to find and dispose of the bad jpegs. I was trying full-HD at 2 fps, bright sun, and 0 degrees C, and got a 30 minute video with only 1 error. But the interesting thing was that the error happened in a jpeg that hit the 64 units of the huffman block, which I had been checking for in the ESP32, and discarding those frames (assuming the bit error caused us to miss the EOB if we exceed 60). I might have a bug in that code, or the error happened after the check. The jpeg travels from psram -> sram for the jpeg check, and then again for the SD write. So that points to the SD writer system. Another oddity of that experiment was that I was using an old dollarstore circle 10, 16 GB sd card, while normally I'm using a 64GB V30 card. So the slower write speed might be avoiding errors in the ram -> sd process. I also spent some time 5640 registers and code - haven't got into the clockspeed business, but I thought I might be able to turn on "jpeg restart" but it doesn't seem to be an option. I thought "Scalado mode" might be something - but it didn't work. |
James, I strongly recommend you switch to using ESPIDF. I started with Arduino, and ended up making the change when I realized I couldn't accomplish what I needed without access to menuconfig. The difference in power is striking - mostly because of menuconfig, but also because of the ability to painlessly make edits to the drivers. The debugging cycle is also slightly faster. Don't get me wrong - switching is a pain. A lot of the arduino functions do not work with ESPIDF (especially as relates to Wifi). When I record video, it starts with a low error rate. It then picks up as the camera heats up. Eventually, esp_get_frame fails to return, and I simply restart the ESP32 - allowing the camera to cool down and continue the cycle. To be fair, I am operating it at around 20C. I'm working on having a custom heat-sink made for the OV2640 module of the ESP32-CAM with - if it works, I could potentially shoot one your way. Unfortunately, I need to learn more about image compression - Huffman blocks and whatnot. One way of testing your theory would be to saving the image to SD, and then run it though your bad filter jpeg algorithm a second time, seeing if it is caught. My bet is that it a code issue with the bad jpeg filter algorithm, or corrupted images somehow having valid huffman blocks. I use Lexar 32 / 16 GB SDs, and they work well for me. The ESP32 is 32bit, so can not use SD storage above 16/32GB (I forget which). My guess is that would explain why your 64GB card is not performing as well as cheap 16GB card. |
So, working my theory that the error occurred on the ram -> sd journey, I tried writing to the sd twice - once as a avi file, and once as a mjpeg file, and after it had already passed the test that it was not corrupt by the tjpgd.c huffman analysis missing EOB theory. Then I observed a bad frame - the identical bad frame in the avi and the mjpeg - so it made one journey psram - > ram for the tjpgd.c analysis and passed, then another journey psram -> ram, and two journeys ram -> sd, which came out identical, and wrong. And then I selected that bad jpeg and resubmitted it to tjpgd.c, and it failed. So the first psram -> ram for tjpgd.c was good, but the second psram->ram introduced the error. So the psram->ram is the culprit. So now I'm running a program to catch bad-frames from a continuous stream. A regular HD frame takes about 150ms to parse, so thats about 8fps - just a little slower than the ov2640 camera can produce on 1.05. And if you are doing a timelapse, or a single-frame application, the 150ms would be worth not having the occasional broken jpeg. And no post-processing when playing the video. The only problem is that you cannot get a large jpeg into ram, check it, and move to sd, so you would have to merge the tjpgd.c check and sd-write, and abort after part was already written to the sd, and start over. It finds all the bad-frames I have studied so far. Unable to find any bad-frames indoors so need sunshine and heat to test it. So its a vast amount of computing to find the 1/1000 error, but what else has the esp32 got to do? |
James, This is very cool - it is interesting that the PSRAM to RAM might be causing the error. Something fishy is going on with PSRAM... This thread indicates as much. The issue is fixed in hardware by revision 3, but the ESP32 chip on the ESP32Cam is revision one. You will see a post stating - "That said, for new designs it is recommended to use ESP32 silicon revision 3 as it fixes the PSRAM cache issue in hardware." Apparently half of the PSRAM can be used by core one, and the other half by core two - or something like that. My background in the lower level of design is unfortunately limited. However, I will spend some time on this - hopefully I will have an update for you. My application is a multicore stress test - simultaneous record to SD (15FPS CIF), 250KByte wifi transmission and warm temperatures. Also, you are correct in routing the frames from PSRAM through RAM before hitting the SD card - PSRAM <-> RAM or SD Card <-> RAM is done by fast DMA under the hood, but DMA can not be used for SD CARD <-> PSRAM, unfortunately. At the time was experimenting with simply storing the frame buffers in RAM (CIF resolution makes this somewhat reasonable), and had blanked on PSRAM usually being used to store frame buffers |
That psram thread is interesting. And talks about this issue with the upper and lower 2MB banks. And igrr is saying Nov 15, 2020 that corrections are being added to ESP-IDF 3.3, which I believe is the root of arduino-esp32 1.0.5. I don't understand the problem or correction, but maybe it didn't make it into 1.0.5 ??? Or maybe this is a genuine data error -- not sure what the expectation of 1-bit errors should be in the camera->psram->ram sequence. My new idea is to copy a jpeg from psram -> ram (in blocks), do a checksum, then copy it again in blocks, and start the SD writer application, while re-doing the checksum, and if the checksums match at the end, then we likely have a good jpeg, or if not, then abandon that jpeg, and move the file pointer back to the start of this frame. It will not work for streaming, as you cannot backup there. The psram->ram only takes 20-30 micro-seconds or so, so that will not slow things down, and the 200-400 milli-seconds of jpeg decoding and checking might be unnecessary. Or it might catch errors from the camera, or from the camera->psram transfer. So I'll give that a try. |
Im using 64g & 128gb cards (not tested 256) |
The only problem is formatting them fat32 -- its all the fault of Dave from Dave's Garage youtube channel who wrote windows format. 😄 |
That 30 microseconds remark above was wrong -- that is the time to get the address of a frame that is already in the psram. The simplified jpeg decoder "eob checker" takes about 100-150ms on regular hd. Cannot find any failures between multiple checksums with indoor light and heat. |
@jameszah If the checksums before PSRAM processing match those which occur after PSRAM processing, it is likely that PSRAM corruption is not your issue. I ended up solving the vast majority of my corruption issues with the clock speed adjustment recommended by @Schaggo Interesting that performing a checksum on an image takes as long in SRAM as in PSRAM. I suppose it does need to be done by CPU either way, but that still does not quite explain the equality in run-time. Memcopy is written in ASM, which is beyond my skillset - would need to look at the register definitions and schematic and whatnot I suppose your camera example could be sped up by using DMA instead of memcopy in a few cases. An example of implementing DMA can be found in the camera driver for the ESP32Cam, or in other locations. I believe that you would have to mess with interrupts. The ESP32Cam only has two DMA channels and 2 CPU cores, so it may not make sense if DMA is already performing two tasks @lunadm What framesize allocation raised frame-rate most effectively, and with what frame resolution being used to record? |
Hi James I found I had to manage the allocations so I could read the sd cards on both win and Mac., I’ve recently tried the 5640 in hd but I was only getting about 12fps (outside sunshine) and the extra work reduced the battery life. |
One thing I have noticed with 1.0.5 - (changed xclk and fbcount) When the time is set / or timeouts the captive portal closes and wifi is switched off - but I noticed a reduction in framerates - |
So with the wifi shut off completely (never started), I get about 4 hours of ov5640 regular hd at 14fps, and ov2640 regular hd at 9fps, and xclk_freq_hz = 16500000 (I meant to change that back to 200...), and not a bad frame in sight. I wonder if this could be the problem: |
So after another 10 or 12 hours of full speed, bright outdoor recording at 1280x720 ~14fps, with xclk switched back to 20000000, generating about 5GB files per hour, with the 1.05 software, ... I cannot find any broken jpegs. I boot the esp32, start the wifi and get the time, then a simple WiFi.disconnect(); and everything is fine. That NeoPixel issue with the "bitbanging" is not exactly relevant as I think the i2s for the camera, and spi for the psram and the sd card -- so that would not be bitbanging. The WiFI events can be sent to either core -- I assumed that was just the events the "user" wants to handle, rather the the overhead type events/packets that the user never sees. espressif/arduino-esp32#4762 (comment) The the frequency of the problem might the some combination of that xclk speed and the jpeg size, that is interrupted by the wifi events that disturb things enough to create an error in the jpeg - I guess in the camera->ram transfer. So my conclusion is the WiFi is disturbing things -- even if I am not actively using the wifi, but it is just sitting there waiting for events. Not a solution if you are primarily using streaming, or broadcasting events, but you are primarily recording to sd, that would avoid the problem. |
Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled. Maybe that said it all in the intro to esp32-camera. The wifi must step on i2s at times. |
|
I find the simple WiFi.disconnect(); will solve the problem, without deinit(). I had a theory on the bmp vs jpeg, and also the 1.04 vs 1.05. The bmp was sending much more data, so would hit the problem more often (wifi vs i2s), and 1.05 seems to run the the camera more efficiently so you can get 12.5 fps on uxga where it used to be 6.5 I think -- so again, more data, and more chances for the wifi vs i2s problem. |
Another strange observation - using 1.06 with UXGA quality 15 and fb_count = 1, ... almost every jpeg is broken with wifi on or off, and with the buffers set up with quality=5, so the buffers are enormous (900kb ?) and can handle any frame of the ov2640. So maybe an opportunity to search for the problem. |
Another possibility: I ran into this post https://www.esp32.com/viewtopic.php?t=15193#p62169 looking for some other issue. It says that the solution is set the wifi modem to power save WIFI_PS_NONE, from the default of power save WIFI_PS_MIN_MODEM. It seems to work on initial tests. 😄 It also ramps up the speed of the wifi. My esp32's always showed 6000kps on my phone company router, but after switching to WIFI_PS_NONE, it bounces around, but gets up to 72222kbps ... which is the speed that many 2.4G devices operating on my router -- just one old laptop is 2.4G and 130000kps. So the hopeful thinking is that the modem is half asleep, it wakes up and causes a disturbance in the i2s, which disrupts the i2s interface to the camera. I assume it burns more power - so maybe not for battery powered systems. There could be heat issues too. So this code after the normal wifi setup does it:
|
Serial.printf("Set power save to %d\n", WIFI_PS_NONE); espressif/esp32-camera#244 (comment)
Hi, trying to sort out some glitches in in the esp32 video recorder. Using 1.0.5-rc6, and xclk 20000000, with both the ov2640 and ov540 cameras, I occasionally get a jump or blotch on the video, and when you track down the frame, you get something like attached below, good frame and the next jpeg is damaged. The headers and the end-of-image ffd9 are in the correct place, but there is a flaw in the start-of-scan --> end-of-image zone. These are two consecutive jpegs of a 30 min avi containing about 40,000 jpegs at about 20-25 frames per second totaling 1.5 GB to 2 GB of data-- these were svga from ov2640, but I have similar examples from ov5640 at regular hd at about 22 frames per second.
I was looking for a way to find these bad frames, and noticed that in the start-of-scan --> end-of-image zone, there are these blocks of 64 bytes of patterned data, where there should be these variable size huffman codes, etc., If there is a flaw in the spi data, with a bit wrong, then the jpeg decoder could be thrown off, but this 64 bytes of bad data may suggest that the sender or receiver of the spi is getting it wrong.
I tried searching for these patterns to drop those frames with this bit of code, and got about 0.03% to 0.3% of frames dropped, but still had a few problems. It basically looks through the latter half of the frame, to see if there is a repeating series of 4 bytes patterns, and then marks that frame as bad, where x is the frame length
Wondering if anyone has any advice? Maybe slow down the xclk a little? Or is that block of 64 bytes normal somehow?
Is there some other way to validate this start-of-scan --> end-of-image zone of a jpeg? It seems very unstructured, and 2 GB or data travelling over an SPI could have a bit wrong here and there, that could slip through, but how do you find it?
The good news is that the esp32 can do this processing with no slowdown in the camera/sd speed. 😄
The text was updated successfully, but these errors were encountered: