Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken jpegs - fixed ??? - 1.0.5rc6, config.xclk_freq_hz = 20000000, ov2640 and ov5640 jpeg i2s problem #244

Closed
jameszah opened this issue Feb 10, 2021 · 38 comments

Comments

@jameszah
Copy link

Hi, trying to sort out some glitches in in the esp32 video recorder. Using 1.0.5-rc6, and xclk 20000000, with both the ov2640 and ov540 cameras, I occasionally get a jump or blotch on the video, and when you track down the frame, you get something like attached below, good frame and the next jpeg is damaged. The headers and the end-of-image ffd9 are in the correct place, but there is a flaw in the start-of-scan --> end-of-image zone. These are two consecutive jpegs of a 30 min avi containing about 40,000 jpegs at about 20-25 frames per second totaling 1.5 GB to 2 GB of data-- these were svga from ov2640, but I have similar examples from ov5640 at regular hd at about 22 frames per second.

I was looking for a way to find these bad frames, and noticed that in the start-of-scan --> end-of-image zone, there are these blocks of 64 bytes of patterned data, where there should be these variable size huffman codes, etc., If there is a flaw in the spi data, with a bit wrong, then the jpeg decoder could be thrown off, but this 64 bytes of bad data may suggest that the sender or receiver of the spi is getting it wrong.

I tried searching for these patterns to drop those frames with this bit of code, and got about 0.03% to 0.3% of frames dropped, but still had a few problems. It basically looks through the latter half of the frame, to see if there is a repeating series of 4 bytes patterns, and then marks that frame as bad, where x is the frame length

Wondering if anyone has any advice? Maybe slow down the xclk a little? Or is that block of 64 bytes normal somehow?
Is there some other way to validate this start-of-scan --> end-of-image zone of a jpeg? It seems very unstructured, and 2 GB or data travelling over an SPI could have a bit wrong here and there, that could slip through, but how do you find it?

The good news is that the esp32 can do this processing with no slowdown in the camera/sd speed. 😄

    for (int j = x * 0.5 ; j < x * 0.9; j++) {
      if (fb->buf[j] == fb->buf[j + 4]) {
        if (fb->buf[j] == fb->buf[j + 8]) {
          if ( fb->buf[j + 1] == fb->buf[j + 5] && fb->buf[j + 1] == fb->buf[j + 9]) {
            if ( fb->buf[j + 2] == fb->buf[j + 6] && fb->buf[j + 2] == fb->buf[j + 10]) {
              Serial.printf("Bad omen at frame %d, byte %d\n", frame_cnt, j);
              bad_frame = 1;
              break;
            }
          }
        }
      }
    }

image

5951
5952

@jameszah
Copy link
Author

Having studied a couple examples, it seems there is 1 bit wrong, and the decoder cannot find the end-of-block on one table of one MCU, which swallows up the entire next MCU, and then the relative colors are all wrong when you finally get synchronized again. And there are no resets used in this camera, so the rest of the picture is bad.
So my idea is to race through the jpeg and look for tables with a full 64 elements, and assume the picture is bad. Or continue to the end of the jpeg, and count the total number of MCU, which should be wrong as well. And then drop that jpeg.
It is not a full decode, as you don't need to do the floating point math, just count the blocks.

I wonder if there is code to "jpeg decode check" rather than full decode?

https://github.com/ImpulseAdventure/JPEGsnoop/blob/master/source/ImgDecode.cpp
https://github.com/espressif/esp32-camera/blob/master/conversions/jpge.cpp

@jameszah
Copy link
Author

jameszah commented Feb 15, 2021

I tried this code - removed the parts to decompress the jpeg image -- but it still took about 150 ms to parse a 1280x720 jpeg image to check it was perfectly valid. So that is not going to work at 20+ frames per second. ☹️

Although on a time-lapse, where a bad image is a a bigger problem, it might be useful.

https://github.com/lvgl/lv_lib_split_jpg/blob/master/tjpgd.c

@jameszah
Copy link
Author

So running the original code to search for the 64 byte patterns, which takes about 8 ms, as well as the code to parse the entire jpeg, which takes the 150 ms for a regular hd from a 5640 camera, .... after many hours of running both at about 5 fps, in a static non-complex scene, I have not got any jpeg with more than 40 of the 64 elements (my guess at a bit error that destroys the jpeg after swallowing the next mcu), but still many of the 64 bytes pattern problems, at maybe 0.05 %,.

The camera must be creating that data with the extra 64 bytes. ???.
Maybe need complex outdoor scenes to stress the jpeg compressor, to create a bad jpeg.
Haven't worked through the jpeg parsing of one of those mcu's containing the 64 patterned bytes above.

Although I am now available for jpeg internals consulting. 😄

@SWillSZ
Copy link

SWillSZ commented Feb 27, 2021

James,
Your work on the ESP32CAM is pretty incredible. I also appreciate your passion for sticking with the project of recording & uploading video for so long. I am working on a similar problem myself - luckily, the ESP32 has two cores and DMA and such to work with. Getting everything to work together well is the tough part. A quick note which might help you on this is that OpenCV can process video with a few bad frames with no issue. It might make sense for you to perform post processing serverside, and ignore the very few bad frames using OpenCV (while loop combined with cap.read()). That being said, those bad frames are irritating when you expect clean video.

@jameszah
Copy link
Author

I noticed this broken jpeg problem getting worse in bright sun with complex scene - sometimes 10% of jpegs where broken using FRAMESIZE_QSXGA and quality 12, with the buffers set for 6, which should give 983,040 bytes for the picture ... but it would hit the FB_GET_TIMEOUT of 4 seconds I think. The jpeg sizes would get to 600,000, but I didn't see anything near 900,000. But many would have a bit or 2 wrong that would break the jpeg decode.

As soon as I pulled it out of the sunny window, the frames would start working again. Setting the buffers to quality 5, meaning I had to use non-continuous (count=1), which should give 2,457,600 bytes per image, and lowering the quality down to 20, that cut the average jpeg size, and things worked much better.

But I noticed this strange effect - the jpeg sizes are quite consistent (logical as the scene was not changing much), but the time to transfer the image in non-continuous mode varied widely. This is 1800 frames - one per second.

Maybe the camera is getting too hot - it is grinding away at these big images - and sitting in bright sun indoors.

image

@SWillSZ
Copy link

SWillSZ commented Mar 1, 2021

James,

My problem is similar - heat does not go well with the OV2640 module. I was taking CIF images at 20FPS with my OV2640 in a plastic enclosure, and the data would come back corrupted or not at all. I ended up using a small piece of metal on the back of my camera module

The time to get frames unfortunately does vary wildly. I have found this to be the case in both continuous and non-continuous mode, with CIF resolution and a low quality of resolution (20FPS)

I have a solution which might help reduce your image corruption. sdmmc_read_sector and sdmmc_write_sector in sdmmc_cmd are poorly written - each time you write to / read from SD, it allocates / deallocated 512 bytes of memory, over and over and over again. The line

tmp_buf = heap_caps_malloc(block_size, MALLOC_CAP_DMA);

Always runs with a block size of 512. What I do is simply run malloc once, permanently allocating a buffer of 512 bytes. I do this for both sdmmc_read_sector and sdmmc_write_sector (two separate buffers). I think this may have lowered the amount of image corruption I receive.

The top of my sdmmc_cmd is now

#include "sdmmc_common.h"

static const char* TAG = "sdmmc_cmd";
void* tmp_buf_read;
void* tmp_buf_write;
bool tmp_buf_read_init = false;
bool tmp_buf_write_init = false;

@jameszah
Copy link
Author

jameszah commented Mar 2, 2021

Hi, that is interesting. I don't think I use the sdmmc, but rather the more basic calls (I think) fseek and fwrite.
I noticed a significant bottleneck when trying to write from psram -> sd using the fwrite, which I think went two-times over the spi bus, and I assumed it was 512 bytes blocks starting and stopping, slowing things down.
I switched to a memcpy from psram to regular ram of 32kb blocks, and then fwrite to sd with a 32k block. It dramatically increased the thoughput.
I'm not sure where the errors are coming from. Could be in the camera, or camera->esp32, or or esp32->sd, ..., there is lots of mild criticism of the esp32-cam system for glitching-ness, but it seems to be mostly excused as fine for the $10 camera-cpu combo, so just take another picture and it will be fine. But annoying in a video.
I'm leaning toward "crazy from the heat" problems in the camera lately. It works much better in the -10c outside, rather than in the +20c in the bright sun indoors.

@SWillSZ
Copy link

SWillSZ commented Mar 3, 2021

fwrite calls sdmmc under the hood, so you are likely executing huge numbers of 512 byte writes, allocating and then deallocating. You are right in transferring to RAM as a way stop between the SD card and PSRAM, since DMA can speed data transfer between SD and RAM or PSRAM and RAM, but not directly between PSRAM and SD card. The glitchyness of the camera is a definite pain. I've noticed that glitches increase as duration of video recording increases, regardless of the camera module heatsink, but that might be another continuation of the heat issue.

One thing I might try is really digging into the camera driver and seeing whether the thing is constantly allocating/deallocating memory for frames. It might speed the process up if space for the frame was allocated once throughout program execution. Just a brainstorm, I need to look at the camera driver again.

@Schaggo
Copy link

Schaggo commented Mar 3, 2021

I experimented a bit trying to find a way to keep the ov2640 cool, but neither longer cables and mounting the cam to a heat sink nor active cooling helped with any of the symptoms. I bet the problem is completely on the driver side.

@jameszah
Copy link
Author

jameszah commented Mar 3, 2021

tmp_buf = heap_caps_malloc(block_size, MALLOC_CAP_DMA);

So you would make the change in this code
https://github.com/espressif/esp-idf/blob/master/components/sdmmc/sdmmc_cmd.c

which would replace
C:\ArduinoPortable\arduino-1.8.13\portable\packages\esp32\hardware\esp32\1.0.5\tools\sdk\lib\libsdmmc.a

with the modified version of the sd_mmc package??? I'll give it a try.

@jameszah
Copy link
Author

jameszah commented Mar 3, 2021

So I'm getting less than 10 flaws in a 30 min video at 25 fps regular HD, about 45,000 jpegs, using the "avoid the middle of the psram" rule.
image
This looks like the culprit with the non-triangular table. Much easier for the "player" on the big computer to locate and skip these, than doing it in the esp32.
image

@jameszah
Copy link
Author

jameszah commented Mar 5, 2021

For the broken jpeg students out there ... this one missed the EOB, and started swallowing the next MCU ... I think you could drop any MCU with an EOB64.
image
image
image

@SWillSZ
Copy link

SWillSZ commented Mar 7, 2021

James, did changing the sdmmc file produce any improved results? Congrats on getting the error rate down to less than ten flaws with regular recording per 30 minutes on HD at such a high framerate. Is this while devoting all resources to recording video (both cores, DMA, etc)? What do you mean by avoid the middle of the PSRAM, and how would that be implemented?

In addition, I was looking through your code for the ESP32CAM junior. One thing I noticed was that your code for another_save_avi utilizes an array called framebuffer_static, which takes around 64KB of global memory. Instead of copying the result of esp_camera_get() to this frame, it might make sense to simply directly edit the frame at the pointer given by esp_camera_get(). Should get a speedup that way, as well as saving 64K of heap memory.

At the moment I'm dealing with the broken jpegs serverside

@jameszah
Copy link
Author

jameszah commented Mar 9, 2021

Howdy, I have not got to that yet. I was making the junior version faster by adding mutexes and studying exactly how long things are taking. I think a V10 sd card, can keep up the camera on HD recording on both the ov2640 (12 fps) and ov5640 (25 fps) cameras, while just using core 1 for camera and sd, and core 0 is free for wifi, and a streaming task.

Avoid the middle of PSRAM is that advice from schaggo in the issue #249 PIXFORMAT_RAW support missing? ... , where he said he observed jpeg errors if the jpeg crossed the mid-point of the psram, so I started checking the address of the framebuffers in psram to see where they compared to the middle at 3FA0 0000. I had been allocating giant framesize / quality from old problems with the jpeg exceeding the buffers, but my simple solution is just allocated HD, 3 buffers, medium quality, which takes much less than half the psram. Not sure if it improved things or not. Long run my plan was just to load a jpeg into the midpoint buffer, and just leave it there, so I'm using buffers in the top and bottom halves, but not crossing the middle.

The 64K static ram was an attempt to solve the psram -> sd slowness problem. As 1.05 needs more ram than 1.04, I think I have reduced this to 4k or 8k, which is just as good. I think there is advice to write the sd with 32kb writes for speed, but if sdmmc breaks them into 512 byte blocks, that 32kb objective is not achieved. But am I correct that running through the full jpeg accessing each byte from from psram would be much slower than copying 4k chucks over to sram and scanning it there.

@Schaggo
Copy link

Schaggo commented Mar 9, 2021

Hey, I am loosely following this and other threads and am stoked to see what you guys are coming up with.
To rule out that I might have given you bad advice to avoid the midpoint of PSRAM, I think it could be helpful to explain how I got there.
I was trying to achieve quality 0 at a max resolution (stills) for days until I started manipulating all dials that are available to us. Everything was tested on 72 cams simultaneously. I have taken literally thousands of images/hundreds of image sets to get to a point where stuff worked reliably.
Here are the things I found out:

  • If you don't care about fps setting the clock speed to an arbitrary value between 10MHz and 20Mhz improves reliability and max file size. Why ...? I have no idea. The weird thing is either 10MHz or 20MHz are causing trouble.
  • data corruption at the midpoint of PSRAM seems to be a rather old issue that I found in the Espressif forum and was supposed to be fixed by an update but still seems to be present.
  • there is a hard-coded buffer size limit in the driver set by a magic number compression_ratio_bound. I wanted to recompile said driver with the magic number available through the header but never came around to do it.

I don't know what of this is still relevant in the latest driver, but especially the last point would be where I'd pick up the project again in a couple of months.

@jameszah
Copy link
Author

jameszah commented Mar 9, 2021

Howdy, the 10MHz/20MHz versus others is interesting. In the 1.04 version 10MHz got you fast performance (with a clock divider or something), but 1.05 is now 20MHz. The wire is only half an inch long, but dirty connectors or electrical issues such might cause problems at that speed??? I'll give that a try.

The midpoint bug I thought is plausible as well, with some memory mapping idea I read about somewhere ... but so far I am still getting some broken jpegs while avoiding the midpoint. Not rigorously measured.

The sdmmc alloc/de-alloc issue I thought might explain the 64bytes of patterned data that started this thread.

And there is the old ffd9 bug that the driver needs a 0 after the ffd9 in certain cases which may confuse the camera who thought it was done transmitting -- that again needs to re-complile the camera software, which opens a can or worms with the i2s system I think (not spi as I said in the title!).

It is all very low error rates, so it doesn't point to a clear bug somewhere, but occasional heat/electrical issues.

I have a "sense" that my videos done in 1.04 code are better quality with fewer broken jpegs. But those cameras are setup in comfortable surroundings with ov2640 camera, while some of the very bad videos were done with the higher current (hotter) ov5460 camera in the hot sunshine. In the last few days I have been testing outside in the zero celsius which might improve things with the cooling.

Thanks for the info 😄

@jameszah
Copy link
Author

Anybody have an IEEE account?

https://ieeexplore.ieee.org/document/664106
Detection and correction of transmission errors in JPEG images
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 8, Issue: 2, Apr 1998)

@SWillSZ
Copy link

SWillSZ commented Mar 12, 2021

I don't have an IEEE account, unfortunately. Out of curiosity, James, do you primarily code in ESPIDF or with Arduino?

The error rate for videos does seem to be determined by external factors. If a scene is complicated (or noisy, like with high gain ceilings such as GAINCEILING_16X), corrupt JPEGs will occur more frequently. If the temperature is hot, corrupt JPEGs will occur. I don't think it is a frame size issue - tiny little CIF JPEGs can suffer from corruption (15fps in my case, while uploading and writing to SD simultaneously).

I don't think corruption is a RAM vs PSRAM issue. I lowered my resolution to CIF, and allocated all frame buffers in RAM instead of PSRAM. Still ended up with corruption. To be fair, I am simultaneously transmitting via WiFi. A hardware engineer I spoke with mentioned that it could be because of WiFi transmissions causing interference with the CMOS sensor. Apperently CMOS sensors are vulnerable to RF interference.

Schaggo,
The 10 vs 20MHz issue is likely caused by the camera driver. For instance, in OV2640.c, we have:

    if (sensor->xclk_freq_hz == 10000000) {
        if (framesize <= FRAMESIZE_CIF) {
            WRITE_REG_OR_RETURN(BANK_SENSOR, CLKRC, CLKRC_2X_CIF);
        } else if (framesize <= FRAMESIZE_SVGA) {
            WRITE_REG_OR_RETURN(BANK_SENSOR, CLKRC, CLKRC_2X_SVGA);
        } else {
            WRITE_REG_OR_RETURN(BANK_SENSOR, CLKRC, CLKRC_2X_UXGA);
        }
    }

I need to dig into the OV2640 documentation to figure out exactly what setting this register does.

@Schaggo
Copy link

Schaggo commented Mar 12, 2021

All of the phenomena also occur on ov3660s - if that helps

@jameszah
Copy link
Author

Hi, I got the article from a teenager with a powerful library card! Haven't read it yet.

I'm using Arduino mostly - I've tried the PlatformIO and ESPIDF, but haven't switched yet.

I think it would be nice to find the problem, but an acceptable alternative would be an easy way to find and dispose of the bad jpegs. I was trying full-HD at 2 fps, bright sun, and 0 degrees C, and got a 30 minute video with only 1 error. But the interesting thing was that the error happened in a jpeg that hit the 64 units of the huffman block, which I had been checking for in the ESP32, and discarding those frames (assuming the bit error caused us to miss the EOB if we exceed 60). I might have a bug in that code, or the error happened after the check. The jpeg travels from psram -> sram for the jpeg check, and then again for the SD write. So that points to the SD writer system. Another oddity of that experiment was that I was using an old dollarstore circle 10, 16 GB sd card, while normally I'm using a 64GB V30 card. So the slower write speed might be avoiding errors in the ram -> sd process.

I also spent some time 5640 registers and code - haven't got into the clockspeed business, but I thought I might be able to turn on "jpeg restart" but it doesn't seem to be an option. I thought "Scalado mode" might be something - but it didn't work.

image

@SWillSZ
Copy link

SWillSZ commented Mar 12, 2021

James,

I strongly recommend you switch to using ESPIDF. I started with Arduino, and ended up making the change when I realized I couldn't accomplish what I needed without access to menuconfig. The difference in power is striking - mostly because of menuconfig, but also because of the ability to painlessly make edits to the drivers. The debugging cycle is also slightly faster.

Don't get me wrong - switching is a pain. A lot of the arduino functions do not work with ESPIDF (especially as relates to Wifi).

When I record video, it starts with a low error rate. It then picks up as the camera heats up. Eventually, esp_get_frame fails to return, and I simply restart the ESP32 - allowing the camera to cool down and continue the cycle. To be fair, I am operating it at around 20C. I'm working on having a custom heat-sink made for the OV2640 module of the ESP32-CAM with - if it works, I could potentially shoot one your way.

Unfortunately, I need to learn more about image compression - Huffman blocks and whatnot. One way of testing your theory would be to saving the image to SD, and then run it though your bad filter jpeg algorithm a second time, seeing if it is caught. My bet is that it a code issue with the bad jpeg filter algorithm, or corrupted images somehow having valid huffman blocks.

I use Lexar 32 / 16 GB SDs, and they work well for me. The ESP32 is 32bit, so can not use SD storage above 16/32GB (I forget which). My guess is that would explain why your 64GB card is not performing as well as cheap 16GB card.

@jameszah
Copy link
Author

So, working my theory that the error occurred on the ram -> sd journey, I tried writing to the sd twice - once as a avi file, and once as a mjpeg file, and after it had already passed the test that it was not corrupt by the tjpgd.c huffman analysis missing EOB theory.

Then I observed a bad frame - the identical bad frame in the avi and the mjpeg - so it made one journey psram - > ram for the tjpgd.c analysis and passed, then another journey psram -> ram, and two journeys ram -> sd, which came out identical, and wrong. And then I selected that bad jpeg and resubmitted it to tjpgd.c, and it failed. So the first psram -> ram for tjpgd.c was good, but the second psram->ram introduced the error.

So the psram->ram is the culprit. So now I'm running a program to catch bad-frames from a continuous stream. A regular HD frame takes about 150ms to parse, so thats about 8fps - just a little slower than the ov2640 camera can produce on 1.05. And if you are doing a timelapse, or a single-frame application, the 150ms would be worth not having the occasional broken jpeg. And no post-processing when playing the video.

The only problem is that you cannot get a large jpeg into ram, check it, and move to sd, so you would have to merge the tjpgd.c check and sd-write, and abort after part was already written to the sd, and start over.

It finds all the bad-frames I have studied so far. Unable to find any bad-frames indoors so need sunshine and heat to test it.

So its a vast amount of computing to find the 1/1000 error, but what else has the esp32 got to do?

@SWillSZ
Copy link

SWillSZ commented Mar 15, 2021

James,

This is very cool - it is interesting that the PSRAM to RAM might be causing the error. Something fishy is going on with PSRAM... This thread indicates as much. The issue is fixed in hardware by revision 3, but the ESP32 chip on the ESP32Cam is revision one. You will see a post stating - "That said, for new designs it is recommended to use ESP32 silicon revision 3 as it fixes the PSRAM cache issue in hardware." Apparently half of the PSRAM can be used by core one, and the other half by core two - or something like that. My background in the lower level of design is unfortunately limited. However, I will spend some time on this - hopefully I will have an update for you. My application is a multicore stress test - simultaneous record to SD (15FPS CIF), 250KByte wifi transmission and warm temperatures.

Also, you are correct in routing the frames from PSRAM through RAM before hitting the SD card - PSRAM <-> RAM or SD Card <-> RAM is done by fast DMA under the hood, but DMA can not be used for SD CARD <-> PSRAM, unfortunately. At the time was experimenting with simply storing the frame buffers in RAM (CIF resolution makes this somewhat reasonable), and had blanked on PSRAM usually being used to store frame buffers

@jameszah
Copy link
Author

That psram thread is interesting. And talks about this issue with the upper and lower 2MB banks. And igrr is saying Nov 15, 2020 that corrections are being added to ESP-IDF 3.3, which I believe is the root of arduino-esp32 1.0.5. I don't understand the problem or correction, but maybe it didn't make it into 1.0.5 ??? Or maybe this is a genuine data error -- not sure what the expectation of 1-bit errors should be in the camera->psram->ram sequence.

My new idea is to copy a jpeg from psram -> ram (in blocks), do a checksum, then copy it again in blocks, and start the SD writer application, while re-doing the checksum, and if the checksums match at the end, then we likely have a good jpeg, or if not, then abandon that jpeg, and move the file pointer back to the start of this frame. It will not work for streaming, as you cannot backup there. The psram->ram only takes 20-30 micro-seconds or so, so that will not slow things down, and the 200-400 milli-seconds of jpeg decoding and checking might be unnecessary. Or it might catch errors from the camera, or from the camera->psram transfer.

So I'll give that a try.

@lunadm
Copy link

lunadm commented Mar 16, 2021

I use Lexar 32 / 16 GB SDs, and they work well for me. The ESP32 is 32bit, so can not use SD storage above 16/32GB (I forget which). My guess is that would explain why your 64GB card is not performing as well as cheap 16GB card.

Im using 64g & 128gb cards (not tested 256)
Im currently detecting the cardsize and changing the .allocation_unit_size = 64 * 1024 or 128 * 1024 etc
improves the framerates

@jameszah
Copy link
Author

Im using 64g & 128gb cards (not tested 256)

The only problem is formatting them fat32 -- its all the fault of Dave from Dave's Garage youtube channel who wrote windows format. 😄

@jameszah
Copy link
Author

jameszah commented Mar 17, 2021

That 30 microseconds remark above was wrong -- that is the time to get the address of a frame that is already in the psram.
Running a simple checksum on a regular hd jpeg is about 5-10 ms. And about the same if you scan through it in psram, or memcpy it to sram, and do the same. I thought memcpy might have a dma feature, but it looks like there is a separate call for that and it is on the critical path - so there is nothing else to do while the dma is running. Still a little confusing that the summing the bytes in psram, or is copying them to sram in blocks, and summing them, end up the same.

The simplified jpeg decoder "eob checker" takes about 100-150ms on regular hd. Cannot find any failures between multiple checksums with indoor light and heat.

@SWillSZ
Copy link

SWillSZ commented Mar 17, 2021

@jameszah
After spending some time on my end, I realized that PSRAM was not responsible for my corrupted images. All of my frame buffers were being allocated in the first half of PSRAM by the first core, with no potential of any of the frame buffers straddling both halves of the PSRAM.

If the checksums before PSRAM processing match those which occur after PSRAM processing, it is likely that PSRAM corruption is not your issue.

I ended up solving the vast majority of my corruption issues with the clock speed adjustment recommended by @Schaggo
xclk_freq_hz = 16500000
taken from Github Post

Interesting that performing a checksum on an image takes as long in SRAM as in PSRAM. I suppose it does need to be done by CPU either way, but that still does not quite explain the equality in run-time. Memcopy is written in ASM, which is beyond my skillset - would need to look at the register definitions and schematic and whatnot
Memcpy Source. I think memcopy does not use DMA. The memcopy code linked does not once mention DMA in comments. In addition, forums for some other products (not ESP32) mention that memcopy does not use DMA https://forums.xilinx.com/t5/Embedded-Linux/memcpy-to-programmable-logic-not-using-DMA/td-p/680507. From looking at driver code, I am 100% confident that ESP32Camera frames are written to memory through DMA, data is written / read from SD via DMA, and parts of the wifi stack are performed via DMA.

I suppose your camera example could be sped up by using DMA instead of memcopy in a few cases. An example of implementing DMA can be found in the camera driver for the ESP32Cam, or in other locations. I believe that you would have to mess with interrupts. The ESP32Cam only has two DMA channels and 2 CPU cores, so it may not make sense if DMA is already performing two tasks

@lunadm What framesize allocation raised frame-rate most effectively, and with what frame resolution being used to record?

@lunadm
Copy link

lunadm commented Mar 18, 2021

Hi James
Great work btw- I should have doffed my cap in my first post

I found I had to manage the allocations so I could read the sd cards on both win and Mac.,
if I left it to automatically configure the Mac would try to reinitialise it - on digging it appears to be a common usability problem with drone cams.
Some sites suggest 128 works on both platforms but I found 128 doesn’t work on the Mac.
The sweet spot for both Mac and win appears to be 64 .
On windows 128 was slightly faster, but less optimal space usage - but I imagine you could tune the recording time.
I was recording SVGA on the ov2640

I’ve recently tried the 5640 in hd but I was only getting about 12fps (outside sunshine) and the extra work reduced the battery life.

@lunadm
Copy link

lunadm commented Mar 18, 2021

One thing I have noticed with 1.0.5 - (changed xclk and fbcount)
On my version, before recording, I have the wifi start up and launch a captive portal - this only happens with a Poweron_Reset (Vbat power on reset)

When the time is set / or timeouts the captive portal closes and wifi is switched off - but I noticed a reduction in framerates -
(If I remove the wifi its fine)
moving back to 1.04 and all is well. I need to do some digging.

@jameszah
Copy link
Author

re: WiFi

I had pretty much given up on jpeg testing -- it is easy (if very cpu-intensive) to find a missing EOB, but can't think of a way to find an error that creates a premature EOB. Like this one
image

So I was running the camera without jpeg testing, and my router rebooted about an hour in. The 30-minute windy outdoor 13fps regular HD video, has a few errors, but after the router rebooted, the esp32 did not reconnect, and the video after that looked error-free. I've looked through about an hour - about 4.4GB, 40,000 frames - without any problems.

So maybe 1.05 did something with wifi that is creating an error with the camera->psram transfer (or camera->ram transfer). I was running all the camera and sd writer on core 1, and streaming on core 0 (with the wifi/rtos as I understand it), and the http handler being assigned to either core. For streaming, I copy the next frame about to be written to sd. It is not mutex'ed but an entire fb_get would have to be completed before it is over-written. And the defect is in the sd-write, not the streaming which I don't save and cannot obsess over defects.

So more testing with WiFi shut off.

@jameszah
Copy link
Author

jameszah commented Mar 21, 2021

So with the wifi shut off completely (never started), I get about 4 hours of ov5640 regular hd at 14fps, and ov2640 regular hd at 9fps, and xclk_freq_hz = 16500000 (I meant to change that back to 200...), and not a bad frame in sight.

I wonder if this could be the problem:

adafruit/Adafruit_NeoPixel#139 (comment)

@jameszah
Copy link
Author

So after another 10 or 12 hours of full speed, bright outdoor recording at 1280x720 ~14fps, with xclk switched back to 20000000, generating about 5GB files per hour, with the 1.05 software, ... I cannot find any broken jpegs.

I boot the esp32, start the wifi and get the time, then a simple WiFi.disconnect(); and everything is fine.

That NeoPixel issue with the "bitbanging" is not exactly relevant as I think the i2s for the camera, and spi for the psram and the sd card -- so that would not be bitbanging.

The WiFI events can be sent to either core -- I assumed that was just the events the "user" wants to handle, rather the the overhead type events/packets that the user never sees. espressif/arduino-esp32#4762 (comment)

The the frequency of the problem might the some combination of that xclk speed and the jpeg size, that is interrupted by the wifi events that disturb things enough to create an error in the jpeg - I guess in the camera->ram transfer.

So my conclusion is the WiFi is disturbing things -- even if I am not actively using the wifi, but it is just sitting there waiting for events.

Not a solution if you are primarily using streaming, or broadcasting events, but you are primarily recording to sd, that would avoid the problem.

@jameszah jameszah reopened this Mar 24, 2021
@jameszah
Copy link
Author

Using YUV or RGB puts a lot of strain on the chip because writing to PSRAM is not particularly fast. The result is that image data might be missing. This is particularly true if WiFi is enabled.

Maybe that said it all in the intro to esp32-camera. The wifi must step on i2s at times.

@lunadm
Copy link

lunadm commented Apr 2, 2021

WiFi is disturbing things
just a thought - not tested - I wonder if its Arduino wifi
what if you #include "esp_wifi.h" and then after wifi.mode(off) use esp_wifi_deinit(); which should unload the driver
https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html#wi-fi-deinit-phase

@jameszah
Copy link
Author

jameszah commented Apr 7, 2021

I find the simple WiFi.disconnect(); will solve the problem, without deinit().
So I'v set up a system to connect and disconnect wifi under various circumstances (physical switch, and timeouts, etc), and the wifi on --> wifi off will erase these small erros (1/1000 frames maybe ... after you have disposed of frames with no ffd9).

I had a theory on the bmp vs jpeg, and also the 1.04 vs 1.05. The bmp was sending much more data, so would hit the problem more often (wifi vs i2s), and 1.05 seems to run the the camera more efficiently so you can get 12.5 fps on uxga where it used to be 6.5 I think -- so again, more data, and more chances for the wifi vs i2s problem.
I thought another good experiment would be to get that new single core esp32, and see if it can reproduce the problem. Or maybe run everything on core 0. Camera stuff and user wifi handlers would be on core 1 usually, and I think some wifi driver stuff would run on core 0. So the multi-processor issues would be excluded. But I cannot assign all wifi activities to core 0, without switching from arduino -> idf framework.

@jameszah
Copy link
Author

Another strange observation - using 1.06 with UXGA quality 15 and fb_count = 1, ... almost every jpeg is broken with wifi on or off, and with the buffers set up with quality=5, so the buffers are enormous (900kb ?) and can handle any frame of the ov2640. So maybe an opportunity to search for the problem.

@jameszah
Copy link
Author

jameszah commented May 3, 2021

Another possibility:

I ran into this post https://www.esp32.com/viewtopic.php?t=15193#p62169 looking for some other issue.

It says that the solution is set the wifi modem to power save WIFI_PS_NONE, from the default of power save WIFI_PS_MIN_MODEM.

It seems to work on initial tests. 😄

It also ramps up the speed of the wifi. My esp32's always showed 6000kps on my phone company router, but after switching to WIFI_PS_NONE, it bounces around, but gets up to 72222kbps ... which is the speed that many 2.4G devices operating on my router -- just one old laptop is 2.4G and 130000kps.

So the hopeful thinking is that the modem is half asleep, it wakes up and causes a disturbance in the i2s, which disrupts the i2s interface to the camera.

I assume it burns more power - so maybe not for battery powered systems. There could be heat issues too.

So this code after the normal wifi setup does it:

  wifi_ps_type_t the_type;
  
  esp_err_t get_ps = esp_wifi_get_ps(&the_type);
  Serial.printf("The power save was: %d\n",the_type);
  
  Serial.printf("Set power save to %d\n", WIFI_PS_NONE);
  esp_err_t set_ps = esp_wifi_set_ps(WIFI_PS_NONE);

  esp_err_t new_ps = esp_wifi_get_ps(&the_type);
  Serial.printf("The power save is : %d\n",the_type);

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/network/esp_wifi.html#_CPPv415esp_wifi_set_ps14wifi_ps_type_t

typedef enum {
    WIFI_PS_NONE,        /**< No power save */
    WIFI_PS_MIN_MODEM,   /**< Minimum modem power saving. In this mode, station wakes up to receive beacon every DTIM period */
    WIFI_PS_MAX_MODEM,   /**< Maximum modem power saving. In this mode, interval to receive beacons is determined by the listen_interval 
                              parameter in wifi_sta_config_t. 
                              Attention: Using this option may cause ping failures. Not recommended */
} wifi_ps_type_t;

@jameszah jameszah reopened this May 3, 2021
@jameszah jameszah changed the title 1.0.5rc6, config.xclk_freq_hz = 20000000, ov2640 and ov5640 jpeg spi problem ? Broken jpegs - fixed ??? - 1.0.5rc6, config.xclk_freq_hz = 20000000, ov2640 and ov5640 jpeg i2s problem May 3, 2021
@jameszah jameszah closed this as completed May 8, 2021
jameszah added a commit to jameszah/ESP32-CAM-Video-Recorder-junior that referenced this issue May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants