Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Printcount breaks eeprom on SKR E3 mini v1.2 512k #15982

Closed
chestwood96 opened this issue Nov 23, 2019 · 34 comments
Closed

[BUG] Printcount breaks eeprom on SKR E3 mini v1.2 512k #15982

chestwood96 opened this issue Nov 23, 2019 · 34 comments

Comments

@chestwood96
Copy link
Contributor

Description

If I have print count enabled in my configuration the eeprom seems to get corrupted every time the board restarts.

Saving and loading settings and meshes works until the board restarts.

Steps to Reproduce

Marlin.zip

  1. Build Marlin with printcount enabled
  2. Flash new version of marlin
  3. Change a setting (I set the I of the PID to 66 as an easily recognicable number)
  4. Store the settings to EEPROM (M500)
  5. Verify eeprom (M504) --> should be fine
  6. Load settings (M501) --> chenged setting should still be there
  7. Restart the board (I power cycled it)

Expected behavior: The EEPROM can still be loaded, M504 still is valid and I=66 in the pid should still be there

Actual behavior: "EEPROM version mismatch (EEPROM=? Marlin=V72)"

Additional Information

There seem to be other things occasionally corrupting the eeprom but at least without printcount it survives a restart.

@anlupat
Copy link

anlupat commented Nov 23, 2019

I got a skr dip. And I can confirm printcount also breaks my EEPROM

@Masanetz
Copy link

With emulated eeprom on sd-card it still works with enabled printcounter on my E3-DIP RET6...

@boelle boelle changed the title [BUG] Printcount breaks eeprom on SKR E3 mini v1.2 512k [BUG] [Bugfix 2.0.x] Printcount breaks eeprom on SKR E3 mini v1.2 512k Nov 24, 2019
@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

tried

M502
M500

??

@Masanetz
Copy link

With emulated eeprom on flash, it always leads to "EEPROM version mismatch (EEPROM=? Marlin=V72)", with eeprom on sd-card, it just works...

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

and doing

M502
M500

does not matter???

@Masanetz
Copy link

M502
M500
is the first, I do after EEPROM mismatch error, of course.

But running ABL with 100 points after every power-cycle is not really fun ;-)

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

is the first, I do after EEPROM mismatch error, of course.

and the mismatch error remains?

But running ABL with 100 points after every power-cycle is not really fun ;-)

i assume you use restore G29 after G28 and M420 S1 to enable it after power cycle?

@Masanetz
Copy link

The mismatch error remains for every power cycle (if using flash).

My regular way is to issue
G28 ;Home M420 S1 ;Enable bed leveling
as my cura start g-code.

So as workaround I disabled FLASH_EEPROM_EMULATION in pins_BTT_SKR_ER_DIP.h and put a eeprom.dat on the sd-card in the printer...

@chestwood96
Copy link
Contributor Author

Yes I tried M502 M500 multiple times, I did not mention it in the steps (between 2 and 3) as it makes no difference.

@tatusah
Copy link

tatusah commented Nov 24, 2019

I finally got rid of these flash EEPROM issues on 512K RCT6 mini E v1.2 and UBL working. I disabled the print counter, lowered the grid size to 9x9 and disabled UBL_SAVE_ACTIVE_ON_M500 (just to have full controll of the saving process). With the 10x10 grid I couldn't get M504 return OK after going through the UBL setup and saving the EEPROM.

Seems like these different EEPROM related issues are so far occurring on BTT E3 boards. I was going to try on my RCT6 based board if the problem with the EEPROM and UBL was also happening when using the official 256K flash to rule out if everything was caused by using the undocumented larger flash. Unfortunately whatever features I disabled couldn't get the firmware small enough for the original size limit. But I did then use st-link to verify if I could use the full 512K. Just like Alex Kenis video showed I was also able to verify that my board reported 64K ram (making it more likely that these came from the same manufacturing line as the RET6 with official 512K/64K) and thus I've customized my build env to include board_upload.maximum_ram_size=65536 definition.

st-info

But anyway, we are on uncharted territory when trying to use the undocumented 512K flash, I'm not sure if the problems are caused because of that or some other thing. Still, if you are experiencing EEPROM issues it might be worth a try disabling the print counter and if using the UBL mesh lower the size of the grid. This worked in my case. Smaller grid uses smaller chunks of the memory reserved for them so I also got 1 additional mesh slots when going from 10x10 to 9x9 grid.

@chestwood96
Copy link
Contributor Author

Without printcount 10x10 grids work for me.

@chestwood96
Copy link
Contributor Author

Saving a 2. 10x10 grid seems to wreck the eeprom.

@tatusah
Copy link

tatusah commented Nov 24, 2019

So is the conclusion that the unofficial upper region of memory has some bad locations in the areas where either print counter statistics or mesh slots are stored and this wrecks the eeprom? Or is there actually something wrong with the code? I'm starting to believe the former. Especially if these mini&dip boards with 512K RCT6 chips were manufactured as RET6 but didn't pass the quality control for full memory. Maybe also the boards that came with RET6 chip but had the bootloader locked to 256K were also second quality. Just guessing, only BTT knows.

Or what does @boelle say, has there been any problems with either print counter or bigger mesh sizes with boards that use only certified memory? Is there possibility that the print counter or bigger mesh slots overflow to the area of other settings in the emulated flash eeprom?

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

i dont say anything as i'm not the maintainer of marlin, nor do i write code for marlin

but if i where to say something it would be not to juse china junk boards

@chestwood96
Copy link
Contributor Author

I am not completely discounting the possibility of flash just being bad but the printcounter was a bit too reliable at wrecking the eeprom at boot so I am pretty sure it has something to do with software. My guess is that different parts of the software are messing with each others memory.

I do have a RGT6 (1M/96kb) laying around that I plan to put on the board the next time I take it out of the machine (because I can't leave anything I own stock XD) so that would fix it if it was just bad flash.

Junk board is a bit harsh, the price/performance on those things is pretty great.

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

Junk board is a bit harsh

maybe

price/performance

what about quality?

@chestwood96
Copy link
Contributor Author

What about the quality? Except for that (which likely is not a problem with the board) I had not problems with it. The solder joints look nice and the schematics are available, I really can not complain on the quality front. It is chinesium but pretty good chinesium.

The only thing that kind of annoys me about the v1.2 is that the wasted 3 pins (and software serial) on TMC2208 compatibility on a board with soldered TMC2209s. But I guess they'll make a E3 mini light with 2208s at some point so it is probably better have as little differences as possible.

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

it just seems that these boards have doubts about what eeprom size they actually have

and there are so many issues with SKR that it cant all be marlin error or user error

@chestwood96
Copy link
Contributor Author

Well that is a bit unfair to the board though, the 512k thing did not come from btt but was discovered by the comunity.

You can get the whole stock functionality and more within 256k, people just found out you can go higher. A more ore less clean option to use 512k was just merged a couple days ago.

It also does not help that most tutorials for building marlin for it are outdated now but that technically counts as user error. I am pretty sure it will get better at some point, this board is rather new.

@boelle boelle changed the title [BUG] [Bugfix 2.0.x] Printcount breaks eeprom on SKR E3 mini v1.2 512k [BUG] Printcount breaks eeprom on SKR E3 mini v1.2 512k Nov 26, 2019
@brew99
Copy link

brew99 commented Dec 4, 2019

Anymore update on this issue? I have the same problem on a V1.0 E3 mini (RCT6) using Bugfix-2.0.x from Dec 3rd. Disabling PRINTCOUNTER fixes the issue. Discussion on BTT github also with this issue

@kafie1980
Copy link

kafie1980 commented Dec 4, 2019

EEPROM was broken for me too in the Marlin 2.0 official release and tested it with the latest compilation from last night using the SKR mini E3 v1.2 512k USB configuration. After a few power cycles, the EEPROM settings for PID Autotuning would be lost and revert back to firmware defaults. Disabling printcount in configuration.h fixes the issue for me too.

@tatusah
Copy link

tatusah commented Dec 4, 2019

Has anyone had time to check whether the print counter corrupts the EEPROM also when using the 256K firmware on SKR E3 mini v1.2 or if the same problem can be reproduced on other boards?

@sjasonsmith
Copy link
Contributor

I have only briefly looked at this code, to see whether the flash is being handled in a safe way to prevent corruption. If I am interpreting things correctly, the flash will be corrupted every time print statistics are saved, which by default is every hour.

A few issues I see (although I may be missing something):

  1. The default store time of 1 hour is going to exhaust the 10 kcycle endurance of the flash in approximately 416 days.
  2. Writing statistics erases the entire EEPROM-emulation flash-sectors without restoring the prior configuration contents.
  3. There is no protection against corrupting the contents if a power-off occurs during a write. This is true even without the print counter, but would still be an issue even if Hi #2 is fixed.

Strategies exist which could minimize the impact of all of these issues. It looks like the LPC1768 HAL implements several mechanisms to help with this, but the STM32F1 HAL does not.

@brew99
Copy link

brew99 commented Dec 4, 2019

@tatusah, I've compiled with 256K and PRINTCOUNTER enabled, and same result happens. Definitely something going on with the PRINTCOUNTER

@sjasonsmith
Copy link
Contributor

If my assessment of the code is correct, PRINTCOUNTER should probably be blocked in the STM32F1 sanity checks until improvements are made.

@brew99
Copy link

brew99 commented Dec 4, 2019

Maybe also comment out PRINTCOUNTER in the example configs under BigtreeTech, so that general users don't run into this continually. IMO, not really needed in an example config

@thisiskeithb
Copy link
Member

Sorry about that everyone. On the STM32 front, I only have an SKR Mini 1.1 to bench test with, but I have some E3 (and other BigTreeTech) boards arriving soon that I can install and run real prints with for testing/debugging.

@sjasonsmith
Copy link
Contributor

Another concern we had is whether the PRINTCOUNTER feature cooperates nicely with probing meshes which are saved at the end of the EEPROM. We have some suspicion that they might be overlapping, but haven't actually verified yet. If that is true then it would likely cause problems even when using an SD Card rather than flash.

@randellhodges
Copy link
Contributor

They shouldn't overlap. PRINTCOUNTER is written before configuration, then mesh data.
printcounter is 0x32 or 0x40 (50 or 64) bytes offset (and only 16 bytes), then configuration which is 100 bytes offset, then some magic math, then the meshes.

@randellhodges
Copy link
Contributor

randellhodges commented Dec 5, 2019

What @sjasonsmith and I found was that, whenever a save operation happened, such as printcount or configuration or a ubl mesh save with G29 SX, the existing code would erase both pages of flash and then only write the data for that operation.

Meaning, a printcount write would eliminate your configuration and ubl mesh. A G29 SX would eliminate your printcount and configuration, etc, etc.

With his help, we have a PR that mimics some elements of how the LPC1768 handles it (minus the wear leveling/slots that platform implements) by buffering the 4K, letting the write update the buffer, then flushing the entire buffer.

There is room for improvement (as always) but at least it should eliminate the immediate issues.

As @sjasonsmith said, flash only has a fixed number of writes. The printcount feature does flush to storage once an hour, but only while a print is active. This does cut down on the number of writes, but, I would agree that feature shouldn't be enabled by default for this platform/flash combination.

No one pays much attention to warnings during a compile, but I'd think a warning for any platform that has it enabled and is backing configuration into a limited write medium might be a good idea for the future.

@tatusah
Copy link

tatusah commented Dec 5, 2019

I tested out your pull request @randellhodges with SKR e3 mini v1.2 512K USB (RCT6 chip) build. Now saving multiple mesh slots works without problems. Awesome job!

@boelle
Copy link
Contributor

boelle commented Dec 7, 2019

will close this one since the PR from @randellhodges has been merged

@TerawattX
Copy link

Not sure this is 100% resolved.
I performed a pull this evening that should include the fixes, then built with the SKR Mini E3 v1.2 with 512K, print count off, and writing to flash. I can create a mesh and save it with G29 S0, Load it with L0, and its fine. However if I do a M500 and M501 the mesh is gone.

@github-actions
Copy link

github-actions bot commented Jul 3, 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests