Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fire hazard from SD Card fault #22585

Closed
tdcox opened this issue Aug 17, 2021 · 18 comments
Closed

[BUG] Fire hazard from SD Card fault #22585

tdcox opened this issue Aug 17, 2021 · 18 comments

Comments

@tdcox
Copy link

tdcox commented Aug 17, 2021

Did you test the latest bugfix-2.0.x code?

No, but I will test it now!

Bug Description

JGMaker Artist-D Pro with firmware release v2.3.3

A corrupted SD card caused a print to fail part way through. The failure manifested as a halt to printing at a specific and repeatable layer. The nozzle was driven down into the part and remained heating. More worryingly, the bed continued to heat up, without any thermal control and when I discovered the problem, the metal chassis of the machine was hot. This could easily have led to a fire had the temperature gone much higher.

When I powered off and on the machine, it only then detected a problem and set off an alarm, refusing to boot until the temperature was back within the safe range.

The UI was still responsive during the fault, so this is a failure to monitor safety parameters during an unexpected print buffer under-flow rather than a system hang.

Bug Timeline

New machine. Fault occurred on prints #3 and #4 with faulty SD card.

Expected behavior

I expected safety protocols to be functional at all times.

Actual behavior

Over temperature checks were not being carried out and no alarm was raised.

Steps to Reproduce

The fault occurs when the printer is reading a file that is corrupted on SD card. Upon subsequent investigation, Windows is unable to read the file at all, reporting it as corrupted, so this looks like a combination of two bugs, firstly a failure to validate the print file before starting a print and then secondly a failure of the temperature monitoring routine in the face of an unexpected empty buffer.

The print failure mode is repeatable using the same card and file, but the thermal state is sensitive to initial conditions, so it may fail with the heaters on or off.

Version of Marlin Firmware

V2.3.3

Printer model

JGMaker Artist-D Pro

Electronics

Stock

Add-ons

None

Bed Leveling

No response

Your Slicer

IdeaMaker

Host Software

SD Card (headless)

Additional information & file uploads

No response

@ellensp
Copy link
Contributor

ellensp commented Aug 17, 2021

That is not a valid version of Marlin. reall official Marlin is only u to 2.0.9.1
It is probably JGMaker specific version, and as such we cant help

@ellensp
Copy link
Contributor

ellensp commented Aug 17, 2021

Here it is https://github.com/JGMaker3dofficial/ArtistD-Pro
This is based on really old marlin 2.0.5
You need to submit this report on JGMaker's github. As we cannot help with 3rd party forks of Marlin.

@ellensp ellensp closed this as completed Aug 17, 2021
@tdcox
Copy link
Author

tdcox commented Aug 17, 2021

With respect, you still have an active issue with an undiagnosed thermal runaway problem in the current release and the buffer underrun condition may provide pertinent information to help pinpoint that problem.

In Open Source communities, it is important not to discourage people from reporting safety-related issues, even on older releases, since these are likely to be versions that are prevalent in the community and which represent something that we need to react to and mitigate as responsible software maintainers.

A year old, is not ‘really old’ from the perspective of a piece of firmware in commercial equipment. I regularly deal with safety-related systems that are expected to have a lifespan of a decade or more so washing your hands of something that was only written a year ago reflects badly upon the maturity of the Open Source community.

@cp2004
Copy link

cp2004 commented Aug 17, 2021

With respect as well, you are asking the open source community to fix an issue with firmware that came with your printer which is not in this repository. You've been told where to send your bug report, this community is not responsible for supporting commercial printers' firmware and only maintains the code that is contained within this repository.

Feel free to use this repository to rebuild your own version of Marlin, and come back with a new report based on the current code and not a 3rd party fork.

@tdcox
Copy link
Author

tdcox commented Aug 17, 2021

No, what I am saying is that as Open Source maintainers, the onus is on us to act responsibly when it comes to reports of safety-related issues and not dismiss them as ‘not our problem’ for spurious reasons. At least, there should be a desire to perform a diff on the JGMaker branch against the point it forked to see if they have modified anything relating to card reading or temperature safety monitoring. If not, you would obviously want to inspect the commit history of your current version to see if it was still relying upon the same code and therefore at risk of similar failures.

If a safety incident happens, it doesn’t matter if it occurs in the main project, or in a fork, because in the public perception, it is your brand that is impacted negatively, so we should always care about potential issues in this space. More importantly, however, we represent Open Source as a whole and if we are seen to be passing the buck on safety issues, that reflects badly upon all Open Source projects from the perspective of regulators.

Some possible better responses might be to reach out to the fork maintainer to offer to help them identify a safety risk, or to ask the submitter of the issue if they are able to build a rebased version to test to see if the issue is still live in main. Personally, I would also want to obtain a copy of the corrupted file, so that it could be included in automated testing for future releases to mitigate this problem ever occurring in my project subsequently.

It is probably not a good idea to contact fork maintainers and tell them not to report bugs upstream because that is bad for all of us. We need to know about problems before we can fix them, so we should always encourage feedback. Somebody may be exercising something in a fork that you haven’t got test coverage for in your project.

@thinkyhead
Copy link
Member

Please test the bugfix-2.0.x branch to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

@thinkyhead thinkyhead reopened this Aug 17, 2021
@WillisJM
Copy link

WillisJM commented Aug 17, 2021 via email

@Tannoo
Copy link
Contributor

Tannoo commented Aug 19, 2021

What he is asking is to download the latest bugfix branch code, extract it, configure it, burn it, and run it.
Test THAT code against the crash scenario that was played out before.

I am running bugfix-2.0.x (07-10-2021).

I cannot reproduce this issue. I also do not have a corrupt SD card. Thermal runaway has been working just fine for me.

This was also stress tested during my SKR 1.4 upgrade and all of the bad over-crimps (tighter is not always better).

@Tannoo
Copy link
Contributor

Tannoo commented Aug 19, 2021

The Marlin maintainers cannot control and also are not responsible for what is done with any modification of downloaded code, commercial or otherwise.

And being that the commercial entity has given it's own version numbers, they have modified the code outside of this Marlin repo. So, it is that entity's responsibility to handle the issue of that variation of Marlin. The official Marlin repo maintainers cannot possibly know everything that has been done to that code. This was stated before and not be rude, but to be honest.

If a safety incident happens, it doesn’t matter if it occurs in the main project, or in a fork, because in the public perception, it is your brand that is impacted negatively, so we should always care about potential issues in this space.

So, if I modify the latest Chevy Silverado to run a drag race in the low 9's and my engine runs away potentially blowing up, is Chevrolet supposed to correct the runaway? I think not. Same story.

@Tannoo
Copy link
Contributor

Tannoo commented Aug 19, 2021

When looking over the release from JGMaker, it can easily be upgraded to the latest Marlin.
Backup your firmware copy.

Replace all of the Marlin files from the Marlin folder from the bugfix branch.

Then update the config files with the correct values and options.

Upload to the printer and it should be good to go.

The bugfix branch has all the latest fixes to all kinds of things. Sometimes, an unknown issue get's corrected from something seemingly unrelated.

@Tannoo
Copy link
Contributor

Tannoo commented Aug 19, 2021

Once you do this upgrade, JGMaker may not help with software issues.

@thisiskeithb thisiskeithb added Bug: Potential ? C: Safety Needs: More Data We need more data in order to proceed labels Aug 19, 2021
@thinkyhead
Copy link
Member

I cannot reproduce this issue. I also do not have a corrupt SD card.

Yanking the card in the middle of a print is now handled, so that should be a safe scenario at this time. Maybe I can add something under MARLIN_DEV_MODE that allows simulating a corrupted SD card, just to see what transpires next.

@Tannoo
Copy link
Contributor

Tannoo commented Aug 20, 2021

Yanking the card in the middle of a print is now handled, so that should be a safe scenario at this time. Maybe I can add something under MARLIN_DEV_MODE that allows simulating a corrupted SD card, just to see what transpires next.

Randomly inject a string of garbage ASCII characters? lol

@Tannoo
Copy link
Contributor

Tannoo commented Aug 22, 2021

Yanking the SD card wouldn't quite be the same as a corrupt one as the SD detect is "un-triggered". That would activate a routine rather quickly, right?

Injecting garbage in the print buffer or completly stop the data flow to the buffer would be more like it.

@Sebazzz
Copy link
Contributor

Sebazzz commented Sep 3, 2021

You should have the watchdog (USE_WATCHDOG) enabled - I have seen similar symptoms on Creality v4.5.x boards, where reading the SD card through SDIO can cause hangs. With the watchdog this results in a reboot obviously, without watchdog it can indeed result in unlimited heating.

@thinkyhead thinkyhead changed the title [BUG] SAFETY ISSUE - thermal runaway on bed heater [BUG] Fire hazard from SD Card fault Oct 17, 2021
@thinkyhead
Copy link
Member

Good point. I've added a build warning about the watchdog being disabled, along with warnings about the thermal runaway protection features being disabled.

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants