Description
openedon Sep 26, 2022
Lightgbm should handle CRLF (\r\n) line endings or at the very least it should fail gracefully with a nice error message.
Summary
I have examples of lightgbm model files (.lgb files) which crash python when trying to load them in lightgbm. If you change the line endings from CRLF (\r\n) to LF (\n) then it is loaded without a problem.
According to a comment in #3589 this is expected behavior since lightgbm only supports LF (\n) line endings. That is why I am putting it here as a feature request rather than a bug.
Ideally lightgbm should support CRLF line endings. But even if that is the expected behavior, the current failure is far from ideal
- Error message: The error message is
[LightGBM] [Fatal] Model format error, expect a tree here. met 200 1298 1149 12880 ...
which does not state what the actual problem is. - Crashing: It does not handle the error. And since the error occurs in low level code (not in python) it completely crashes python with message
*** buffer overflow detected ***: python terminated
- Documentation: If it is expected to fail for CRLF line endings then it should be documented somewhere. I have not found that documentation.
- Consistency: I have many examples of lightgbm model files with CRLF line endings which can be loaded just fine. Only certain ones crash python.
I think lightgbm should either add support for CRLF line endings or at least gracefully handle failures caused by line endings -- returning a useful error message and not crashing python.
Motivation
Many people on windows use lightgbm. Git also has a standard feature to convert line endings when cloning a repo. So even if the model is checked in with \n line endings, it may still fail on windows machines.