Clarity around the MLFlow logger checkpoint issue #21281
Unanswered
Northo
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 1 reply
-
I'd be happy to contribute/review any fix to this. I am aware that my previous fix though did (unintentionally) introduce bugs of its own, hence why I believe that @Borda reverted it and so I believe that we do need to be very careful with any fix. I believe that the core of that issue was that MLflow does not (yet) support |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There is a long-standing bug #20664 that was later fixed #20669, and subsequently reverted, which causes an issue with the MLFlow logger, making saving checkpoints fail.
Background
There is a relatively long list of issues and PRs related to this, and there seem to be some confusion on the matter:
invalid artifact path
mlflow/mlflow#15111The solution is currently to pin
lightning==2.5.0
.Question
After some digging, I was unable to find why the issue is not fixed, though it seems the proposed fix introduced another bug.
This blocks those using MLFlow from upgrading
lightning
, and the relatively long list of related issues and PRs makes it confusing for those stumbling upon this issue. It would be great to get some clarity into when this will be fixed, and what is currently the hold-up.Really love
lightning
❤️ and want to keep using it for our research! Thanks for the help!Ps. Tagging those that seem most involved:
@niander, @yxtay, @Borda, @benglewis
Beta Was this translation helpful? Give feedback.
All reactions