-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Bug description
Error when training with "MLFlowLogger" and with log_models="all"
, running on Windows:
mlflow.exceptions.MlflowException: Invalid artifact path: 'epoch=0-step=43654'. Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'epoch=0-step=43654'.
I was able to find the reason for this error:
MLFlowLogger
is calling MLflowClient.log_artifact(...)
internally -- inside method MLFlowLogger._scan_and_log_checkpoints(...)
-- passing the artifact path as a pathlib.Path
object. However, MLFlow expects paths in the POSIX format, whereas pathlib.Path
will use the current filesystem format.
Now, the actual reason for the error is due to an internal check that MLFlow
does. It tries to verify that the path is already "normalized", that is, doesn't contain "." or "..". The way it does that is by calling posixpath.normpath(...)
and checking if the output is the same as the original path. Because the original path is a "pathlib.Path" object, both paths do not match. I am assuming that if the platform was different from Windows, the comparison would match because the internal string representation of pathlib.Path
would be the same.
What version are you seeing the problem on?
v2.5
How to reproduce the bug
Error messages and logs
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0): 2.5.1
#- PyTorch Version (e.g., 2.5): 2.6.0
#- Python version (e.g., 3.12): 3.12.9
#- OS (e.g., Linux): Windows
#- CUDA/cuDNN version: 12.6
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source): pip
More info
No response