-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Bug description
When using MLFlowLogger with log_model=True, an error occurs during training when attempting to log checkpoints:
mlflow.exceptions.MlflowException: Invalid artifact path: 'model/checkpoints/epoch=0-step=151'. Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'model/checkpoints/epoch=0-step=151'
Environment
- Python 3.9
- PyTorch Lightning 2.5.2
- MLflow 3.1.0
Error Analysis
Error Origin:
The error is raised at site-packages/mlflow/store/artifact/artifact_repo.py, line 462, in verify_artifact_path:
def verify_artifact_path(artifact_path):
if artifact_path and path_not_unique(artifact_path):
raise MlflowException(
f"Invalid artifact path: '{artifact_path}'. {bad_path_message(artifact_path)}"
)
Validation Failure:
The path_not_unique function (from site-packages/mlflow/utils/validation.py, line 164) fails validation:
def path_not_unique(name):
norm = posixpath.normpath(name)
return norm != name or norm == "." or norm.startswith("..") or norm.startswith("/")
Specifically, norm != name evaluates to True because name is a Path object (not a string), causing the exception.
Root Cause:
In Lightning 2.5.2 (site-packages/lightning/pytorch/loggers/mlflow.py, line 366), artifact_path is constructed as a Path object:
artifact_path = Path(self._checkpoint_path_prefix) / Path(p).stem # Returns Path object
This Path object is passed to MLflow's log_artifact(), ultimately triggering the validation error.
Historical Context:
In Lightning 2.2.4, the same location used a string (no error):
artifact_path = f"model/checkpoints/{Path(p).stem}" # Returns string
In MLflow 2.12.2, the path_not_unique logic was identical, confirming the issue stems from Lightning’s Path usage.
Proposed Fix
Modify the Lightning code to explicitly convert artifact_path to a POSIX string:
artifact_path = (Path(self._checkpoint_path_prefix) / Path(p).stem).as_posix() # Convert to string
After applying this change, the error no longer occurs.
Recommendation
Update the mlflow.py logger in PyTorch Lightning to ensure artifact_path is passed as a string (not Path). This aligns with MLflow’s API expectations and resolves the path normalization issue.
What version are you seeing the problem on?
v2.5
Reproduced in studio
No response
How to reproduce the bug
Just use MLFlowLogger with log_model=True as logger to perfrom training.
Error messages and logs
# Error messages and logs here please
Traceback (most recent call last):
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 48, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 599, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/trainer.py", line 1025, in _run
call._call_teardown_hook(self)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 148, in _call_teardown_hook
logger.finalize("success")
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning_utilities/core/rank_zero.py", line 41, in wrapped_fn
return fn(*args, **kwargs)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/loggers/mlflow.py", line 289, in finalize
self._scan_and_log_checkpoints(self._checkpoint_callback)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/loggers/mlflow.py", line 369, in _scan_and_log_checkpoints
self.experiment.log_artifact(self._run_id, p, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/tracking/client.py", line 2433, in log_artifact
self._tracking_client.log_artifact(run_id, local_path, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 639, in log_artifact
artifact_repo.log_artifact(local_path, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/store/artifact/local_artifact_repo.py", line 33, in log_artifact
verify_artifact_path(artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repo.py", line 464, in verify_artifact_path
raise MlflowException(
mlflow.exceptions.MlflowException: Invalid artifact path: 'model/checkpoints/epoch=0-step=151'. Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'model/checkpoints/epoch=0-step=151'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 68, in _call_and_handle_interrupt
_interrupt(trainer, exception)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/trainer/call.py", line 82, in _interrupt
logger.finalize("failed")
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning_utilities/core/rank_zero.py", line 41, in wrapped_fn
return fn(*args, **kwargs)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/loggers/mlflow.py", line 289, in finalize
self._scan_and_log_checkpoints(self._checkpoint_callback)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/lightning/pytorch/loggers/mlflow.py", line 369, in _scan_and_log_checkpoints
self.experiment.log_artifact(self._run_id, p, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/tracking/client.py", line 2433, in log_artifact
self._tracking_client.log_artifact(run_id, local_path, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/client.py", line 639, in log_artifact
artifact_repo.log_artifact(local_path, artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/store/artifact/local_artifact_repo.py", line 33, in log_artifact
verify_artifact_path(artifact_path)
File "/home/joshua/miniforge3/envs/PyTorch250/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repo.py", line 464, in verify_artifact_path
raise MlflowException(
mlflow.exceptions.MlflowException: Invalid artifact path: 'model/checkpoints/epoch=0-step=151'. Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'model/checkpoints/epoch=0-step=151'
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response