Hi, I'm testing the ZoeDepth pretrained models on images captured by highway-side cameras. These images have ground truth depth values that span beyond 300 meters, which I understand exceeds the model's training range (max depth: 80.0 meters).
However, when running the pretrained KITTI model, I consistently get a maximum estimated depth of approximately 5 meters. Below is the code snippet I used:
import torch
zoe = torch.hub.load("isl-org/ZoeDepth", "ZoeD_K", pretrained=True)
predicted_depth = zoe.infer_pil(image, pad_input=False) # Better 'metric' accuracy
For comparison, I also tested the NK model. It provides a more reasonable maximum depth estimate, ranging between 30–50 meters, which aligns better with the expected values from the model.Additionally, I tried using the KITTI model weights from Hugging Face
, but the results were similar.
Upon inspecting the model configurations, I noticed potential discrepancies in the uploaded weights:
"bin_configurations": [
{
"max_depth": 10.0,
"min_depth": 0.001,
"n_bins": 64,
"name": "nyu"
}
This configuration suggests a maximum depth of 10.0 meters, which might explain the observed behavior.
Questions/Concerns:
- Is the issue related to incorrect configurations in the uploaded weights?
- Could you reupload the weights of the KITTI model with a max depth of 80 meters?
- Are there any suggestions for datasets that are beyond 100 meters?