Dear Guo:
I have a question regarding the FoundationStereo weights you trained on the SceneFlow dataset. In your training setup for FoundationStereo, you froze the entire DepthAnythingV2 backbone, which is consistent with the official FoundationStereo implementation. However, I was surprised to find that in your released weights (Foundation_Sceneflow.pt), the parameters of this backbone are not the same as those in the official FoundationStereo weights (nor do they match the official DepthAnythingV2 weights).
Concretely, when running inference, the output disp becomes NaN at the DPTHead stage. This issue does not occur when I use the official FoundationStereo weights. Therefore, I would like to ask for clarification: where do the Depthanythingv2 backbone weights in your released model come from?
Thank you!