Lingbot-Depth is truly excellent work. This is the result I got using images from an L515 camera as input. The only small drawback is that the current model size is quite large. I'm wondering if there are plans to distill smaller models in the future, such as ViT-small/base versions, to make deployment on robots easier.


