ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model
computer-vision robotics representation-learning video-understanding vla reasoning multimodal self-supervised-learning video-representation video-modeling dexterous-robotic-hand semantic-guidance dexterous-manipulation large-language-model vision-language-model jepa world-model multimodal-reasoning latent-world-model thinkjepa
-
Updated
Apr 4, 2026 - Python