[ICML 2026] TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments
Zhiyu Huang†, Yun Zhang†, Johnson Liu, Rui Song, Chen Tang, Jiaqi Ma
University of California, Los Angeles (UCLA)
† Equal contribution
TIC-VLA introduces a latency-aware Think-in-Control (TIC) architecture for vision-language-action (VLA) model for robot navigation in dynamic, human-centric environments.
-
🧠 Think-in-Control Architecture
Decouples slow vision-language reasoning from fast reactive control through an explicit delayed semantic–control interface. -
⏱️ Latency-Aware Action Generation
Conditions control on current observations, cached VLM hidden states, and explicit delay metadata to mitigate stale semantics. -
🧪 Latency-Consistent Training Pipeline
Combines vision-language reasoning distillation, latency-induced imitation learning, and online reinforcement learning. -
🚶 Dynamic, Human-Centric Navigation
Evaluated in physics-accurate, photo-realistic environments with human-robot interactions and long-horizon instructions.
We are currently organizing the project for public release.
- 📦 Code Release: June 2026
- 🗂️ Dataset and Benchmark Release: June 2026
- 🤖 Trained checkpoints will also be released. Stay tuned for updates!
We introduce DynaNav, a language-conditioned navigation benchmark designed to test VLA systems under realistic scenarios.
- 85 task configurations across Hospital, Office, Warehouse, and Outdoor scenes
- Varying crowd density, navigation distance, and scene layout
If you find this repository useful for your research, please consider giving us a star 🌟 and citing our paper.
@inproceedings{huang2026ticvla,
title={TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments},
author={Zhiyu Huang and Yun Zhang and Johnson Liu and Rui Song and Chen Tang and Jiaqi Ma},
booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
year={2026}
}
