👋 Hi, I'm Kevin Qinghong Lin.
I’m a third-year Ph.D. student at Show Lab, National University of Singapore.
I work in Vision+Language, Video Understanding, AI Agents.
🌐 Homepage: qhlin.me
📧 Email: kevin.qh.lin@gmail.com
👋 Hi, I'm Kevin Qinghong Lin.
I’m a third-year Ph.D. student at Show Lab, National University of Singapore.
I work in Vision+Language, Video Understanding, AI Agents.
🌐 Homepage: qhlin.me
📧 Email: kevin.qh.lin@gmail.com
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[NeurIPS2022] Egocentric Video-Language Pretraining
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.