I'm Orr Zohar 👋
My research focuses on Large Multi-Modal Models, especially Large Image/Video + Langauge models, with the hope of pushing these models to be capable of evaluating the quality of actions in video. Recent relevant work:
- 💫 Video-STAR: Introduced a method that allows the utilization of any labeled video dataset for instruction tuning.
- 🤖 VideoAgent: A novel agent-based system that utilizes a large language model to iteratively identify and compile crucial information from long-form videos