Currently, I am an algorithm engineer.
π Reseach-wise, I mainly focus on:
- Multi-modal Large Language Models
- Video Understanding
π« Contact me by:
- Email: zhanghuaxin@bytedance.com
π¬ News:
- 2025-02-27: Holmes-VAU is accepted on CVPR 2025.
- 2024-07-01: We release our code and model of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM".[project page]
- 2024-06-10: We release our code and model of "Arcana: Improving Multi-modal Large Language Model through Boosting Vision Capabilities".[project page]
- 2024-01-29: I start my internship in Baidu VIS, to do some research on Multi-modal Large Language Model (MLLM).
- 2023-12-09: One paper about point supervised temporal action localization is accepted on AAAI 2024.