-
Sun Yat-sen University
- Canton, China
- https://jianjieluo.github.io/
Stars
[MM 2025] SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Official implementation of OneDiffusion paper (CVPR 2025)
Stable Diffusion web UI
《可解释的机器学习--黑盒模型可解释性理解指南》,该书为《Interpretable Machine Learning》中文版
Align Anything: Training All-modality Model with Feedback
[ICLR 2025 Spotlight] An open-sourced LLM judge for evaluating LLM-generated answers.
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
A data augmentations library for audio, image, text, and video.
A deep learning library for video understanding research.
Effective Video Augmentation Techniques for Training Convolutional Neural Networks
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Concat-ID: Towards Universal Identity-Preserving Video Synthesis
[Embodied-AI-Survey-2025] Paper List and Resource Repository for Embodied AI
Materials for the Hugging Face Diffusion Models Course
✨✨Latest Advances on Multimodal Large Language Models
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)