Skip to content

amusi/CVPR2025-Papers-with-Code

Repository files navigation

CVPR 2024 论文和开源项目合集(Papers with Code)

CVPR 2024 decisions are now available on OpenReview!

注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!

【CVPR 2024 论文开源目录】

3DGS

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Avatars

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Backbone

CLIP

MAE

Embodied AI

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

GAN

OCR

An Empirical Study of Scaling Law for OCR

NeRF

DETR

DETRs Beat YOLOs on Real-time Object Detection

Prompt

多模态大语言模型(MLLM)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

NAS

ReID(重识别)

Diffusion Models(扩散模型)

InstanceDiffusion: Instance-level Control for Image Generation

Residual Denoising Diffusion Models

Vision Transformer

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

视觉和语言(Vision-Language)

目标检测(Object Detection)

DETRs Beat YOLOs on Real-time Object Detection

目标跟踪(Object Tracking)

语义分割(Semantic Segmentation)

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

医学图像分割(Medical Image Segmentation)

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

Low-level Vision

Residual Denoising Diffusion Models

超分辨率(Video Super-Resolution)

去噪(Denoising)

图像去噪(Image Denoising)

图像生成(Image Generation)

InstanceDiffusion: Instance-level Control for Image Generation

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Instruct-Imagen: Image Generation with Multi-modal Instruction

Residual Denoising Diffusion Models

UniGS: Unified Representation for Image Generation and Segmentation

视频生成(Video Generation)

Vlogger: Make Your Dream A Vlog

视频理解(Video Understanding)

其他(Others)

Object Recognition as Next Token Prediction

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Seamless Human Motion Composition with Blended Positional Encodings

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update