Stars
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
A fork to add multimodal model training to open-r1
Solve Visual Understanding with Reinforced VLMs
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
The code of paper "Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models"
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning
Our solution for the arc challenge 2024
Fully open reproduction of DeepSeek-R1
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Mortise AI PC Community Project
The official implementation of SAGA (Segment Any 3D GAussians)
[ICLR 2025] Point-SAM: Promptable 3D Segmentation Model for Point Clouds
RTG-SLAM: Real-time 3D Reconstruction at Scale Using Gaussian Splatting (ACM SIGGRAPH 2024)
mstampa / lsd_slam
Forked from tyunist/LSD-SLAMThis a revised version of LSD-SLAM to work with Ubuntu 20.04 and ROS Noetic.
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[RAL 2024] RANSAC Back to SOTA: A Two-Stage Consensus Filtering for Real-Time 3D Registration
[NeurIPS 2024] Binocular3DGS: Binocular-Guided 3D Gaussian Splatting with ViewConsistency for Sparse View Synthesis
CoTracker is a model for tracking any point (pixel) on a video.
Multi-Object Tracking with Uncertain Detections [ECCV 2024 UnCV]
Tightly coupled GNSS-Visual-Inertial system for locally smooth and globally consistent state estimation in complex environment.
Lightweight stereo matching network based on MobileNet blocks
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting