Skip to content
View CaptainEven's full-sized avatar

Block or report CaptainEven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’

Python 933 34 Updated Mar 6, 2025

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 1,237 68 Updated Mar 7, 2025

A fork to add multimodal model training to open-r1

Python 998 51 Updated Feb 8, 2025

Solve Visual Understanding with Reinforced VLMs

Python 3,938 241 Updated Mar 9, 2025

Chain_of_Thoughts_3D_Visual_Grounding

Python 17 1 Updated Apr 20, 2024

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 3,935 348 Updated Aug 7, 2024

The code of paper "Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models"

Python 5 2 Updated Mar 20, 2024

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,094 228 Updated Feb 19, 2025

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 257 10 Updated Dec 22, 2024

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

Python 26 1 Updated Jul 21, 2024

Our solution for the arc challenge 2024

Jupyter Notebook 106 14 Updated Mar 1, 2025

Fully open reproduction of DeepSeek-R1

Python 22,416 2,011 Updated Mar 9, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,489 599 Updated Mar 7, 2025

Mortise AI PC Community Project

TypeScript 50 4 Updated Feb 26, 2025

The official implementation of SAGA (Segment Any 3D GAussians)

Jupyter Notebook 682 48 Updated Feb 18, 2025

[ICLR 2025] Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Python 189 11 Updated Dec 18, 2024

RTG-SLAM: Real-time 3D Reconstruction at Scale Using Gaussian Splatting (ACM SIGGRAPH 2024)

Python 363 40 Updated Nov 21, 2024

LSD-SLAM

C++ 2,638 1,233 Updated Mar 23, 2023

This a revised version of LSD-SLAM to work with Ubuntu 20.04 and ROS Noetic.

C++ 17 12 Updated Jan 26, 2021

[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Python 2,581 149 Updated Dec 14, 2024

Semi-direct Visual Odometry

C++ 2,131 861 Updated Aug 22, 2019

[RAL 2024] RANSAC Back to SOTA: A Two-Stage Consensus Filtering for Real-Time 3D Registration

C++ 100 11 Updated Jan 13, 2025

[NeurIPS 2024] Binocular3DGS: Binocular-Guided 3D Gaussian Splatting with ViewConsistency for Sparse View Synthesis

Python 46 Updated Oct 28, 2024

CoTracker is a model for tracking any point (pixel) on a video.

Jupyter Notebook 4,163 285 Updated Jan 21, 2025

Multi-Object Tracking with Uncertain Detections [ECCV 2024 UnCV]

Python 56 1 Updated Oct 14, 2024

Tightly coupled GNSS-Visual-Inertial system for locally smooth and globally consistent state estimation in complex environment.

C++ 949 241 Updated Sep 11, 2021

Lightweight stereo matching network based on MobileNet blocks

Python 253 47 Updated Mar 15, 2022

GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

Python 252 12 Updated Feb 10, 2025
Next