Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[NeurIPS 2024] Behavioral Topology (BeTop), a multi-agent behavior formulation for interactive motion prediction and planning
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
Official code for paper "SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning tasks"
Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step
Witness the aha moment of VLM with less than $3.
(T-IV) Dream to Drive with Predictive Individual World Model
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
A curated list of resources on graph-based retrieval-augmented generation (GraphRAG) for customized large language models.
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generation
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models
Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
[ECCV 2024] Official GitHub repository for the paper "LingoQA: Visual Question Answering for Autonomous Driving"
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Code and data for "SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation"
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository
Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
DeepSeek r1 and Claude 3.5 Sonnet achieve the best combination, fully unleashing the power of the strongest models. Supports OpenAI streaming output and can run on your favorite ChatBox!
Mozha-R1 is an AI-powered application utilizing DeepSeek R1 Distill Model. This project designed to run locally on Windows and Linux (AMD64 & ARM64). This application provides an API interface.
π DeepSeek-R1: Retrieval-Augmented Generation for Document Q&A π