Stars
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
OpenMMLab Detection Toolbox and Benchmark
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
deep learning for image processing including classification and object-detection etc.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Int…
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
Visual tracking library based on PyTorch.
EVA Series: Visual Representation Fantasies from BAAI
Let ChatGPT truly learn how to go online and call APIs! 'EX-ChatGPT' can rival and even surpass NewBing
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
An official PyTorch implementation of the CRIS paper
Official code for "SRFormer: Permuted Self-Attention for Single Image Super-Resolution" (ICCV 2023) and SRFormerV2
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
JSeg is a Semantic segmentation toolbox based on MMSegmentation and Jittor