[CVPRW 2024] TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning. Official code for the 3rd place solution of the AI City Challenge 2024 Track 2.

Python 30 3 Updated Jun 17, 2024

open-webui / open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

JavaScript 67,957 8,057 Updated Feb 5, 2025

ollama / ollama-python

Ollama Python library

Python 6,261 533 Updated Jan 28, 2025

tianyi-lab / Reflection_Tuning

[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Python 347 29 Updated Sep 6, 2024

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 27,387 5,619 Updated Feb 6, 2025

ollama / ollama

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.

Go 120,093 9,611 Updated Feb 6, 2025

LLaVA-VL / LLaVA-NeXT

Python 3,336 302 Updated Oct 16, 2024

agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Jupyter Notebook 1,170 157 Updated Jan 22, 2025

Angel-ML / angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Java 6,752 1,597 Updated Jan 16, 2024

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,124 1,299 Updated Jan 27, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,905 1,403 Updated Dec 25, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,932 529 Updated Dec 25, 2024

ultralytics / ultralytics

Ultralytics YOLO11 🚀

Python 36,081 6,952 Updated Feb 5, 2025

AssafSinger94 / dino-tracker

Official Pytorch Implementation for “DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video”

Python 458 44 Updated Nov 23, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,837 126 Updated Oct 30, 2024

wutaiqiang / MoSLoRA

Python 92 9 Updated Jul 6, 2024

lucidrains / byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Python 1,790 248 Updated Jul 15, 2024

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 39,042 4,363 Updated Feb 3, 2025

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 9,742 871 Updated Aug 7, 2024

tianyi-lab / HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Python 269 8 Updated Nov 13, 2024

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,384 416 Updated Aug 7, 2024

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 16,521 1,363 Updated Feb 1, 2025

Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Python 1,693 134 Updated Jan 23, 2025

wang-zhanyu / R2GenGPT

Radiology Report Generation with Frozen LLMs

Python 64 7 Updated Apr 19, 2024

facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.

Python 1,003 72 Updated Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yangbang18

Achievements