-
horizon robotics
- nanjing, china
Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Google Research
pytorch handbook是一本开源的书籍,目标是帮助那些希望和使用PyTorch进行深度学习开发和研究的朋友快速入门,其中包含的Pytorch教程全部通过测试保证可以成功运行
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
PyTorch tutorials and fun projects including neural talk, neural style, poem writing, anime generation (《深度学习框架PyTorch:入门与实战》)
LAVIS - A One-stop Library for Language-Vision Intelligence
Official inference library for Mistral models
PyTorch code and models for the DINOv2 self-supervised learning method.
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focuse…
Pytorch implementations of various Deep NLP models in cs-224n(Stanford Univ)
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"
Two time-scale update rule for training GANs
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
Implementation of various topic models
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
Codebase for the Recognize Anything Model (RAM)
MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)