-
Czech Technical University, Carnegie Mellon University, IIIT Hyderabad
- Prague, Czech Republic
- https://yash0307.github.io/
Highlights
- Pro
Stars
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Hackable and optimized Transformers building blocks, supporting a composable construction.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
MLCD & UNICOM : Large-Scale Visual Representation Model
LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts…
PyTorch Implementation of the ICCV 2023 paper: Generalized Differentiable RANSAC ($\nabla$-RANSAC).
SuperGlue: Learning Feature Matching with Graph Neural Networks (CVPR 2020, Oral)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Easily compute clip embeddings and build a clip retrieval system with them
An open source implementation of CLIP.
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)
DocILE: Document Information Localization and Extraction Benchmark
QuadTree Attention for Vision Transformers (ICLR2022)
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
Training and evaluating NBM and SPAM for interpretable machine learning.
🐍 Geometric Computer Vision Library for Spatial AI
Official repository for "Revisiting Weakly Supervised Pre-Training of Visual Perception Models". https://arxiv.org/abs/2201.08371.
Code for Recall@k Surrogate Loss with Large Batches and Similarity Mixup, CVPR 2022.
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning
Implementation of [Understanding and Improving Kernel Local Descriptors](https://arxiv.org/abs/1811.11147) using PyTorch.
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Fast and memory-efficient exact attention