Stars
A latent text-to-image diffusion model
Google Research
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
This repository contains implementations and illustrative code to accompany DeepMind publications
PRML algorithms implemented in Python
LAVIS - A One-stop Library for Language-Vision Intelligence
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
links to conference publications in graph-based deep learning
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Single Shot MultiBox Detector in TensorFlow
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
Easily compute clip embeddings and build a clip retrieval system with them
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Text To Video Synthesis Colab
Simple image captioning model
Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
This repository is intended to host tools and demos for ActivityNet
deforum / stable-diffusion
Forked from CompVis/stable-diffusionOpenAI CLIP text encoders for multiple languages!
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
The source code for paper "Deep Image Spatial Transformation for Person Image Generation"
Code for CVPR'19 paper Linkage-based Face Clustering via GCN