Too lazy to organize my desktop, make gpt + BLIP-2 do it
-
Updated
Oct 24, 2023 - Python
Too lazy to organize my desktop, make gpt + BLIP-2 do it
caption generator using lavis and argostranslate
Caption images across your datasets with state of the art models from Hugging Face and Replicate!
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.
The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
A true multimodal LLaMA derivative -- on Discord!
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Add a description, image, and links to the blip2 topic page so that developers can more easily learn about it.
To associate your repository with the blip2 topic, visit your repo's landing page and select "manage topics."