DetGPT: Detect What You Need via Reasoning

News

[2023-05-09] We have launched our project website.
[2023-05-08] The first version of DetGPT is available now! Try our demo.

Online Demo

Due to high website traffic, we have created multiple online services. If one link is not working, please use another one. Thank you for your support!

Examples

Features

DetGPT locates target objects, not just describing images.
DetGPT understands complex instructions, like "Find blood pressure-reducing foods in the image."
DetGPT accurately localizes target objects via LLM reasoning. - For example, it can identify bananas as a potassium-rich food to alleviate high blood pressure.
DetGPT provides answers beyond human common sense, like identifying unfamiliar fruits rich in potassium.

Setup

1. Installation

git clone https://github.com/OptimalScale/DetGPT.git
cd DetGPT
conda create -n detgpt python=3.9 -y
conda activate detgpt
pip install -e .

2. Install GroundingDino

python -m pip install -e GroundingDINO

2. Download the pretrained checkpoint

Our model is based on pretrained language model checkpoints. In our experiments, we use Robin from LMFlow team, and Vicuna and find they perform competitively well. You can run following script to download the Robin checkpoint:

cd output_models
bash download.sh all
cd -

Merge the robin lora model with the original llama model and save the merged model to output_models/robin-7b, where the corresponding model path is specified in this config file here.

To obtain the original llama model, one may refer to this doc. To merge a lora model with a base model, one may refer to PEFT or use the merge script provided by LMFlow.

Training

The code will be released soon.

Deploy Demo Locally

Run the demo by executing the following command. Replace 'path/to/pretrained_linear_weights' in the config file to the real path. We currently release linear weights based on Vicuna-13B-v1.1 and will release other weights later. The demo runs on 2 GPUs by default, one for the language model and another for GroundingDino.

CUDA_VISIBLE_DEVICES=0,1 python demo_detgpt.py --cfg-path configs/detgpt_tasktune_13b_coco.yaml

Acknowledgement

The project is built on top of the amazing open-vocabulary detector GroundingDino and multimodal conversation model MiniGPT-4, which is based on BLIP2 and Lavis. Thanks for these great work!

If you're using DetGPT in your research or applications, please cite using this BibTeX:

 @misc{detgpt2023,
    title = {DetGPT: Detect What You Need via Reasoning},
    url = {to be finished},
    author = {Pi, Renjie and Gao, Jiahui and Diao, Shizhe and Pan, Rui and Dong, Hanze and Zhang, Jipeng and Yao, Lewei and Kong, Lingpeng and Zhang, Tong},
    month = {May},
    year = {2023}
}

License

This repository is released under BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
GroundingDINO		GroundingDINO
assets		assets
configs		configs
dataset		dataset
detgpt.egg-info		detgpt.egg-info
detgpt		detgpt
examples		examples
output_models		output_models
prompts		prompts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo_detgpt.py		demo_detgpt.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DetGPT: Detect What You Need via Reasoning

News

Online Demo

Examples

Features

Setup

Training

Deploy Demo Locally

Acknowledgement

License

About

Releases

Packages

Languages

License

zhangzhao4444/DetGPT

Folders and files

Latest commit

History

Repository files navigation

DetGPT: Detect What You Need via Reasoning

News

Online Demo

Examples

Features

Setup

Training

Deploy Demo Locally

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages