Scripts for to inference using Oscar in Image Captioning and VQA tasks
For inference One Image
Minimum VRAM = 6GB, You must run 'torch.cuda.empty_cache()' to flush gpu cache at every inference
Recommemd VRAM = 7GB or more
To Know How to use Oscar models see *.ipynb
Since scene_graph_benchmark repo, the vinvl encoder only support default cuda.
So if you want to use other cuda device.
You must change default cuda.
Insert the code below on your code.
import os
os.environ['CUDA_VISIBLE_DEVICES']='1'
Task | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | CIDEr |
---|---|---|---|---|---|
Ours+B(XE) | 72.7 | 54.6 | 36.9 | 23.0 | 118.0 |
Ours+L(XE) | 72.9 | 54.92 | 37.4 | 23.7 | 118.0 |
Ours+B(CIDEr) | 76.9 | 59.7 | 41.6 | 25.6 | 128.6 |
Ours+L(CIDEr) | 76.8 | 59.7 | 41.8 | 25.9 | 128.6 |
Oscar+ | - | - | - | 41.0 | 140.9 |
Task | ACC |
---|---|
Ours+ | 58.1 |
Oscar+ | 64.7 |