-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-shot performance about YOLOWorldPromptDetector #154
Comments
Hi @taofuyu, you need to freeze all parameters (backbone, head, and neck) except the embeddings. However, I need to double-check whether all layers are frozen. |
Ok, I will have a try and update the result. |
You can evaluate the 4-category detection and 3-category detection separately and then perform the joint evaluation. |
But, parameters of backbone, head, and neck are all frozen, and the only updated parameters 'embeddings' are not saved into hard disk (during inference, the pre-computed embedding file is still used), so it seems there is nothing changed in the model ? |
It seems to validate my idea. After running 10 epochs now, the model can only detect 'car', which appears in the pre-trained datasets, other new categories can not be detected (can be detected when not freeze the model) |
@taofuyu Do you know the difference between all_fine_tuning and prompt tuning? I'm not clear about the config file of all_fine_tuning |
you can compare these two files, by VSCode or something. |
@Hudaodao99 It's my fault, I should start a branch to avoid misleading. The prompt tuning only optimizes the embeddings while the |
|
Thanks for your answer! |
@taofuyu I'll check it. |
@taofuyu I met the same problem. But in prompt tuning on my custom dataset(10 class), I find if I write the number of prompt text less than 10, it has error. like this: (I just write 2 prompt text, which all are not in my dataset, but class is more than 2) class= [1 2 4 4 3 6] Have you met the same question? |
detections的结果还是config里embeddings\num_classes的设置的样子,而texts是你命令行里直接输入的,数量不一样就导致维度不匹配了。 |
Thanks! |
I attempt to find a way out this issue thus going to learn more about OVD algorithms. In MM-grounding-DINO, it mentions that close-set fine-tuing will lose OVD generality. |
Furthermore, it mentions that, |
We did not expect it, the original intention of prompt tuning is to retain the zero-shot capability and generalization and to achieve stronger performance on custom datasets. |
thanks, I already changed the lr to 2e-4 during my fine-tuning. |
@wondervictor Hi! I'm not quite sure what the difference is between the purpose of |
@taofuyu 你好,请问你把学习率调整为2e-4后微调效果如何呢?能否解决微调后失去开集检测能力的问题呢? |
我也有同样的 问题,我在本地微调自己的数据集之后,自己的数据集20个类,每个类有不同的text prompt,我想在微调自己数据集之后,保留原始预训练权重的clip的zeroshot能力。但是似乎结果不是这样的。比如常用的peroson、people、human都可以检测,但是自己的 数据集中,不同文本就检测不了。 |
@mio410 No |
Hi @taofuyu, @xiyangyang99, @Hudaodao99, and @mio410, sorry for the delay. I'll check it and provide solutions asap. Please stay tuned and please let me know if you have any updates. |
Can separate inference solve the problem. It comes to me that some interference between each prompts may cause the problem.@taofuyu |
sorry, could you please explain this in detail ? |
One text prompt may interfere the inference process of the other, you can refer to the text-guided CSPlayer in the paper. I would also like use the prompt tuning technic, hope to solve this issue. like mentioned in : |
@taofuyu any update?in case you don't notice the answer above. |
@Yindong-Zhang, ongoing |
I think, just tuning custom data with GoldG is fine. Model can detect custom categories and retain OVD ability at the same time. |
Adding VG(or GoldG) for fine-tuning does maintain the zero-shot performance. I'm now seeking more efficient ways such as regularization for efficient fine-tuning. |
Hi all, Thank you! |
tuning custom data with GoldG |
does mean the grounding dataset is key to build open-vocabulary/zero-shot ability. |
Yes, I think so |
你好,我使用COCO+GQA来进行微调,但是遇到一个问题,无论如何设置参数,训练了几个epoch之后,grad_norm开始变得很大,loss也变得很大,随后就一直为0,想请教下这是什么原因? |
我使用的配置文件如下: base = ('../../third_party/mmyolo/configs/yolov8/' hyper-parametersnum_classes = 80 load_from = '/mnt/sdc/lishen/yolo-world-model/yolo_world_v2_l_obj365v1_goldg_cc3mlite_pretrain-ca93cd1f.pth' model = dict( text_transform = [ train_pipeline = [ mg_train_dataset = dict(type='YOLOv5MixedGroundingDataset', coco_train_dataset = dict( type='MultiModalDataset', train_dataloader = dict(batch_size=train_batch_size_per_gpu, test_pipeline = [ val_dataloader = dict(dataset=coco_val_dataset) val_evaluator = dict(delete=True, default_hooks = dict( optim_wrapper = dict(optimizer=dict( |
您好,您的意思是只使用带有goldG的配置文件训练自定义数据集就可以了吗,比如yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_1280ft_lvis_minival.py 不需要其他数据集比如flik,QGA这些数据集混合训练? |
goldG就是flickr那几个grounding数据集的总称 |
config看着没什么问题,具体原因不太清楚了。。 |
你好,这里的embeddings指的是text embeddings嘛?他是如何更新的呢?通过I-Pooling Attention? |
@taofuyu Hello, I also want to add the gold dataset to keep zero-shot, but don't know how to set it up. Would you like to show me your relevant profile? If you can, I hope you send it to this email: mr.pengc@foxmail.com. Thank you very much for sharing
|
I rush into the same question like before, #71 , #78 .
I modify the config in configs/prompt_tuning_coco/, generate custom embedding file, to fine-tune my dataset which has 4 categories.
When inference, I generate a new embedding file which has 7 categories(4 old classes seen in training and 3 new classes) and replace the old embedding file in the config.
These 3 new classes CAN NOT be detected, even setting score threshold to 0.01
It seems like losing open-vocabulary/zero-shot ability.
The text was updated successfully, but these errors were encountered: