-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scan loss stuck, performs worse than pretext #113
Comments
Since I don't know any details about your dataset (e.g. imbalance), I can't help much. We only provide results for CIFAR, STL and ImageNet. Try experimenting with these datasets first to know how the loss should behave. Play around with the entropy weight and also see if only updating the clustering head helps. |
@wvangansbeke, really appreciate you keeping up with the repository and replying. Great paper and very clear code. My dataset is quite imbalanced. ~ 250000 samples, 5 classes, the largest class has ~ 150000 samples, the smallest about 10,000 samples. The images are of athletes in different uniforms, so the main difference is the clothes they wear. I change the augmentations to not do color augmentation since it is crucial here. By your intuition, what should the entropy weight be for such an imbalanced dataset? The pretext part worked like a charm. Really high accuracy pretty quickly. The Good idea about about setting I'll write an update here after I do some more experiments. |
You might want to reduce the entropy weight in this case. Set it to 1 or 2. Note that we applied stronger augmentations for the SCAN part (strong color augmentations etc., see randaugment.py). Not sure if it breaks things for your datasets. Also note that you can't really compare knn accuracy (top20) and classification accuracy directly. Due the heavy imbalance you will have to try out a few things. If the knn's are good, try overclustering. |
@wvangansbeke , what do you mean by Is there any other place where augmentations are happening? |
Yes, I meant |
Thanks @wvangansbeke! One more question: I was doing some more debugging and found out the following. Two of my classes are somewhat similar. So let's out of five classes, Class 1 is the biggest one, Class 4 is similar to Class 2, and Class 5 is similar to Class 3. What happens with the So all of this is making sense to me now. I wonder how I can "encourage" the model to not create superclusters and try to separate the other two classes from similar ones. If you have any tips or suggestions, I would appreciate it. Thanks for all your help! |
My suggestion is to overcluster (eg. go to 10 or 20 clusters) and then merge them manually. |
Thanks @wvangansbeke! In your paper you mention the following:
I couldn't find more details on implementation, so I wanted to check on how you approached over clustering in CIFAR10 or STL10. Let's say you try to cluster CIFAR10 into 20 clusters. Do you remove all validation in this scenario, since you do not have true labels for 20 clusters? (In that case you wouldn't be able to pick the best model, so I'm surprised the performance doesn't decrease.) Or do you just randomly split every true cluster into 2 fake clusters? Like say you have 100 images of the cluster "dog". Do you just split it into 2 clusters "dog-1" and "dog-2" randomly 50/50 or just cluster without any validation checks? Another question was at which point, did you merge the classes manually? To me, it seems like the best would be to manually merge them after |
Hi @wvangansbeke , I've been experimenting with I printed out the number of high probability samples per class after every epoch: Epoch 2:
|
Hey @mazatov , I'm curious to know what worked for you? I am facing a similar issue of SCAN loss plateauing and sometimes starting from a negative value and plateauing at 0. My dataset classes are balanced so not sure what the problem could be. |
Also, @wvangansbeke, I loved this work and loved your implementation equally! I feel it's done beautifully, took me quiet sometime to understand the work. I have a couple of questions related to the SCAN loss (including but not limited to the scope of OP's problem). While we are trying to maximize the dot product of a sample and its neighbor, why not also try to minimize the dot product of an image and another image not in the neighbor set? Basically like contrastive learning, also trying to push apart images from different classes Also another question about SCAN loss: I'm not sure if this is a problem, so do let me know what you think. |
@akshay-iyer , I couldn't do |
Hi @mazatov ! |
Sorry not. For me, I was purposely clustering on more clusters to get the small clusters working.So before self-labeling step I needed to manually merge classes. Never figured out how to run the last step if the number of classes is changed. |
|
Hello,
I'm trying to train the model on my own dataset. I successfully trained the pretext model with very good top20 accuracy (95%, the dataset is pretty simple). However, when I run scan.py the loss gets stuck without any improvement and the final performance is pretty bad (56%). I wonder what could go wrong in
scan.py
for the loss to get stuck like that? The only things I changed nin the config file were the number of clusters and the crop size.I also wonder if I should be changing anything here.
The text was updated successfully, but these errors were encountered: