-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegNet in torchvision ? #2655
Comments
@blefaudeux thanks for the suggestion ! For which tasks you think about for this model: at least classification, right ?
Could you detail which use-cases ClassyVision implementation does not cover ? Would you like to draft a PR for that ? Otherwise, me or someone else can do that. |
Oh, I just meant that not everyone is using ClassyVision obviously, I for instance came across users telling me that they were sticking to EfficientNets or ResNets because they were only wiling to consider Torchvision.
I'm not sure what there is to know for a model to be supported by torchvision, apart from the raw code (which I can handle indeed, or anybody else, no preference). Are there some licence pre-requisites, pre-trained models, authorship constraints (validation from the original authors ?), things like that ? I don't have that much time right now, so if the requirements are clear (or minimal :)) I can handle that starting from the implementation in ClassyVision, else if there's some know-how required I would gladly stay around to assist but not do it myself |
Excellent question. I'd say we have to provide model's implementation and ImageNet pretrained weights. In the docstring we provide some information about the model, link to the paper etc. For example, MNasNet: vision/torchvision/models/mnasnet.py Lines 204 to 208 in 190a5f8
ImageNet pretrained weights are often coming from retraining, but there are cases when they were converted let's say from TF weights etc. If torchvision's implementation will be a copy from ClassyVision, maybe we can reuse their weights if there are ones... @fmassa can you also comment out this question, please ?
No worries. I send a PR where I can copy and adapt the implementation from ClassyVision and you could check if we are correct etc. |
I could provide reference weights from a ClassyVision ImageNet training, fairly easy to reproduce (if we're ok to limit this to some members of the RegNet family, probably not all of them :)). Another option is to translate the weights in the PyCls repo/model zoo, but that's some work because the model definition is not exactly the same (even if the actual underlying architecture is, of course). CC @mannatsingh from Classy |
Yes, all model weights are available in pycls, not sure how easy it is to convert them to classy vision format. Link: https://github.com/facebookresearch/pycls/blob/master/MODEL_ZOO.md |
@blefaudeux @pdollar thanks for the details ! So, we prefer to have here certain families of RegNet implemented as in Classy than the ones in pycls format right ? I let you define those families you think the most interesting for the users. It would be, probably, helpful for the future to add some info, here in torchvision, on the reason why we prefered one implementation (classy) to another (pycls) if they are a bit different... |
Sorry for the imprecision, I forgot that the context is clearly not trivial, trying to address that :
|
@blefaudeux thanks for the explanation ! I see better the context. Yes, this sounds good !
If you have an idea of which RegNetX, RegNetY it would be good to provide with pretrained weights. Maybe, those from the paper's tables 5 and 6 :
What do you think ? |
Looks good to me ! Probably worth it writing a small tool to transcribe the weights in between pycls and this more streamlined implementation, it's something I could do. Not sure whether @mannatsingh would have something handy around to help on that ? |
@blefaudeux I don't have anything available to convert pycls weights to classy, unfortunately. |
What is the motivation for only including a subset of the models? Is it because they have to be retrained? The reason I ask is that the benefit of RegNets is they give good accuracy models across a wide range of flop regimes, as opposed to say ResNet which typically is only optimized for a narrow range of 4GF-12GF (ResNet50-ResNet152). On the other hand RegNets can be good at very small sizes (200MF) and very large sizes (32GF). The very small and very large models are potentially the most interesting (for say mobile and state-of-the-art research). So I would advocate including the full range of models if possible. |
It may be better to figure out how to convert pycls weights to classy vision weights. i don't know classy vision well, but i can't imagine it being that hard?!? (famous last words :P ) |
@pdollar it's not so much of an issue with Classy actually, it's just that some names have changed in between the pycls and classy implementation of the RegNets, I should be able to fix that by loading/mapping names/saving again. I'm just a bit wary of something really subtle there, but from a distance it should not be too hard indeed and probably the best thing to do |
Hi, I'm not sure we would be ready to add RegNet to torchvision yet. We generally have a requirement of number of citations of the paper containing the model model before we include it in torchvision, similarly to what we do for PyTorch. We can reconsider this decision in 6 months. Users can obtain RegNet variants by using PyTorch Image Models https://github.com/rwightman/pytorch-image-models , |
Thanks for the context @fmassa . @blefaudeux one more thing to clarify, in your original note, you mention -
From what I understand, torchvision's implementations are even more strict (even fewer configuration parameters allowed, if any) - @fmassa can correct me if I'm wrong. Also, you should be able to generate any RegNet using the Classy implementation, so I wouldn't want anyone reading this issue to get the wrong impression :) |
@mannatsingh ah ok, that's not what I meant with I did not know about |
We try to keep the model implementations fairly simple -- the space of configurations for a model is potentially infinite, and trying to expose too many options can make things very hard to understand for users.
I agree that citations per se is not a perfect metric. But given the amount of research and activity around computer vision nowadays, with 100s of papers every year claiming SOTA, we need some metric to be able to define what should be in torchvision or not -- otherwise we will end up having 100s of models which are all respectively SOTA during their respective submission time, but being SOTA doesn't involve only architectural changes to the model but also to the training recipe. We will be adding more information about what are the criteria for a model / op to be include in torchvision in the CONTRIBUTING.md file, and we have an issue tracking it in #2651 , thanks for the discussion, let me us know if there are anything that you disagree / would like to add to the discussion. |
Let's keep this issue open for now to track RegNets |
ResNets are heavy in terms of compute. MobileNets have less compute, but heavy in terms of memory access (Depthwise layers). RegNets provide a good balance between these extremes - especially RegNetX. Highly recommend them. Whish to see them as part of torchvision. |
@mathmanu we are going to be adding RegNets in torchvision in the coming months |
Thank you. Note that RegNetY models are memory intensive due to the Squeeze-and-Exctication layers. Some of the advantage that RegNet provides will be removed by those layers. Doubling the memory transfer requirement to get a very small lift in accuracy is not a good tradeoff - especially for embedded devices. Hence I am also looking forward to torchvision having models without Squeeze-and-Exctication, say RegNetX. If possible lite versions on MobileNetV3 as well - as a reference tensorflow/models provide such models - they call it either minimialistic or lite. References: |
It's straight forward to extend the existing implementation to support the lite/minimalist versions of MobileNetV3. I think the most appealing reason to support it is that the minimalist version uses ReLU instead of hard swish and this plays nice with quantization. Based on the aforementioned references the minimalistic version exchanges 5.6 accuracy points for roughly a 23% speed improvement which is definitely non trivial. On the other hand, the specific versions are not described on the paper so the only reference for it is the official repo. To answer the question on whether we should add it or not, I think if there is enough demand from the community to support them we can do it. I personally don't have a use-case where it is required and I think there are other architectures we should offer before considering the minimalist MobileNetV3 models but I could be wrong. |
Another major industry player using RegNets :-D |
In the following weeks, I will be working on upstreaming RegNet from Classy Vision to TorchVision. |
Great to hear everyone, thank you :) |
@kazhang please feel free to get in touch me for any details / code reviews :) |
Not done yet. I'm still training the rest of the models. I will add the pretrained weights in the following days. |
Thank you @kazhang , great work ! |
Converting an existing model to a lite model seems to be pretty easy. Once the model is created, just search through the model and replace torchvision.ops.misc.SqueezeExcitation with torch.nn.Identity. We should also replace torch.nn.Hardswich by torch.nn.ReLU. In addition if we want to replace torch.nn.ReLU6 with torch.nn.ReLU that's also easy by the same method. It should it should be possible to create a utility function to transform any given model to a "lite" model. @datumbox @fmassa What do you think? That would make several embedded friendly models available in torchvision. We could also think about other transformations (for example replace heavy 3x3 non-grouped convolutions by 3x3-depthwise or grouped convolutions) - but we don't have to go that far in the initial implementation. |
@mathmanu If you were do something like that, you will have to retrain from scratch. You might be able to get away with a Personally I think such model surgeries are best served by writing custom code which meets your exact needs and utilizing PyTorch FX's replace_pattern to reduce the amount of code you write. |
🚀 Feature
Add RegNet trunks in torchvision
Motivation
RegNets were proposed in this paper https://arxiv.org/pdf/2003.13678.pdf, they're showing very interesting performance and speed. They have been open sourced already, but are not usable in a straightforward way for people used to having reference models in torchvision. Another implementation is available in ClassyVision (I'm a co-author of this one), but it does not cover all use cases.
Pitch
Start from the ClassyVision RegNet support and implement RegNets in torchvision.
Alternatives
Let users use RegNets from external implementations
Additional context
This has been discussed with @pdollar, one of the RegNet authors. CC @fmassa
cc @vfdev-5
The text was updated successfully, but these errors were encountered: