-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Inceptionv4
, InceptionResNetv2
and Xception
#170
Conversation
1. Transition to `Inceptionv3` instead of `Inception3` for standardisation of names 2. Formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nearly ready right off the bat, nice job! Just a couple initial comments.
CI times are through the roof...we might have to disable gradtests again (or maybe enable it only for select models) |
...how on earth are the second CI times so much lesser than the first one? |
AIUI GHA runners are containerized but not otherwise isolated, so times can vary drastically depending on what else is running on the host. |
Inceptionv4
and InceptionResNetv2
Inceptionv4
, InceptionResNetv2
and Xception
Xception seems to have an even worse gradient benchmark: julia> x = rand(Float32, 299, 299, 3, 2);
julia> model = Xception();
julia> @benchmark Zygote.gradient(p -> sum($model(p)), $x)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 9.255 s (78.29% GC) to evaluate,
with a memory estimate of 10.65 GiB, over 1553316 allocations. |
baf311c
to
55565d8
Compare
1. Support `pretrain` for the Inception model APIs 2. Group deprecations in a single source file to make stuff more organised 3. Random formatting nitpicks 4. Use a plain broadcast instead of `applyactivation`
This should be ready now, although I'm not sure what exactly I am to make of the gradient times. I suppose they shouldn't block this PR, but is there something that could be a possible contributor that I can change? |
The memory problems return 😬 |
Not sure there's much we can do beyond what's already been tried in Metalhead. We just need to do something about |
418800b
to
af504d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some minor doc stuff
push!(layers, x -> relu.(x)) | ||
append!(layers, | ||
depthwise_sep_conv_bn((3, 3), inc, outc; pad = 1, bias = false, | ||
use_bn = (false, false))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case of (true, false)
or (false, true)
? If not, I was thinking it is kinda silly to use conv_bn
with the keyword use_bn = false
cause that's just Conv
. It makes sense for consistency with depthwise_sep_conv_bn
. But it might be cleaner to just introduce depthwise_sep_conv
then instead of more keywords in multiple places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I see something yet but I think this can be kept for some time...with all the churn over the next two months it might be handy. If not, I'll simplify it later
Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>
Ah sorry, I keep missing these minor doc tweaks 🤦🏽 I'll be a bit more thorough next time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! Thanks and nice job!
This PR is my first official contribution as a GSoC student 😁. It does a couple of things:
Inceptionv4
andInceptionResNetv2
). Internal docs for the layers of these models are left out because they are repetitive and don't really explain much - but if that's a concern I can always fill them in.Inception3
in favour ofInceptionv3
since the vX notation is being used for models in other libraries (as well as in Metalhead forMobileNetv3
) and so it makes sense to be explicit about it.Other things that I happened to fill in because I found them:
cat_channels
is now type-stable as it usesVal(3)
fordims
- thanks to @ToucheSir pointing that out in some conversationsNotes:
InceptionResNetv2
is absolutely insane, sometimes taking minutes. Subsequent gradients are quite slow as well (in comparison, ViTs, which are heavier models, seem to be faster) and also take a lot of memory. This might be helped by usingChain(::Vector)
but I haven't used that yet because I'm not sure if it is causing stability issues while training.Edit: added the
Xception
model as well