Switch to (Re)TestItems #262

ToucheSir · 2023-12-11T06:50:03Z

The impetus for this PR was twofold:

Make use of the dedicated Julia test action as mentioned in use julia-actions/cache in CI #260 (comment).
Migrate off ReTest, which has not seen any activity in 2 years.

Along the way, I found some additional changes which could either be tackled here or in a follow-up PR:

We don't actually test any CUDA code on GPU CI! Buildkite merely runs the CPU test suite end-to-end, which feels wasteful
We don't have any tests which involve GPU arrays
We weren't testing WideResNet on GHA (fixed)
We aren't testing 1.6 outside of doctests

My feeling is that we'd want to set aside a subset of faster tests for 1.6/nightly/GPU CI. Maybe the smallest variant of each model. Then we can decrease our overall runtime while expanding our version matrix to cover everything we probably should've been covering.

PR Checklist

Tests are added
Documentation, if applicable

test/Project.toml

test/convnet_tests.jl

We can guarantee these test images will always be available, which is not the case for the current sample image.

ToucheSir · 2023-12-13T04:29:56Z

Still needs a bit of work for GPU CI on 1.6 and nightly (possibly disabling the latter for now), but this is mostly good to go.

Some timings:

Group	Model	GHA time (PR)	Buildkite time (master)
AlexNet\|VGG	AlexNet	47.5s	47.3s
AlexNet\|VGG	VGG	2m56.7s	6m54.1s
GoogLeNet\|SqueezeNet\|MobileNet\|MNASNet	GoogLeNet	5m03.4s	9m56.8s
	SqueezeNet	32.3s	46.9s
	MobileNet	32.4s + 52.1s + 2m32.0s	5m29.0s + 57.7s + 13.2s + 18.1s
	MNASNet	1m38.3s	5m29.0s + 57.7s + 13.2s + 18.1s
EfficientNet	EfficientNet	8m00.6s + 6m46.4s	6m51.6s + 4m42.0s
ResNet\|WideResNet	ResNet	16m36.8s	16m41.3s
ResNet\|WideResNet	WideResNet	1m49.0s	2m41.8s
ResNeXt	ResNeXt	8m37.2s	4m40.0s
SEResNet\|SEResNeXt	SEResNet	7m25.8s	6m25.1s
SEResNet\|SEResNeXt	SEResNeXt	2m20.6s	2m35.5s
Res2Net\|Res2NeXt	Res2Net	8m49.7s	7m53.6s
Res2Net\|Res2NeXt	Res2NeXt	34.3s	23.7s
Inception	Inception	7m31.9s = 2m12.7s + 1m34.7s + 2m26.1s + 1m10.6s	6m52.9s
DenseNet	DenseNet	8m56.4s	8m48.7s
Unet	Unet	3m44.7s	3m21.8s
ConvNeXt\|ConvMixer	ConvNeXt	4m40.1s	6m06.1s
ConvNeXt\|ConvMixer	ConvMixer	3m43.5s	4m16.3s
MLP-Mixer\|ResMLP\|gMLP	MLP-Mixer	3m10.9s	3m39.8s
	ResMLP	1m58.4s	2m31.7s
	gMLP	1m58.9s	2m19.5s
ViT	ViT	1m34.4s	1m02.0s

It appears we spend a lot of time compiling, as evidenced by the large time savings when similar models are run one after another. ViTs are an outlier despite their relative runtime slowness because they use the (type unstable under AD) Vector Chain. I wonder if we should explore expanding that to more models or having a serious look into other ideas for reducing compile times, but that's a discussion for another PR.

theabhirath · 2023-12-14T03:33:26Z

It appears we spend a lot of time compiling, as evidenced by the large time savings when similar models are run one after another. ViTs are an outlier despite their relative runtime slowness because they use the (type unstable under AD) Vector Chain. I wonder if we should explore expanding that to more models or having a serious look into other ideas for reducing compile times, but that's a discussion for another PR.

During my GSoC, we explored this and I had noticed that when training, the Vector Chain gave me extremely bumpy loss curves – one of the reason we removed them from 0.7 to 0.8. A lot of this can come back slowly if we train more to isolate the exact problem, I think.

ToucheSir · 2023-12-15T00:25:01Z

With the renewed interest in #198 (comment), now may be the time to revisit what's causing these mysterious instabilities during training. Shall we continue the discussion there?

test/model_tests.jl

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

`reclaim` to load the CUDA driver and fails otherwise

50% per worker so we avoid

ToucheSir · 2023-12-23T01:04:17Z

Ok, Buildkite is happy and so am I. This should be good to go. We now should have a pretty good picture of what works and doesn't on GPU too!

Switch to (Re)TestItems

29ab7f8

darsnack reviewed Dec 11, 2023

View reviewed changes

test/Project.toml Outdated Show resolved Hide resolved

Fixup deps and use TestItemRunner on 1.6

9904b41

ToucheSir force-pushed the bc/testitems branch from f3dba17 to 68b31c6 Compare December 11, 2023 23:49

Remove incompatible dep on 1.6 CI

61abd23

ToucheSir force-pushed the bc/testitems branch from 68b31c6 to 61abd23 Compare December 11, 2023 23:57

darsnack reviewed Dec 12, 2023

View reviewed changes

test/convnet_tests.jl Outdated Show resolved Hide resolved

darsnack and others added 2 commits December 11, 2023 19:47

Update convnet_tests.jl

7b67124

Add GPU path, fast path, parallelism and 1.6

2eac7f4

ToucheSir force-pushed the bc/testitems branch from 29397bf to 2eac7f4 Compare December 12, 2023 04:33

fixup test item name groups

4dfdbd8

darsnack closed this Dec 12, 2023

darsnack reopened this Dec 12, 2023

ToucheSir added 3 commits December 12, 2023 17:09

Use TestImages to avoid flaky download

0a57226

We can guarantee these test images will always be available, which is not the case for the current sample image.

fixup gradient call

046603d

Fixup tests and tweak env var handling

da5be3f

Proper test group regexes

096025c

darsnack reviewed Dec 15, 2023

View reviewed changes

test/model_tests.jl Outdated Show resolved Hide resolved

ToucheSir and others added 2 commits December 14, 2023 19:33

wrong env var name

13b0d24

Co-authored-by: Kyle Daruwalla <daruwalla.k.public@icloud.com>

Mark tests broken on 1.6 + CUDA

eee59a9

ToucheSir force-pushed the bc/testitems branch from b00c6c6 to eee59a9 Compare December 15, 2023 04:10

More broken GPU tests and better GPU memory cleanup

96bbc70

ToucheSir mentioned this pull request Dec 22, 2023

Add training benchmarking script #264

Draft

ToucheSir added 4 commits December 21, 2023 20:42

Don't reclaim in tests on non-GPU systems

b1ed20d

`reclaim` to load the CUDA driver and fails otherwise

missed broken test

99ca13a

1.6 GPU works better than 1.7+???

6112dc7

another missed test

574c55e

ToucheSir added 4 commits December 22, 2023 07:59

reduce worker count

64418b2

unbroken test

672d579

try memory limit

fc4e3a7

50% per worker so we avoid

maybe the memory limit doesn't work

929aca0

darsnack approved these changes Dec 23, 2023

View reviewed changes

ToucheSir merged commit b2ec1d6 into master Dec 23, 2023
41 of 42 checks passed

ToucheSir deleted the bc/testitems branch December 23, 2023 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to (Re)TestItems #262

Switch to (Re)TestItems #262

ToucheSir commented Dec 11, 2023

ToucheSir commented Dec 13, 2023

theabhirath commented Dec 14, 2023

ToucheSir commented Dec 15, 2023

ToucheSir commented Dec 23, 2023

Switch to (Re)TestItems #262

Switch to (Re)TestItems #262

Conversation

ToucheSir commented Dec 11, 2023

PR Checklist

ToucheSir commented Dec 13, 2023

theabhirath commented Dec 14, 2023

ToucheSir commented Dec 15, 2023

ToucheSir commented Dec 23, 2023