-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No shuffling of examples in introduction notebook #54
Labels
bug
Something isn't working
Comments
The validation procedure on unshuffled examples may also produce biased results, depending on how it was carried out, which could lead to sub-optimal results. |
To clarify further: the code in the notebook does not create the |
jfb54
added a commit
to cambridge-mlg/cnaps
that referenced
this issue
Dec 22, 2020
jfb54
added a commit
to cambridge-mlg/adv-fsl
that referenced
this issue
Dec 23, 2020
jfb54
added a commit
to cambridge-mlg/melloo
that referenced
this issue
Dec 29, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We realized that in the introduction notebook, the usage examples given for the
make_multisource_episode_pipeline
did not set theshuffle_buffer_size
parameter, which defaults to not shuffling examples within each class.Two unfortunate consequences we identified in code that would not shuffle examples are:
traffic_sign
dataset were overly optimistic, since the examples were organized as 30-image sequences of pictures from the same physical sign (successive frames from the same video), leading to support and query examples being more frequently really close.Code using the training loop of Meta-Dataset was not affected, since it gets its
shuffle_buffer_size
value from aDataConfig
object set from agin
configuration that is explicitly passed toTrainer
's constructor (inall.gin
andimagenet.gin
).We have mitigated the first point by updating the dataset conversion code to shuffle the
traffic_sign
images once (3512a82), and updated the notebook to show a better practice (c3f62a1), but existing datasets, and code inspired from the notebook (outside of this repository) are still impacted.Similarly,
make_multisource_batch_pipeline
does not pass ashuffle_buffer_size
, but the impact seems much smaller (batch training should be less sensitive to the order of examples, and the random mixing of different classes adds randomness already).The text was updated successfully, but these errors were encountered: