-
Notifications
You must be signed in to change notification settings - Fork 2.1k
ChunkTeacher no longer requires num_examples/num_episodes to be correct #3681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome. should we require a user to implement num_examples
now? (setting it as an abstractmethod requires the implementation)
# (This isn't relevant for the training loop, which loops for ever and | ||
# never "epochs"). | ||
retval, fake_epoch_done = super().next_example() | ||
real_epoch_done = retval.is_padding() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this works OK with dynamic batch padding, since that occurs at a level higher than the teacher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this has been tested with all variants of batching and ensured that all examples are seen.
You still have to implement num_examples, as that determines the behavior of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
Patch description
For as long as we've had ChunkTeacher, it's required that the number of episodes and examples be exactly reported correctly, or else it would cause a hang. With this patch, we loosen that requirement.
Methodology:
train:ordered
was previously cycling, when it shouldn't.Testing steps
Manual testing.
New CI tests that stand up chunk teachers that lie about their number examples/episodes, and check that the number of unique messages seen is exactly correct.