Description
Issue encountered
I noticed that the greedy_until
function in TransformersModel
uses excessive padding. In my case, I have a test set where my largest input has 27k tokens but most of the inputs are under 8k tokens. The current implementation uses max_context_continuation_size_allowed
as the max_length
in the tokenizer, which corresponds to the number of tokens for the largest samples in the entire dataset plus the maximum number of output tokens. This unnecessarily increases the evaluation time.
Solution/Feature
Instead of using max_context_continuation_size_allowed
when tokenizing the batch contexts, it would be better to use something like this (untested):
largest_sample_in_batch = len(batch[0].tokenized_context)
max_generation_size = batch[0].generation_size if batch[0].generation_size else self.max_length - largest_sample_in_batch
max_length = min(largest_sample_in_batch + max_generation_size, self.max_length)
tokenized = self.tokenizer(
...
max_length=max_length # Only this needs to change
...
).to(self.device)
The calculations are essentially the same as the ones being done already in the code, only that we don't look at the first sample in the entire dataset but the first sample in the batch for determining the max_length
.
If you think this makes sense, I could open a pull request.