-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
When I was examining the source code of TokenCountBatchingStrategy, I identified an issue with the batching and saving process in the batch() method.
When creating a new instance of currentBatch, the currentSize is reset to 0, but the Document documents traversed during the current iteration will be added to the new instance of currentBatch
This results in a situation where, during the loop, each time the currentSize is reset, one token length of the current iteration through the Document is not accumulated
coding node:
for (Map.Entry<Document, Integer> entry : documentTokens.entrySet()) {
Document document = entry.getKey();
currentSize += entry.getValue();
if (currentSize > this.maxInputTokenCount) {
batches.add(currentBatch);
currentBatch = new ArrayList<>();
currentSize = 0;
}
currentBatch.add(document);
}
I apologize for my poor English. This is the bug description I generated through machine translation
