Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
between release b1618 and b1680 train-text-from-scratch broke.
compute_size returns exactly the same size as ggml_allocr_max_size which of course is not possible.
I experienced this when trying to train and saw that it has been reported a few places - one being - #4791
current master as of jan 18 2024
main: compute_size = 140477909098560 bytes (133970176.0 MB)
main: evaluation order = RIGHT_TO_LEFT
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
After the fix back to normal and reasonable size
main: input_size = 131076128 bytes (125.0 MB)
main: compute_size = 701759840 bytes (669.3 MB)
main: evaluation order = LEFT_TO_RIGHT
main: tokenize training data
tokenize_file: total number of samples: 27520
main: number of training tokens: 27584
main: begin training
main: work_size = 768376 bytes (0.7 MB)