Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033

bzuzo · 2024-01-19T02:44:27Z

between release b1618 and b1680 train-text-from-scratch broke.
compute_size returns exactly the same size as ggml_allocr_max_size which of course is not possible.

I experienced this when trying to train and saw that it has been reported a few places - one being - #4791

current master as of jan 18 2024
main: compute_size = 140477909098560 bytes (133970176.0 MB)
main: evaluation order = RIGHT_TO_LEFT
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

After the fix back to normal and reasonable size

main: input_size = 131076128 bytes (125.0 MB)
main: compute_size = 701759840 bytes (669.3 MB)
main: evaluation order = LEFT_TO_RIGHT
main: tokenize training data
tokenize_file: total number of samples: 27520
main: number of training tokens: 27584
main: begin training
main: work_size = 768376 bytes (0.7 MB)

ggerganov · 2024-01-19T12:49:44Z

Instead of this, let's implement the fix from this comment: #4791 (comment)

If you can confirm it works in your case, we can merge

bzuzo · 2024-01-19T18:14:33Z

okay, I'll try that suggestion and

Instead of this, let's implement the fix from this comment: #4791 (comment)

If you can confirm it works in your case, we can merge

I implemented the fix as suggested, it works.

main: compute_size = 701759840 bytes (669.3 MB)
main: evaluation order = LEFT_TO_RIGHT
main: tokenize training data
tokenize_file: total number of samples: 755165
main: number of training tokens: 755229
main: begin training
main: work_size = 768376 bytes (0.7 MB)
train_opt_callback: iter= 256 sample=4113/755165 sched=0.862144 loss=0.000000 |->
train_opt_callback: iter= 257 sample=4129/755165 sched=0.861124 loss=5.805915 dt=00:00:04 eta=00:19:10 |->

* Fix issue with alloc causing max_compute_size to be calculated * remove ggml_allocr_free as suggested in issue ggerganov#4791

Fix issue with alloc causing max_compute_size to be calculated

f783c59

bzuzo changed the title ~~Fix issue with alloc causing max_compute_size to be calculated~~ Fix issue #4791 with alloc causing max_compute_size to be calculated Jan 19, 2024

bzuzo changed the title ~~Fix issue #4791 with alloc causing max_compute_size to be calculated~~ Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump Jan 19, 2024

remove ggml_allocr_free as suggested in issue ggerganov#4791

38f2e6e

ggerganov approved these changes Jan 19, 2024

View reviewed changes

ggerganov merged commit 381ee19 into ggerganov:master Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033

Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033

bzuzo commented Jan 19, 2024 •

edited

Loading

ggerganov commented Jan 19, 2024

bzuzo commented Jan 19, 2024

Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033

Fix issue #4791 alloc causes compute_size to be calculated incorrectly in train-text-from-scratch, end result core dump #5033

Conversation

bzuzo commented Jan 19, 2024 • edited Loading

ggerganov commented Jan 19, 2024

bzuzo commented Jan 19, 2024

bzuzo commented Jan 19, 2024 •

edited

Loading