fixes to accelerator so that iterable pretraining datasets work #1759

winglian · 2024-07-17T00:23:09Z

…raining

* fixes to accelerator so that iterable pretraining datasets work * fix the pretraining test params * split batches, not dispatch batches needs to be set * update c4 datasets * set epochs in pretrain config test * need to set both split_batches and dispatch_batches to false for pretraining * fix bool val in comment

winglian added 4 commits July 17, 2024 07:18

fixes to accelerator so that iterable pretraining datasets work

1be7132

fix the pretraining test params

02d99af

split batches, not dispatch batches needs to be set

69d827e

update c4 datasets

9466e68

winglian force-pushed the llama-pretrain-fix branch from 6e80490 to 9466e68 Compare July 17, 2024 11:18

winglian added 2 commits July 17, 2024 08:08

set epochs in pretrain config test

c7f1776

need to set both split_batches and dispatch_batches to false for pret…

e757bad

…raining

winglian added the ready to merge label Jul 17, 2024

fix bool val in comment [skip ci]

c32aa18

winglian merged commit 976f851 into main Jul 17, 2024

winglian deleted the llama-pretrain-fix branch July 17, 2024 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fixes to accelerator so that iterable pretraining datasets work #1759

fixes to accelerator so that iterable pretraining datasets work #1759

Uh oh!

winglian commented Jul 17, 2024

Uh oh!

Uh oh!

Uh oh!

fixes to accelerator so that iterable pretraining datasets work #1759

fixes to accelerator so that iterable pretraining datasets work #1759

Uh oh!

Conversation

winglian commented Jul 17, 2024

Uh oh!

Uh oh!