Open
Description
I hope you are doing well. I came across a reference to the "Web Pipeline" in the paper "Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research" and I am very interested in exploring it further. However, it seems that the pipeline is still in preparation. I would like to kindly inquire about the availability of the "Web Pipeline". Is there any information on when it might be released for public use?
Metadata
Metadata
Assignees
Labels
No labels