预训练资料准备问题
#511
Replies: 1 comment
-
预训练语料不需要特殊处理。文档之间你可以用一个空行隔开。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我想请问一下关于预训练资料集该如何准备,我现在想用wiki资料进行pre-train,在资料整备的方面,假设我今天有10篇文章,请问这10篇文章是直接写入txt就好,还是第一笔文章和第二笔文章之间需要用什么符号做区隔?
第二個問題是,同一個文章可能會因為段落換行,這個換行會影響模型的訓練嗎?
Beta Was this translation helpful? Give feedback.
All reactions