-
Notifications
You must be signed in to change notification settings - Fork 105
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix llama1-7b and llama1-13b readme [LLM] add llama1-13b pretrain [LLM] llama1-7b pretrain with callback
- Loading branch information
1 parent
45c4220
commit 3b0b6ae
Showing
26 changed files
with
633 additions
and
637 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
from .base import Driver | ||
from .callback_paddle import PaddleCallback | ||
from .event import Event | ||
from .log_event import LogEventManager |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
### 模型信息 | ||
#### 模型介绍 | ||
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community1. | ||
|
||
Please refer to this paper for a detailed description of LLaMA1: | ||
[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) | ||
|
||
#### 模型代码来源 | ||
Paddle case代码来源: | ||
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/llama licensed under the Apache License, Version 2.0. | ||
|
||
|
||
#### 数据集 | ||
##### 测试数据集下载地址 | ||
测试数据集中提供了处理好的openwebtext 100k条 doc的训练样本: | ||
``` | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_ids.npy | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_idx.npz | ||
``` | ||
|
||
##### 预处理 | ||
> 无需预处理 | ||
#### 模型实现 | ||
* 运行自动加载 | ||
|
||
#### 模型checkpoint | ||
* 运行自动下载,参数量:13B | ||
* Paddle的 LLaMA 模型的权重的使用则需要遵循[License](../../paddlenlp/transformers/llama/LICENSE)。 | ||
|
||
### 框架与芯片支持情况 | ||
| | Pytorch |Paddle|TensorFlow2| | ||
| ---- | ---- | ---- | ---- | | ||
| Nvidia GPU |N/A |✅ |N/A| | ||
| | | | | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
/ssd2/laixinyi/projects/FlagPerf/training/benchmarks/llama1_7B/paddle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
### 模型信息 | ||
#### 模型介绍 | ||
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community1. | ||
|
||
Please refer to this paper for a detailed description of LLaMA1: | ||
[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) | ||
|
||
#### 模型代码来源 | ||
Paddle case代码来源: | ||
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/llama licensed under the Apache License, Version 2.0. | ||
|
||
#### 数据集 | ||
##### 测试数据集下载地址 | ||
测试数据集中提供了处理好的openwebtext 100k条 doc的训练样本: | ||
``` | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_ids.npy | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_idx.npz | ||
``` | ||
|
||
##### 预处理 | ||
> 无需预处理 | ||
#### 模型实现 | ||
* 运行自动加载 | ||
|
||
#### 模型checkpoint | ||
* 运行自动下载,参数量:7B | ||
* Paddle的 LLaMA 模型的权重的使用则需要遵循[License](../../paddlenlp/transformers/llama/LICENSE)。 | ||
|
||
### 框架与芯片支持情况 | ||
| | Pytorch |Paddle|TensorFlow2| | ||
| ---- | ---- | ---- | ---- | | ||
| Nvidia GPU |N/A |✅ |N/A| | ||
| | | | | |
Oops, something went wrong.