-
Notifications
You must be signed in to change notification settings - Fork 106
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
24 changed files
with
574 additions
and
616 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
from .base import Driver | ||
from .callback_paddle import PaddleCallback | ||
from .event import Event | ||
from .log_event import LogEventManager |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
### 模型信息 | ||
#### 模型介绍 | ||
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions | ||
of tokens, and show that it is possible to train | ||
state-of-the-art models using publicly available datasets exclusively, without resorting | ||
to proprietary and inaccessible datasets. In | ||
particular, LLaMA-13B outperforms GPT-3 | ||
(175B) on most benchmarks, and LLaMA65B is competitive with the best models, | ||
Chinchilla-70B and PaLM-540B. We release | ||
all our models to the research community1 | ||
. | ||
|
||
Please refer to this paper for a detailed description of LLaMA1: | ||
[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) | ||
|
||
#### 模型代码来源 | ||
Paddle case代码来源: | ||
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/llama licensed under the Apache License, Version 2.0. | ||
|
||
|
||
#### 数据集 | ||
##### 测试数据集下载地址 | ||
测试数据集中提供了处理好的100k条doc的训练样本: | ||
``` | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_ids.npy | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_idx.npz | ||
``` | ||
|
||
##### 预处理 | ||
> 无需预处理 | ||
#### 模型checkpoint | ||
Paddle通过`model_name_or_path = "facebook/llama-13b"`自动加载 llama1-13b 模型参数。参数数:13B | ||
Paddle case的 LLaMA 模型的权重的使用则需要遵循[License](../../paddlenlp/transformers/llama/LICENSE)。 | ||
|
||
### 框架与芯片支持情况 | ||
| | Pytorch |Paddle|TensorFlow2| | ||
| ---- | ---- | ---- | ---- | | ||
| Nvidia GPU |N/A |✅ |N/A| | ||
| 天数智芯 |N/A | N/A |N/A| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
/ssd2/laixinyi/projects/FlagPerf/training/benchmarks/llama1_7B/paddle |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
### 模型信息 | ||
#### 模型介绍 | ||
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions | ||
of tokens, and show that it is possible to train | ||
state-of-the-art models using publicly available datasets exclusively, without resorting | ||
to proprietary and inaccessible datasets. In | ||
particular, LLaMA-13B outperforms GPT-3 | ||
(175B) on most benchmarks, and LLaMA65B is competitive with the best models, | ||
Chinchilla-70B and PaLM-540B. We release | ||
all our models to the research community1 | ||
. | ||
|
||
Please refer to this paper for a detailed description of LLaMA1: | ||
[LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) | ||
|
||
#### 模型代码来源 | ||
Paddle case代码来源: | ||
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm/llama licensed under the Apache License, Version 2.0. | ||
|
||
|
||
#### 数据集 | ||
##### 测试数据集下载地址 | ||
测试数据集中提供了处理好的100k条doc的训练样本: | ||
``` | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_ids.npy | ||
wget https://bj.bcebos.com/paddlenlp/models/transformers/llama/data/llama_openwebtext_100k_idx.npz | ||
``` | ||
|
||
##### 预处理 | ||
> 无需预处理 | ||
#### 模型checkpoint | ||
Paddle通过`model_name_or_path = "facebook/llama-7b"`自动下载并加载 llama1-7b 的模型参数。参数数:7B。 | ||
Paddle case的 LLaMA 模型的权重的使用则需要遵循[License](../../paddlenlp/transformers/llama/LICENSE)。 | ||
|
||
### 框架与芯片支持情况 | ||
| | Pytorch |Paddle|TensorFlow2| | ||
| ---- | ---- | ---- | ---- | | ||
| Nvidia GPU |N/A |✅ |N/A| | ||
| 天数智芯 |N/A | N/A |N/A| |
Oops, something went wrong.