FlagAI-Open
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 1 deletion b/‎.gitignore‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎README_zh.md‎
Lines changed: 2 additions & 1 deletion b/‎README_zh.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎doc_zh/TUTORIAL_15_BERT_EXAMPLE_TITLE_GENERATION.md‎
Lines changed: 1 addition & 1 deletion b/‎doc_zh/TUTORIAL_15_BERT_EXAMPLE_TITLE_GENERATION.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc_zh/TUTORIAL_21_OPTIMIZER.md‎
Lines changed: 54 additions & 0 deletions b/‎doc_zh/TUTORIAL_21_OPTIMIZER.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎doc_zh/TUTORIAL_3_MODEL.md‎
Lines changed: 1 addition & 1 deletion b/‎doc_zh/TUTORIAL_3_MODEL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/TUTORIAL_21_OPTIMIZER.md‎
Lines changed: 57 additions & 0 deletions b/‎docs/TUTORIAL_21_OPTIMIZER.md‎
Lines changed: 57 additions & 0 deletions
diff --git a/‎docs/TUTORIAL_3_MODEL.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/TUTORIAL_3_MODEL.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/bert_title_generation_english/generate.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/bert_title_generation_english/generate.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/bert_title_generation_english/train.py‎
Lines changed: 0 additions & 1 deletion b/‎examples/bert_title_generation_english/train.py‎
Lines changed: 0 additions & 1 deletion
@@ -27,8 +27,9 @@ datasets
 qqp
 glm_large_qqp_pytorch
 wandb
+clip_benchmark_datasets
 examples/AltCLIP/clip_benchmark_datasets
 examples/glm_pretrain/data.lazy
 examples/glm_pretrain/examples/glm_pretrain/data.lazy
 examples/vit_cifar100/cifar100
-examples/vit_cifar100/data
+examples/vit_cifar100/data
@@ -15,11 +15,12 @@ FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensibl
 
 * These models can be applied to (Chinese/English) Text, for tasks like text classification, information extraction, question answering, summarization, and text generation.
 
-* FlagAI is backed by the three most popular data/model parallel libraries — [PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)/[BMTrain](https://github.com/OpenBMB/BMTrain) — with seamless integration between them. Users can parallel their training/testing process with less than ten lines of code.
+* FlagAI is backed by the four most popular data/model parallel libraries — [PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)/[BMTrain](https://github.com/OpenBMB/BMTrain) — with seamless integration between them. Users can parallel their training/testing process with less than ten lines of code.
 
 The code is partially based on [GLM](https://github.com/THUDM/GLM), [Transformers](https://github.com/huggingface/transformers)，[timm](https://github.com/rwightman/pytorch-image-models) and [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM).
 
 ## News
+- [2 Mar 2023] release v1.6.1, Support Galactica model [#234](https://github.com/FlagAI-Open/FlagAI/pull/234); BMInf, a low-resource inference package [#238](https://github.com/FlagAI-Open/FlagAI/pull/238), and examples for p-tuning [#227](https://github.com/FlagAI-Open/FlagAI/pull/238)
 - [12 Jan 2023] release v1.6.0, support a new parallel lib called [**BMTrain**](https://github.com/OpenBMB/BMTrain) and integate [**Flash Attention**](https://github.com/HazyResearch/flash-attention) to speedup training of Bert and Vit models, examples in [FlashAttentionBERT](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/bert_title_generation_english/train_flash_atten.py) and [FlashAttentionViT](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/vit_cifar100/train_single_gpu_flash_atten.py). Also add the contrastive search based text generation method [**SimCTG**](https://github.com/yxuansu/SimCTG) and DreamBooth finetuning based on AltDiffusion, examples in [AltDiffusionNaruto](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/AltDiffusion/dreambooth.py). 
 - [28 Nov 2022] release v1.5.0, support 1.1B [**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP) and [ALM: A large Arabic Language Model based on GLM], examples in [**ALM**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/ALM)
 - [10 Nov 2022] release v1.4.0, support [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679v1), examples in [**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP) and [**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion)
@@ -259,6 +260,6 @@ The majority of FlagAI is licensed under the [Apache 2.0 license](LICENSE), howe
 ### &#8627; Star History
 <div align="center">
 
-[![Star History Chart](https://api.star-history.com/svg?repos=FlagAI-Open/FlagAI&type=Date)](https://star-history.com/#baaivision/EVA&Date)
+![Star History Chart](https://api.star-history.com/svg?repos=FlagAI-Open/FlagAI&type=Date)]
 
 </div>
@@ -15,12 +15,13 @@
 
 * 这些模型可以应用于文本，用于文本分类、信息提取、问答、摘要、文本生成等任务，尤其是中文。
 
-* 飞智由三个最流行的数据/模型并行库（[PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)/[BMTrain](https://github.com/OpenBMB/BMTrain)）提供支持，它们之间实现了无缝集成。 你可以用不到十行代码来并行你的训练/测试过程。
+* 飞智由四个最流行的数据/模型并行库（[PyTorch](https://pytorch.org/)/[Deepspeed](https://www.deepspeed.ai/)/[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)/[BMTrain](https://github.com/OpenBMB/BMTrain)）提供支持，它们之间实现了无缝集成。 你可以用不到十行代码来并行你的训练/测试过程。
 
 
 本项目的部分代码基于[GLM](https://github.com/THUDM/GLM)，[Transformers](https://github.com/huggingface/transformers)，[timm](https://github.com/rwightman/pytorch-image-models) 和 [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM).
 
 ## 动态
+- [2 Mar 2023] 支持v1.6.1版本, 增加Galactica模型 [#234](https://github.com/FlagAI-Open/FlagAI/pull/234), 大模型推理的低资源工具包BMInf [#238](https://github.com/FlagAI-Open/FlagAI/pull/238), 以及P-tuning样例 [#227](https://github.com/FlagAI-Open/FlagAI/pull/238)
 - [12 Jan 2023] 发布v1.6.0版本, 新增支持并行训练库 [**BMTrain**](https://github.com/OpenBMB/BMTrain) 以及集成 [**Flash Attention**](https://github.com/HazyResearch/flash-attention) 到 Bert 和 Vit 模型提速端到端训练, 示例见 [FlashAttentionBERT](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/bert_title_generation_english/train_flash_atten.py)和 [FlashAttentionViT](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/vit_cifar100/train_single_gpu_flash_atten.py). 同时增加了基于对比搜索的文本生成方法 [**SimCTG**](https://github.com/yxuansu/SimCTG) 以及基于 AltDiffusion 进行 DreamBooth 个性化微调, 示例见 [AltDiffusionNaruto](https://github.com/FlagAI-Open/FlagAI/blob/master/examples/AltDiffusion/dreambooth.py). 
 - [28 Nov 2022] 发布v1.5.0版本, 支持1.1B参数的 [**EVA-CLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/EVA_CLIP) 以及[ALM: 基于GLM的阿拉伯语大模型], 示例见[**ALM**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/ALM)
 - [10 Nov 2022] 发布v1.4.0版本, 支持[AltCLIP: 更改CLIP中的语言编码器以扩展语言功能](https://arxiv.org/abs/2211.06679v1), 示例见[**AltCLIP**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltCLIP)以及[**AltDiffusion**](https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion)
 
@@ -24,7 +24,7 @@
 ### 1. 数据加载
 样例数据位于 /examples/bert_title_generation/data/
 
-需要在 ```trianer.py```文件中定义数据读取过程，例如：
+需要在 ```trainer.py```文件中定义数据读取过程，例如：
 ```python
 def read_file():
     src = []
 
@@ -0,0 +1,54 @@
+# 如何使用优化器
+
+## 优化器是什么?
+在机器学习和深度学习的语境下，
+优化器（Optimizer）是指用于更新模型参数的算法或方法，以便最小化预测输出和实际输出之间的误差。
+
+优化器的目标是找到最优的参数组合，以在给定任务上获得最佳性能。
+这个过程通常在机器学习模型的训练阶段执行。
+
+优化器通过计算损失函数相对于模型参数的梯度，并使用这些信息来更新参数，以减少损失。
+有多种可用的优化算法，例如随机梯度下降（SGD）、Adagrad、Adam、RMSprop等，每种算法都有其优点和缺点。
+
+优化器的选择取决于特定问题、数据集的大小、模型的复杂性和其他因素。
+一个好的优化器可以显著提高模型的训练速度和准确性。
+
+
+
+
+## 加载优化器
+
+### 依赖
+#### adan
+```
+python3 -m pip install git+https://github.com/sail-sg/Adan.git
+```
+#### lion
+```
+$ pip install lion-pytorch
+```
+#### lamb
+```
+$ pip install torch_optimizer
+```
+#### 例子
+```python
+>>> # currently FlagAI support adam, adamw, lion, adan, adafactor and lamb, which can be defined by setting optimizer_type when defining Trainer
+>>>     trainer = Trainer(env_type='pytorch',
+>>>                   epochs=1,
+>>>                   batch_size=2,
+>>>                   eval_interval=100,
+>>>                   log_interval=10,
+>>>                   experiment_name='glm_large_bmtrain',
+>>>                   pytorch_device='cuda',
+>>>                   load_dir=None,
+>>>                   lr=1e-4,
+>>>                   num_gpus = 1,
+>>>                   weight_decay=1e-2,
+>>>                   save_interval=1000,
+>>>                   hostfile='./hostfile',
+>>>                   training_script=__file__,
+>>>                   deepspeed_config='./deepspeed.json',
+>>>                   optimizer_type='lion') #load optimizer
+```
+
@@ -20,7 +20,7 @@
 ## From_pretrain
 
 `From_pretrain` 函数用于加载模型。同一个模型结构的模型可以用同一个class进行加载，比如`BERT-base-ch` 和`Roberta-base-ch`模型都能用`BertModel`这个`Class`进行加载。`From_pretrain`为了数据/模型并行的模型加载进行了特定优化，避免重复下载导致的资源浪费。
-通过调用`ClassName.from_pretrian()`来进行加载.
+通过调用`ClassName.from_pretrain()`来进行加载.
 ### 从modelhub加载
 现在我们支持从modelhub中下载[常用模型](#所有支持模型)，可以直接通过`from_pretrain`下载模型配置文件`config.json`，模型权重`pytorch_model.bin`，以及字典文件`vocab.txt`。例子：
 ```python
 
@@ -0,0 +1,57 @@
+# How to use Optimizer
+
+## What is Optimizer?
+In the context of machine learning and deep learning, 
+an optimizer is an algorithm or method used to update the parameters of a model in order to minimize the error between the predicted output and the actual output.
+
+The goal of an optimizer is to find the optimal set of parameters that can achieve the best performance on a given task. 
+This process is typically performed during the training phase of a machine learning model.
+
+Optimizers work by computing the gradients of the loss function with respect to the model parameters, 
+and using this information to update the parameters in the direction that reduces the loss. 
+There are various optimization algorithms available, 
+such as stochastic gradient descent (SGD), Adagrad, Adam, RMSprop, and more, each with their own advantages and disadvantages.
+
+The choice of optimizer depends on the specific problem, the size of the dataset, 
+the complexity of the model, and other factors. 
+A good optimizer can significantly improve the training speed and accuracy of a model.
+
+
+
+
+## Loading optimizer
+
+### dependencies
+#### adan
+```
+python3 -m pip install git+https://github.com/sail-sg/Adan.git
+```
+#### lion
+```
+$ pip install lion-pytorch
+```
+#### lamb
+```
+$ pip install torch_optimizer
+```
+#### example
+```python
+>>> # currently FlagAI support adam, adamw, lion, adan, adafactor and lamb, which can be defined by setting optimizer_type when defining Trainer
+>>>     trainer = Trainer(env_type='pytorch',
+>>>                   epochs=1,
+>>>                   batch_size=2,
+>>>                   eval_interval=100,
+>>>                   log_interval=10,
+>>>                   experiment_name='glm_large_bmtrain',
+>>>                   pytorch_device='cuda',
+>>>                   load_dir=None,
+>>>                   lr=1e-4,
+>>>                   num_gpus = 1,
+>>>                   weight_decay=1e-2,
+>>>                   save_interval=1000,
+>>>                   hostfile='./hostfile',
+>>>                   training_script=__file__,
+>>>                   deepspeed_config='./deepspeed.json',
+>>>                   optimizer_type='lion') #load optimizer
+```
+
@@ -25,7 +25,7 @@ All supported models now support the three most common model types [encoder, dec
 
 ### load model from modelhub
 
-By calling `ClassName.from_pretrian()` to load following [supported models](#all-supported-models), it will automatically download the model configuration file `config.json`, model weights `pytorch_model.bin`, and dictionary files `vocab .txt`.
+By calling `ClassName.from_pretrain()` to load following [supported models](#all-supported-models), it will automatically download the model configuration file `config.json`, model weights `pytorch_model.bin`, and dictionary files `vocab .txt`.
 
 ```python
 >>> # Downloading GLM-large-ch from modelhub
 
@@ -7,7 +7,7 @@
 
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
-model_dir = "../state_dict/"
+model_dir = "./checkpoints/"
 
 # Note "./checkpoints_seq2seq/{}/mp_rank_00_model_states.pt", {} is a directory in the checkpoints_seq2seq.
 model_save_path = "./checkpoints_seq2seq/7079/mp_rank_00_model_states.pt"
 
@@ -1,7 +1,6 @@
 # Copyright © 2022 BAAI. All rights reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License")
-import sys
 import os
 import torch
 from torch.utils.data import Dataset
Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,6 @@`
`1`	`1`	`# Copyright © 2022 BAAI. All rights reserved.`
`2`	`2`	`#`
`3`	`3`	`# Licensed under the Apache License, Version 2.0 (the "License")`
`4`		`-import sys`
`5`	`4`	`import os`
`6`	`5`	`import torch`
`7`	`6`	`from torch.utils.data import Dataset`