[CherryPick] Donot save optimizer #8001

JunnYu · 2024-02-22T04:16:13Z

PR types

#7978

PR changes

Description

新增忽略保存lr和optim的可选项，默认值为False，跟以前一样，只有想要使用的时候指定开启。

* [AutoParallel] Auto Trans PP to VPP * update pp scheduler config * add comment

* [CI] set codecov status check * update * [CI] adjust codeocov target

* Update trainer.md --------- Co-authored-by: DrownFish19 <DrownFish19@gmail.com>

* update mem from B to MB * fix ft * fix pretrain * Revert "update mem from B to MB" This reverts commit 044a88c.

* Update release.yml to release tags * Update release.yml * Update release.yml

) * sp for static llama * sp for static llama1 * code style * change script

* support dynamic src_length * revert max_position_embedding * update doc * update flask_server * update max_length control * update request flask_server * fix max-position-embeddings * update error message * update predictor length init

* fix use_unified_checkpoint defined * add n1c2 test into ci_case.sh * move test_unified_checkpoint to tests * add tests/trainer into testpaths * remove unifiedcheckpoint case from ci_case.sh

* try fix * fix hf download bug ... * update config download bug * fix * add subfolder * update * 优先级,先本地,再builtin,再aistudio,再hf hub,再bos * 更新chattemplate文件检索路径 * update * fix subfolder && add tests * fix * update * fix tokenizer_config_file_dir_list * subfolder test * fix from_pretrained() load hf sharded model * 更新逻辑 * update use_safetensors * update * fix resolve_weight_file_from_hf_hub * 更新bos旧的下载方式 * update download from hf hubgit add . * update logging * update * 关闭代理 * update * update * fix image process --------- Co-authored-by: CrazyBoyM <ai-lab@foxmail.com> Co-authored-by: Ke Bai <35400185+CrazyBoyM@users.noreply.github.com>

* update faiss * update faiss * update faiss

* fix shared weights sync * fix typo

) * init qwen inference model * fix name * fix hidden dim * fix dtype * fix length * fix attention_mask * fix up & gate dtype bug * fix ffn1 weight * modify codes * remote unused variable * remove unused code * add qwen weight only * format with black * format with isort * fix dtype * add qwen inference model in static graph * add qwen unittest * format with black * print log * remove print * set safetensors usage False * remove tests * Empty-Commit

* fix eval during pretrain

* pipeline parallel benchmark * add seed setting * fixed

* supprt qlora pp * fix scale dtype

…addlePaddle#7768) * add parse_json_file_and_cmd_lines * change unit test file path * Change the way the JSON file is determined * Merge parameter parsing judgment branches and add comments. * remove the special handling of output_dir * Add remaining_args warning

) * support blha and cache kv quant * lint * fix unit test * fix infer when blha is on * code refine * add docs and fix ops * merge blha read res in predictor * finish docs * add docs and unittest * add unittest * migrate read res

)

* add qwen & baichuan into CE * add Qwen & Baichuan into CE, cleaned for PR * add only Qwen into CE, cleaned for PR * dd only Qwen into CE, open switch_ir_optim, cleaned for PR * add only Qwen into CE, keep switch_ir_optim open, add ce script

* Hackathon TASK73 ToT 1. finish meta/llama2 version * update readme tutorial * modify according to Lint * modify according Link 1. resolve one unused variable * Delete LICENSE * Update LICENSE * black format * isort format * Update search_crosswords-dfs.ipynb * update files formats * Update LICENSE * Update LICENSE * Update LICENSE * Update LICENSE * delete test data * delete some unnecessary files 1. delete some unnecessary files according to comments. * add paddlenlp-llama2 1. add llama2 in paddlenlp * fix one bug * fix outputs bug 1. format data structure * delete meta/llama2 * modify according to comments 1. add acknow into readme 2.change png into url in readme 3. add all the models supported by paddlenlp * change according to comments * Delete .gitignore * Create .gitignore * Move directory * Add tree of thoughts scripts * add first dir * add note * Update README.md add test results of facebook/llama-2-7b-chat and llama-2-13b-chat * Update requirements.txt delete unnecessary packages * Update demo.py add Ernie * Update .gitignore delete pyproject.toml * Update run.py add Ernie * Update __init__.py add Ernie * chat templates * add Ernie * Update llama.py 兼容Ernie * Update bfs.py 兼容Ernie * Update models.py 兼容Ernie * Update run.py * format style * format style * format style * format style * format style * format style * format style * format style * 删掉重复的“测试结果” * 删除Ernie的token，设置环境变量解决 * format style * format style * 删除注释掉的代码 --------- Co-authored-by: root <root@tutu-win.localdomain>

)

* add auto_tuner * fix * update log_file * update json * close eval/predict * fix run_mode * update * fix * Revert "fix" This reverts commit e526c86. * Revert "update" This reverts commit 9cbd773. * update prepare * Revert "Revert "update"" This reverts commit 811b6a4. * Revert "Revert "fix"" This reverts commit 32cc005. * update finetune prepare * update * add * update sft/lora steps * update json * update * add benchmark * update years * update a100

* add qwen benchmark * update qwen benchmark scripts * qwen 7b benchmark * arg change * fix wrong args * fix args * update

* add sharding_v2 case * update run_mode to device_num * fix * fix

* fix logger level * fix training args logger level

* RuntimeTimer for the toolekit * RuntimeTimer for the toolekit * reformat * fix timer and load checkpoints * remove reset

fix timer device

…ddlePaddle#7885) * support semi-auto trainer and fit Llama2 training * support shard_dataloader in dynamic semi-auto * rewrite traning loop * refactor traning loop * refine args of auto trainer * broadcast loss * add auto ci cases

* gqa fuse attention qkv * add annotation for the fusion

* rename files and add readme for llama auto_parallel * rename files and add readme for llama auto_parallel * fix ci

…stage3 (PaddlePaddle#7969) * turn of uc when sharding stage3 * fix

…2static.utils_helper` (PaddlePaddle#7989) * fix bugs * add try import to support develop and release

* add semi-autoparallel amp * support amp in semi-auto * change loss base * polish

paddle-bot · 2024-02-22T04:16:18Z

Thanks for your contribution!

DesmonDay and others added 30 commits January 2, 2024 17:24

Add unified checkpoint training args doc (PaddlePaddle#7756)

5f5cb52

[AutoParallel] Auto Trans PP to VPP (PaddlePaddle#7747)

38f792e

* [AutoParallel] Auto Trans PP to VPP * update pp scheduler config * add comment

Add codecov check (PaddlePaddle#7760)

a2e5cf5

* [CI] set codecov status check * update * [CI] adjust codeocov target

[CE] Delete gpt for sequence classification (PaddlePaddle#7757)

0a23bbb

[DOC] Update trainer.md (PaddlePaddle#7761)

34b381e

* Update trainer.md --------- Co-authored-by: DrownFish19 <DrownFish19@gmail.com>

Change version to 2.7.0 (PaddlePaddle#7764)

f720794

[benchmark]close skip_memory_metrics for ips (PaddlePaddle#7732)

895a816

* update mem from B to MB * fix ft * fix pretrain * Revert "update mem from B to MB" This reverts commit 044a88c.

[Release] Update release.yml to release tags (PaddlePaddle#7765)

5c7efcc

* Update release.yml to release tags * Update release.yml * Update release.yml

[AutoParallel] Add Sequence Parallel for Static LLaMA (PaddlePaddle#7746

70c92e5

) * sp for static llama * sp for static llama1 * code style * change script

Fix unified_checkpoint bug (PaddlePaddle#7770)

7d789a8

* fix use_unified_checkpoint defined * add n1c2 test into ci_case.sh * move test_unified_checkpoint to tests * add tests/trainer into testpaths * remove unifiedcheckpoint case from ci_case.sh

fix dist dataloader eval (PaddlePaddle#7777)

4437198

Update convert_files_to_dicts_splitter (PaddlePaddle#7748)

17acf22

fix (PaddlePaddle#7781)

079f067

[Paddle-Pipelines] update faiss (PaddlePaddle#7793)

ff1e910

* update faiss * update faiss * update faiss

Fix shared weights sync for PipelineLayer (PaddlePaddle#7772)

487428b

* fix shared weights sync * fix typo

slow (PaddlePaddle#7798)

97f6158

[CE] Add CE for Distributed Hybrid Parallel (PaddlePaddle#7782)

393ac18

[CE] Add MP2-SP2-pp4-vpp2-SD2-stage1-mbs2-acc8 ce (PaddlePaddle#7774)

fc6ab70

[Pretrain] Fix eval during pretrain (PaddlePaddle#7806)

dab175b

* fix eval during pretrain

pipeline parallel benchmark (PaddlePaddle#7759)

d51edd0

* pipeline parallel benchmark * add seed setting * fixed

fix br gradio (PaddlePaddle#7788)

3056226

delete useless code for write_cache_kv.cu (PaddlePaddle#7812)

75e8981

[llm]support qlora pp (PaddlePaddle#7801)

5c2bf81

* supprt qlora pp * fix scale dtype

[Bug Fix] fix paddle multipy_fwd_func warning message (PaddlePaddle#7818

abb0d3c

)

fix lora (PaddlePaddle#7824)

4069f22

ziangqin-baidu and others added 27 commits January 26, 2024 14:24

[CustomDevice] fix loading rng state on custom device (PaddlePaddle#7894

1f82403

)

[LLM] fix llama precision on custom devices (PaddlePaddle#7895)

95c0dd4

[RELEASE] Update README.md (PaddlePaddle#7834)

688adb8

add qwen benchmark (PaddlePaddle#7758)

2a5e5a6

* add qwen benchmark * update qwen benchmark scripts * qwen 7b benchmark * arg change * fix wrong args * fix args * update

trainer refactor (PaddlePaddle#7909)

a86de1b

[CE]add gpt sharding_v2 case (PaddlePaddle#7914)

5951179

* add sharding_v2 case * update run_mode to device_num * fix * fix

[Improvement] fix logger level (PaddlePaddle#7903)

9a31322

* fix logger level * fix training args logger level

RuntimeTimer for the toolkit (PaddlePaddle#7913)

6e0ac44

* RuntimeTimer for the toolekit * RuntimeTimer for the toolekit * reformat * fix timer and load checkpoints * remove reset

Trainer add wandb and tensorboard (PaddlePaddle#7863)

0d6fc1e

fix timer device (PaddlePaddle#7939)

44bfeb0

fix timer device

gqa fuse attention qkv (PaddlePaddle#7890)

c0c64fa

* gqa fuse attention qkv * add annotation for the fusion

rename files and add readme for llama auto_parallel (PaddlePaddle#7944)

8d6e813

* rename files and add readme for llama auto_parallel * rename files and add readme for llama auto_parallel * fix ci

skip some trainer test. (PaddlePaddle#7949)

a33ab37

[Unified checkpoint] Turn off unified checkpoint when using sharding …

b312959

…stage3 (PaddlePaddle#7969) * turn of uc when sharding stage3 * fix

Update text matching (PaddlePaddle#7973)

7b9cb9b

ignore_save_lr_and_optim

f03d61f

add npu fused and fix AICPU (PaddlePaddle#7976)

394a3e2

fix multi-node same output json written (PaddlePaddle#7977)

7e643ad

add swiglu operator (PaddlePaddle#7967)

4eb6f0a

[model_zoo/gpt-3] Fix bugs from PR-61236 which cleared `paddle.jit.dy…

5c9c8d3

…2static.utils_helper` (PaddlePaddle#7989) * fix bugs * add try import to support develop and release

只对finetune的测试开启ignore_save_lr_and_optim

71fe5d5

【AutoParallel】Add semi autoparallel amp (PaddlePaddle#7985)

b2be2fc

* add semi-autoparallel amp * support amp in semi-auto * change loss base * polish

Merge branch 'PaddlePaddle:develop' into donot_save_optimizer

f880265

JunnYu closed this Feb 22, 2024

JunnYu deleted the donot_save_optimizer branch February 22, 2024 04:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CherryPick] Donot save optimizer #8001

[CherryPick] Donot save optimizer #8001

JunnYu commented Feb 22, 2024

paddle-bot bot commented Feb 22, 2024

[CherryPick] Donot save optimizer #8001

[CherryPick] Donot save optimizer #8001

Conversation

JunnYu commented Feb 22, 2024

PR types

PR changes

Description

paddle-bot bot commented Feb 22, 2024