Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CherryPick] Donot save optimizer #8001

Closed
wants to merge 86 commits into from

Conversation

JunnYu
Copy link
Member

@JunnYu JunnYu commented Feb 22, 2024

PR types

#7978

PR changes

Description

新增忽略保存lr和optim的可选项,默认值为False,跟以前一样,只有想要使用的时候指定开启。

DesmonDay and others added 30 commits January 2, 2024 17:24
* [AutoParallel] Auto Trans PP to VPP

* update pp scheduler config

* add comment
* [CI] set codecov status check

* update

* [CI] adjust codeocov target
* Update trainer.md

---------

Co-authored-by: DrownFish19 <DrownFish19@gmail.com>
* update mem from B to MB

* fix ft

* fix pretrain

* Revert "update mem from B to MB"

This reverts commit 044a88c.
* Update release.yml to release tags

* Update release.yml

* Update release.yml
)

* sp for static llama

* sp for static llama1

* code style

* change script
* support dynamic src_length

* revert max_position_embedding

* update doc

* update flask_server

* update max_length control

* update request flask_server

* fix max-position-embeddings

* update error message

* update predictor length init
* fix use_unified_checkpoint defined

* add n1c2 test into ci_case.sh

* move test_unified_checkpoint to tests

* add tests/trainer into testpaths

* remove unifiedcheckpoint case from ci_case.sh
* try fix

* fix hf download bug ...

* update config download bug

* fix

* add subfolder

* update

* 优先级,先本地,再builtin,再aistudio,再hf hub,再bos

* 更新chattemplate文件检索路径

* update

* fix subfolder && add tests

* fix

* update

* fix tokenizer_config_file_dir_list

* subfolder test

* fix from_pretrained() load hf sharded model

* 更新逻辑

* update use_safetensors

* update

* fix resolve_weight_file_from_hf_hub

* 更新bos旧的下载方式

* update download from hf hubgit add .

* update logging

* update

* 关闭代理

* update

* update

* fix image process

---------

Co-authored-by: CrazyBoyM <ai-lab@foxmail.com>
Co-authored-by: Ke Bai <35400185+CrazyBoyM@users.noreply.github.com>
* update faiss

* update faiss

* update faiss
)

* init qwen inference model

* fix name

* fix hidden dim

* fix dtype

* fix length

* fix attention_mask

* fix up & gate dtype bug

* fix ffn1 weight

* modify codes

* remote unused variable

* remove unused code

* add qwen weight only

* format with black

* format with isort

* fix dtype

* add qwen inference model in static graph

* add qwen unittest

* format with black

* print log

* remove print

* set safetensors usage False

* remove tests

* Empty-Commit
* pipeline parallel benchmark

* add seed setting

* fixed
* supprt qlora pp

* fix scale dtype
…addlePaddle#7768)

* add parse_json_file_and_cmd_lines

* change unit test file path

* Change the way the JSON file is determined

* Merge parameter parsing judgment branches and add comments.

* remove the special handling of output_dir

* Add remaining_args warning
)

* support blha and cache kv quant

* lint

* fix unit test

* fix infer when blha is on

* code refine

* add docs and fix ops

* merge blha read res in predictor

* finish docs

* add docs and unittest

* add unittest

* migrate read res
ziangqin-baidu and others added 27 commits January 26, 2024 14:24
* add qwen & baichuan into CE

* add Qwen & Baichuan into CE, cleaned for PR

* add only Qwen into CE, cleaned for PR

* dd only Qwen into CE, open switch_ir_optim, cleaned for PR

* add only Qwen into CE, keep switch_ir_optim open, add ce script
* Hackathon TASK73 ToT

1. finish meta/llama2 version

* update readme tutorial

* modify according to Lint

* modify according Link

1. resolve one unused variable

* Delete LICENSE

* Update LICENSE

* black format

* isort format

* Update search_crosswords-dfs.ipynb

* update files formats

* Update LICENSE

* Update LICENSE

* Update LICENSE

* Update LICENSE

* delete test data

* delete some unnecessary files

1. delete some unnecessary files according to comments.

* add paddlenlp-llama2

1. add llama2 in paddlenlp

* fix one bug

* fix outputs bug

1. format data structure

* delete meta/llama2

* modify according to comments

1. add acknow into readme
2.change png into url in readme
3. add all the models supported by paddlenlp

* change according to comments

* Delete .gitignore

* Create .gitignore

* Move directory

* Add tree of thoughts scripts

* add first dir

* add note

* Update README.md

add test results of facebook/llama-2-7b-chat and llama-2-13b-chat

* Update requirements.txt

delete unnecessary packages

* Update demo.py

add Ernie

* Update .gitignore

delete pyproject.toml

* Update run.py

add Ernie

* Update __init__.py

add Ernie

* chat templates

* add Ernie

* Update llama.py

兼容Ernie

* Update bfs.py

兼容Ernie

* Update models.py

兼容Ernie

* Update run.py

* format style

* format style

* format style

* format style

* format style

* format style

* format style

* format style

* 删掉重复的“测试结果”

* 删除Ernie的token,设置环境变量解决

* format style

* format style

* 删除注释掉的代码

---------

Co-authored-by: root <root@tutu-win.localdomain>
* add auto_tuner

* fix

* update log_file

* update json

* close eval/predict

* fix run_mode

* update

* fix

* Revert "fix"

This reverts commit e526c86.

* Revert "update"

This reverts commit 9cbd773.

* update prepare

* Revert "Revert "update""

This reverts commit 811b6a4.

* Revert "Revert "fix""

This reverts commit 32cc005.

* update finetune prepare

* update

* add

* update sft/lora steps

* update json

* update

* add benchmark

* update years

* update a100
* add qwen benchmark

* update qwen benchmark scripts

* qwen 7b benchmark

* arg change

* fix wrong args

* fix args

* update
* add sharding_v2 case

* update run_mode to device_num

* fix

* fix
* fix logger level

* fix training args logger level
* RuntimeTimer for the toolekit

* RuntimeTimer for the toolekit

* reformat

* fix timer and load checkpoints

* remove reset
…ddlePaddle#7885)

* support semi-auto trainer and fit Llama2 training

* support shard_dataloader in dynamic semi-auto

* rewrite traning loop

* refactor traning loop

* refine args of auto trainer

* broadcast loss

* add auto ci cases
* gqa fuse attention qkv

* add annotation for the fusion
* rename files and add readme for llama auto_parallel

* rename files and add readme for llama auto_parallel

* fix ci
…2static.utils_helper` (PaddlePaddle#7989)

* fix bugs

* add try import to support develop and release
* add semi-autoparallel amp

* support amp in semi-auto

* change loss base

* polish
Copy link

paddle-bot bot commented Feb 22, 2024

Thanks for your contribution!

@JunnYu JunnYu closed this Feb 22, 2024
@JunnYu JunnYu deleted the donot_save_optimizer branch February 22, 2024 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.