Skip to content
This repository was archived by the owner on Jul 18, 2024. It is now read-only.

[v1.0] refine Dockerfile and trigger DockerHub automatic publish #128

Merged
merged 2 commits into from
Dec 18, 2022
Merged

[v1.0] refine Dockerfile and trigger DockerHub automatic publish #128

merged 2 commits into from
Dec 18, 2022

Conversation

zigzagcai
Copy link
Contributor

No description provided.

@zigzagcai zigzagcai merged commit 6471f64 into intel:main Dec 18, 2022
@zigzagcai zigzagcai deleted the refine-dockerfile branch December 18, 2022 12:13
Peach-He added a commit to Peach-He/e2eAIOK that referenced this pull request Jan 9, 2023
…d test for DeNas (intel#154)

* fix dataset path issue for wnd (intel#105)

* fix model_path issue for dlrm (intel#109)

* add aidk cicd integrated test scripts (intel#108)

* add aidk cicd integrated test scripts

* refine aidk cicd scripts

* refine aidk cicd scripts

* refine aidk cicd integrated test scripts, add test for spark and recdp

* refine aidk cicd integrated test scripts for codestyle check

* refine cicd scripts and make dien full run with small dataset

* refine cicd scripts, save dlrm/dien/wnd full run results

* refine cicd scripts, make wnd full run with small dataset

* refine cicd scripts, add Docker file for package install and conf

* refine cicd test scripts and makr dlrm full run with small dataset

* Update README.md

* refine cicd test scripts to enable dlrm full run with small dataset

* refine cicd scripts

* refine cicd scripts

* Create README.md

* modify README.md for CICD (Test PR without Tag) (intel#111)

* modigy README.md for CICD

* Update README.md

* Jenkins CICD server has been linked with Github PR (intel#110)

* Update README.md for CICD

* Opensource code security related fix (intel#117)

* fix for rnnt

* command injection fix

* fix exclude model zoo

* fix exclude model zoo amend

* fix bandit

* revert model zoo checkmarx

* add rnnt patch

* Revert "add rnnt patch"

This reverts commit 07ec51396b0d0972dd4ab7b9227d4af781edd573.

* Revert "fix bandit"

This reverts commit 09ec4d8851fc5b922ac99813d31662fe7758ab7b.

* Use read-only in docker run (intel#113)

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Add CICD test for DLRM/WnD distributed training test (intel#118)

* add cicd test for DLRM/WnD distributed training test

* refine cicd for dlrm train script

* add Jenkinsfile for CI/CD distributed training

* refine jenkinsfile

* Update Jenkinsfile

* refine Jenkinsfile

* update jenkinfile

* update cicd config file

* refine CICD scripts for distributed train

* Update Jenkinsfile

* add sigopt switch for single-node dlrm/wnd/dien

* update jenkinsfile

* update jenkinsfile

* refine cicd scripts

* refine cicd scripts

* refine cicd scripts

* refine cicd scripts

* add dlrm torch1.10 test script

* refine dlrm torch1.10 test script

* update spark hostname

* refine cicd scripts

* add dlrm torch110 distributed training test script

* dlrm torch110 single test and distributed test scripts finished

* update cicd scripts

* update jenkinsfile,remove sensitive token info

* refine cicd scripts

* use Jenkinsfile for test config of single node training

* update jenkinsfile

* update test scripts and jenkinsfile

* update jenkinsfile

* update jenkinsfile, dafault not use sigopt

* update cicd scripts

* update JenkinsfileDistributed

* update jenkins file

* update jenkins file

* update jenkinsfiledistributed

* update jenkins file

* update jenkinsdistributed

* update jenkinsfile

* update jenkinsfile

* update cicd scripts

* update cicd scripts

* Refine CI/CD scripts (intel#125)

* reformat cicd scripts

* update Jenkinsfile

* update cicd scripts

* fix hydro interactive input bug

* update model reload script

* auto build and load docker

* update test dockerfile

* update jenkinsfile distributed

* update jenkinsfile distributed

* update jenkinsfile distributed

* refine cicd scripts

* update jenkinsfile distributed

* update jenkinsfile

* Update README.md

* refine cicd scripts

* update jenkins file

* add bert function (intel#120)

* add bert function

* add data and pre-trained models introduction in README

* add Bert Advisor function

* add jenkins_bert

* solve advisor model save path

* use save_path as model output_dir for storing model parameters

* add model evaluation function in the run_squad.py

* fix the predict file issues

* fix the f1 score save issues

* fix bert jenkins test issues

* add bert automated jenkins test function

* add jenkins test label

* add ignorefile and architecture pircture

* fix test_path error in the jenkins test

* renew the gitignore file

* add cicd bert conf

* refine bert jenkins test python path issues

* add result dir to save different models and remove rm ops

* readd architecture figure

* fix the bert multi-saved result path issues

* readd architecture.jpg

* change configure file for single process

* Remove credentials in burgurking (intel#128)

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [rnnt] add rnnt cicd (intel#126)

* add rnnt to model zoo

* add rnnt advisor

* launch rnnt with ipex launch script

* apply sigopt params

* add running script

* update dataset path

* persist training result

* add SDA example

* update conda env

* update launch scripts and disable ipex

* add training time threshold to stop training

* single node inference

* distribute inference

* training move model config to cmd args

* update rnnt advisor for training

* add inference

* move stack time factor to arguments

* save ckpt at last, save trace to output dir

* [inference] add ema, move model config to args

* [train] add config for rnnt advisor

* add gru support

* fix distribute log issue

* add gru to config

* add cicd

* revert changes

* update model save path

* add pytorch profiling check

* add log check for cicd

* add modelzoo for minigo (intel#116)

* add modelzoo for minigo workload

* Update dual_net.py

* Update build_gcc.sh

* update minigo README for performance dashboard

* Delete run_docker.sh

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Create build_env_script.sh

* Update build_env_script.sh

* Update README.md

* refine minigo rm operation, check dir before rm dir

* Update README.md

* Update README.md

* Update README.md

* add gitignore file for modelzoo/minigo

* clean up tmp files when train stopped

* compute winrate at each training loop

* [refined] compute winrate at each training iteration

* [fix] minigo can run with specified winrate requirement

* redefine max iteration number

* merge bert code and resolve conflicts

* Revert "merge bert code and resolve conflicts"

This reverts commit e2f31e434f506d8b639d471d7af919dee30b2b3a.

* remove not tracked files

* add modeladvisor for minigo

* update minigo advisor

* update minigo modeladvisor

* refine minigo model advisor, can run w/wo sigopt

* remove minigo original code, use git patch instead

* refine patch file for minigo

* update minigo model advisor and patch file

* add jenkinsfile for minigo auto-test config

* hydroai_defaults_minigo_example.conf

* update jenkinsfile

* update minigo cicd test script and jenkinsfile

* update jenkinsfile

* update jenkinsfile for minigo

* update jenkinsfile

* refine minigo patch file

* update jenkinsfile

* update minigo test script

* [fix] cleanWs before checkout scm

* [refine] use modelzoo/third_party/mlperf_v1.0 as submodule and apply minigo patch

* update minigo patch file

* Update Jenkinsfile

* Remove useless code for TwitterRecSys (intel#129)

* add dockerfile for base/tensorflow/pytorch1.10/pytorch envs (intel#121)

* add dockerfile for pytorch_1.10 env

* refine pytorch dockerfile

* dockerfile for pytorch_1.10 finished

* add dockerfile for base env, pytorch_1.10 env, torchccl env

* refine dockerfile for torchccl

* dockerfile dor torchccl finished

* refine torchccl dockerfile, validated in dlrm workload

* refine base/pytorch dockerfile

* dockerfile for base/tensorflow/pytorch1.10/torchccl

* update dockerfile for tensorflow

* dockerfile for tensorflow has been finished

* refine aidk dockerfiles

* refine aidk dockerfiles

* update tensorflow dockerfile

* refined dockerfiles, test with wnd/dien/dlrm passed

* reset entrypoint value

* add dockerfile README.md

* add ssh service for dockerfiles

* update jenkinsfile

* trigger test for dockerfile integration

* trigger test for rnnt and minigo

* update dockerfile ssh keys

* update jenkinsfile and minigo test script

* update jenkinsfile

* update README.md

* Move AIDK to second-level folder for python setup

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Reorg AIDK folder for pip install and add an development API

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* clean modelzoo for v0.2

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Clean SDA codes for removed models

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Remove conf and run scripts for v0.2 models

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [Minigo]third_party separation refine (intel#132)

* refine minigo patch script

* Update patch_minigo.sh

* refine minigo.patch

* [DIEN] make dien as third party + patch (intel#135)

* dien patch

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [dlrm]Add DLRM  patch for v0.1  (intel#134)

* Add dlrm model to v0.1 modelzoo

* remove dlrm folder

* modify dlrm advisor and dlrm model, test passed

* change the patch order

Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [wnd] patch for v0.1 (intel#133)

* separate wnd third party code

* fix format issue in patch

* pass model path

Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [Bert] patch for v0.2 (intel#136)

* add IntelAI model into modeldoo/thirdparty

* add bert patch

* fix the ip address used in run_bert

* refine patch

* refine bert patch shell

Co-authored-by: tianyi1 <liutianyi@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* add rnnt patch (intel#140)

* [v0.2 branch]Refine Dockerfile and CICD (intel#139)

* add IntelAI model into modeldoo/thirdparty

* add AIDK CLI custom_result_path support;
refactor CICD and Jenkinsfile;
integrate dockerfile with jenkins and enable all models;
refine Jenkins README.md and Dockerfile README.md

* add bert patch

* update wnd test conf

* [Minigo]third_party separation refine (intel#132)

* refine minigo patch script

* Update patch_minigo.sh

* refine minigo.patch

* fix the ip address used in run_bert

* refine patch

* refine bert patch shell

* remove untracked models such as dlrm_torch110

* remove sensitive information in Dockerfiles

* JenkinsfileDis

* update jenkinsfile and fix dockerfile

* update jenkinsfile distributed

* fix the patch issues by using a stable intel models version

* add the newly update bert.patch

* Update README.md

* Update README.md

* update README.md

Co-authored-by: tianyi1 <liutianyi@intel.com>
Co-authored-by: tianyi1 <tianyi.liu@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* add cicd test file for denas cnn/vit

* Update DeNas README.md

* Update DeNas README.md

* Update README.md

* [rnnt] add normalization (intel#157)

* add normalization and merge config

* add readme

* update model advisor

* fix cicd issue

* trigger cicd

* [v0.2 branch]Add ResNet model (intel#158)

* [v0.2 branch]Add ResNet model

* refine code

* direct conda environment

* Refine tensorflow dockerfile and run shell

Co-authored-by: Tao He <tao1.he@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Tianyi Liu <tianyi.liu@intel.com>
Co-authored-by: XinyaoWa <xinyao.wang@intel.com>
Co-authored-by: csdingbin <bin.ding@intel.com>
Co-authored-by: tianyi1 <liutianyi@intel.com>
Peach-He added a commit to Peach-He/e2eAIOK that referenced this pull request Jan 9, 2023
* fix dataset path issue for wnd (intel#105)

* fix model_path issue for dlrm (intel#109)

* add aidk cicd integrated test scripts (intel#108)

* add aidk cicd integrated test scripts

* refine aidk cicd scripts

* refine aidk cicd scripts

* refine aidk cicd integrated test scripts, add test for spark and recdp

* refine aidk cicd integrated test scripts for codestyle check

* refine cicd scripts and make dien full run with small dataset

* refine cicd scripts, save dlrm/dien/wnd full run results

* refine cicd scripts, make wnd full run with small dataset

* refine cicd scripts, add Docker file for package install and conf

* refine cicd test scripts and makr dlrm full run with small dataset

* Update README.md

* refine cicd test scripts to enable dlrm full run with small dataset

* refine cicd scripts

* refine cicd scripts

* Create README.md

* modify README.md for CICD (Test PR without Tag) (intel#111)

* modigy README.md for CICD

* Update README.md

* Jenkins CICD server has been linked with Github PR (intel#110)

* Update README.md for CICD

* Opensource code security related fix (intel#117)

* fix for rnnt

* command injection fix

* fix exclude model zoo

* fix exclude model zoo amend

* fix bandit

* revert model zoo checkmarx

* add rnnt patch

* Revert "add rnnt patch"

This reverts commit 07ec51396b0d0972dd4ab7b9227d4af781edd573.

* Revert "fix bandit"

This reverts commit 09ec4d8851fc5b922ac99813d31662fe7758ab7b.

* Use read-only in docker run (intel#113)

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Add CICD test for DLRM/WnD distributed training test (intel#118)

* add cicd test for DLRM/WnD distributed training test

* refine cicd for dlrm train script

* add Jenkinsfile for CI/CD distributed training

* refine jenkinsfile

* Update Jenkinsfile

* refine Jenkinsfile

* update jenkinfile

* update cicd config file

* refine CICD scripts for distributed train

* Update Jenkinsfile

* add sigopt switch for single-node dlrm/wnd/dien

* update jenkinsfile

* update jenkinsfile

* refine cicd scripts

* refine cicd scripts

* refine cicd scripts

* refine cicd scripts

* add dlrm torch1.10 test script

* refine dlrm torch1.10 test script

* update spark hostname

* refine cicd scripts

* add dlrm torch110 distributed training test script

* dlrm torch110 single test and distributed test scripts finished

* update cicd scripts

* update jenkinsfile,remove sensitive token info

* refine cicd scripts

* use Jenkinsfile for test config of single node training

* update jenkinsfile

* update test scripts and jenkinsfile

* update jenkinsfile

* update jenkinsfile, dafault not use sigopt

* update cicd scripts

* update JenkinsfileDistributed

* update jenkins file

* update jenkins file

* update jenkinsfiledistributed

* update jenkins file

* update jenkinsdistributed

* update jenkinsfile

* update jenkinsfile

* update cicd scripts

* update cicd scripts

* Refine CI/CD scripts (intel#125)

* reformat cicd scripts

* update Jenkinsfile

* update cicd scripts

* fix hydro interactive input bug

* update model reload script

* auto build and load docker

* update test dockerfile

* update jenkinsfile distributed

* update jenkinsfile distributed

* update jenkinsfile distributed

* refine cicd scripts

* update jenkinsfile distributed

* update jenkinsfile

* Update README.md

* refine cicd scripts

* update jenkins file

* add bert function (intel#120)

* add bert function

* add data and pre-trained models introduction in README

* add Bert Advisor function

* add jenkins_bert

* solve advisor model save path

* use save_path as model output_dir for storing model parameters

* add model evaluation function in the run_squad.py

* fix the predict file issues

* fix the f1 score save issues

* fix bert jenkins test issues

* add bert automated jenkins test function

* add jenkins test label

* add ignorefile and architecture pircture

* fix test_path error in the jenkins test

* renew the gitignore file

* add cicd bert conf

* refine bert jenkins test python path issues

* add result dir to save different models and remove rm ops

* readd architecture figure

* fix the bert multi-saved result path issues

* readd architecture.jpg

* change configure file for single process

* Remove credentials in burgurking (intel#128)

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [rnnt] add rnnt cicd (intel#126)

* add rnnt to model zoo

* add rnnt advisor

* launch rnnt with ipex launch script

* apply sigopt params

* add running script

* update dataset path

* persist training result

* add SDA example

* update conda env

* update launch scripts and disable ipex

* add training time threshold to stop training

* single node inference

* distribute inference

* training move model config to cmd args

* update rnnt advisor for training

* add inference

* move stack time factor to arguments

* save ckpt at last, save trace to output dir

* [inference] add ema, move model config to args

* [train] add config for rnnt advisor

* add gru support

* fix distribute log issue

* add gru to config

* add cicd

* revert changes

* update model save path

* add pytorch profiling check

* add log check for cicd

* add modelzoo for minigo (intel#116)

* add modelzoo for minigo workload

* Update dual_net.py

* Update build_gcc.sh

* update minigo README for performance dashboard

* Delete run_docker.sh

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Create build_env_script.sh

* Update build_env_script.sh

* Update README.md

* refine minigo rm operation, check dir before rm dir

* Update README.md

* Update README.md

* Update README.md

* add gitignore file for modelzoo/minigo

* clean up tmp files when train stopped

* compute winrate at each training loop

* [refined] compute winrate at each training iteration

* [fix] minigo can run with specified winrate requirement

* redefine max iteration number

* merge bert code and resolve conflicts

* Revert "merge bert code and resolve conflicts"

This reverts commit e2f31e434f506d8b639d471d7af919dee30b2b3a.

* remove not tracked files

* add modeladvisor for minigo

* update minigo advisor

* update minigo modeladvisor

* refine minigo model advisor, can run w/wo sigopt

* remove minigo original code, use git patch instead

* refine patch file for minigo

* update minigo model advisor and patch file

* add jenkinsfile for minigo auto-test config

* hydroai_defaults_minigo_example.conf

* update jenkinsfile

* update minigo cicd test script and jenkinsfile

* update jenkinsfile

* update jenkinsfile for minigo

* update jenkinsfile

* refine minigo patch file

* update jenkinsfile

* update minigo test script

* [fix] cleanWs before checkout scm

* [refine] use modelzoo/third_party/mlperf_v1.0 as submodule and apply minigo patch

* update minigo patch file

* Update Jenkinsfile

* Remove useless code for TwitterRecSys (intel#129)

* add dockerfile for base/tensorflow/pytorch1.10/pytorch envs (intel#121)

* add dockerfile for pytorch_1.10 env

* refine pytorch dockerfile

* dockerfile for pytorch_1.10 finished

* add dockerfile for base env, pytorch_1.10 env, torchccl env

* refine dockerfile for torchccl

* dockerfile dor torchccl finished

* refine torchccl dockerfile, validated in dlrm workload

* refine base/pytorch dockerfile

* dockerfile for base/tensorflow/pytorch1.10/torchccl

* update dockerfile for tensorflow

* dockerfile for tensorflow has been finished

* refine aidk dockerfiles

* refine aidk dockerfiles

* update tensorflow dockerfile

* refined dockerfiles, test with wnd/dien/dlrm passed

* reset entrypoint value

* add dockerfile README.md

* add ssh service for dockerfiles

* update jenkinsfile

* trigger test for dockerfile integration

* trigger test for rnnt and minigo

* update dockerfile ssh keys

* update jenkinsfile and minigo test script

* update jenkinsfile

* update README.md

* Move AIDK to second-level folder for python setup

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Reorg AIDK folder for pip install and add an development API

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* clean modelzoo for v0.2

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Clean SDA codes for removed models

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* Remove conf and run scripts for v0.2 models

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [Minigo]third_party separation refine (intel#132)

* refine minigo patch script

* Update patch_minigo.sh

* refine minigo.patch

* [DIEN] make dien as third party + patch (intel#135)

* dien patch

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* [dlrm]Add DLRM  patch for v0.1  (intel#134)

* Add dlrm model to v0.1 modelzoo

* remove dlrm folder

* modify dlrm advisor and dlrm model, test passed

* change the patch order

Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [wnd] patch for v0.1 (intel#133)

* separate wnd third party code

* fix format issue in patch

* pass model path

Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [Bert] patch for v0.2 (intel#136)

* add IntelAI model into modeldoo/thirdparty

* add bert patch

* fix the ip address used in run_bert

* refine patch

* refine bert patch shell

Co-authored-by: tianyi1 <liutianyi@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* add rnnt patch (intel#140)

* [v0.2 branch]Refine Dockerfile and CICD (intel#139)

* add IntelAI model into modeldoo/thirdparty

* add AIDK CLI custom_result_path support;
refactor CICD and Jenkinsfile;
integrate dockerfile with jenkins and enable all models;
refine Jenkins README.md and Dockerfile README.md

* add bert patch

* update wnd test conf

* [Minigo]third_party separation refine (intel#132)

* refine minigo patch script

* Update patch_minigo.sh

* refine minigo.patch

* fix the ip address used in run_bert

* refine patch

* refine bert patch shell

* remove untracked models such as dlrm_torch110

* remove sensitive information in Dockerfiles

* JenkinsfileDis

* update jenkinsfile and fix dockerfile

* update jenkinsfile distributed

* fix the patch issues by using a stable intel models version

* add the newly update bert.patch

* Update README.md

* Update README.md

* update README.md

Co-authored-by: tianyi1 <liutianyi@intel.com>
Co-authored-by: tianyi1 <tianyi.liu@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [rnnt] add normalization (intel#157)

* add normalization and merge config

* add readme

* update model advisor

* fix cicd issue

* trigger cicd

* [v0.2 branch]Add ResNet model (intel#158)

* [v0.2 branch]Add ResNet model

* refine code

* direct conda environment

* Refine tensorflow dockerfile and run shell

* fix log init issue (intel#163)

* [v0.2 branch] Apply latest MiniGo optimizations (intel#164)

* minigo devel initial config

* enable multiple selfplay games on single physical core

* apply minigo latest optimization

* apply latest minigo optimization

* add example script and toggle jenkins

* fix minigo

* [v0.2 branch] fix the Bert patch issues (intel#147)

* add IntelAI model into modeldoo/thirdparty

* add bert patch

* fix the ip address used in run_bert

* refine patch

* refine bert patch shell

* fix the patch issues by using a stable intel models version

* add the newly update bert.patch

* refine early stop for bert

* add one blank line in script for triggering the jenkins

Co-authored-by: tianyi1 <liutianyi@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* add early stop into the bert patch (intel#166)

* [v0.2 branch] Add the lamb optimizations into the Bert patch (intel#171)

* add early stop into the bert patch

* refine the inconsistent loss and add lamb optimization

* [v0.2] optimize hyperparam for minigo fastplay and readout (intel#170)

* tune minigo fastplay and readout hyperparam

* small fix for minigo

* [v0.2 branch] Bump pytorch1.10 dockerfile base image to latest (intel#179)

* Aligned Denas DockerFile to latest aitoolkit docker (intel#177)

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>

* bump Pytorch110 dockerfile base oneapi-aikit image

* refine jenkins file

* apply v0.1 latest dockerfile changes to v0.2

* fix test script

* fix cicd scripts

* refine and format dockerfiles

* refine jenkinsfile

* fix TF dockerfile

* refine PyTorch dockerfiles

* Update README.md

* refine jenkins file

* refine pytorch dockerfile

* refine jenkins file

* fix patch_bert.sh patch file

* fix jenkins file

* fix jenkins config

* fix jenkins file

* fix bert patch issue, trigger new jenkins build

* trigger new build and abort previous build, test robustness

* fix jenkins abort stability issue

Co-authored-by: Chendi.Xue <chendi.xue@intel.com>

* [v0.2]add attention block layer to ResNet model (intel#181)

* add attention block layer to resnet module

* test

* remove bf16

* [v0.2 branch] Optimize minigo mcts hyperparams (intel#190)

* optimize minigo mcts hyperparams

* update jenkinsfile for minigo

* refine the early stop in the distributed training (intel#194)

* [v0.2 branch] refix the inconsistent early stop issues by using step threshold (intel#205)

* refix the inconsistent early stop issues by using step threshold

* refine patch_bert.sh

* fix the unexpected stop issues and add bf16 optimization (intel#210)

* reduce bs for cicd with dummy dataset (intel#218)

* [v0.2 branch] add cicd distributed test for v0.2 workloads (intel#214)

* add rnnt distributed test

* fix rnnt distributed test scirpt

* add resnet distributed test scripts

* add bert distributed test

* fix bert conf file

* add minigo fistributed test

* fix minigo distributed test scirpt

* fix minigo distributed test scripts

* update conf file for distributed train

* update jenkinsfile

* update jenkinsfile

* fix JenkinsfileDistributed

* fix JenkinsfileDistributed

* fix JenkinsfileDistributed

* fix JenkinsfileDistributed

* fix JenkinsfileDistributed

* fix JenkinsfileDistributed

* update JenkinsfileDistributed

* fix distributed test scripts

* fix distributed test scripts

* add distributed test for minigo

* fix rnnt distributed test scripts

* fix JenkinsfileDistributed

* update rnnt conf for cicd distributed train

* update JenkinsfileDistributed

* refine cicd scripts for v0.2 branch (intel#222)

* refine cicd scripts for v0.2 branch

* fix bert cicd distributed training config

* refine denas cicd test scripts

* minor fix for denas cicd test scripts

Signed-off-by: Xue, Chendi <chendi.xue@intel.com>
Co-authored-by: Tao He <tao1.he@intel.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Tianyi Liu <tianyi.liu@intel.com>
Co-authored-by: XinyaoWa <xinyao.wang@intel.com>
Co-authored-by: csdingbin <bin.ding@intel.com>
Co-authored-by: tianyi1 <liutianyi@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant