Skip to content

Conversation

ytaous
Copy link
Contributor

@ytaous ytaous commented May 14, 2020

Description: Include missing CPU/memory usage in training run.

Motivation and Context

  • leverage same utils from perf test project
  • ingest the new data to MySQL for generating new reports

@ytaous ytaous added the training issues related to ONNX Runtime training; typically submitted using template label May 14, 2020
@ytaous ytaous requested review from SherlockNoMad and edgchen1 May 14, 2020 19:47
@ytaous ytaous requested a review from a team as a code owner May 14, 2020 19:47
edgchen1
edgchen1 previously approved these changes May 15, 2020
@ytaous ytaous requested a review from edgchen1 May 15, 2020 04:15
@ytaous ytaous merged commit bc441b7 into master May 15, 2020
@ytaous ytaous deleted the ettao/mem-cpu branch May 15, 2020 19:29
fs-eire added a commit that referenced this pull request May 18, 2020
commit a3120b3
Merge: c919f84 0d11649
Author: Yulong Wang <f.s@qq.com>
Date:   Sun May 17 16:35:27 2020 -0700

    Merge remote-tracking branch 'origin/master' into fs-eire/nodejs-cmake-debug

commit 0d11649
Author: Wei-Sheng Chin <wschin@outlook.com>
Date:   Sun May 17 14:08:33 2020 -0700

    Address comments from #3823 and polish code (#3964)

    * Address comments from #3823 and polish code

    * One line

commit c919f84
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Sun May 17 13:26:52 2020 -0700

    string strip ort_version

commit 40a9dd7
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Sun May 17 10:16:58 2020 -0700

    fix target dir in cmakefile

commit 4ff73d0
Author: Prabhat <prabhat.roy@microsoft.com>
Date:   Sun May 17 14:06:55 2020 +0530

    Fix python pkg permission issue (#3957)

    * Fix python pkg permission issue

    * Run chown with sudo

    * Add workspace clean to arm pipeline

    * Run docker as current user

commit 07e9a4c
Author: Tianlei Wu <tlwu@microsoft.com>
Date:   Sat May 16 20:17:40 2020 -0700

    Update benchmark to reflect those used in our latest results (#3967)

    Update optimizer for GPT2 models exported from PyTorch 1.5.
    Update benchmark to use GPT2 models without Past State inputs/outputs
    Update bert_perf_test to allow setting omp_num_threads etc to test only one setting

commit 56700be
Author: Tianlei Wu <tlwu@microsoft.com>
Date:   Sat May 16 20:13:24 2020 -0700

    Add example of python code to readme of transformers tools (#3966)

    * Use shorter name for tools
    * Use optimizer_cli
    * Add comments about -i parameter

commit 9bc9d08
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Sat May 16 12:29:24 2020 -0700

    try fix build_dir

commit 769c11f
Author: Tianlei Wu <tlwu@microsoft.com>
Date:   Sat May 16 11:13:34 2020 -0700

    Update doc for transformers tools (#3963)

    * update readme for onnxruntime-tools package
    * update license section in benchmark

commit a296b16
Author: M. Zeeshan Siddiqui <mzs@microsoft.com>
Date:   Sat May 16 00:33:25 2020 -0700

    Prevent divide by zero in CUDA implementation of SoftmaxCrossEntropyLossGrad. (#3962)

commit 132ce3a
Author: KeDengMS <kedeng@microsoft.com>
Date:   Fri May 15 23:41:29 2020 -0700

    Fixes for quantizing a BERT from HuggingFace (#3939)

    * Fixes for quantizing a BERT from HuggingFace

    * Address CR and some other minor fixes

commit 33208c9
Author: Wei-Sheng Chin <wschin@outlook.com>
Date:   Fri May 15 18:27:19 2020 -0700

    Modify Pipeline Facilities to Fix PipeDream Deadlock (#3823)

    * Prepare utils for adding Wait's and Record's

    * Have a running PipeDream

    * Add comments

    * Polish comments

    * Clean code

    * Fix test

    * Polish names

    * Polish names

    * Remove debug headers

    * Fix a shape inference bug (not related to pipeline code)

    * Fix a warning

    * Address some comments

    * Address comments

    * Only touch consumers of outputs when re-wire edges

commit 560603c
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Fri May 15 15:19:13 2020 -0700

    fix build cmd

commit 999554c
Author: edgchen1 <18449977+edgchen1@users.noreply.github.com>
Date:   Fri May 15 13:34:18 2020 -0700

    CGManifest - add training entries and generate entries for submodules. (#3933)

    Add cgmanifest.json entries for training dependencies.
    Add script to generate git submodule cgmanifest.json entries.

commit bc441b7
Author: ytaous <4484531+ytaous@users.noreply.github.com>
Date:   Fri May 15 12:29:40 2020 -0700

    Add cpu/mem usage for perf metrics (#3947)

    * add cpu/mem usage

    * on comments

    * on comments

    * renaming

    Co-authored-by: Ethan Tao <ettao@microsoft.com>

commit 4150466
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Fri May 15 12:00:14 2020 -0700

    fix NPM_CLI

commit be003db
Author: Tracy Sharpe <42477615+tracysh@users.noreply.github.com>
Date:   Thu May 14 23:14:15 2020 -0700

    Fix ACL build break (#3952)

commit 47ae969
Author: Pranav Sharma <prs@microsoft.com>
Date:   Thu May 14 21:27:46 2020 -0700

    Fix ordering of APIs. (#3951)

commit 9ef3768
Author: Hariharan Seshadri <shariharan91@gmail.com>
Date:   Thu May 14 19:21:40 2020 -0700

    Add test for If node with conditional branches only containing Constant nodes (#3949)

commit 38467f8
Author: Ryan Lai <rylai@microsoft.com>
Date:   Thu May 14 18:52:08 2020 -0700

    DirectML Nuget package has different time stamp than Native and Managed Nuget (#3950)

    * Fix DirectML nuget creation in Nuget pipeline

    * DirectML Nuget package has different timestamp

    * remove accidentally changed file

commit e6da594
Author: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Date:   Thu May 14 17:33:46 2020 -0700

    Update DML Nuget version and DML EP Doc (#3945)

    Update DML Nuget version and DML EP Doc

commit 782c6c2
Author: Tianlei Wu <tlwu@microsoft.com>
Date:   Thu May 14 15:32:59 2020 -0700

    Rename bert to transformers (#3946)

    * rename folder bert to transformers
    * rename bert_model_optimization.py to optimizer.py
    * update URL links in notebooks

commit 3c4f3d0
Author: Zhang Lei <zhang.huanning@hotmail.com>
Date:   Thu May 14 14:52:55 2020 -0700

    Implement QLinearLeakyRelu (#3648)

    * Implement QLinearRelu and its unit test.
    * Add logic to compute table during constructor when all parameters is constant.
    * Fix test case rounding result related with rounding mode.

commit 5e0928a
Author: Scott McKay <skottmckay@gmail.com>
Date:   Fri May 15 07:15:06 2020 +1000

    Enable running PEP8 on python scripts using flake8 (#3928)

    * Enable running PEP8 checks via flake8 as part of the build if flake8 is installed.
    Update scripts in \tools and \onnxruntime\python. Excluding \onnxruntime\python\tools which needs a lot more work to be PEP8 compliant. Also excluding orttraining\tools for the same reason.
    Install flake8 as part of the static_analysis build task in the Win-CPU CI so the checks are run in one CI build.
    Update coding standards doc.

commit 3981431
Author: Yulong Wang <yulongw@microsoft.com>
Date:   Thu May 14 12:37:32 2020 -0700

    [Node.js binding] fix linux build (#3927)

commit 50f798d
Author: Yufeng Li <liyufeng1987@gmail.com>
Date:   Thu May 14 12:02:28 2020 -0700

    support non-zero zero point for matmulinteger u8s8 (#3883)

    * support non-zero zero point for matmulinteger u8s8

commit 9c989c8
Author: Changming Sun <chasun@microsoft.com>
Date:   Thu May 14 11:43:06 2020 -0700

    Update build doc for cross-compiling (#3672)

commit cab2122
Author: manashgoswami <magoswam@microsoft.com>
Date:   Thu May 14 11:42:44 2020 -0700

    Updated TPN for OpenMPI and cleanup (#3932)

    * Update README.md

    * Update ReleaseManagement.md

    * Updated Third Party Notice for training feature

    Added Open MPI license

commit cba8bdc
Author: gwang-msft <62914304+gwang-msft@users.noreply.github.com>
Date:   Thu May 14 10:53:37 2020 -0700

    Make some compile change for Android NNAPI provider using DNNLibrary (#3935)

    * Change compile settings for NNAPI with DNNLib

    * update build.py

    * update build readme

commit 84c108a
Author: Prasanth Pulavarthi <prasantp@microsoft.com>
Date:   Thu May 14 07:35:23 2020 -0700

    link to folder instead of READMEs inside folder (#3938)

    otherwise hard to find the source code

commit 48f69cf
Author: Ryan Lai <rylai@microsoft.com>
Date:   Wed May 13 19:34:38 2020 -0700

    Fix DirectML nuget creation in Nuget pipeline (#3929)

commit f380460
Author: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Date:   Wed May 13 19:30:27 2020 -0700

    Update the build steps to support ORT on Jetson (#3869)

    * Update BUILD doc for ARM64 build for TensorRT support on Jetson device

    * minor revision

    * JetPack 4.4 is in developer preview stage, so we suggest to use JetPack
    4.3

commit 93eb9bc
Author: ytaous <4484531+ytaous@users.noreply.github.com>
Date:   Wed May 13 14:15:17 2020 -0700

    Add yaml/perf scripts for new perf test pipeline (#3909)

    * yaml/perf scripts for new pipeline

    * yaml/perf scripts for new pipeline

    * remove unused imports

    * testing some comments change

    * testing some comments change

    * testing jdbc

    * testing jdbc

    * testing jdbc

    * exclude pwd from jdbc properties

    * exclude pwd from jdbc properties

    * namedtuple

    * on comments

    Co-authored-by: Ethan Tao <ettao@microsoft.com>

commit e86214e
Author: Changming Sun <chasun@microsoft.com>
Date:   Wed May 13 11:52:59 2020 -0700

    Fix the tensorflow performance test (#3847)

commit 7c774e9
Author: Yufeng Li <liyufeng1987@gmail.com>
Date:   Wed May 13 11:16:37 2020 -0700

    support quantization of optimized model with ir<4 (#3853)

commit 25257a6
Author: Prabhat <prabhat.roy@microsoft.com>
Date:   Wed May 13 23:20:29 2020 +0530

    Added onnxruntime aarch64 wheel to pypi publishing pipeline (#3903)

    * Added onnxruntime aarch64 wheel to pypi publishing pipeline

    * Support nightly build flag

    * Add support for nightly build

commit 1c1685a
Author: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Date:   Wed May 13 10:13:16 2020 -0700

    Fix error handling in LearningModelSession.cpp (#3920)

commit 385073e
Author: Tiago Koji Castro Shibata <ticastro@microsoft.com>
Date:   Wed May 13 09:14:55 2020 -0700

    Fix DmlCopyTensor test (#3923)

    * Fix heap corruption

    * Cleanup

commit eab61e8
Author: Zhang Lei <zhang.huanning@hotmail.com>
Date:   Tue May 12 20:38:26 2020 -0700

    Fix quantization tool bugs when model nodes have no name. (#3854)

    Fix bugs when model nodes have no name.

commit 9b5daa2
Author: liqunfu <liqfu@microsoft.com>
Date:   Tue May 12 18:11:25 2020 -0700

    patch torch onnx opset 10 (#3910)

    patch pytorch to export onnx nll_loss opset version 10. add mnist test to covert onnx opset version 10.

commit 7b858d6
Author: Ori Levari <ori.levari@microsoft.com>
Date:   Tue May 12 17:22:47 2020 -0700

    Various changes for automated downlevel test pipeline (#3901)

    Co-authored-by: Ori Levari <orlevari@microsoft.com>

commit 3065219
Author: Hariharan Seshadri <shariharan91@gmail.com>
Date:   Tue May 12 17:07:06 2020 -0700

    Changes related to the release binaries requiring Visual C++ 2019 runtime (#3871)

commit bccbdd0
Author: Xiang Zhang <xianz@microsoft.com>
Date:   Tue May 12 15:46:46 2020 -0700

    User/xianz/enable batch tests (#3914)

    * enable batch tests in winml_image_test

    * copy batchGroundTruth folder

    * skip GPU tests when GPU is unavailable

commit 18dc0ec
Author: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Date:   Tue May 12 15:31:20 2020 -0700

    Rework jar by creating os-arch folders (#3849)

    Detect os and arch and move the artifacts to a new folder.
     Remove unnecesary jars so we cam focus on those we publish.
     Add signing
     Make signature simlper.
     Fix indent.
     Halt on 32-bit arch.
     Credits: @Craigacp

commit c00945a
Author: Hariharan Seshadri <shariharan91@gmail.com>
Date:   Tue May 12 14:43:32 2020 -0700

    Build ORT by default for Mac OS X versions 10.12+  (#3626)

commit 99415f0
Author: Scott McKay <skottmckay@gmail.com>
Date:   Wed May 13 07:18:32 2020 +1000

    Fix bug where linear_output_ is not cleared when linear_before_reset is true and no bias input is provided. Requires a batch size of 3 or more to trigger if initial_h is not provided. (#3893)

commit 475ea38
Author: Andrews548 <32704142+Andrews548@users.noreply.github.com>
Date:   Wed May 13 00:06:48 2020 +0300

    Fix ACL EP convolution-activation fusion optimization (#3896)

    Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>

commit f170f31
Author: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
Date:   Tue May 12 14:04:09 2020 -0700

    Extend workaround with input name matching in DML fused graph kernel (#3918)

commit 0f82b42
Author: Bowen Bao <bowbao@microsoft.com>
Date:   Tue May 12 13:32:27 2020 -0700

    Ensure pt model is set to cpu in ort_trainer (#3867)

    * Ensure pt model is set to cpu in ort_trainer

    * add note comment

commit 2949617
Author: Tianlei Wu <tlwu@microsoft.com>
Date:   Tue May 12 12:26:22 2020 -0700

    Add Benchmark Script for Bert Models (#3829)

    Add benchmark script for Transformer models
    * Set intra_op_num_threads=1 for cpu (version <= 1.2.0)
    * Add percentiles for latency
    * torch.set_num_threads (for intra op) to get fair comparison
    * Allow export ONNX model with specified number of inputs
    * Add fusion statistics
    * Install transformers from source
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants