Add benchmarks to CI #479

Summary: ## Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [X] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Docs change / refactoring / dependency upgrade ## Motivation and Context / Related issue There's a task pytorch#368 for committing benchmark code. In this change I add these benchmarks into CI integration tests. To choose thresholds I ran the benchmarks locally on all the layers with (batch size: 16, num_runs: 100, num_repeats: 20, forward_only: False), and generated the following report: | | memory* | memory* | memory* | memory* | memory* | runtime | runtime | runtime | runtime | runtime | |--------------|---------|--------|------------|--------|-------------|------------------------|----------------------|--------------------|------------------------|--------------------| | value | control | dp | dp/control | gsm | gsm/control | control | dp | dp/control | gsm | gsm/control | | base_layer | | | | | | | | | | | | conv | 0.0 | | | 0.0 | | 2.021756922606001 | | | 3.2889059911645036 | 1.6267563891534373 | | embedding | 0.0 | | | 0.0 | | 0.002484286398502263 | | | 0.013664713416999803 | 5.5004581698946 | | groupnorm | 0.0 | | | 0.0 | | 0.0001871487290072764 | | | 0.00043170701800136156 | 2.306759016165034 | | gru | 0.0 | 0.0 | | 0.0 | | 0.045029744959007065 | 0.057370035271503174 | 1.2740475284443677 | 0.2402042072270033 | 5.334345274344187 | | instancenorm | 0.0 | | | 0.0 | | 0.004493124293996517 | | | 0.006058429501005777 | 1.3483779002287433 | | layernorm | 0.0 | | | 0.0 | | 0.00011227587499979562 | | | 0.0002241125804985131 | 1.9960884784814286 | | linear | 0.0 | | | 0.0 | | 0.001010556231000001 | | | 0.003052972127999998 | 3.021080900148341 | | lstm | 0.0 | 0.0 | | 0.0 | | 0.052634652085002925 | 0.06508583683050075 | 1.2365586975931682 | 0.2982182763324963 | 5.665816425477371 | | mha | 0.0 | 0.0 | | 0.0 | | 0.018872260358001765 | 0.01870937360499738 | 0.9913689854890476 | 0.02688384014700477 | 1.424516175435558 | | rnn | 0.0 | 0.0 | | 0.0 | | 0.01576623683249454 | 0.02184348723049516 | 1.3854597937711604 | 0.10178373254250346 | 6.455803856296582 | (*) This report wasn't generated on a machine with CUDA so the memory wasn't measured. Will update later when it runs in CI on a GPU machine. Using the report and section 3 in the [paper](https://arxiv.org/pdf/2109.12298.pdf), I parameterised the runtime and memory thresholds for different layers. ## How Has This Been Tested (if it applies) I ran the jobs locally and generated reports. ## Checklist - [X] The documentation is up-to-date with the changes I made. - [X] I have read the **CONTRIBUTING** document and completed the CLA (see **CONTRIBUTING**). - [ ] All tests passed, and additional code has been covered with new tests. Pull Request resolved: pytorch#479 Differential Revision: D38999201 Pulled By: moaradwan fbshipit-source-id: 3d02931970e39ea331674c9f0676db9e22c5edaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks to CI #479

Add benchmarks to CI #479

Commits on Aug 25, 2022