Remove flatten params #2761

mvpatel2000 · 2023-12-05T12:42:17Z

What does this PR do?

Remove flatten params. Was removed from torch

Along for ride:

Update codeowners to simplify
Speed up ICL tests
disable mosaicml logger as it causes flakey tests

…l2000/composer into mvpatel2000/remove-flatten-params

dakinggg

Was there any logic to the new max seq lens you set in tests? They seem a bit random

tests/fixtures/autouse_fixtures.py

mvpatel2000 · 2023-12-05T23:39:09Z

Was there any logic to the new max seq lens you set in tests? They seem a bit random

I picked 64 to start and then raised them as required by failures in test due to too small seq len. It should be principled and we should prune datasets to support a fixed size, but unblocking CI/CD is more important

* Bump torch to 2.1.1 version (#2717) * Add more info when run doesnt complete (#2751) * Lower sequence generation length on code gen to be dependent on max canonical solution length (#2682) * sequentialize generations_per_sample * fix bug * lower generation length * lower generation length * lower generation length * fix gen len * restore * restore * restore * fix tests * fix test * Remove flatten params (#2761) * remove flatten params * simplify tests * simplify tests * clean * fix more tests * rerun tests * speed up icl * fix tests * fix cpu tests * add more fixtures * fix tests * token count * fix vocab size * remove logger * remove clears * fix mosaicml logger * change codeowners * clean up codeowners * rerun tests * shrink dataset * fix tests * fix test * rerun tests * fix tests * fix tests * fix seed * set to 0 * rerun tests * rerun tests * change threshold * rerun tests * rerun tests * logs * remove changes * logs * logs * remove logs * rerun tests * rerun tests * logs * rerun * logs * rerun * rerun * rerun tests * many more logs * rerun tests * strip logs * enable tests * remove opt * rerun tests * add test * lint * rerun tests * fix lint * lint * filter warnings * rerun tests * fixture * add fixture * change * logs * rerun tests * add logs * rerun tests * fixture * lint * lint * rerun tests * fix ignore warning * logs * regex * regex * regex * fix * logs * reformat * fix lint (#2767) * lint (#2768) * Use time.tokens for speedmonitor instead of dataset length (#2762) * change token math * tokens * add test * fix tests * remove exception (#2759) * time to clean up time parsing 😉 (#2770) * time to clean up time parsing * fix type error * updates * Upgrade RunConfig compute specification (#2772) * Upgrade RunConfig compute specification * extra cluster * Use async logging in MLflowLogger (#2693) * async mlflow logging Signed-off-by: chenmoneygithub <chen.qian@databricks.com> * small fix Signed-off-by: chenmoneygithub <chen.qian@databricks.com> * clean up * fix test * fix tests * deflake * pin mlflow --------- Signed-off-by: chenmoneygithub <chen.qian@databricks.com> * Fix FSDP _param_init_fn to not reinit parameters multiple times (#2765) * Gate FSDP param init test on torch 2.1 (#2774) * Parallelize OCI multipart download (#2750) * [UCVolumes] Add support for list API (#2769) * Add the memory timeline profiling support through the PyTorch profiler. (#2771) * v1 * fix issues * add logs * change names * comment * add device * uncomment original trace * add custome plot * fix pyright * Update composer/profiler/torch_profiler.py Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com> * address comments * fix code check * fix formatting * address comments * add unit test * fix check * fix check * fix check * fix check * fix print * add test comment * add test comment --------- Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com> * Improve torch memory profiling arguments processing (#2777) * improve torch profile args * improve torch profile args * change default torch_prof_memory_filename * add memory profiling arg test * fix check * fix check * fix check * fix check * fix check * fix check * Add platform AWS and bump aws ofi nccl version (#2776) * Extend checkpoint loading to accept a validation function (#2726) * Fix checkpoint validation tests for torch 1.13 (#2779) * fix checkpoint validation tests for torch 1.13 * Fix * Bump version to 0.17.2 (#2780) * bump version * 0.17.2 * update matrix * bump transformers version (#2781) * Bump sphinxext-opengraph from 0.9.0 to 0.9.1 (#2784) Bumps [sphinxext-opengraph](https://github.com/wpilibsuite/sphinxext-opengraph) from 0.9.0 to 0.9.1. - [Release notes](https://github.com/wpilibsuite/sphinxext-opengraph/releases) - [Commits](wpilibsuite/sphinxext-opengraph@v0.9.0...v0.9.1) --- updated-dependencies: - dependency-name: sphinxext-opengraph dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump coverage[toml] from 7.3.0 to 7.3.3 (#2783) Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.3.0 to 7.3.3. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.3.0...7.3.3) --- updated-dependencies: - dependency-name: coverage[toml] dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 (#2785) Updates the requirements on [torch](https://github.com/pytorch/pytorch) to permit the latest version. - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](pytorch/pytorch@v1.13.1...v2.1.2) --- updated-dependencies: - dependency-name: torch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [UCVolumes] Rely on databricks-sdk auth for the right requirements (#2789) * Enable system metrics in mosaic mlflow logger (#2775) * Enable system metrics in mosaic mlflow logger * remove fixture * Update composer/loggers/mlflow_logger.py Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> * Update composer/loggers/mlflow_logger.py Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> * Update composer/loggers/mlflow_logger.py Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> --------- Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Update parse_uri (#2787) * default-no-memory-timeline (#2790) * Add eot token to ICL generate kwargs (#2782) * add custome gen kwargs and stopping on eos token * modify test * modify test * finish * finish * finish * finish * Add nightly image for torch 2.2.0 12-20-23 (#2791) * Add torch nightly 12-13 (#2792) * Add process group as arg to FSDP (#2794) * add test * only cast if PG is specified * add to docstring * filter warning * filter warning * docs * support lists * remove warnings * lint * hsdp monkeypatch * logs * change log * fix patch * typo * clean up logs * Bump coverage[toml] from 7.3.3 to 7.3.4 (#2798) Bumps [coverage[toml]](https://github.com/nedbat/coveragepy) from 7.3.3 to 7.3.4. - [Release notes](https://github.com/nedbat/coveragepy/releases) - [Changelog](https://github.com/nedbat/coveragepy/blob/master/CHANGES.rst) - [Commits](nedbat/coveragepy@7.3.3...7.3.4) --- updated-dependencies: - dependency-name: coverage[toml] dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix load_ignore_keys with rng (#2803) * fix rng load * lint * Bump ipykernel from 6.26.0 to 6.28.0 (#2806) Bumps [ipykernel](https://github.com/ipython/ipykernel) from 6.26.0 to 6.28.0. - [Release notes](https://github.com/ipython/ipykernel/releases) - [Changelog](https://github.com/ipython/ipykernel/blob/main/CHANGELOG.md) - [Commits](ipython/ipykernel@v6.26.0...v6.28.0) --- updated-dependencies: - dependency-name: ipykernel dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump junitparser from 3.1.0 to 3.1.1 (#2805) Bumps [junitparser](https://github.com/weiwei/junitparser) from 3.1.0 to 3.1.1. - [Changelog](https://github.com/weiwei/junitparser/blob/master/CHANGELOG.md) - [Commits](weiwei/junitparser@3.1.0...3.1.1) --- updated-dependencies: - dependency-name: junitparser dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pytest from 7.4.3 to 7.4.4 (#2807) Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.3 to 7.4.4. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@7.4.3...7.4.4) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Avoid futures on close for MosaicML logger (#2804) * avoid futures on close * typo * logs * logs * check (#2812) * Better communication computation overlap (#2811) * patched torch * fixed torch imports * fixed torch imports * fixed torch imports * patching through composer * patching through composer * patching typingr * comment added * don't patch torch 2.1.0 * patch torch 2.1.1 and 2.2.0 * linting fix * Improve error message for speed monitor (#2801) * fix flops * stacklevel * bump torch version (#2814) * bump vision (#2815) * fix rng load (#2816) * Correct multi-unshard stream patching for torch 2.2.0dev, and stream waiting correctness. (#2817) * patched torch * fixed torch imports * fixed torch imports * fixed torch imports * patching through composer * patching through composer * patching typingr * comment added * don't patch torch 2.1.0 * patch torch 2.1.1 and 2.2.0 * linting fix * waiting on computation stream from unshard stream * waiting on computation stream from unshard stream * less waiting * no waiting * all unshard streams wait on computation stream now * 2.2.0 dev change * fix profiler (#2818) * Bump traitlets from 5.13.0 to 5.14.1 (#2822) Bumps [traitlets](https://github.com/ipython/traitlets) from 5.13.0 to 5.14.1. - [Release notes](https://github.com/ipython/traitlets/releases) - [Changelog](https://github.com/ipython/traitlets/blob/main/CHANGELOG.md) - [Commits](ipython/traitlets@v5.13.0...v5.14.1) --- updated-dependencies: - dependency-name: traitlets dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * All unshard streams wait on computation every step (#2823) * patched torch * fixed torch imports * fixed torch imports * fixed torch imports * patching through composer * patching through composer * patching typingr * comment added * don't patch torch 2.1.0 * patch torch 2.1.1 and 2.2.0 * linting fix * waiting on computation stream from unshard stream * waiting on computation stream from unshard stream * less waiting * no waiting * all unshard streams wait on computation stream now * 2.2.0 dev change * correct waiting on computation stream * fsdp state typiung * patching root pre forward * patching root pre forward * fsdp state typing * patch forward * correct waiting * linting * Add encoding=utf-8 (#2824) * Fix import for daily test (#2826) * patched torch * fixed torch imports * fixed torch imports * fixed torch imports * patching through composer * patching through composer * patching typingr * comment added * don't patch torch 2.1.0 * patch torch 2.1.1 and 2.2.0 * linting fix * waiting on computation stream from unshard stream * waiting on computation stream from unshard stream * less waiting * no waiting * all unshard streams wait on computation stream now * 2.2.0 dev change * correct waiting on computation stream * fsdp state typiung * patching root pre forward * patching root pre forward * fsdp state typing * patch forward * correct waiting * linting * daily test change * daily test fix * [MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore (#2802) * Implementation of MLFlowObjectStore * Update object store test settings * Import mlflow dependencies inline * Fix tests and ignore some pyright * Bugfix * Enforce experiment and run in get_artifact_path * Update placeholders * Make logs debug instead of info * Minor PR comments * MLflow casing * tracking_uri fixes * Update comments * Update placeholders * Fix tests * Fix pyright * Use tempfile for temp dirs * Read tracking uri env var directly * Remove dist from MLFlowObjectStore --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Remove fused layernorm (already deprecated for 2 versions) (#2827) * remove fused layernorm * remove import * remove import * remove * fix * remove docs * all * fix * filter warnings * norm * lint * refactor --------- Co-authored-by: Your Name <you@example.com> * checkpoint saver tracks all checkpoints/intervals in state (#2819) * checkpoint tracking state * fix some tests * Update tests/callbacks/test_checkpoint_saver.py * Checkpoint itself should be included in state, dont pickle timestamp object * patch the key error (doesnt fix the bug though :sad:) * avoid slashes in state, adjust tests * fix gpu test, probably * formatting * feedback * add a comment * Apply suggestions from code review Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> --------- Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> * code-quality timeout update (#2830) Timed out after 10 minutes here https://github.com/mosaicml/composer/actions/runs/7465107219/job/20313553654?pr=2819 Bumps runtime up to 15min * [S] Fix how single value tensors are logged (#2831) Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Adds DTensor Support (#2821) * fixes to get dtensor to work * more fixes * Change state dict materialization for new version of torch * get load working for new set_state_dict api * use device_mesh * Add fsdp init monkeypatch for DTensor * Add checkpoint profiling logs * attempt * working single node * fix optimizer * allow 3d device mesh * attempt to use different pg during 3d mesh save * undo 3d mesh changes * load_state_dict -> load * allow parent mesh in FSDP init * allow override of force_sync_module_states * remove unnecessary exit * ignore _validate_and_get_shard_state() * save/load hsdp-moe working * remove prints * v1 * v2 * lint * add more tests * switch to PRs * ignore warning * fix lint * version error * fix version * fix state dict * update versions * lint * lint * disable lint for mosaic fsdp utils * remove bad line * move around for legacy * device mesh * ignore warning * fix import * always init * fix error * fix load planner * remove * fix lint * lint * delay state dict * test checkpoint * checkpoint * fix cpu tests * fix rotate tests * fix precision * lint * fix alibi * cleanup * cleanup * remove force sync * fix type * merge * lint * fix gpt * comment * fix test * lint * minor optimizations * Update composer/core/state.py Co-authored-by: Evan Racah <evan@mosaicml.com> * revert tests --------- Co-authored-by: Evan Racah <ejracah@gmail.com> Co-authored-by: Abhinav Venigalla <abhi.venigalla@databricks.com> Co-authored-by: root <23239305+b-chu@users.noreply.github.com> Co-authored-by: Abhinav Venigalla <abhi@mosaicml.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Evan Racah <evan@mosaicml.com> * Remove duplicate checkpoint verifications (#2828) * Fix seed for FSDP wrap (#2833) * first try * add context * lint * more lint * remove comment --------- Co-authored-by: Daniel King <daniel@mosaicml.com> Co-authored-by: Your Name <you@example.com> * Remove fsdp patch for comm overlap (#2836) * allow hsdp (#2838) * Bump torch 2.1.2 (#2840) * bump torch * bump * bump * Upgrade pyright to 1.1.310 (#2841) * [MLFlowObjectStore] [2/2] Support checkpointing with MLFlow (#2810) * Support checkpoint uploads to MLFlow (untested) Use MLFlow run tag for autoresume Add MLFlowLogger test for existing composer run tag * Try formatting mlflow save folder after INIT Make MLFlow experiment and run ID available on all ranks Fix path issue Format mlflow placeholders in remote filenames * Unit tests for partial_format * Log mlflow info as hyperparams * partial_format doc update * Fix formatting * Pull distributed logic out of MLFlowObjectStore Add debug tracebacks Bugfix Add path to debug info Try fixing RUD object store init Pyright * Partial format in format_name helpers * Fix import * Add extra partial_format test * Fix mlflow RUD check * Fix test pyright No longer expect KeyError for format_with_dist using partial_format Refactor partial_format for readability * Max iters on partial_format * Fix partial_format * Clean up * fix test import * Fix test * update nightly to torch 2.3 (#2842) * update nightly to torch 2.3 * tighten --------- Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> * Pin sphinxcontrib applehelp (#2854) * pin release * bump * break pypi * tighter pin * pin * pin * pin * Update setup.py (#2855) * Torch 2.3 patch (#2849) * add monkeypatch for verify_options * patch * fix * fix * partial precommit * bit of cleanup * doc * debug * fix version pinning * precommit * checkdown * lint --------- Co-authored-by: Evan Racah <ejracah@gmail.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> * Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 (#2866) Updates the requirements on [mosaicml-cli](https://github.com/mosaicml/mosaicml-cli) to permit the latest version. - [Commits](https://github.com/mosaicml/mosaicml-cli/commits) --- updated-dependencies: - dependency-name: mosaicml-cli dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Rewrite to use individual state functions (#2860) * checkdown * checkdown * lint * fix * load ignore keys * fix * resolve comments * fix load ignore keys * offload * fix gate * merge * lint * use flag * force trye * Add custom stopping criteria to ICL generate tasks (#2800) * add custome gen kwargs and stopping on eos token * modify test * modify test * finish * finish * finish * finish * finish pr * implement early stop * add tesT * fix bug * bug fix * add keys * diff split * fix typo * fix precommit * fix precommit * fix precommit * fix precommit * fix precommit * fix precommit * fix conditional import * add nlp metrics * remove code gen changes * fix nits --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> * Add save_ignore_keys (#2868) * comment * add it * debug * add the keys * debug * debug * remove print statement * docs and tests * fix tests --------- Co-authored-by: Daniel King <daniel@mosaicml.com> --------- Signed-off-by: chenmoneygithub <chen.qian@databricks.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Charles Tang <j316chuck@users.noreply.github.com> Co-authored-by: Anna <anna@mosaicml.com> Co-authored-by: Jeremy D <115047575+bmosaicml@users.noreply.github.com> Co-authored-by: Mihir Patel <mihir.v.patel7@gmail.com> Co-authored-by: Chen Qian <chenmoney@google.com> Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: coryMosaicML <83666378+coryMosaicML@users.noreply.github.com> Co-authored-by: Harsh Panchal <68880048+panchalhp-db@users.noreply.github.com> Co-authored-by: willgleich <22464726+willgleich@users.noreply.github.com> Co-authored-by: Irene Dea <deaairene@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: snarayan21 <saaketh@mosaicml.com> Co-authored-by: Jerry Chen <jerry.chen@databricks.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Evan Racah <ejracah@gmail.com> Co-authored-by: Abhinav Venigalla <abhi.venigalla@databricks.com> Co-authored-by: root <23239305+b-chu@users.noreply.github.com> Co-authored-by: Abhinav Venigalla <abhi@mosaicml.com> Co-authored-by: Evan Racah <evan@mosaicml.com> Co-authored-by: Daniel King <daniel@mosaicml.com>

remove flatten params

3d74f6f

mvpatel2000 requested a review from vchiley December 5, 2023 12:42

Merge branch 'dev' into mvpatel2000/remove-flatten-params

8b2ef02

vchiley approved these changes Dec 5, 2023

View reviewed changes

mvpatel2000 added 2 commits December 5, 2023 11:47

simplify tests

bf06e25

Merge branch 'mvpatel2000/remove-flatten-params' of github.com:mvpate…

ae7cbf2

…l2000/composer into mvpatel2000/remove-flatten-params

mvpatel2000 requested review from dskhudia and nik-mosaic as code owners December 5, 2023 16:50

mvpatel2000 added 8 commits December 5, 2023 13:10

simplify tests

7d1bcce

clean

a3be4cc

fix more tests

114c30d

rerun tests

810037d

speed up icl

179b1b8

fix tests

52151b2

fix cpu tests

193fbea

add more fixtures

6af7161

mvpatel2000 requested a review from dakinggg December 5, 2023 20:50

mvpatel2000 added 6 commits December 5, 2023 15:59

fix tests

b01e483

token count

c85311a

fix vocab size

cdb9c5e

remove logger

9cffa18

remove clears

4fa7196

fix mosaicml logger

e84a67c

dakinggg approved these changes Dec 5, 2023

View reviewed changes

tests/fixtures/autouse_fixtures.py Show resolved Hide resolved

change codeowners

21a44ee

mvpatel2000 requested a review from a team as a code owner December 5, 2023 23:39

mvpatel2000 added 2 commits December 5, 2023 18:40

clean up codeowners

bc8a89e

rerun tests

e18c64b

dskhudia approved these changes Dec 6, 2023

View reviewed changes

mvpatel2000 added 9 commits December 6, 2023 23:59

rerun tests

004d8d3

fix lint

15343f4

lint

8ad4654

filter warnings

1e8873d

rerun tests

c6911f2

fixture

f81a907

add fixture

c93d0dc

change

db58919

logs

4d20f1e

mvpatel2000 requested a review from eracah as a code owner December 7, 2023 16:46

mvpatel2000 added 13 commits December 7, 2023 12:02

rerun tests

687cd2f

add logs

655bd36

rerun tests

0dc1320

fixture

4d22628

lint

d240088

lint

d088a4f

rerun tests

9bbc2ee

fix ignore warning

ad2adf0

logs

4381b95

regex

8cfdefa

regex

b64bdf9

regex

62298c7

fix

23cd426

mvpatel2000 force-pushed the mvpatel2000/remove-flatten-params branch 2 times, most recently from 27987a1 to 23cd426 Compare December 7, 2023 20:51

mvpatel2000 added 2 commits December 7, 2023 15:51

logs

7cd6113

reformat

ee8f2be

mvpatel2000 merged commit 8957550 into mosaicml:dev Dec 7, 2023
11 of 13 checks passed

mvpatel2000 deleted the mvpatel2000/remove-flatten-params branch December 7, 2023 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove flatten params #2761

Remove flatten params #2761

mvpatel2000 commented Dec 5, 2023 •

edited

Loading

dakinggg left a comment

mvpatel2000 commented Dec 5, 2023

Remove flatten params #2761

Remove flatten params #2761

Conversation

mvpatel2000 commented Dec 5, 2023 • edited Loading

What does this PR do?

dakinggg left a comment

Choose a reason for hiding this comment

mvpatel2000 commented Dec 5, 2023

mvpatel2000 commented Dec 5, 2023 •

edited

Loading