Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abra merge test #2870

Merged
merged 73 commits into from
Jan 17, 2024
Merged
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
1dccb14
Bump torch to 2.1.1 version (#2717)
j316chuck Nov 30, 2023
b11f7b6
Add more info when run doesnt complete (#2751)
aspfohl Dec 1, 2023
5d20db1
Lower sequence generation length on code gen to be dependent on max c…
bmosaicml Dec 4, 2023
8957550
Remove flatten params (#2761)
mvpatel2000 Dec 7, 2023
1a8a664
fix lint (#2767)
mvpatel2000 Dec 7, 2023
e87c06d
lint (#2768)
mvpatel2000 Dec 7, 2023
7f55b7a
Use time.tokens for speedmonitor instead of dataset length (#2762)
mvpatel2000 Dec 7, 2023
cb8f937
remove exception (#2759)
mvpatel2000 Dec 8, 2023
f097fd7
time to clean up time parsing 😉 (#2770)
aspfohl Dec 9, 2023
236b738
Upgrade RunConfig compute specification (#2772)
aspfohl Dec 11, 2023
39d6df4
Use async logging in MLflowLogger (#2693)
chenmoneygithub Dec 11, 2023
c04405e
Fix FSDP _param_init_fn to not reinit parameters multiple times (#2765)
dakinggg Dec 11, 2023
bc50049
Gate FSDP param init test on torch 2.1 (#2774)
dakinggg Dec 11, 2023
aad8901
Parallelize OCI multipart download (#2750)
coryMosaicML Dec 12, 2023
f497e60
[UCVolumes] Add support for list API (#2769)
panchalhp-db Dec 12, 2023
a7cad7c
Add the memory timeline profiling support through the PyTorch profile…
cli99 Dec 12, 2023
db3d187
Improve torch memory profiling arguments processing (#2777)
cli99 Dec 13, 2023
0d61164
Add platform AWS and bump aws ofi nccl version (#2776)
willgleich Dec 13, 2023
776d172
Extend checkpoint loading to accept a validation function (#2726)
irenedea Dec 14, 2023
09f4580
Fix checkpoint validation tests for torch 1.13 (#2779)
irenedea Dec 14, 2023
7e0e40a
Bump version to 0.17.2 (#2780)
mvpatel2000 Dec 14, 2023
45bb135
bump transformers version (#2781)
dakinggg Dec 15, 2023
84059b6
Bump sphinxext-opengraph from 0.9.0 to 0.9.1 (#2784)
dependabot[bot] Dec 18, 2023
15324c7
Bump coverage[toml] from 7.3.0 to 7.3.3 (#2783)
dependabot[bot] Dec 18, 2023
b8363bb
Update torch requirement from <2.1.2,>=1.13.1 to >=1.13.1,<2.1.3 (#2785)
dependabot[bot] Dec 18, 2023
af8797d
[UCVolumes] Rely on databricks-sdk auth for the right requirements (#…
panchalhp-db Dec 19, 2023
420cb07
Enable system metrics in mosaic mlflow logger (#2775)
chenmoneygithub Dec 19, 2023
f24f43c
Update parse_uri (#2787)
irenedea Dec 20, 2023
96df92d
default-no-memory-timeline (#2790)
cli99 Dec 20, 2023
a8a261b
Add eot token to ICL generate kwargs (#2782)
bmosaicml Dec 20, 2023
ff145d3
Add nightly image for torch 2.2.0 12-20-23 (#2791)
j316chuck Dec 21, 2023
a3ea7a4
Add torch nightly 12-13 (#2792)
j316chuck Dec 21, 2023
2aa50e7
Add process group as arg to FSDP (#2794)
mvpatel2000 Dec 26, 2023
910223e
Bump coverage[toml] from 7.3.3 to 7.3.4 (#2798)
dependabot[bot] Dec 28, 2023
db424e5
Fix load_ignore_keys with rng (#2803)
mvpatel2000 Jan 2, 2024
070095e
Bump ipykernel from 6.26.0 to 6.28.0 (#2806)
dependabot[bot] Jan 2, 2024
e274ca0
Bump junitparser from 3.1.0 to 3.1.1 (#2805)
dependabot[bot] Jan 2, 2024
9110c57
Bump pytest from 7.4.3 to 7.4.4 (#2807)
dependabot[bot] Jan 2, 2024
ed4e07c
Avoid futures on close for MosaicML logger (#2804)
mvpatel2000 Jan 2, 2024
ee7cb69
check (#2812)
mvpatel2000 Jan 2, 2024
52ac18c
Better communication computation overlap (#2811)
snarayan21 Jan 2, 2024
80b35a7
Improve error message for speed monitor (#2801)
mvpatel2000 Jan 4, 2024
f6ca956
bump torch version (#2814)
mvpatel2000 Jan 4, 2024
4af5076
bump vision (#2815)
mvpatel2000 Jan 4, 2024
206a9ea
fix rng load (#2816)
mvpatel2000 Jan 4, 2024
e5240d2
Correct multi-unshard stream patching for torch 2.2.0dev, and stream …
snarayan21 Jan 4, 2024
2deccf2
fix profiler (#2818)
mvpatel2000 Jan 4, 2024
5592e41
Bump traitlets from 5.13.0 to 5.14.1 (#2822)
dependabot[bot] Jan 8, 2024
c22c61a
All unshard streams wait on computation every step (#2823)
snarayan21 Jan 8, 2024
23bc6fb
Add encoding=utf-8 (#2824)
dakinggg Jan 8, 2024
a36fb74
Fix import for daily test (#2826)
snarayan21 Jan 8, 2024
f50dcaf
[MLFlowObjectStore] [1/2] Base implementation for MLFlowObjectStore (…
jerrychen109 Jan 8, 2024
0aa95e0
Remove fused layernorm (already deprecated for 2 versions) (#2827)
mvpatel2000 Jan 9, 2024
737b462
checkpoint saver tracks all checkpoints/intervals in state (#2819)
aspfohl Jan 9, 2024
c82dcc4
code-quality timeout update (#2830)
aspfohl Jan 9, 2024
7b70dde
[S] Fix how single value tensors are logged (#2831)
aspfohl Jan 9, 2024
94e0386
Adds DTensor Support (#2821)
mvpatel2000 Jan 9, 2024
eb4fbd0
Remove duplicate checkpoint verifications (#2828)
eracah Jan 10, 2024
c48e6fe
Fix seed for FSDP wrap (#2833)
mvpatel2000 Jan 10, 2024
6c63f2e
Remove fsdp patch for comm overlap (#2836)
mvpatel2000 Jan 11, 2024
83fb295
allow hsdp (#2838)
mvpatel2000 Jan 11, 2024
55341aa
Bump torch 2.1.2 (#2840)
mvpatel2000 Jan 12, 2024
2ff7c27
Upgrade pyright to 1.1.310 (#2841)
b-chu Jan 12, 2024
56fa4bd
[MLFlowObjectStore] [2/2] Support checkpointing with MLFlow (#2810)
jerrychen109 Jan 12, 2024
c9f0c21
update nightly to torch 2.3 (#2842)
j316chuck Jan 13, 2024
c19fd36
Pin sphinxcontrib applehelp (#2854)
mvpatel2000 Jan 13, 2024
027c3d0
Update setup.py (#2855)
j316chuck Jan 13, 2024
a2ae299
Torch 2.3 patch (#2849)
dakinggg Jan 14, 2024
abff6de
Update mosaicml-cli requirement from <0.6,>=0.5.25 to >=0.5.25,<0.7 (…
dependabot[bot] Jan 15, 2024
d497d8f
Rewrite to use individual state functions (#2860)
mvpatel2000 Jan 15, 2024
1bc8d0a
Add custom stopping criteria to ICL generate tasks (#2800)
bmosaicml Jan 15, 2024
31ea664
Add save_ignore_keys (#2868)
mvpatel2000 Jan 16, 2024
f5978a8
fix conflicts
cli99 Jan 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Gate FSDP param init test on torch 2.1 (#2774)
  • Loading branch information
dakinggg authored Dec 11, 2023
commit bc50049b165a91d575c3a07768be42bc08097c6f
4 changes: 2 additions & 2 deletions tests/trainer/test_fsdp.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@ def test_fsdp_device_initialization(model: ComposerClassifier, mixed_precision:
@pytest.mark.parametrize('device', _INIT_DEVICES)
@world_size(2)
@pytest.mark.gpu
@pytest.mark.skipif(version.parse(torch.__version__) < version.parse('1.13.0'),
reason='FSDP requires PyTorch 1.13 or higher')
@pytest.mark.skipif(version.parse(torch.__version__) < version.parse('2.1.0'),
reason='This has only been fixed and tested starting with torch 2.1.0')
def test_fsdp_inits_params_once(model: ComposerClassifier, device: str, world_size: int, expected_param_inits: int):
resolved_device = device
if device == 'mixed':
Expand Down
Loading