23 Aug 23:12

jeffra

c35bfe8

v0.7.1: Patch release

What's Changed

Fix for distributed tests on pytorch>=1.12 by @mrwyattii in #2141
delay torch import for inference compatability check by @jeffra in #2167
Fix wrong unit of latency in flops-profiler (#2090) by @zionwu in #2095
[docs] adoption updates by @jeffra in #2173
Update for AMD CI workflow by @mrwyattii in #2172
[docs] update offload docs to include stage 1 by @jeffra in #2178
Fixing model partitioning without injection by @RezaYazdaniAminabadi in #2179
Match compute and reduce dtype by @tjruwase in #2145
Enable fused_lamb_cuda_kernel on ROCm by @rraminen in #2148
Update README to latest Composer version by @hanlint in #2177
[deepspeed/autotuner] Missing hjson import by @rahilbathwal5 in #2175
[docs] add more models to adoption by @jeffra in #2189
[CI] fix lightning tests by @jeffra in #2190
Fix typos on README.md by @gasparitiago in #2192
Fix the layer-past for GPT based models by @RezaYazdaniAminabadi in #2196
Add gradient_average flag support for sparse grads by @Dipet in #2188
Adding the compression tutorial on GPT distillation and quantization by @minjiaz in #2197
Log user config exactly by @tjruwase in #2201
Fix the tensor-slicing copy for qkv parameters by @RezaYazdaniAminabadi in #2198
Refactor Distributed Tests by @mrwyattii in #2180
fix table syntax by @kamalkraj in #2204
Correctly detect offload configuration by @tjruwase in #2208
add cuda 11.7 by @jeffra in #2211
use torch 1.9 in accelerate tests by @jeffra in #2215
[zero-3] print warning once and support torch parameter by @awan-10 in #2127
Add support of OPT models by @arashb in #2205
fix typos in readme. by @zhjohnchan in #2218
Fix regression w. dist_init_required by @jeffra in #2225
add doc for new bert example by @conglongli in #2224
Remove the random-generator from context during inference by @RezaYazdaniAminabadi in #2228
allow saving ckpt w/o ckpt json + bloom copy fix by @jeffra in #2237
Correctly detect zero_offload by @tjruwase in #2213
[docs] update community videos by @jeffra in #2249
Refactor dist tests: Checkpointing by @tjruwase in #2202
Make OPT policy backward compatible with pre-OPT transformers versions by @arashb in #2254
fix ds-inference without policy by @RezaYazdaniAminabadi in #2247

New Contributors

@zionwu made their first contribution in #2095
@hanlint made their first contribution in #2177
@rahilbathwal5 made their first contribution in #2175
@gasparitiago made their first contribution in #2192
@arashb made their first contribution in #2205
@zhjohnchan made their first contribution in #2218

Full Changelog: v0.7.0...v0.7.1

Contributors

arashb, jeffra, and 14 other contributors

Assets 2

01 Aug 21:25

jeffra

v0.7.0

5fe9d61

DeepSpeed v0.7.0

New features

DeepSpeed Compression: https://www.microsoft.com/en-us/research/blog/deepspeed-compression-a-composable-library-for-extreme-compression-and-zero-cost-quantization/

What's Changed

Adding DeepSpeed Compression Composer by @yaozhewei in #2105
Remove hardcoded ROCm install path by @mrwyattii in #2093
Fix softmax dim of Residual MoE implementation in moe/layer.py by @hero007feng in #2110
reduce ds-inference log verbosity by @jeffra in #2111
DeepSpeed Compression announcement by @conglongli in #2114
Checkpoint reshaping by @tjruwase in #1953
Fix init_process_group by @Quentin-Anthony in #2121
DS Benchmarks QoL Improvements by @Quentin-Anthony in #2120
[ROCm] Wrong command broke ROCm build. by @jpvillam-amd in #2118
DeepSpeed Communication Profiling and Logging by @Quentin-Anthony in #2012
Add flake8 to pre-commit checks by @aphedges in #2051
Fix conflict between Tutel and top-2 gate in MoE layer by @yetiansh in #2053
adding HF Accelerate+DS tests workflow by @pacman100 in #2134
[inference tests] turn off time check for now by @jeffra in #2142
Allow turning off loss scaling wrt GAS + update tput calculator by @jeffra in #2140
Refactor ZeRO configs to use Pydantic by @mrwyattii in #2004
Add purely-local sliding window sparse attention config by @Quentin-Anthony in #1962
Trajepl/nebula ckpt engine by @trajepl in #2085
Graceful exit on failures for multi-node runs by @jerrymannil in #2008
fix: fix BF16_Optimizer compatibility issue by @shjwudp in #2152
Fix random token-generation issue + MP-checkpoint loading/saving by @RezaYazdaniAminabadi in #2132
Added retain_graph as a kwarg to the main engine backward function by @ncilfone in #1149
Elastic Training support in DeepSpeed by @aj-prime in #2156
prevent cuda 10 builds of inference kernels on ampere by @jeffra in #2157
[zero-3] shutdown zero.Init from within ds.init by @jeffra in #2150
enable fp16 input autocasting by @jeffra in #2158
Release swap buffers for persisted params by @tjruwase in #2089
Tensor parallelism for Mixture of Experts by @siddharth9820 in #2074

New Contributors

@hero007feng made their first contribution in #2110
@jpvillam-amd made their first contribution in #2118
@yetiansh made their first contribution in #2053
@pacman100 made their first contribution in #2134
@jimwu6 made their first contribution in #2144
@trajepl made their first contribution in #2085
@ncilfone made their first contribution in #1149

Full Changelog: v0.6.7...v0.7.0

Contributors

jeffra, conglongli, and 17 other contributors

Assets 2

19 Jul 20:00

jeffra

v0.6.7

ee7ea3b

v0.6.7: Patch release

What's Changed

Add Inference support for running the BigScience-BLOOM Architecture by @RezaYazdaniAminabadi in #2083
[ds-inference] checkpoint loading => tqdm by @stas00 in #2107
Dont overwrite hook handles in flop profiler by @Sanger2000 in #2106
Support HuggingFace NeoX injection policy by @mrwyattii in #2087

Full Changelog: v0.6.6...v0.6.7

Contributors

stas00, Sanger2000, and 2 other contributors

Assets 2

18 Jul 20:22

jeffra

v0.6.6

c1af73f

v0.6.6: Patch release

What's Changed

[docs] add 530b paper by @jeffra in #1979
small fix for the HF Bert models by @RezaYazdaniAminabadi in #1984
Add unit test for various model families and inference tasks by @mrwyattii in #1981
Fix for lightning tests by @mrwyattii in #1988
fix typo when getting kernel dim in conv calculation by @cli99 in #1989
Add torch-latest and torch-nightly CI workflows by @mrwyattii in #1990
[bug] Add user-defined launcher args for MPI launcher by @mrwyattii in #1933
Propagate max errorcode to deepspeed when using PDSH launcher by @jerrymannil in #1994
[docs] add new build badges to landing page by @jeffra in #1998
DeepSpeed Comm. Backend v1 by @awan-10 in #1985
Relax DeepSpeed MoE ZeRO-1 Assertion by @Quentin-Anthony in #2007
update CODEOWNERS by @conglongli in #2017
[CI] force upgrade HF dependencies & output py env by @jeffra in #2015
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) by @jeffra in #1992
DeepSpeed examples refresh by @jeffra in #2021
Fix transformer API for training-evaluation pipeline by @RezaYazdaniAminabadi in #2018
DataLoader Length Fix by @Sanger2000 in #1718
DeepSpeed Monitor Module (Master) by @Quentin-Anthony in #2013
Use partition numel by @tjruwase in #2011
fix import errors by @KMFODA in #2026
Fix inference unit test import error catching by @mrwyattii in #2024
Retain available params until last use by @tjruwase in #2016
Split parameter offload from z3 by @tjruwase in #2009
Fix flops profiler print statements by @mrwyattii in #2038
Add compression papers by @conglongli in #2042
Fix the half-precision version of CPU-Adam by @RezaYazdaniAminabadi in #2032
Fix for AMD unit tests by @mrwyattii in #2047
Wrong partition_id while copying fp32_params -> fp16 params in Z2 for MoE by @siddharth9820 in #2058
Fix missing import in replace_module.py by @aphedges in #2050
Comms Benchmarks by @Quentin-Anthony in #2040
add ds inference paper by @jeffra in #2072
Comments for better understanding of zero stage1_2 by @kisseternity in #2027
[docs] fix broken read-the-docs build by @jeffra in #2075
Fix building package without a GPU by @aphedges in #2049
Fix partition id in the fp32->fp16 param copying step for z2+cpu-offload by @siddharth9820 in #2059
Codeowner addendum and fix to small model debugging script by @samadejacobs in #2076
remove require grad in params count by @cli99 in #2065
Add missing newline for ZeroOneAdam parameter table by @manuelciosici in #2088
fixed "None type has no len()" by @xiazeyu in #2091
Improving memory utilization of Z2+MoE by @siddharth9820 in #2079

New Contributors

@jerrymannil made their first contribution in #1994
@Sanger2000 made their first contribution in #1718
@KMFODA made their first contribution in #2026
@siddharth9820 made their first contribution in #2058
@samadejacobs made their first contribution in #2076
@xiazeyu made their first contribution in #2091

Full Changelog: v0.6.5...v0.6.6

Contributors

manuelciosici, jeffra, and 15 other contributors

Assets 2

25 May 20:35

jeffra

v0.6.5

8164ea9

v0.6.5: Patch release

What's Changed

GatheredParameters - accept a tuple of params by @stas00 in #1941
Update partition_parameters.py by @manuelciosici in #1943
fix step in adam by @szhengac in #1823
[pipe] prevent deadlock with multiple evals sequence by @stas00 in #1944
Fairseq support by @jeffra in #1915
DeepSpeed needs to start cleaning up by @tjruwase in #1947
trivial fix by @kisseternity in #1954
Enabling CUDA-graph for the bert-type models by @RezaYazdaniAminabadi in #1952
Add loss scale guard to avoid inf loop by @Quentin-Anthony in #1958
[launcher] add option to bypass ssh check by @liamcli in #1957
Bump nokogiri from 1.13.4 to 1.13.6 in /docs by @dependabot in #1965
Fix typo in timer.py by @Quentin-Anthony in #1964
[docs] fix dependabot version issue by @jeffra in #1966
Don't add curand on rocm by @jeffra in #1968
Add Unidirectional Sparse Attention Type to BigBird and BSLongformer by @Quentin-Anthony in #1959
Fix: Sparse tensors not updating by @Dipet in #1914
Fixing several bugs in the inference-api and the kernels by @RezaYazdaniAminabadi in #1951

New Contributors

@Quentin-Anthony made their first contribution in #1958

Full Changelog: v0.6.4...v0.6.5

Contributors

manuelciosici, jeffra, and 9 other contributors

Assets 2

06 May 17:32

jeffra

v0.6.4

673cb60

v0.6.4: Patch release

What's Changed

[fix] Windows installs cannot import fcntl by @mrwyattii in #1921
[build] explicitly add op_builder to manifest by @jeffra in #1920
Enable DeepSpeed inference on ROCm by @rraminen in #1922
bf16 inference by @tjruwase in #1917
spell err by @kisseternity in #1929
[ZeRO-3] Rename confusing log message by @jeffra in #1932
[bug] Fix time log error in PipelineEngine by @Codle in #1934
Improve z3 trace management by @tjruwase in #1916

New Contributors

@kisseternity made their first contribution in #1929
@Codle made their first contribution in #1934

Full Changelog: v0.6.3...v0.6.4

Contributors

jeffra, tjruwase, and 4 other contributors

Assets 2

27 Apr 00:30

jeffra

v0.6.3

dda0336

v0.6.3: Patch release

What's Changed

Fix setup.py crash when torch is not installed. by @PaperclipBadger in #1866
Add support for AWS SageMaker. by @matherit in #1868
Fix broken links by @tjruwase in #1873
[docs] add amd blog to website by @jeffra in #1874
[docs] add moe paper by @jeffra in #1875
Supporting multiple modules injection with a single policy when they … by @samyam in #1869
[docs] fix dead links by @jeffra in #1877
add now required -lcurand to solve undefined symbol: curandCreateGenerator by @stas00 in #1879
Bug fix for flops profilers output by @VisionTheta in #1885
Bump nokogiri from 1.13.3 to 1.13.4 in /docs by @dependabot in #1889
[docs] fix commonmarker security issue by @jeffra in #1892
bf16+pipeline parallelism by @tjruwase in #1801
fix file ordering by @szhengac in #1822
Use f-strings where possible by @manuelciosici in #1900
[partition_parameters.py] better diagnostics by @stas00 in #1887
comm backend: cast bool when not supported by torch2cupy by @conglongli in #1894
Use cuda events to improve timing for multi-stream execution by @tjruwase in #1881
Fix multiple zero 3 tracing errors by @tjruwase in #1901
Improve ds_report output for HIP/ROCm by @mrwyattii in #1906
Fix launcher for reading env vars by @szhengac in #1907
Fix OOM and type mismatch by @tjruwase in #1884

New Contributors

@PaperclipBadger made their first contribution in #1866
@matherit made their first contribution in #1868
@VisionTheta made their first contribution in #1885
@szhengac made their first contribution in #1822

Misc

v0.6.2 was skipped due to a build/deploy issue with that release

Full Changelog: v0.6.1...v0.6.3

Contributors

manuelciosici, jeffra, and 10 other contributors

Assets 2

28 Mar 22:44

jeffra

v0.6.1

ebbcfd5

v0.6.1: Patch release

qkv_out can be a single tensor or a list. Handling these cases separe…

Assets 2

07 Mar 21:00

jeffra

v0.6.0

a32e9b3

DeepSpeed v0.6.0

Release notes

New features

Advancing MoE inference and training to power next-generation AI scale
- MoE inference
- PR-MoE model support
AMD support (#1430)
Various ZeRO Stage3 Optimizations + Improvements (#1453)

Special thanks to our contributors in this release

@stas00, @jithunnair-amd, @rraminen, @jeffdaily, @okakarpa, @jfc4050, @raamjad, @aphedges, @SeanNaren, @liamcli, @andriyor, @manuelciosici

Contributors

manuelciosici, jeffdaily, and 10 other contributors

Assets 2

19 Jan 00:37

jeffra

v0.5.10

3293cf7

v0.5.10: Patch release

[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memor…

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

New features

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Misc

Contributors

DeepSpeed v0.6.0

Release notes

New features

Special thanks to our contributors in this release

Contributors

Releases: microsoft/DeepSpeed

v0.7.1: Patch release

What's Changed

New Contributors

Contributors

DeepSpeed v0.7.0

New features

What's Changed

New Contributors

Contributors

v0.6.7: Patch release

What's Changed

Contributors

v0.6.6: Patch release

What's Changed

New Contributors

Contributors

v0.6.5: Patch release

What's Changed

New Contributors

Contributors

v0.6.4: Patch release

What's Changed

New Contributors

Contributors

v0.6.3: Patch release

What's Changed

New Contributors

Misc

Contributors

v0.6.1: Patch release

DeepSpeed v0.6.0

DeepSpeed v0.6.0

Release notes

New features

Special thanks to our contributors in this release

Contributors

v0.5.10: Patch release