Releases: microsoft/DeepSpeed
Releases · microsoft/DeepSpeed
v0.7.1: Patch release
What's Changed
- Fix for distributed tests on pytorch>=1.12 by @mrwyattii in #2141
- delay torch import for inference compatability check by @jeffra in #2167
- Fix wrong unit of latency in flops-profiler (#2090) by @zionwu in #2095
- [docs] adoption updates by @jeffra in #2173
- Update for AMD CI workflow by @mrwyattii in #2172
- [docs] update offload docs to include stage 1 by @jeffra in #2178
- Fixing model partitioning without injection by @RezaYazdaniAminabadi in #2179
- Match compute and reduce dtype by @tjruwase in #2145
- Enable fused_lamb_cuda_kernel on ROCm by @rraminen in #2148
- Update README to latest Composer version by @hanlint in #2177
- [deepspeed/autotuner] Missing hjson import by @rahilbathwal5 in #2175
- [docs] add more models to adoption by @jeffra in #2189
- [CI] fix lightning tests by @jeffra in #2190
- Fix typos on README.md by @gasparitiago in #2192
- Fix the layer-past for GPT based models by @RezaYazdaniAminabadi in #2196
- Add gradient_average flag support for sparse grads by @Dipet in #2188
- Adding the compression tutorial on GPT distillation and quantization by @minjiaz in #2197
- Log user config exactly by @tjruwase in #2201
- Fix the tensor-slicing copy for qkv parameters by @RezaYazdaniAminabadi in #2198
- Refactor Distributed Tests by @mrwyattii in #2180
- fix table syntax by @kamalkraj in #2204
- Correctly detect offload configuration by @tjruwase in #2208
- add cuda 11.7 by @jeffra in #2211
- use torch 1.9 in accelerate tests by @jeffra in #2215
- [zero-3] print warning once and support torch parameter by @awan-10 in #2127
- Add support of OPT models by @arashb in #2205
- fix typos in readme. by @zhjohnchan in #2218
- Fix regression w. dist_init_required by @jeffra in #2225
- add doc for new bert example by @conglongli in #2224
- Remove the random-generator from context during inference by @RezaYazdaniAminabadi in #2228
- allow saving ckpt w/o ckpt json + bloom copy fix by @jeffra in #2237
- Correctly detect zero_offload by @tjruwase in #2213
- [docs] update community videos by @jeffra in #2249
- Refactor dist tests: Checkpointing by @tjruwase in #2202
- Make OPT policy backward compatible with pre-OPT transformers versions by @arashb in #2254
- fix ds-inference without policy by @RezaYazdaniAminabadi in #2247
New Contributors
- @zionwu made their first contribution in #2095
- @hanlint made their first contribution in #2177
- @rahilbathwal5 made their first contribution in #2175
- @gasparitiago made their first contribution in #2192
- @arashb made their first contribution in #2205
- @zhjohnchan made their first contribution in #2218
Full Changelog: v0.7.0...v0.7.1
DeepSpeed v0.7.0
New features
- DeepSpeed Compression: https://www.microsoft.com/en-us/research/blog/deepspeed-compression-a-composable-library-for-extreme-compression-and-zero-cost-quantization/
What's Changed
- Adding DeepSpeed Compression Composer by @yaozhewei in #2105
- Remove hardcoded ROCm install path by @mrwyattii in #2093
- Fix softmax dim of Residual MoE implementation in moe/layer.py by @hero007feng in #2110
- reduce ds-inference log verbosity by @jeffra in #2111
- DeepSpeed Compression announcement by @conglongli in #2114
- Checkpoint reshaping by @tjruwase in #1953
- Fix init_process_group by @Quentin-Anthony in #2121
- DS Benchmarks QoL Improvements by @Quentin-Anthony in #2120
- [ROCm] Wrong command broke ROCm build. by @jpvillam-amd in #2118
- DeepSpeed Communication Profiling and Logging by @Quentin-Anthony in #2012
- Add flake8 to pre-commit checks by @aphedges in #2051
- Fix conflict between Tutel and top-2 gate in MoE layer by @yetiansh in #2053
- adding HF Accelerate+DS tests workflow by @pacman100 in #2134
- [inference tests] turn off time check for now by @jeffra in #2142
- Allow turning off loss scaling wrt GAS + update tput calculator by @jeffra in #2140
- Refactor ZeRO configs to use Pydantic by @mrwyattii in #2004
- Add purely-local sliding window sparse attention config by @Quentin-Anthony in #1962
- Trajepl/nebula ckpt engine by @trajepl in #2085
- Graceful exit on failures for multi-node runs by @jerrymannil in #2008
- fix: fix BF16_Optimizer compatibility issue by @shjwudp in #2152
- Fix random token-generation issue + MP-checkpoint loading/saving by @RezaYazdaniAminabadi in #2132
- Added retain_graph as a kwarg to the main engine backward function by @ncilfone in #1149
- Elastic Training support in DeepSpeed by @aj-prime in #2156
- prevent cuda 10 builds of inference kernels on ampere by @jeffra in #2157
- [zero-3] shutdown zero.Init from within ds.init by @jeffra in #2150
- enable fp16 input autocasting by @jeffra in #2158
- Release swap buffers for persisted params by @tjruwase in #2089
- Tensor parallelism for Mixture of Experts by @siddharth9820 in #2074
New Contributors
- @hero007feng made their first contribution in #2110
- @jpvillam-amd made their first contribution in #2118
- @yetiansh made their first contribution in #2053
- @pacman100 made their first contribution in #2134
- @jimwu6 made their first contribution in #2144
- @trajepl made their first contribution in #2085
- @ncilfone made their first contribution in #1149
Full Changelog: v0.6.7...v0.7.0
v0.6.7: Patch release
What's Changed
- Add Inference support for running the BigScience-BLOOM Architecture by @RezaYazdaniAminabadi in #2083
- [ds-inference] checkpoint loading => tqdm by @stas00 in #2107
- Dont overwrite hook handles in flop profiler by @Sanger2000 in #2106
- Support HuggingFace NeoX injection policy by @mrwyattii in #2087
Full Changelog: v0.6.6...v0.6.7
v0.6.6: Patch release
What's Changed
- [docs] add 530b paper by @jeffra in #1979
- small fix for the HF Bert models by @RezaYazdaniAminabadi in #1984
- Add unit test for various model families and inference tasks by @mrwyattii in #1981
- Fix for lightning tests by @mrwyattii in #1988
- fix typo when getting kernel dim in conv calculation by @cli99 in #1989
- Add torch-latest and torch-nightly CI workflows by @mrwyattii in #1990
- [bug] Add user-defined launcher args for MPI launcher by @mrwyattii in #1933
- Propagate max errorcode to deepspeed when using PDSH launcher by @jerrymannil in #1994
- [docs] add new build badges to landing page by @jeffra in #1998
- DeepSpeed Comm. Backend v1 by @awan-10 in #1985
- Relax DeepSpeed MoE ZeRO-1 Assertion by @Quentin-Anthony in #2007
- update CODEOWNERS by @conglongli in #2017
- [CI] force upgrade HF dependencies & output py env by @jeffra in #2015
- [inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) by @jeffra in #1992
- DeepSpeed examples refresh by @jeffra in #2021
- Fix transformer API for training-evaluation pipeline by @RezaYazdaniAminabadi in #2018
- DataLoader Length Fix by @Sanger2000 in #1718
- DeepSpeed Monitor Module (Master) by @Quentin-Anthony in #2013
- Use partition numel by @tjruwase in #2011
- fix import errors by @KMFODA in #2026
- Fix inference unit test import error catching by @mrwyattii in #2024
- Retain available params until last use by @tjruwase in #2016
- Split parameter offload from z3 by @tjruwase in #2009
- Fix flops profiler print statements by @mrwyattii in #2038
- Add compression papers by @conglongli in #2042
- Fix the half-precision version of CPU-Adam by @RezaYazdaniAminabadi in #2032
- Fix for AMD unit tests by @mrwyattii in #2047
- Wrong partition_id while copying fp32_params -> fp16 params in Z2 for MoE by @siddharth9820 in #2058
- Fix missing import in replace_module.py by @aphedges in #2050
- Comms Benchmarks by @Quentin-Anthony in #2040
- add ds inference paper by @jeffra in #2072
- Comments for better understanding of zero stage1_2 by @kisseternity in #2027
- [docs] fix broken read-the-docs build by @jeffra in #2075
- Fix building package without a GPU by @aphedges in #2049
- Fix partition id in the fp32->fp16 param copying step for z2+cpu-offload by @siddharth9820 in #2059
- Codeowner addendum and fix to small model debugging script by @samadejacobs in #2076
- remove require grad in params count by @cli99 in #2065
- Add missing newline for ZeroOneAdam parameter table by @manuelciosici in #2088
- fixed "None type has no len()" by @xiazeyu in #2091
- Improving memory utilization of Z2+MoE by @siddharth9820 in #2079
New Contributors
- @jerrymannil made their first contribution in #1994
- @Sanger2000 made their first contribution in #1718
- @KMFODA made their first contribution in #2026
- @siddharth9820 made their first contribution in #2058
- @samadejacobs made their first contribution in #2076
- @xiazeyu made their first contribution in #2091
Full Changelog: v0.6.5...v0.6.6
v0.6.5: Patch release
What's Changed
- GatheredParameters - accept a tuple of params by @stas00 in #1941
- Update partition_parameters.py by @manuelciosici in #1943
- fix step in adam by @szhengac in #1823
- [pipe] prevent deadlock with multiple evals sequence by @stas00 in #1944
- Fairseq support by @jeffra in #1915
- DeepSpeed needs to start cleaning up by @tjruwase in #1947
- trivial fix by @kisseternity in #1954
- Enabling CUDA-graph for the bert-type models by @RezaYazdaniAminabadi in #1952
- Add loss scale guard to avoid inf loop by @Quentin-Anthony in #1958
- [launcher] add option to bypass ssh check by @liamcli in #1957
- Bump nokogiri from 1.13.4 to 1.13.6 in /docs by @dependabot in #1965
- Fix typo in timer.py by @Quentin-Anthony in #1964
- [docs] fix dependabot version issue by @jeffra in #1966
- Don't add curand on rocm by @jeffra in #1968
- Add Unidirectional Sparse Attention Type to BigBird and BSLongformer by @Quentin-Anthony in #1959
- Fix: Sparse tensors not updating by @Dipet in #1914
- Fixing several bugs in the inference-api and the kernels by @RezaYazdaniAminabadi in #1951
New Contributors
- @Quentin-Anthony made their first contribution in #1958
Full Changelog: v0.6.4...v0.6.5
v0.6.4: Patch release
What's Changed
- [fix] Windows installs cannot import fcntl by @mrwyattii in #1921
- [build] explicitly add op_builder to manifest by @jeffra in #1920
- Enable DeepSpeed inference on ROCm by @rraminen in #1922
- bf16 inference by @tjruwase in #1917
- spell err by @kisseternity in #1929
- [ZeRO-3] Rename confusing log message by @jeffra in #1932
- [bug] Fix time log error in PipelineEngine by @Codle in #1934
- Improve z3 trace management by @tjruwase in #1916
New Contributors
- @kisseternity made their first contribution in #1929
- @Codle made their first contribution in #1934
Full Changelog: v0.6.3...v0.6.4
v0.6.3: Patch release
What's Changed
- Fix setup.py crash when torch is not installed. by @PaperclipBadger in #1866
- Add support for AWS SageMaker. by @matherit in #1868
- Fix broken links by @tjruwase in #1873
- [docs] add amd blog to website by @jeffra in #1874
- [docs] add moe paper by @jeffra in #1875
- Supporting multiple modules injection with a single policy when they … by @samyam in #1869
- [docs] fix dead links by @jeffra in #1877
- add now required
-lcurand
to solveundefined symbol: curandCreateGenerator
by @stas00 in #1879 - Bug fix for flops profilers output by @VisionTheta in #1885
- Bump nokogiri from 1.13.3 to 1.13.4 in /docs by @dependabot in #1889
- [docs] fix commonmarker security issue by @jeffra in #1892
- bf16+pipeline parallelism by @tjruwase in #1801
- fix file ordering by @szhengac in #1822
- Use f-strings where possible by @manuelciosici in #1900
- [partition_parameters.py] better diagnostics by @stas00 in #1887
- comm backend: cast bool when not supported by torch2cupy by @conglongli in #1894
- Use cuda events to improve timing for multi-stream execution by @tjruwase in #1881
- Fix multiple zero 3 tracing errors by @tjruwase in #1901
- Improve ds_report output for HIP/ROCm by @mrwyattii in #1906
- Fix launcher for reading env vars by @szhengac in #1907
- Fix OOM and type mismatch by @tjruwase in #1884
New Contributors
- @PaperclipBadger made their first contribution in #1866
- @matherit made their first contribution in #1868
- @VisionTheta made their first contribution in #1885
- @szhengac made their first contribution in #1822
Misc
- v0.6.2 was skipped due to a build/deploy issue with that release
Full Changelog: v0.6.1...v0.6.3
v0.6.1: Patch release
qkv_out can be a single tensor or a list. Handling these cases separe…
DeepSpeed v0.6.0
DeepSpeed v0.6.0
Release notes
New features
- Advancing MoE inference and training to power next-generation AI scale
- MoE inference
- PR-MoE model support
- AMD support (#1430)
- Various ZeRO Stage3 Optimizations + Improvements (#1453)
Special thanks to our contributors in this release
@stas00, @jithunnair-amd, @rraminen, @jeffdaily, @okakarpa, @jfc4050, @raamjad, @aphedges, @SeanNaren, @liamcli, @andriyor, @manuelciosici
v0.5.10: Patch release
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memor…