Skip to content

Commit 8575200

Browse files
authored
add links (deepspeedai#56)
1 parent 010f6dc commit 8575200

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,11 @@ efficient, and effective.
1010

1111
DeepSpeed can train DL models with over a hundred billion parameters on current
1212
generation of GPU clusters, while achieving over 5x in system performance
13-
compared to the state-of-art.
13+
compared to the state-of-art. Early adopters of DeepSpeed have already produced
14+
a language model (LM) with over 17B parameters called
15+
[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
16+
establishing a new SOTA in the LM category.
17+
1418

1519
# Table of Contents
1620

@@ -84,6 +88,12 @@ replicated across data-parallel processes, ZeRO partitions model states to save
8488
significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
8589
4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054).
8690

91+
With this impressive memory reduction, early adopters of DeepSpeed have already
92+
produced alanguage model (LM) with over 17B parameters called
93+
[Turing-NLG](https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft),
94+
establishing a new SOTA in the LM category.
95+
96+
8797
## Scalability
8898
DeepSpeed supports efficient data parallelism, model parallelism, and their
8999
combination. ZeRO boosts the scaling capability and efficiency further.

0 commit comments

Comments
 (0)