Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curriculum learning #1307

Merged
merged 5 commits into from
Aug 16, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add doc/tutorial/assertion
  • Loading branch information
conglongli committed Aug 16, 2021
commit 7c8c93438db04458d3bb51a497c831cf9dfcf86a
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)


# News
* [2021/08/16] [Curriculum learning: a regularization method for stable and 2.6x faster GPT-2 pre-training with 8x/4x larger batch size/learning rate](https://www.deepspeed.ai/tutorials/curriculum-learning/)
* [2021/05/24] [DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression](https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/)
* [2021/04/20] [1-bit LAMB: up to 4.6x less communication and 2.8x faster training, together with LAMB's convergence speed at large batch sizes](https://www.deepspeed.ai/tutorials/onebit-lamb/)
* [2021/04/19] [ZeRO-Infinity unlocks unprecedented model scale for deep learning training](https://www.microsoft.com/en-us/research/blog/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training/)
Expand Down Expand Up @@ -148,6 +149,10 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage.
* Learning Rate Range Test
* 1Cycle Learning Rate Schedule
* [Simplified Data Loader](https://www.deepspeed.ai/features/#simplified-data-loader)
* [Curriculum Learning](https://www.deepspeed.ai/tutorials/curriculum-learning/)
* A curriculum learning-based data pipeline that presents easier or simpler examples earlier during training
* Stable and 2.6x faster GPT-2 pre-training with 8x/4x larger batch size/learning rate while maintaining token-wise convergence speed
* Complementary to many other DeepSpeed features
* [Performance Analysis and Debugging](https://www.deepspeed.ai/features/#performance-analysis-and-debugging)


Expand Down Expand Up @@ -198,9 +203,10 @@ Conduct](https://opensource.microsoft.com/codeofconduct/). For more information
2. Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. (2020) DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. [In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20, Tutorial)](https://dl.acm.org/doi/10.1145/3394486.3406703).
3. Minjia Zhang, Yuxiong He. (2020) Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. [arXiv:2010.13369](https://arxiv.org/abs/2010.13369) and [NeurIPS 2020](https://proceedings.neurips.cc/paper/2020/hash/a1140a3d0df1c81e24ae954d935e8926-Abstract.html).
4. Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He. (2021) ZeRO-Offload: Democratizing Billion-Scale Model Training. [arXiv:2101.06840](https://arxiv.org/abs/2101.06840).
5. Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He. (2021) 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. [arXiv:2102.02888](https://arxiv.org/abs/2102.02888).
5. Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He. (2021) 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. [arXiv:2102.02888](https://arxiv.org/abs/2102.02888) and [ICML 2021](http://proceedings.mlr.press/v139/tang21a.html).
6. Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He. (2021) ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. [arXiv:2104.07857](https://arxiv.org/abs/2104.07857).
7. Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He. (2021) 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. [arXiv:2104.06069](https://arxiv.org/abs/2104.06069).
8. Conglong Li, Minjia Zhang, Yuxiong He. (2021) Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training. [arXiv:2108.06084](https://arxiv.org/abs/2108.06084).

# Videos
1. DeepSpeed KDD 2020 Tutorial
Expand Down
15 changes: 15 additions & 0 deletions deepspeed/runtime/data_pipeline/curriculum_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ class CurriculumScheduler(object):
def __init__(self, config):
super().__init__()
self.state = {}
assert "curriculum_type" in config, "Curriculum learning requires the config 'curriculum_type'"
assert "min_difficulty" in config, "Curriculum learning requires the config 'min_difficulty'"
assert "max_difficulty" in config, "Curriculum learning requires the config 'max_difficulty'"
assert "schedule_type" in config, "Curriculum learning requires the config 'schedule_type'"
self.state['min_difficulty'] = config['min_difficulty']
self.state['max_difficulty'] = config['max_difficulty']
self.state['current_difficulty'] = config['min_difficulty']
Expand All @@ -25,6 +29,12 @@ def __init__(self, config):
The self.state['schedule'] is a dictionary of
difficulty : [max step for this difficulty, next difficulty].
"""
assert "difficulty" in config['schedule_config'], "Curriculum learning with fixed_discrete schedule requires the schedule_config 'difficulty'"
assert "max_step" in config['schedule_config'], "Curriculum learning with fixed_discrete schedule requires the schedule_config 'max_step'"
assert len(config['schedule_config']['max_step']) > 0
assert len(config['schedule_config']['difficulty']) > 0
assert len(config['schedule_config']['difficulty']) == len(
config['schedule_config']['max_step']) + 1
self.state['schedule'] = {}
for i in range(len(config['schedule_config']['max_step'])):
self.state['schedule'][config['schedule_config']['difficulty'][i]] = \
Expand All @@ -49,6 +59,9 @@ def __init__(self, config):
"root_degree": 2
}
"""
assert "total_step" in config['schedule_config'], "Curriculum learning with fixed_root schedule requires the schedule_config 'total_step'"
assert "difficulty_step" in config['schedule_config'], "Curriculum learning with fixed_root schedule requires the schedule_config 'difficulty_step'"
assert "root_degree" in config['schedule_config'], "Curriculum learning with fixed_root schedule requires the schedule_config 'root_degree'"
self.state['schedule'] = config['schedule_config']
elif config['schedule_type'] == 'fixed_linear':
"""
Expand All @@ -59,6 +72,8 @@ def __init__(self, config):
"difficulty_step": 8
}
"""
assert "total_step" in config['schedule_config'], "Curriculum learning with fixed_linear schedule requires the schedule_config 'total_step'"
assert "difficulty_step" in config['schedule_config'], "Curriculum learning with fixed_linear schedule requires the schedule_config 'difficulty_step'"
self.state['schedule'] = config['schedule_config']
else:
raise RuntimeError('Unsupported curriculum schedule type')
Expand Down
1 change: 1 addition & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ collections:
- bert-finetuning.md
- bert-pretraining.md
- cifar-10.md
- curriculum-learning.md
- flops-profiler.md
- gan.md
- lrrt.md
Expand Down
2 changes: 2 additions & 0 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ lnav:
url: /tutorials/bert-pretraining/
- title: "CIFAR-10"
url: /tutorials/cifar-10/
- title: "Curriculum Learning"
url: /tutorials/curriculum-learning/
- title: "Flops Profiler"
url: /tutorials/flops-profiler/
- title: "GAN"
Expand Down
76 changes: 76 additions & 0 deletions docs/_pages/config-json.md
Original file line number Diff line number Diff line change
Expand Up @@ -716,3 +716,79 @@ Configuring the asynchronous I/O module for offloading parameter and optimizer s
"num_sliding_window_blocks": 3
}
```

### Curriculum Learning
```json
"curriculum_learning": {
"enabled": true,
"curriculum_type": "seqlen",
"min_difficulty": 8,
"max_difficulty": 1024,
"schedule_type": "fixed_linear",
"schedule_config": {
"total_step": 40000,
"difficulty_step": 8
}
}
```
<i>**enabled**</i>: [boolean]

| Description | Default |
| ----------------------------------------- | ------- |
| Set to true to enable curriculum learning | `false` |

<i>**curriculum_type**</i>: [string]

| Description | Default |
| ----------------------------------------------------------------- | ------- |
| Type of curriculum difficulty metric. Currently support `seqlen`. | N/A |


<i>**min_difficulty**</i>: [integer]

| Description | Default |
| ----------------------------- | ------- |
| The starting difficulty level | N/A |

<i>**max_difficulty**</i>: [integer]

| Description | Default |
| --------------------------- | ------- |
| The ending difficulty level | N/A |

<i>**schedule_type**</i>: [string]

| Description | Default |
| -------------------------------------------------------------------------------------------------- | ------- |
| Type of curriculum schedule. Currently support `fixed_linear`, `fixed_root`, and `fixed_discrete`. | N/A |


<i>**total_step**</i>: [integer]

| Description | Default |
| --------------------------------------------------------------- | ------- |
| Total number of steps for the curriculum learning. One of the `schedule_config` when the `fixed_linear` and `fixed_root` schedule_type are used. | N/A |

<i>**difficulty_step**</i>: [integer]

| Description | Default |
| --------------------------------------------------------------- | ------- |
| At any time, the curriculum learning difficulty must be multiple of this `difficulty_step`. Set this to multiple of 8 (for FP16 data) or 16 (for INT8 data) to enable NVIDIA Tensor Core acceleration. One of the `schedule_config` when the `fixed_linear` and `fixed_root` schedule_type are used. | N/A |
conglongli marked this conversation as resolved.
Show resolved Hide resolved

<i>**root_degree**</i>: [integer]

| Description | Default |
| --------------------------------------------------------------- | ------- |
| Root degree of the curriculum schedule function. One of the `schedule_config` when the `fixed_root` schedule_type is used. | N/A |

<i>**difficulty**</i>: [list of integer]

| Description | Default |
| --------------------------------------------------------------- | ------- |
| List of difficulty levels to be used during schedule. One of the `schedule_config` when the `fixed_discrete` schedule_type is used. | N/A |

<i>**max_step**</i>: [list of integer]

| Description | Default |
| --------------------------------------------------------------- | ------- |
| List of which step to change difficulty level. One of the `schedule_config` when the `fixed_discrete` schedule_type is used. | N/A |
3 changes: 3 additions & 0 deletions docs/_pages/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,9 @@ DeepSpeed abstracts away data parallelism and model parallelism from the user wh
comes to data loading. Users simply provide a PyTorch dataset, and DeepSpeed data loader
can automatically handle batch creation appropriately.

## Curriculum Learning
Please refer to the [Curriculum Learning](/tutorials/curriculum-learning/) tutorial.

## Performance Analysis and Debugging

DeepSpeed provides a set of tools for performance analysis and debugging.
Expand Down
Loading