Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse attn triton v1.0 support + torch1.8 test runner #1374

Merged
merged 23 commits into from
Sep 21, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
18a8dff
let the sparse tests run
Jun 10, 2021
80dbe2f
fixing the sparse-APIs to use the latest triton version
Jun 29, 2021
be55c85
update with assert and some fixes
jeffra Sep 15, 2021
9e2dee4
add torch18 tests and fix sparse-attn checks
jeffra Sep 15, 2021
e702866
turn back on tests
jeffra Sep 16, 2021
54752d0
use relative paths for megatron jsons
jeffra Sep 14, 2021
741f8b5
factor out relative path for unit test files
jeffra Sep 14, 2021
626a51e
set test path
jeffra Sep 16, 2021
318ae95
refactor sparse attn imports
jeffra Sep 16, 2021
9963435
Merge branch 'master' into reyazda/test-sparse-v2
jeffra Sep 16, 2021
49514bd
fix relative import
jeffra Sep 16, 2021
61bf986
rename test_path so pytest doesn't think its a test
jeffra Sep 16, 2021
35245fe
skip test_configurable_parallel for now until fixed
jeffra Sep 16, 2021
0d28040
moe fix
jeffra Sep 17, 2021
d1615ff
fixes random connection reset test failures for some unit tests
jeffra Sep 17, 2021
940abc6
resolve the TK with correct setting based on dtype and block size
Sep 17, 2021
bd13efd
Merge branch 'reyazda/test-sparse-v2' of github.com:microsoft/DeepSpe…
Sep 17, 2021
7d9c5fb
fix megatron regression, add moe unit test, fix moe ckpt comparison
jeffra Sep 17, 2021
7733f00
add sparse-attn skip if not compatible
jeffra Sep 17, 2021
8001124
Merge branch 'master' into reyazda/test-sparse-v2
jeffra Sep 18, 2021
08695e2
skip moe ckpt test if old torch
jeffra Sep 20, 2021
1324589
turn back on test_configurable_parallel
jeffra Sep 20, 2021
3beb9fd
tear down torch dist pg when test completes
jeffra Sep 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions tests/unit/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ def dist_init(local_rank, num_procs, *func_args, **func_kwargs):

run_func(*func_args, **func_kwargs)

# make sure all ranks finish at the same time
torch.distributed.barrier()

def dist_launcher(num_procs, *func_args, **func_kwargs):
"""Launch processes and gracefully handle failures. """

Expand Down
6 changes: 1 addition & 5 deletions tests/unit/simple_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,7 @@ def __init__(self, hidden_dim):
super(SimpleMoEModel, self).__init__()
self.linear = torch.nn.Linear(hidden_dim, hidden_dim)
linear2 = torch.nn.Linear(hidden_dim, hidden_dim)
self.linear2 = MoE(hidden_size=hidden_dim,
output_dropout_prob=0.0,
expert=linear2,
num_experts=4,
k=1)
self.linear2 = MoE(hidden_size=hidden_dim, expert=linear2, num_experts=4, k=1)
self.cross_entropy_loss = torch.nn.CrossEntropyLoss()

def forward(self, x, y):
Expand Down