Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Operator] Add repeat_interleave_self_tensor #230

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

zfu82
Copy link
Collaborator

@zfu82 zfu82 commented Sep 27, 2024

Performance

Tested on NV-A100

Operator repeat_interleave_self_tensor Performance Test (dtype=torch.float16, mode=cuda)
Size    Torch Latency (ms)    Gems Latency (ms)    Gems Speedup
---------------------------------------------------------------
1024              0.336896              20.0387          0.0168
6144               1.53498              20.9459          0.0733
11264               2.7648              20.4554           0.135
16384              4.12979              21.6965            0.19
21504                5.376              21.4784            0.25
26624              7.02874              22.4543           0.313
31744              8.03123              22.8055           0.352
36864              8.09677              22.5853           0.358
41984              10.2134              23.1731           0.441
47104              11.3715              23.2704           0.489
52224              12.6669              24.6088           0.515
57344              13.7267              25.2928           0.543
62464              15.0774              25.3972           0.594
67584              15.2904              24.5217           0.624
72704              16.7752              24.8955           0.674
77824              17.6722              26.4264           0.669
Operator repeat_interleave_self_tensor Performance Test (dtype=torch.float32, mode=cuda)
Size    Torch Latency (ms)    Gems Latency (ms)    Gems Speedup
---------------------------------------------------------------
1024              0.338944              19.7806          0.0171
6144               1.54419              20.1861          0.0765
11264              3.11194              21.5511           0.144
16384              4.07859              21.1241           0.193
21504              5.98528               22.484           0.266
26624              7.27859              22.9478           0.317
31744              8.11418              22.4348           0.362
36864              8.45619              23.6575           0.357
41984              10.6609              23.8254           0.447
47104               11.732              24.4019           0.481
52224              13.4359              25.1873           0.533
57344              13.9284               25.385           0.549
62464              15.7348              26.5933           0.592
67584              15.9037               26.751           0.595
72704               17.792              27.5722           0.645
77824              19.1754              28.2491           0.679
Operator repeat_interleave_self_tensor Performance Test (dtype=torch.bfloat16, mode=cuda)
Size    Torch Latency (ms)    Gems Latency (ms)    Gems Speedup
---------------------------------------------------------------
1024                0.3328              20.1001          0.0166
6144               1.49504              20.1185          0.0743
11264              2.75968              20.3551           0.136
16384              3.95776              20.4063           0.194
21504               5.3545              21.1671           0.253
26624              6.84954              21.8993           0.313
31744              7.75066              21.7375           0.357
36864              8.07424              22.0928           0.365
41984              10.1786              22.6376            0.45
47104              11.1913              23.1301           0.484
52224              12.1201               23.682           0.512
57344              13.2219              24.4797            0.54
62464              14.9258               24.705           0.604
67584              14.8818              25.1003           0.593
72704              16.7035              25.7403           0.649
77824              17.9456              27.0879           0.662
PASSED

@zfu82 zfu82 force-pushed the dev_repeat_interleave_self_tensor branch from 9366504 to 5675cf4 Compare September 27, 2024 09:37
Copy link
Collaborator

@zhzhcookie zhzhcookie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhzhcookie zhzhcookie merged commit 7ecdd12 into master Oct 31, 2024
3 of 4 checks passed
@zhzhcookie zhzhcookie deleted the dev_repeat_interleave_self_tensor branch October 31, 2024 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants