Exponential added. #138

tongxin · 2024-07-26T07:25:54Z

Added Triton function for inplace operator tensor.exponential_.
The output value range is (0, \infin)

iclementine · 2024-07-26T08:44:11Z

tests/test_special_ops.py

+    x = torch.randn(size=shape, dtype=dtype, device="cuda")
+    with flag_gems.use_gems():
+        res_out = x.exponential_(lambd=0.5)
+    assert res_out.min() > 0


Maybe add a K-S test for the distribution?
Testing for positiveness is too minimal
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

Thanks and more substantial tests will be on the way.

src/flag_gems/ops/exponential_.py

StrongSpoon

benchmark to do

src/flag_gems/ops/exponential_.py

StrongSpoon · 2024-07-26T08:16:04Z

tests/test_special_ops.py

+    x = torch.randn(size=shape, dtype=dtype, device="cuda")
+    with flag_gems.use_gems():
+        res_out = x.exponential_(lambd=0.5)
+    assert res_out.min() > 0


is it enough to ensure the accuracy? I think we need more checks.

More tests will be done.

src/flag_gems/ops/exponential_.py

iclementine · 2024-07-29T01:35:05Z

src/flag_gems/ops/exponential_.py

+def transform_exponential(u, lambd, eps):
+    eps1 = -0.5 * eps
+    is_min = u >= 1.0 + eps1
+    log = tl.where(is_min, eps1, tl.math.log(u))


Is this really needed? What about just using log(u) or log(1-u)?

This is for enforcing compatibility with Pytorch..

Bowen12992

LGTM, added scipy to UT Env

iclementine · 2024-08-01T07:34:05Z

src/flag_gems/ops/exponential_.py

+    logging.debug("GEMS EXPONENTIAL_")
+    dtype = x.dtype
+    device = x.device
+    inplace = x.is_contiguous()


Performing inplace operation on a tensor with internal overlapping should raise a Runtime Exception.

RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation.

Now it would raise a runtime error when copying data back.

import torch import flag_gems flag_gems.enable() x = torch.ones(2, device="cuda") x = torch.broadcast_to(x, (3, 2)) x.exponential_()

I'll fix it.

Pytorch throws with exactly the same error. We'll just keep the current way.

* exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies.

* WIP: multinomial * add Ops & UT & Bench * add full zero ones Ops & UT & Bench * split normal op * Adding multinomial. * fixed one off error in binary search * Added multinomial tests without replacement. * PR comment * split test_special_ops * updated with_replacement tests * add K-S test * split special perf * Update to a more reliable without-replacement test * Exponential added. (#138) * exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies. * resolve conflict * Use int64 indexing when needed & fix argmax (#146) 1. fix amax, armax and triu, use int64 indexing when the largest tensor's size_in_bytes exceed int32's max; 2. change the tiling scheme for argmax to loop in the reduction dimension, instead of data-size-dependent-tile-size * test for op * test for op * Added multinomial perf tests. * Making libentry thread safe (#136) * libentry now is lock protected. * Add multithreading tests for libentry. * polish code. * add argparse * fix desc * fix num * Update test_specific_ops.py * split UT files * fix * fix * resolved conflicts with master. * fixing multinomial, working in progress. * Multinomial passes tests. * Enhance multinomial tests and benchmarks. * [bugfix] keepdim when samples one * [bugfix] fix accu test * fix anomaly behavior in fused_renorm_cumsum * Polish multinomial tests. * remove garbage files. * bfloat16 added for multinomial, polish without replacement test. * Enable two-pass normed cumsum. * cumsum updated * normed cumsum complete. * Fixed multinomial binary search boundary bug * fix normed_cumsum bugs. * quick fix dim check. --------- Co-authored-by: Bowen12992 <zhangbluestars@gmail.com> Co-authored-by: Clement Chan <iclementine@outlook.com> Co-authored-by: Bowen <81504862+Bowen12992@users.noreply.github.com> Co-authored-by: StrongSpoon <35829812+StrongSpoon@users.noreply.github.com> Co-authored-by: StrongSpoon <strongspoon@outlook.com>

* add Ops & UT & Bench * add full zero ones Ops & UT & Bench * split normal op * [Operator] init slice&select scatter * code format * PR comment * split test_special_ops * add K-S test * split special perf * Exponential added. (#138) * exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies. * resolve conflict * Use int64 indexing when needed & fix argmax (#146) 1. fix amax, armax and triu, use int64 indexing when the largest tensor's size_in_bytes exceed int32's max; 2. change the tiling scheme for argmax to loop in the reduction dimension, instead of data-size-dependent-tile-size * test for op * test for op * Making libentry thread safe (#136) * libentry now is lock protected. * Add multithreading tests for libentry. * polish code. * add argparse * fix desc * fix num * Update test_specific_ops.py * split UT files * fix * fix * [Operator] Optimize CrossEntropyLoss (#131) reimplement cross_entropy_loss forward and backward support; indices/probabilities/weight/reduction/ignore_index/label_smoothing; perform better than torch eager on large scale tensors * Exponential added. (#138) * exponential added. * Added K-S tests to exponential_, fp64 corrected. * aligned with aten prototype * Exponential_ uses uint64 offsets in Triton kernel. * Update pyproject config for new test dependencies. * Use int64 indexing when needed & fix argmax (#146) 1. fix amax, armax and triu, use int64 indexing when the largest tensor's size_in_bytes exceed int32's max; 2. change the tiling scheme for argmax to loop in the reduction dimension, instead of data-size-dependent-tile-size * Making libentry thread safe (#136) * libentry now is lock protected. * Add multithreading tests for libentry. * polish code. * [Test] Test for op (#151) * [chore] solve slice&select scatter's test cases * [fix] fix slice&select scatter's test cases * [chore] remove out-of-range indices in select_scatter's test cases * [chore] simplify slice_scatter's test cases * [fix] Added range that is deleted by mistake * Merge branch 'master' into slice&select_scatter * [chore] reformat * [fix] typo * [chore] Considering perf, pause the replacement of some aTen operators * slice_scatter * select_scatter * index_select * [fix] Add libentry in op.cumsum * [fix] Del slice&select scatter's perf tests * [Chore] Add pytest mark for slice&select scatter's test * [Fix] Correct slice_scatter test * [Fix] Replace CPU Tensor --------- Co-authored-by: Bowen12992 <zhangbluestars@gmail.com> Co-authored-by: Tongxin Bai <waffle.bai@gmail.com> Co-authored-by: Clement Chan <iclementine@outlook.com> Co-authored-by: Bowen <81504862+Bowen12992@users.noreply.github.com> Co-authored-by: StrongSpoon <35829812+StrongSpoon@users.noreply.github.com>

exponential added.

3d1da5d

iclementine reviewed Jul 26, 2024

View reviewed changes

src/flag_gems/ops/exponential_.py Show resolved Hide resolved

StrongSpoon reviewed Jul 26, 2024

View reviewed changes

tongxin added 4 commits July 28, 2024 14:20

Added K-S tests to exponential_, fp64 corrected.

b8cd3be

aligned with aten prototype

e78654b

Exponential_ uses uint64 offsets in Triton kernel.

07afd22

Update pyproject config for new test dependencies.

c513584

iclementine reviewed Jul 29, 2024

View reviewed changes

Bowen12992 approved these changes Jul 31, 2024

View reviewed changes

iclementine reviewed Aug 1, 2024

View reviewed changes

iclementine approved these changes Aug 1, 2024

View reviewed changes

iclementine merged commit 4a5acdd into master Aug 1, 2024
3 checks passed

StrongSpoon deleted the exponential branch August 13, 2024 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exponential added. #138

Exponential added. #138

tongxin commented Jul 26, 2024 •

edited

Loading

iclementine Jul 26, 2024

tongxin Jul 27, 2024

StrongSpoon left a comment

StrongSpoon Jul 26, 2024

tongxin Jul 27, 2024

iclementine Jul 29, 2024

tongxin Jul 29, 2024

Bowen12992 left a comment •

edited

Loading

iclementine Aug 1, 2024

iclementine Aug 1, 2024

iclementine Aug 1, 2024 •

edited

Loading

tongxin Aug 1, 2024

tongxin Aug 1, 2024

Exponential added. #138

Exponential added. #138

Conversation

tongxin commented Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrongSpoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bowen12992 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iclementine Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tongxin commented Jul 26, 2024 •

edited

Loading

Bowen12992 left a comment •

edited

Loading

iclementine Aug 1, 2024 •

edited

Loading