Skip to content

Commit bf5abbf

Browse files
committed
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead. Probably a better way to handle this is to dequantize entire v cache outside the model There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help. Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/) [ghstack-poisoned]
1 parent 5f15c76 commit bf5abbf

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

extension/llm/custom_ops/TARGETS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,5 +60,6 @@ runtime.python_test(
6060
],
6161
deps = [
6262
"//caffe2:torch",
63+
"//executorch/extension/pybindings:portable_lib",
6364
],
6465
)

extension/llm/custom_ops/test_quantized_sdpa.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
import torch.nn.functional as F
1313

1414
from executorch.extension.llm.custom_ops import custom_ops # noqa
15+
from executorch.extension.pybindings.portable_lib import _unsafe_reset_threadpool
1516

1617

1718
def is_fbcode():
@@ -40,6 +41,11 @@ def setUp(self):
4041
self.q_shape = None
4142
self.kv_shape = None
4243
self.is_seq_at_dim_2 = True
44+
# For some reason 4 threads doesnt work
45+
# This setting is needed to make this test not flaky due to OMP
46+
# error of "OMP: Error #131: Thread identifier invalid"
47+
# Not clear why that happens but having smaller threadpool resolves it
48+
_unsafe_reset_threadpool(3)
4349

4450
def _scale_tensor(self, tensor, min_value, max_value, scale=True):
4551
normalized_tensor = (tensor - tensor.min()) / (tensor.max() - tensor.min())

0 commit comments

Comments
 (0)