Unit Tests for On Device Sampling #463

quic-sanising · 2025-06-18T18:40:48Z

This PR adds the following Unit Tests for On Device Sampling:

test_sampler_transform: Test if SamplerTransform adds nodes at the output of a QEffForCausalLM model to enable the sampling of next tokens at the device (instead of the host) and returns the next tokens and/or probability distributions.
test_greedy_sampling: Test greedy sampling with QPC compiled with and without On Device Sampling.
test_random_sampling: Test random sampling with QPC compiled with and without On Device Sampling.

Signed-off-by: quic-sanising <quic_sanising@quicinc.com>

ochougul · 2025-06-24T08:24:00Z

tests/transformers/sampler/test_sampler.py

can we also run the two session with the fixed prompt and make sure the outputs don't match with each other. But match with a golden output that we know matches with pytorch execution as well?

Please refer to the test test_random_sampling below.

quic-rishinr · 2025-06-25T11:58:04Z

@quic-sanising can you add a small feature description under /docs/source/quick_start.md supported feature section? also provide the example script link in the description

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-sanising · 2025-06-30T23:36:23Z

@quic-sanising can you add a small feature description under /docs/source/quick_start.md supported feature section? also provide the example script link in the description

Done

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-amitraj

Please fix lint error.

quic-amitraj · 2025-07-02T05:48:07Z

docs/source/quick_start.md

@@ -19,7 +19,8 @@ To achieve this, we have 2 levels of APIs, with different levels of abstraction.
 | [Vision Language Model](QEFFAutoModelForImageTextToText) | Provides support for the AutoModelForImageTextToText class from the transformers library, enabling advanced vision-language tasks. Refer [sample script](https://github.com/quic/efficient-transformers/blob/main/examples/image_text_to_text_inference.py) for more **details**. |
 | [Speech Sequence to Sequence Model](QEFFAutoModelForSpeechSeq2Seq) | Provides support for the QEFFAutoModelForSpeechSeq2Seq Facilitates speech-to-text sequence models. Refer [sample script](https://github.com/quic/efficient-transformers/blob/main/examples/speech_to_text/run_whisper_speech_to_text.py) for more **details**. |
 | Support for FP8 Execution | Enables execution with FP8 precision, significantly improving performance and reducing memory usage for computational tasks. |
-| Prefill caching  | Enhances inference speed by caching key-value pairs for shared prefixes, reducing redundant computations and improving efficiency. |
+| Prefix caching  | Enhances inference speed by caching key-value pairs for shared prefixes, reducing redundant computations and improving efficiency. |
+| On Device Sampling | Enables sampling operations to be executed directly on the QAIC device rather than the host CPU for QEffForCausalLM models. This enhancement significantly reduces host-device communication overhead and improves inference throughput and scalability. Refer [sample script](https://github.com/quic/efficient-transformers/blob/main/examples/on_device_sampling.py) for more **details**. |


Link seems broken, please fix.

The link points to an example file that will be added by this PR. So, the link will be available when the PR is merged.

QEfficient/generation/text_generation_inference.py

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-hemagnih · 2025-07-03T02:12:48Z

QEfficient/generation/text_generation_inference.py

+        elif count < len(sampler_inputs):
+            raise ValueError(
+                "The provided QPC does not have the required number of inputs to run sampling "
+                f"on the QAIC device (only {count}/{len(sampler_inputs)} inputs provided). Partial "


I think we should do count % sampler_inputs here. If we divide count by len(sampler_inputs) then it would return 0.

This is only a print statement. We are not actually dividing here. So, if count = 5 and len(sampler_inputs) = 10, it would print (only 5/10 inputs provided).

quic-hemagnih · 2025-07-03T02:15:11Z

QEfficient/generation/text_generation_inference.py

+        count = 0
+        for session_input_name in self._session.input_names:
+            if session_input_name in sampler_inputs:
+                count += 1


Can there be a case where user provides the same session_input_names multiple times. In that case how we will catch it in this code.
count variable will keep on incrementing and may satisfy the condition

self._session.input_names comes from the exported ONNX file. If there are duplicate names, say abc, the ONNX will convert them to something like abc_0, abc_1, so on. So, we would never get the same name multiple times.

However, if accuracy is the only priority here and performance is not, I could use set() but it would add a slight overhead of O(n).

quic-hemagnih · 2025-07-03T02:19:00Z

QEfficient/generation/text_generation_inference.py

+                count += 1
+                if count == len(sampler_inputs):
+                    break
+        if count == 0:


I think we can avoid this if.. else block.
at line 455 by default set self.include_sampler = False.
Then at line 458 before break set it to True.
At line 462 just check for error condition.

In case the user provides include_sampler as input, self.include_sampler is not set to False. That is why, we need the check in line 460.

We can only avoid the else block in line 468.

I'll make the change.

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-sanising · 2025-07-03T19:21:59Z

Please fix lint error.

@quic-amitraj The lint failures were happening because the linter is installing ruff v0.12.2 whereas the .pre-commit-config.yaml file has an older version of v0.5.2.

To fix the errors, we need to either install ruff v0.5.2 in the linter or update the .pre-commit-config.yaml file to version v0.12.2.

Add sampler transform test

8417d8f

Signed-off-by: quic-sanising <quic_sanising@quicinc.com>

quic-rishinr added the 1.20.0 label Jun 24, 2025

ochougul requested changes Jun 24, 2025

View reviewed changes

sanising added 3 commits June 30, 2025 13:20

Merge branch 'main' into ods-unit-tests

27d8dd5

Add example script

067f9b5

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Update docs

931860f

Signed-off-by: sanising <sanising@qti.qualcomm.com>

sanising added 3 commits June 30, 2025 18:40

Enable On Device Sampling for _continuous_batching_execution()

79b6c95

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Disable On Device Sampling for _regular_model_execution()

75eac30

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Use same sampling parameters for each sequence in a batch

eb6e2eb

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-amitraj requested changes Jul 2, 2025

View reviewed changes

sanising added 2 commits July 2, 2025 18:43

Enable On Device Sampling for _regular_model_execution()

48b35e3

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Add test for greedy sampling

c83a631

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-hemagnih reviewed Jul 3, 2025

View reviewed changes

sanising added 3 commits July 3, 2025 13:44

Add test for random sampling

f698a24

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Remove else block

7b34a07

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Merge branch 'main' into ods-unit-tests

5fa7269

Signed-off-by: sanising <sanising@qti.qualcomm.com>

quic-sanising marked this pull request as ready for review July 3, 2025 19:08

quic-sanising requested a review from quic-rishinr as a code owner July 3, 2025 19:08

quic-sanising requested review from quic-hemagnih, ochougul and quic-amitraj July 3, 2025 19:08

Reformat code

0ee201a

Signed-off-by: sanising <sanising@qti.qualcomm.com>

Unit Tests for On Device Sampling #463

Are you sure you want to change the base?

Unit Tests for On Device Sampling #463

Conversation

quic-sanising commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-rishinr commented Jun 25, 2025

Uh oh!

quic-sanising commented Jun 30, 2025

Uh oh!

quic-amitraj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-sanising Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quic-sanising commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

quic-sanising commented Jun 18, 2025 •

edited

Loading

quic-sanising Jul 3, 2025 •

edited

Loading

quic-sanising commented Jul 3, 2025 •

edited

Loading