Skip to content

Conversation

@glistening
Copy link
Contributor

  • Created runtime/ggma/examples/generate_text/tinyllama.md with step‑by‑step guide.
  • Includes prerequisites, model generation commands, full processing pipeline, and a summary.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com

@glistening
Copy link
Contributor Author

I will append how to preparing ggma package and build ggma, and run.

- Created `runtime/ggma/examples/generate_text/tinyllama.md` with step‑by‑step guide.
- Includes prerequisites, model generation commands, full processing pipeline, and a summary.

ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
@glistening glistening force-pushed the ggma_example branch 3 times, most recently from 4234213 to a1219ae Compare November 21, 2025 09:48

model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
circle_model = tico.convert(model, captured_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FOR OTHER REVIEWERS,

You may encounter export error related to vmap_impl which is caused as sdpa_mask_recent_torch is no more torch-exportable since 4.54.0 ~ 4.57.1 (maybe lower versions too, I checked only 4.54.0 and 4.57.1).

It can be resolved by using transformers==4.50.3 as the author wrote in requirements.txt.

@glistening glistening force-pushed the ggma_example branch 2 times, most recently from b24f78c to 71f6721 Compare November 23, 2025 09:57
@glistening glistening force-pushed the ggma_example branch 3 times, most recently from edf7864 to cb3b36a Compare November 24, 2025 01:48
Comment on lines 5 to 7
PR_WORKTREE = "_pr_16233"
PR_BRANCH = "pr-16233"
PR_REF = "refs/pull/16233/head"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed once 16233 is merged.

@@ -0,0 +1,10 @@
decode: |
fuse.attention.py < decode_.circle
| reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later, kv_cache's shape will be determined automatically based on config.json.


merge: |
merge.circles.py prefill.circle decode.circle
| fuse.bmm_lhs_const.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onert does not allow const lhs for batchmatmul.

merge: |
merge.circles.py prefill.circle decode.circle
| fuse.bmm_lhs_const.py
| downcast.input_ids.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use int32 instead of int64 (← the default type from TICO generated) for input_ids, which is given by gather.

merge.circles.py prefill.circle decode.circle
| fuse.bmm_lhs_const.py
| downcast.input_ids.py
| gc.py > model.circle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It removes unreachable {input/output,tensor,buffer,...}.

| transpose.io.kvcache.py > decode.circle

merge: |
merge.circles.py prefill.circle decode.circle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will merge two circles into one circle.
In this phase, the weight sharing is handled by pointing the same buffer index for same content of weights.

@glistening glistening force-pushed the ggma_example branch 2 times, most recently from cd293c9 to 0b8bd39 Compare November 24, 2025 06:03
@glistening glistening force-pushed the ggma_example branch 3 times, most recently from b93d59c to c86b5cd Compare November 25, 2025 04:59
@glistening glistening force-pushed the ggma_example branch 3 times, most recently from 3c8d290 to 2816c7f Compare November 26, 2025 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants