-
Notifications
You must be signed in to change notification settings - Fork 170
[ggma] Add documentation for TinyLlama example #16283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I will append how to preparing ggma package and build ggma, and run. |
86030e4 to
d16d8b1
Compare
- Created `runtime/ggma/examples/generate_text/tinyllama.md` with step‑by‑step guide. - Includes prerequisites, model generation commands, full processing pipeline, and a summary. ONE-DCO-1.0-Signed-off-by: Sanggyu Lee <sg5.lee@samsung.com>
4234213 to
a1219ae
Compare
|
|
||
| model = AutoModelForCausalLM.from_pretrained(model_name) | ||
| model.eval() | ||
| circle_model = tico.convert(model, captured_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FOR OTHER REVIEWERS,
You may encounter export error related to vmap_impl which is caused as sdpa_mask_recent_torch is no more torch-exportable since 4.54.0 ~ 4.57.1 (maybe lower versions too, I checked only 4.54.0 and 4.57.1).
It can be resolved by using transformers==4.50.3 as the author wrote in requirements.txt.
056dd75 to
f78430e
Compare
b24f78c to
71f6721
Compare
edf7864 to
cb3b36a
Compare
| PR_WORKTREE = "_pr_16233" | ||
| PR_BRANCH = "pr-16233" | ||
| PR_REF = "refs/pull/16233/head" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be removed once 16233 is merged.
cb3b36a to
e1d1b3b
Compare
| @@ -0,0 +1,10 @@ | |||
| decode: | | |||
| fuse.attention.py < decode_.circle | |||
| | reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later, kv_cache's shape will be determined automatically based on config.json.
|
|
||
| merge: | | ||
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onert does not allow const lhs for batchmatmul.
| merge: | | ||
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py | ||
| | downcast.input_ids.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use int32 instead of int64 (← the default type from TICO generated) for input_ids, which is given by gather.
| merge.circles.py prefill.circle decode.circle | ||
| | fuse.bmm_lhs_const.py | ||
| | downcast.input_ids.py | ||
| | gc.py > model.circle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It removes unreachable {input/output,tensor,buffer,...}.
| | transpose.io.kvcache.py > decode.circle | ||
|
|
||
| merge: | | ||
| merge.circles.py prefill.circle decode.circle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will merge two circles into one circle.
In this phase, the weight sharing is handled by pointing the same buffer index for same content of weights.
cd293c9 to
0b8bd39
Compare
b93d59c to
c86b5cd
Compare
3c8d290 to
2816c7f
Compare
2816c7f to
f1d3ef6
Compare
runtime/ggma/examples/generate_text/tinyllama.mdwith step‑by‑step guide.ONE-DCO-1.0-Signed-off-by: Sanggyu Lee sg5.lee@samsung.com