Skip to content

Smoldocling support #14597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jul 10, 2025
Merged

Smoldocling support #14597

merged 38 commits into from
Jul 10, 2025

Conversation

ryan-mangeno
Copy link
Contributor

added end of token check for smoldocling - <end_of_utterance>, also added tensor names after dumping tensors with llama-eval-callback to tensor_mappings.py, added regex for pre tokenizer defined in the hugging face config file. Smoldocling text model architecture is based on llama, so there wasn't a need to impliment its own architecture. I compared both the hugging face tensors and the gguf tensors and both checked out to be the same

Copy link
Contributor

@gabe-l-hart gabe-l-hart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this one @ryan-mangeno! A couple of whitespace NITs, and one request since you're coming out of IBM with me: Our OSS policy requires that we sign our commits when committing to open source. To do this for your branch, you should be able to simply do:

git fetch origin && git rebase -i --signoff --rebase-merges origin/master

@gabe-l-hart
Copy link
Contributor

cc @PeterStaar-IBM

@github-actions github-actions bot added the python python script changes label Jul 9, 2025
ryan-mangeno and others added 15 commits July 9, 2025 12:10
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
* v1

* push more fixes

* another fix

* fix

* more fixes

* minor fix

* more cleaning on python code

* python fixes

* changed precision for multipliers float 32->64

* fixes

* another fix

* fix

* pre-norm -> norm

* fix

* Revert "fix"

This reverts commit 243e4d1.

* fix

* small fix ffn_norm

* try

* mix instead of max

* fix vocab size

* conflict solve

* fixed multipliers

* falcon-h1 specefic vocab resolved

* read arch from gguf.MODEL_ARCH

* mamba_d_ssm added to d_inner find_hparam

* remove unused functions from gguf_writer.py

* override modify_tensors instead of get_tensors

* fix conversion and d_inner

* added some cb functions for debugging puposes

* inp_out_ids moved outside of layers loop

* mup_vec create as float64

* fix rope_theta

* injected mup

* clean ups

* rm extra space

* rm unused MAMBA_CHUNK_SIZE

* rm unused key

* add bos False

* changed ROPE_TYPE

* cleaning debugging stuff

* cleaning debug quant

* fix comment

* some cleanups

* some cleanups

* Update src/llama-model-loader.cpp

* more cleanups

* moe cleanuips

* d_ssm -> d_inner;

* cleaning unused hparams

* cleanup

* more cleanups

* more cleanups on python conversion;

* minor cleanups

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* remove todo

* added falcon-h1

* tensor not required

* clean

* remove unneeded attributes

* more cleanups and fixed conversion

* remove final_norm

* flake8 fixes

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* flake8 fixes

* Update src/llama-hparams.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-arch.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* added hashes

* Update src/llama-arch.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-vocab.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update the update file

* Revert "update the update file"

This reverts commit 082ab4a.

* fix: address suggestions

* fix: update convert_hf_to_gguf.py

* Update gguf-py/gguf/constants.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-model-loader.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* d_inner fixed

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* reshaping ssm_norm for 34B

* removing generate_mup

* remove duplicates metadata keys

* rm comment

* final comment

* fix unused args

* fix constants

* fix bad merge

* Update src/llama-model.cpp

Co-authored-by: compilade <git@compilade.net>

* falcon-h1: remove unused ssm_in_b and bad merge

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* falcon-h1: fix last comment

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* falcon-h1: revert add_add_bos(False)

* falcon-h1: fix tied weights

* falcon-h1: remove whitespace

* falcon-h1: fix wrong size param

* falcon-h1: fix whitespace issues

---------

Co-authored-by: younesbelkada <younes.belkada@tii.ae>
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: compilade <git@compilade.net>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I missed something, SmolDocling-256M-preview is fine tuned from smolVLM, so it should work out of the box. Are you 100% sure that we need to modify the conversion script?

ryan-mangeno and others added 5 commits July 9, 2025 19:14
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com>
@ryan-mangeno
Copy link
Contributor Author

I changed the tensormappings for the safetensors file on hugging face, also all the tensors are the same for smolvlm and smoldocling but noticed that there werent tensor mappings for the text model for smolvlm ... https://huggingface.co/ds4sd/SmolDocling-256M-preview?show_file_info=model.safetensors. I noticed when I didn't have || t.first == "<end_of_utterance>" // smoldocling that it would just keep generating

ryan-mangeno and others added 3 commits July 10, 2025 09:28
ryan-mangeno and others added 7 commits July 10, 2025 10:27
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@CISC
Copy link
Collaborator

CISC commented Jul 10, 2025

Please re-check that conversion and inference is ok.

@ryan-mangeno
Copy link
Contributor Author

Please re-check that conversion and inference is ok.

will do right now 👍

@CISC
Copy link
Collaborator

CISC commented Jul 10, 2025

Please re-check that conversion and inference is ok.

will do right now 👍

Actually, according to this, you should not need to add most of the tensor mappings either:

elif name.startswith("model.text_model"):
name = name.replace("text_model.", "") # for SmolVLM

@ryan-mangeno
Copy link
Contributor Author

Please re-check that conversion and inference is ok.

will do right now 👍

Actually, according to this, you should not need to add most of the tensor mappings either:

elif name.startswith("model.text_model"):
name = name.replace("text_model.", "") # for SmolVLM

ahh ok that makes sense, the tensor names were the same for the text model so those can be removed I am pretty sure

ryan-mangeno and others added 3 commits July 10, 2025 10:51
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ryan-mangeno
Copy link
Contributor Author

I reran conversion and inferencing and looks the same

@CISC
Copy link
Collaborator

CISC commented Jul 10, 2025

Thank you, hope you enjoyed the ride, will merge when CI goes green. :)

@ryan-mangeno
Copy link
Contributor Author

Thank you too! I had a lot of fun making my first contribution to llama cpp and hope to make more :)

@CISC CISC merged commit 4bb625b into ggml-org:master Jul 10, 2025
89 of 90 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 10, 2025
* origin/master:
Smoldocling support (ggml-org#14597)
Docs: script to auto-generate ggml operations docs (ggml-org#14598)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants