-
Notifications
You must be signed in to change notification settings - Fork 32k
Add deepseek ocr #41797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add deepseek ocr #41797
Conversation
|
Implementation works. Processor remains to be optimized but getting similar results as in original repository. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, deepseek_ocr |
|
What's missing here is a solid mapping between checkpoint state dict and canonical 😅 once it's solid, it'll be a good step to use that model in vLLM/SGLang etc with the |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41797&sha=6b3375 |
What does this PR do?
As per title. Architecturally: Llava-next used as skeleton with a modified SamModel and a modified ClipVisionModel, keeping the deepseekV2 decoder untouched (using AutoModel) and changing using config only.
transformersCurrent branch is functional. You don't need to convert the weights, just run the following on your image and you'll get a nice OCR output.