feat(ocr): add LiteLLM provider for vision LLM-based OCR by willgriffin · Pull Request #21 · happyvertical/ocr

willgriffin · 2025-12-19T01:07:46Z

Summary

Add a new OCR provider that uses vision-capable LLMs (like DeepSeek, GPT-4o) via LiteLLM-compatible API endpoints. This enables OCR through any OpenAI-compatible vision model API.

Features:

Support for both simple (text-only) and structured (JSON with confidence) output modes
Environment variable configuration (HAVE_OCR_LITELLM_*)
Constructor options with precedence over environment variables
Automatic image format detection (PNG, JPEG, GIF, WebP, BMP)
Base64 encoding for API transmission
Comprehensive language support via vision LLM capabilities (60+ languages)

Configuration:

HAVE_OCR_LITELLM_BASE_URL: API endpoint (default: http://localhost:4000/v1)
HAVE_OCR_LITELLM_API_KEY: API key for authentication
HAVE_OCR_LITELLM_MODEL: Model name (default: deepseek-chat)
HAVE_OCR_LITELLM_OUTPUT_MODE: Output mode (simple | structured)

Provider priority: onnx > tesseract > litellm (API-based providers last)

Changes

New files:
- src/node/litellm.ts - LiteLLM provider implementation (~500 lines)
- src/node/litellm.spec.ts - Comprehensive unit tests (25 tests)
Modified files:
- src/shared/factory.ts - Added provider loading and priority
- src/shared/types.ts - Added LiteLLMProviderOptions interface
- src/index.ts - Added exports for new provider
- package.json - Added @happyvertical/ai dependency

Notes

Uses local ContentPart type definitions since @happyvertical/ai doesn't export vision types yet
The AI package will need a separate PR to add proper vision support
Uses type assertion (contentParts as any) for compatibility until AI package is updated

Test plan

All existing tests pass (27 tests)
New unit tests pass (25 tests)
Total: 52 tests passing
Manual testing with actual LiteLLM/DeepSeek endpoint

Add a new OCR provider that uses vision-capable LLMs (like DeepSeek, GPT-4o) via LiteLLM-compatible API endpoints. This enables OCR through any OpenAI-compatible vision model API. Features: - Support for both simple (text-only) and structured (JSON with confidence) output modes - Environment variable configuration (HAVE_OCR_LITELLM_*) - Constructor options with precedence over environment variables - Automatic image format detection (PNG, JPEG, GIF, WebP, BMP) - Base64 encoding for API transmission - Comprehensive language support via vision LLM capabilities Configuration: - HAVE_OCR_LITELLM_BASE_URL: API endpoint (default: http://localhost:4000/v1) - HAVE_OCR_LITELLM_API_KEY: API key for authentication - HAVE_OCR_LITELLM_MODEL: Model name (default: deepseek-chat) - HAVE_OCR_LITELLM_OUTPUT_MODE: Output mode (simple|structured) Provider priority: onnx > tesseract > litellm (API-based providers last) Note: Requires @happyvertical/ai vision support. Uses local ContentPart type definitions until the updated AI package is published.

willgriffin merged commit c324add into main Dec 19, 2025
2 checks passed

willgriffin deleted the feat/litellm-provider branch December 19, 2025 02:03

willgriffin mentioned this pull request Dec 19, 2025

feat(ai): add vision support for multimodal messages happyvertical/sdk#683

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ocr): add LiteLLM provider for vision LLM-based OCR#21

feat(ocr): add LiteLLM provider for vision LLM-based OCR#21
willgriffin merged 1 commit intomainfrom
feat/litellm-provider

willgriffin commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willgriffin commented Dec 19, 2025

Summary

Changes

Notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant