Skip to content

feat(ocr): add LiteLLM provider for vision LLM-based OCR#21

Merged
willgriffin merged 1 commit intomainfrom
feat/litellm-provider
Dec 19, 2025
Merged

feat(ocr): add LiteLLM provider for vision LLM-based OCR#21
willgriffin merged 1 commit intomainfrom
feat/litellm-provider

Conversation

@willgriffin
Copy link
Contributor

Summary

Add a new OCR provider that uses vision-capable LLMs (like DeepSeek, GPT-4o) via LiteLLM-compatible API endpoints. This enables OCR through any OpenAI-compatible vision model API.

Features:

  • Support for both simple (text-only) and structured (JSON with confidence) output modes
  • Environment variable configuration (HAVE_OCR_LITELLM_*)
  • Constructor options with precedence over environment variables
  • Automatic image format detection (PNG, JPEG, GIF, WebP, BMP)
  • Base64 encoding for API transmission
  • Comprehensive language support via vision LLM capabilities (60+ languages)

Configuration:

  • HAVE_OCR_LITELLM_BASE_URL: API endpoint (default: http://localhost:4000/v1)
  • HAVE_OCR_LITELLM_API_KEY: API key for authentication
  • HAVE_OCR_LITELLM_MODEL: Model name (default: deepseek-chat)
  • HAVE_OCR_LITELLM_OUTPUT_MODE: Output mode (simple | structured)

Provider priority: onnx > tesseract > litellm (API-based providers last)

Changes

  • New files:
    • src/node/litellm.ts - LiteLLM provider implementation (~500 lines)
    • src/node/litellm.spec.ts - Comprehensive unit tests (25 tests)
  • Modified files:
    • src/shared/factory.ts - Added provider loading and priority
    • src/shared/types.ts - Added LiteLLMProviderOptions interface
    • src/index.ts - Added exports for new provider
    • package.json - Added @happyvertical/ai dependency

Notes

  • Uses local ContentPart type definitions since @happyvertical/ai doesn't export vision types yet
  • The AI package will need a separate PR to add proper vision support
  • Uses type assertion (contentParts as any) for compatibility until AI package is updated

Test plan

  • All existing tests pass (27 tests)
  • New unit tests pass (25 tests)
  • Total: 52 tests passing
  • Manual testing with actual LiteLLM/DeepSeek endpoint

Add a new OCR provider that uses vision-capable LLMs (like DeepSeek, GPT-4o) via
LiteLLM-compatible API endpoints. This enables OCR through any OpenAI-compatible
vision model API.

Features:
- Support for both simple (text-only) and structured (JSON with confidence) output modes
- Environment variable configuration (HAVE_OCR_LITELLM_*)
- Constructor options with precedence over environment variables
- Automatic image format detection (PNG, JPEG, GIF, WebP, BMP)
- Base64 encoding for API transmission
- Comprehensive language support via vision LLM capabilities

Configuration:
- HAVE_OCR_LITELLM_BASE_URL: API endpoint (default: http://localhost:4000/v1)
- HAVE_OCR_LITELLM_API_KEY: API key for authentication
- HAVE_OCR_LITELLM_MODEL: Model name (default: deepseek-chat)
- HAVE_OCR_LITELLM_OUTPUT_MODE: Output mode (simple|structured)

Provider priority: onnx > tesseract > litellm (API-based providers last)

Note: Requires @happyvertical/ai vision support. Uses local ContentPart type
definitions until the updated AI package is published.
@willgriffin willgriffin merged commit c324add into main Dec 19, 2025
2 checks passed
@willgriffin willgriffin deleted the feat/litellm-provider branch December 19, 2025 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant