Skip to content

Conversation

@zhjuzi
Copy link

@zhjuzi zhjuzi commented Nov 1, 2025

PR: Add image URL support for multimodal LLM

Overview

This PR adds Image URL support to the Agent messaging system, allowing direct image link transmission instead of base64 encoding, significantly reducing network transmission overhead and improving multimodal scenario performance.

Motivation

Problem:

  • Currently only supports base64 encoded images, which causes:
    • Large images increase ~33% in size after encoding
    • Long network transmission time, affecting response speed
    • High memory consumption

Solution:

  • Support direct image URL transmission
  • Maintain backward compatibility, base64 method continues to work
  • URL priority strategy: prioritize URL when both are provided

Benefits:

  • 100KB image: transmission size reduced by 99.96% (134KB → 50B)
  • 1MB image: transmission size reduced by 99.996% (1.33MB → 50B)
  • Significantly reduced network latency and memory usage

Core Changes

1. Message.java - Add imageUrl Field

File: genie-backend/src/main/java/com/jd/genie/agent/dto/Message.java

Changes:

  • Add imageUrl field
  • Add factory methods:
    • userMessageWithImageUrl(content, imageUrl)
    • assistantMessageWithImageUrl(content, imageUrl)
  • Add utility methods:
    • hasImage() - Check if contains image (base64 or URL)
    • isImageUrl() - Check if is URL-type image

Design Consideration:

  • No imageUrl support for SYSTEM role: SYSTEM messages are typically used to regulate Agent behavior; including images is not a good practice

2. BaseAgent.java - Update Memory Management

File: genie-backend/src/main/java/com/jd/genie/agent/agent/BaseAgent.java

Changes:

  • Add updateMemoryWithImage(role, content, base64Image, imageUrl, args) method
  • Priority strategy: URL takes precedence over base64
  • Keep old updateMemory() method for backward compatibility

3. LLM.java - Optimize Formatting Logic

File: genie-backend/src/main/java/com/jd/genie/agent/llm/LLM.java

Changes:

  • Support imageUrl formatting to LLM API required format
  • Smart judgment: prioritize imageUrl, fallback to base64Image
  • Compatible multimodal content assembly

Test Coverage

Unit Tests

File: MessageImageUrlTest.java
Coverage: 11 test cases

  • ✅ Factory method tests (USER/ASSISTANT)
  • ✅ Logic judgment tests (hasImage/isImageUrl)
  • ✅ Backward compatibility tests
  • ✅ Edge case tests (null/empty values)
截屏2025-11-02 01 29 19

Integration Tests

File: ImageUrlIntegrationTest.java
Coverage: 2 test cases

  • ✅ Real LLM image analysis test (requires API configuration)
  • ✅ Memory message management test
截屏2025-11-02 01 30 00

Configuration Example:

llm:
  default:
    base_url: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
    apikey: 'sk-xxxxx'
    model: qwen3-vl-plus  # Vision-capable model

Backward Compatibility

Fully Compatible: All existing code requires no modification

  • Old updateMemory() method continues to work
  • Old Message.userMessage() and other methods continue to work
  • Messages with only base64Image work normally

Review Checklist

  1. ✅ Code quality: Follows project standards, complete comments added
  2. ✅ Test coverage: Unit tests + integration tests
  3. ✅ Backward compatibility: No impact on existing features
  4. ✅ Performance optimization: Significantly reduced network overhead
  5. ✅ Documentation: Clear comments, configuration instructions provided

@zhjuzi zhjuzi changed the title feat: Add image URL support for multimodal LLM feature: Add image URL support for multimodal LLM Nov 1, 2025
@zhjuzi zhjuzi changed the title feature: Add image URL support for multimodal LLM feature: Add image url support for multimodal LLM Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant