Skip to content

Support multimodal tool#241

Merged
AlbumenJ merged 9 commits intoagentscope-ai:mainfrom
guanxuc:tool-multimodal
Dec 19, 2025
Merged

Support multimodal tool#241
AlbumenJ merged 9 commits intoagentscope-ai:mainfrom
guanxuc:tool-multimodal

Conversation

@guanxuc
Copy link
Contributor

@guanxuc guanxuc commented Dec 18, 2025

AgentScope-Java Version

1.0.4-SNAPSHOT

Description

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with mvn spotless:apply
  • All tests are passing (mvn test)
  • Javadoc comments are complete and follow project conventions
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@guanxuc guanxuc requested a review from a team December 18, 2025 16:29
@codecov
Copy link

codecov bot commented Dec 18, 2025

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive multimodal tool support for both OpenAI and DashScope platforms, enabling text-to-image, image-to-text, text-to-audio, and audio-to-text conversions.

Key Changes:

  • Implements OpenAI multimodal tools (DALL-E, GPT-4 Vision, Whisper, TTS)
  • Implements DashScope multimodal tools (Wanx, Qwen-VL, Paraformer, Sambert)
  • Extends MediaUtils with new utility methods for handling file/URL conversions and image processing
  • Adds comprehensive unit and E2E tests for both implementations

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pom.xml Adds mockito-junit-jupiter dependency for enhanced testing capabilities
agentscope-dependencies-bom/pom.xml Defines mockito-junit-jupiter version in dependency management
OpenAIMultiModalTool.java Implements OpenAI multimodal conversions with comprehensive error handling
DashScopeMultiModalTool.java Implements DashScope multimodal conversions with streaming audio support
MediaUtils.java Adds URL/file handling utilities, RGBA image conversion, and protocol URL methods
OpenAIMultiModalToolTest.java Provides comprehensive unit tests with mocking for OpenAI tools
DashScopeMultiModalToolTest.java Provides comprehensive unit tests with mocking for DashScope tools
OpenAIMultiModalToolE2ETest.java Provides E2E tests for OpenAI API integration
DashScopeMultiModalToolE2ETest.java Provides E2E tests for DashScope API integration
MediaUtilsTest.java Adds tests for new MediaUtils methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@AlbumenJ AlbumenJ merged commit 4f2eecc into agentscope-ai:main Dec 19, 2025
4 checks passed
@guanxuc guanxuc deleted the tool-multimodal branch December 20, 2025 03:07
JGoP-L pushed a commit to JGoP-L/agentscope-java that referenced this pull request Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Text to Multi-Modal Tool Support

2 participants