基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
-
Updated
May 26, 2025 - Python
基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."