Skip to content
#

qwen-vl-utils

Here are 2 public repositories matching this topic...

A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Click on the search bar"), and receive parsed actions with visualized indicators overlaid on the image.

  • Updated Dec 8, 2025
  • Python

A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.

  • Updated Dec 8, 2025
  • Python

Improve this page

Add a description, image, and links to the qwen-vl-utils topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl-utils topic, visit your repo's landing page and select "manage topics."

Learn more