qwen-vl-utils

Here are 2 public repositories matching this topic...

PRITHIVSAKTHIUR / Fara-7B-GUI-Operator

A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Click on the search bar"), and receive parsed actions with visualized indicators overlaid on the image.

Updated Dec 8, 2025
Python

PRITHIVSAKTHIUR / Vision-to-VibeVoice-en

Star

A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.

opencv text-to-speech cuda pillow torch python3 accelerate matplotlib gradio opencv-python nvidia-gpu torchvision huggingface-transformers huggingface-spaces huggingface-hub vision-to-audio vibevoice vibevoice-microsoft qwen-vl-utils

Updated Dec 8, 2025
Python

Improve this page

Add a description, image, and links to the qwen-vl-utils topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen-vl-utils topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly