[Made Public]
Multimodal AI Inference
This repository is dedicated to running inference tasks using various large vision models (LVMs) over secure SSH connections. It serves as a growing collection of scripts that implement and manage inference for cutting-edge multimodal AI models, focusing on both vision and language tasks.
- Qwen Inference: Leverages Qwen, a robust language model, to process multimodal input through
qwen_inference.py. - Llava Next Integration: Adds advanced visual understanding with Llava Next, utilizing
llava_next_inference.py. - Continuous Expansion: As more large vision models are explored and integrated, the repository will expand with additional inference files.
- SSH-based Inference: All inference processes are conducted remotely over SSH, providing scalable and secure access to compute resources.
qwen_inference.py: Script for running inference tasks using the Qwen model.llava_next_inference.py: Inference script for Llava Next, aimed at advanced visual understanding.
This repository is designed for AI researchers and developers working on large vision models. It facilitates the remote deployment and inference of state-of-the-art vision and multimodal models, with secure SSH-based access.
- Integration of additional large vision models for comprehensive multimodal tasks.
- Support for larger datasets and batch processing capabilities.
- Performance benchmarking and optimization for inference tasks.