Simple, scalable AI model deployment on GPU clusters
-
Updated
Nov 15, 2025 - Python
Simple, scalable AI model deployment on GPU clusters
Analyze and generate unstructured data using LLMs, from quick experiments to billion token jobs.
Source code of the paper "Private Collaborative Edge Inference via Over-the-Air Computation".
Official impl. of ACM MM paper "Identity-Aware Attribute Recognition via Real-Time Distributed Inference in Mobile Edge Clouds". A distributed inference model for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system. Jointly training of pedestrian attribute recognition and Re-ID.
Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics
Add a description, image, and links to the distributed-inference topic page so that developers can more easily learn about it.
To associate your repository with the distributed-inference topic, visit your repo's landing page and select "manage topics."