Welcome to the official documentation for Winbay's AI Inference API. We are a globally distributed team providing high-performance, secure, and cost-effective inference for premier open-source models.
Website: https://winbay.io
Contact: info@winbay.io
Winbay is a globally distributed team with members in the United States, Europe, and Singapore, dedicated to providing premier AI inference services. We are focused on delivering an unparalleled user experience built upon our core advantages: aggressive pricing, exceptional speed, unrestricted concurrency, robust security, and strategic server locations.
We build our service around the features that matter most to our users.
- Aggressive Pricing: We offer highly competitive pricing and generous volume discounts (30-50%) to ensure you receive the best possible value.
- Exceptional Speed: Our stack, optimized with enterprise-grade GPUs, delivers extremely low latency and high throughput for demanding applications.
- Unlimited Concurrency: We do not impose default rate limits. Our infrastructure is built to handle high concurrency, allowing your services to scale without restriction.
- Ironclad Security: We enforce a strict zero-retention policy. No prompt or completion data is ever stored, ensuring maximum privacy and security.
- Strategic Server Locations: With servers located in the United States and Singapore, we guarantee optimal performance and low latency for users across the North American and Asia-Pacific (APAC) regions.
Our API is fully compatible with the OpenAI standard.
We provide optimized inference for a wide range of the latest high-performance models.
| Model Family | Models |
|---|---|
| Anthropic | claude-sonnet-4-20250514, claude-sonnet-4-20250514-thinking |
| Mistral AI | codestral-latest, ministral-3b-latest, ministral-8b-latest, mistral-large-latest, mistral-small, mistral-small-2501, mistral-small-2503, mistral-small-3.1-24b, mistral-small-latest, mistral-tiny-latest, open-mistral-nemo, open-mixtral-8x7b, pixtral-12b-latest, pixtral-large-latest, magistral-medium-latest, magistral-small-latest |
| DeepSeek | deepseek-r1, deepseek-r1-0528, deepseek-r1-search, deepseek-v3, deepseek-v3-0324, deepseek-v3-search |
gemini-2.0-flash, gemini-2.0-flash-exp-image-generation, gemini-2.0-flash-lite, gemini-2.0-flash-preview-image-generation, gemini-2.5-flash, gemini-2.5-flash-lite-preview-06-17, gemini-2.5-flash-preview-04-17, gemini-2.5-flash-preview-04-17-thinking, gemini-2.5-flash-preview-05-20, gemini-2.5-pro, gemini-2.5-pro-preview-05-06, gemini-2.5-pro-preview-06-05, imagen-3.0-generate-002, gemma-3-27b-it |
|
| OpenAI | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o-mini-search-preview, o3 |
| xAI | grok-3-mini |
| Meta | meta/llama-4-maverick-17b-128e-instruct |
| Qwen | qwen3-235b-a22b, qwen3-30b-a3b, qwen3-32b |