title | description |
---|---|
Overview |
How to deploy your Pipecat bot online |
You've created your Pipecat bot, had a good chat with it locally, and are eager to share it with the world. Let’s explore how to approach deployment.
We're continually adding further deployment example projects to the Pipecat repo, which you can find [here](https://github.com/pipecat-ai/pipecat/tree/main/examples/deployment).You have several options for deploying your Pipecat bot:
- Pipecat Cloud - A purpose-built managed service designed specifically for Pipecat deployments
- Self-managed cloud deployment - Deploy to providers like Fly.io, AWS, Google Cloud Run, etc.
- Custom infrastructure - Run on your own servers or specialized AI infrastructure
The best choice depends on your specific requirements, scale, and expertise.
-
Transport service - Pipecat has existing services for various different media transport modes, such as WebRTC or WebSockets. If you're not using a third-party service for handling media transport, you'll want to make sure that infrastructure is hosted and ready to receive connections.
-
Deployment target - You can deploy and run Pipecat bots anywhere that can run Python code - Google Cloud Run, AWS, Fly.io etc. We recommend providers that offer APIs, so you can programmatically spawn new bot agents on-demand.
-
Docker - If you're targeting cloud architecture / VMs, they will most often expect a containerized app. It's worth having Docker installed and setup to run builds. We'll step through creating a
Dockerfile
in this documentation.
In local development, things often work great as you're testing on controlled, stable network conditions. In real-world use-cases, however, your users will likely interact with your bot across a variety of different devices and network conditions.
WebSockets are fine for server-to-server communication or for initial development. But for production use, you'll likely want client-server audio that uses a protocol designed for real-time media transport. For an explanation of the difference between WebSockets and WebRTC, see this post.
If you're targeting scalable, client-server interactions, we recommend you use WebRTC for the best results.
Most chatbots require very little in the way of system resources, but if you are making use of custom models or require GPU-powered infrastructure, it's important to consider how to pre-cache local resources so that they are not downloaded at runtime. Your bot processes / VMs should aim to launch and connect as quickly as possible, so the user is not left waiting.
Designing and operating a pool of workers is out of scope for our documentation, but we'll highlight best practices in all of our examples.
As an example of a supporting model, most Pipecat examples make use of Silero VAD which we recommend including as part of your Docker image (so it's cached and readily available when your bot runs.) Since the Silero model is quite small, this doesn't inflate the size of the container too much. You may, however, want to consider making large models availabile via a network volume and ensuring your bot knows where to find it.
For Silero specifically, you can read more about how to do download it directly here.
# Run at buildtime
torch.hub.load(
repo_or_dir='snakers4/silero-vad',
model='silero_vad',
force_reload=True
)
<Card title="Basic deployment pattern" icon="draw-square" iconType="duotone" href="/deployment/pattern"
Introduction to a model for deploying Pipecat bots
Once you've familiarized yourself with the Pipecat deployment pattern, here are some guides that walk you through the process for various deployment options. Remember, your Pipecat bots are simply Python processes, so you can host them on whichever infrastructure or service best suits your project.
Managed service purpose-built for Pipecat deployments For service-driven / CPU bots For GPU-accelerated models & specialized AI infrastructure