Cortex is a cloud native model serving platform for machine learning engineering teams.
- Realtime machine learning - build NLP, computer vision, and other APIs and integrate them into any application.
- Large-scale inference - scale realtime or batch inference workloads across hundreds or thousands of instances.
- Consistent MLOps workflows - create streamlined and reproducible MLOps workflows for any machine learning team.
- Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client.
- Run realtime inference, batch inference, asynchronous inference, and training jobs.
- Define preprocessing and postprocessing steps in Python and chain workloads seamlessly.
$ cortex deploy apis.yaml
• creating text-generator (realtime API)
• creating image-classifier (batch API)
• creating video-analyzer (async API)
all APIs are ready!
- Create A/B tests and shadow pipelines with configurable traffic splitting.
- Automatically stream logs from every workload to your favorite log management tool.
- Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards.
$ cortex get
API TYPE GPUs
text-generator realtime 32
image-classifier batch 64
video-analyzer async 16
- Configure workload and cluster autoscaling to efficiently handle large-scale production workloads.
- Create clusters with different types of instances for different types of workloads.
- Spend less on cloud infrastructure by letting Cortex manage spot instances.
$ cortex cluster info
region: us-east-1
instance_types: [c5.xlarge, g4dn.xlarge]
spot_instances: true
min_instances: 10
max_instances: 100