Name	Name	Last commit message	Last commit date
Latest commit History 1,701 Commits
.circleci	.circleci
.github	.github
build	build
cli	cli
dev	dev
docs	docs
images	images
manager	manager
pkg	pkg
test	test
.dockerignore	.dockerignore
.gitbook.yaml	.gitbook.yaml
.gitignore	.gitignore
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md
get-cli.sh	get-cli.sh
go.mod	go.mod
go.sum	go.sum

Name

Last commit message

Last commit date

1,701 Commits

Run inference at scale

Cortex is an open source platform for large-scale inference workloads.

Model serving infrastructure

Supports deploying TensorFlow, PyTorch, sklearn and other models as realtime or batch APIs.
Ensures high availability with availability zones and automated instance restarts.
Runs inference on on-demand instances or spot instances with on-demand backups.
Autoscales to handle production workloads with support for overprovisioning.

Configure a cluster

# cluster.yaml

region: us-east-1
instance_type: g4dn.xlarge
min_instances: 10
max_instances: 100
spot: true

Spin up on your AWS or GCP account

$ cortex cluster up --config cluster.yaml

￮ configuring autoscaling ✓
￮ configuring networking ✓
￮ configuring logging ✓

cortex is ready!

Reproducible deployments

Package dependencies, code, and configuration for reproducible deployments.
Configure compute, autoscaling, and networking for each API.
Integrate with your data science platform or CI/CD system.
Deploy custom Docker images or use the pre-built defaults.
Test locally before deploying to a cluster.

Define an API

class PythonPredictor:
  def __init__(self, config):
    from transformers import pipeline

    self.model = pipeline(task="text-generation")

  def predict(self, payload):
    return self.model(payload["text"])[0]

requirements = ["tensorflow", "transformers"]

Configure an API

api_spec = {
  "name": "text-generator",
  "kind": "RealtimeAPI",
  "compute": {
    "gpu": 1,
    "mem": "8Gi"
  },
  "autoscaling": {
    "min_replicas": 1,
    "max_replicas": 10
  }
}

Scalable machine learning APIs

Scale to handle production workloads with request-based autoscaling.
Stream performance metrics and logs to any monitoring tool.
Serve many models efficiently with multi-model caching.
Use rolling updates to update APIs without downtime.
Configure traffic splitting for A/B testing.

Deploy to your cluster

import cortex

cx = cortex.client("aws")
cx.create_api(api_spec, predictor=PythonPredictor, requirements=requirements)

# creating https://example.com/text-generator

Consume your API

$ curl https://example.com/text-generator -X POST -H "Content-Type: application/json" -d '{"text": "hello world"}'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Run inference at scale

Model serving infrastructure

Configure a cluster

Spin up on your AWS or GCP account

Reproducible deployments

Define an API

Configure an API

Scalable machine learning APIs

Deploy to your cluster

Consume your API

Get started

About

Uh oh!

Releases 63

Uh oh!

Contributors 22

Uh oh!

Languages

License

cortexlabs/cortex

Folders and files

Latest commit

History

Repository files navigation

Run inference at scale

Model serving infrastructure

Configure a cluster

Spin up on your AWS or GCP account

Reproducible deployments

Define an API

Configure an API

Scalable machine learning APIs

Deploy to your cluster

Consume your API

Get started

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 63

Uh oh!

Contributors 22

Uh oh!

Languages