Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms currently missing from Kubernetes that are commonly required by many classes of batch & elastic workload including:
- machine learning/deep learning,
- bioinformatics/genomics
- other "big data" applications.
These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, PyTorch, MPI, etc, which Volcano integrates with.
Some examples of the mechanisms and features that Volcano adds to Kubernetes are:
- Job management extensions and improvements, e.g:
- Multi-pod jobs
- Lifecycle management extensions including suspend/resume and restart.
- Improved error handling
- Indexed jobs
- Task dependencies
- Scheduling extensions, e.g:
- Co-scheduling
- Fair-share scheduling
- Queue scheduling
- Preemption and reclaims
- Reservations and backfills
- Topology-based scheduling
- Runtime extensions, e.g:
- Support for specialized container runtimes like Singularity, with GPU accelerator extensions and enhanced security features.
- Other
- Data locality awareness and intelligent scheduling
- Optimizations for data throughput, round-trip latency, etc.
Volcano builds upon a decade and a half of experience running a wide variety of high performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open source community.
NOTE: the scheduler is built based on kube-batch; refer to #241 and #288 for more detail.
You can watch industry experts talking about Volcano in different International Conferences over here.
- Kubernetes 1.12+ with CRD support
You can try volcano by one the following two ways.
Install volcano on a existing Kubernetes cluster.
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml
Enjoy! Volcano will create the following resources in volcano-system
namespace.
NAME READY STATUS RESTARTS AGE
pod/volcano-admission-5bd5756f79-dnr4l 1/1 Running 0 96s
pod/volcano-admission-init-4hjpx 0/1 Completed 0 96s
pod/volcano-controllers-687948d9c8-nw4b4 1/1 Running 0 96s
pod/volcano-scheduler-94998fc64-4z8kh 1/1 Running 0 96s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/volcano-admission-service ClusterIP 10.98.152.108 <none> 443/TCP 96s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/volcano-admission 1/1 1 1 96s
deployment.apps/volcano-controllers 1/1 1 1 96s
deployment.apps/volcano-scheduler 1/1 1 1 96s
NAME DESIRED CURRENT READY AGE
replicaset.apps/volcano-admission-5bd5756f79 1 1 1 96s
replicaset.apps/volcano-controllers-687948d9c8 1 1 1 96s
replicaset.apps/volcano-scheduler-94998fc64 1 1 1 96s
NAME COMPLETIONS DURATION AGE
job.batch/volcano-admission-init 1/1 48s 96s
If you have no kubernetes cluster, try one click install from code base:
./hack/local-up-volcano.sh
You can reach the maintainers of this project at:
Slack Channel : https://volcano-sh.slack.com. (Signup here)
Mailing List : https://groups.google.com/forum/#!forum/volcano-sh