A distributed cron with cli and web ui
Prerequisites:
- a running kubernetes installation
- kubectl on the local machine connected to the remote k8s cluster
- ingress configured on remote cluster
Steps:
- edit configuration at end of k8s/app.yaml
- run k8s/install.sh
- Full cli
- Web based UI with real time status udpates
- Runs k8s jobs
- Supports nodeSelector
- No downtime deploys
- Up and running in 15 minutes.
Tilloo has been a great tool for distributed cron at my current startup for the last five years. We are now moving to containers and want to add the ability to schedule runs in containers. As we thought through this we decided to radically change the implementation of Tilloo. I have created a v1.0 branch for folks who want to continue to use the older version. Master will become the containerized implementation. I plan on making the following changes:
- Eliminate worker. Scheduler will instead schedule container execution against AWS EKS / Kubernetes
- Web interface will change to allow you to specify parameters to launch containers using Kubernetes Jobs.
- Scheduler will run in a Kubernetes deployment
- Web will run in a Kubernetes deployment
- Simplify jobs interface to remove JobId and add a tooltip with jobId and description.
- Allow filtering of jobs
- Moved config.json into a configMap. This removes the requirement to build your own containers
- Removed disqueue and replaced with rabbitmq
I have been a long time user of Sooner.io https://github.com/seven1m/sooner.io but have hit issues around zero downtime deployments and worker scale out. We looked at extending Sooner.io but the original author is no longer maintaining it and it has some fundamental architectural issues that would require a significant overhaul to hit our goals.
We evaluated Chronos https://mesos.github.io/chronos/ but found it to be quite a bit more complex than what we needed. That complexity came with overhead in terms of getting it setup, etc.
I wrote this to help my startup get past these issues.
We evaluated the built in cron support in k8s 1.10 but found it lacking in terms of tracking what each job is doing and liked the realtime nature of this solution. We liked the concept of Jobs and the ability to schedule them across the k8s cluster. We also liked the deployment flexibility of containers. This motivated us to update tilloo to focus on a containerized workload on k8s.
We leverage:
- mongodb https://www.mongodb.com/ for storage
- rabbitmq https://www.rabbitmq.com/ for communication
Mongodb will be installed using helm with the default install script. The app.yaml will run a rabbitmq pod.
The default configuration is:
- StatefulSet running mongodb in the tilloo-services namespace on port 27017
- Deployment running rabbitmq on port 5672
- Deployment running scheduler listening on port 80 with an ingress configured
- Deployment running web listening on port 80 with an ingress configured
- DaemonSet running a logger service on each node
If your environment satisfies the prequisites and is good with the above ports you are good to go.
The typical Tilloo environment consists of:
- 1 Scheduler
- 1 Web UI
- 1 logger on each k8s node
Once everything is installed in k8s open a web browser to http://.
Enjoy!
In a k8s environment the cli is best run from an interactive shell started on the k8s cluster inside the tilloo-services namespace.
Running npm link will put symlinks to the tilloo-cli into your /usr/local/bin directory.
Adds a job to the system.
Arguments
- schedule - cron style schedule of the form * * * * * *. i.e. 0 0 */1 * * * to run once an hour
- path - Path to the executable to run. The path is relative to the worker directory. Absolute paths are allowed.
Options
- --jobname <name> - Friendly name of job. If not specified defaults to the path.
- --path <path> - Optional path to executable to run inside container
- --timeout <seconds> - Max time to allow job to run before it is killed.
- --nodeselector <nodeselector> - Node selector expression to tie run of job to a subset of nodes
- --jobargs <args> - Ordered comma separated list of job arguments i.e. --jobargs "300,test"
- --jobdescription <description> - Notes about job
- --mutex <true||false> - If set to true only a single instance of job is allowed to run. Defaults to true.
Deletes a job from the system.
Arguments
- jobId - The id of the job to delete.
Each run represents a document in mongodb. Each log line is also a document. If you have jobs that run frequently this can impact the performance of mongodb. This command line allows you to expire runs and their logs from mongodb based on the created at date of the run. The days argument represents how many days worth of data to keep. If you specify 7 days any runs created more than 7 days ago will be expired.
Arguments
- jobId - The id of the job to delete.
Get a json description of the job. Includes job details not shown elsewhere. Useful for debugging.
Arguments
- jobId - The id of the job to get details for.
Arguments
- runId - The id of the run of a job to kill. The runId is not the same as the jobId. The runId is associated with a particular run of a job.
Options
- --force - If a worker is killed a job can be left in a busy state but will never complete. If the job has mutex = true it can prevent the next scheduled execution of the job. These zombie jobs will be cleaned up by default every 5 minutes. You can use the option to force it to be cleaned up immediately.
Lists all jobs
Lists all runs for a jobId chronological order
Arguments
- jobId - The id of the job to list all runs for
Get a json description of the run. Includes run details not shown elsewhere. Useful for debugging.
Arguments
- runId - The id of the run to get details for.
Gets the stdout/stderr from the run of a job.
Arguments
- runId - The id of the run to get output for.
The config file has sensible defaults filled in. All keys present in the shipped config file must remain. Removing settings will cause tilloo to fail to run.
Settings
- db - Mongodb database connection string. This is passed directly to mongoose under the covers and supports any mongodb options.
- rabbitmq - Settings for rabbitmq. Specify the host and the port.
- scheduler - Settings pertaining to the scheduler
- host - The host the scheduler resides on. This is used by tilloo-web to connect to the web sockets interface the scheduler exposes.
- port - The port the web sockets interface is exposed on.
- zombieAge - If a job hasn't seen a heartbeat in this many minutes it is marked as failed.
- zombieFrequency - How frequently the zombie garbage collector should start in minutes
- runHistoryDays - How long to keep logs of each run around. No default. If not set no expiration is done. Recommended setting of 90 days depending on volume of jobs.
- web - Settings pertaining to the web interface
- port - The port to start the web interface on
There are notifiers included that can notify of job failures via Mandrill, AWS SNS, or AWS SES. Notifiers are easily added for other destinations. If you add one please submit a pull request and share it.
- Running tilloo on Raspberry PI/k3s - [tilloo-k3s] (https://github.com/chriskinsman/tilloo-k3s)
- Running tilloo on Raspberry PI/k8s - [tilloo-pi] (https://github.com/chriskinsman/tilloo-pi)
- Running tilloo on AWS AKS - [tilloo-aws] (https://github.com/chriskinsman/tilloo-aws)
- Running tilloo on Azure AKS - [tilloo-azure] (https://github.com/chriskinsman/tilloo-azure)
The author is Chris Kinsman