Curator is a tool that we use at MongoDB to generate our package
repositories (e.g. repo.mongodb.org
and
repo.mongodb.com
). Additionally, curator provides tooling to
support the automated publication and management of build artifacts as
part of our continuous integration system.
Please refer to the --help
output for curator
and its
sub-commands. The following sections provides overviews of these
components and their use.
Curator includes a basic set of AWS S3 operations on the
command line. Although the get, put, and delete operations are faily
basic the sync-to
and sync-from
operations provide parallel
directory tree sync operations between S3 buckets and the local file
system. These operations compare objects using MD5 checksums, do
multi-threaded file uploads, and retry failed operations with
exponential backoff, for efficient and robust transfers.
The repobuilder is a helper package to submit Barque jobs to build RPM and DEB package repositories in S3. These jobs: sync files from an existing repository, add packages from the local filesystem to the repository, sign packages (for RPM), regenerate package metadata, sign package metadata, generate html pages for web-based display, and sync the changed files to the remote repository.
The current implementation of the repobuilder process depends on
external repository generation tools (e.g. createrepo
and
apt
tools.) Additionally, the repobuilder currently depends on
MongoDB's internal signing service.
The artifacts functionality uses the release metadata feeds
(e.g. https://downloads.mongodb.org/full.json
) to fetch and
extract release build artifacts for local use. It is particularly
useful for fetching artifacts for and maintaining local caches of
MongoDB builds for multiple releases. Set the
CURATOR_ARTIFACTS_DIRECTORY
environment variable or pass the
--path
option to a flag, and then use the curator artifacts
download
command to download files.
The artifacts
command also includes two exploration subcommands
for discovering available builds: Use the list-map
for specific
lists of available edition, target, and architecture combinations and
list-all
for a list of available target and architectures. Both
list operations are specific to a single version.
Combine the artifacts tool with the prune tool to avoid unbounded cache growth.
Prune is based on the lru library, and takes a file system tree and removes files, based on age, until the total size of the files is less than a specified maximum size. Prune uses modified time for age, in an attempt to have consistent behavior indepenent of operation system and file system.
There are two modes of operation, a recursive mode which removes objects from the tree recursively, but skips directory objects, and directory mode, which does not collect objects recursively, but tracks the size for the contents--recursively--of top-level directories.
- All operations in the continuous integration environment should be easily reproducible outside of the environment. In practice, curator exists to build repositories inside of Evergreen tasks; however, it is possible to run all stages of the repository building process by hand. The manual abilities are useful and required for publishing package revisions and repairing corrupt repositories.
- Leverage, as possible, third party libraries and tools. For example, the cache pruning and artifact management functionality is entirely derived from third-party repositories maintained separately from curator.
- Major functionality is implemented and executed in terms of amboy jobs and queues. Not only does this provide a framework for task execution, but leaves the door open to provide curator functionality as a highly available service with minimal modification.
See the godoc API documentation for more information about curator interfaces and internals.
Please file all issues in the EVG project in the MongoDB Jira instance.