The Yari Deployer does two things. First, it's used to upload pre-built document pages, static files (e.g. JS, CSS, and image files), and sitemap files into an existing AWS S3 bucket. Since we serve MDN document pages from an S3 bucket via a CloudFront CDN, this is the way we upload a new version of the site.
Second, it is used to update and publish changes to existing AWS Lambda functions. For example, we use it to update and publish new versions of a Lambda function that we use to transform incoming document URL's into their corresponding S3 keys.
You can install it globally or in a virtualenv
environment. Whichever you
prefer.
cd deployer
poetry install
poetry run deployer --help
Please refer to the
boto3
documentation
with regards to configuring AWS access credentials.
The poetry run deployer upload DIRECTORY
command uploads files into an
existing S3 bucket. Currently, we have three S3 buckets that we upload into:
mdn-content-dev
(for variations or experimental versions of the site),
mdn-content-stage
, and mdn-content-prod
.
As input, the upload
command takes a directory which contains the files that
should be uploaded. The files are uploaded into a sub-folder (a.k.a. prefix
)
of the S3 bucket's root. The prefix (--prefix
option) defaults to main
,
which is most likely what you'll want for uploads to the mdn-content-stage
and
mdn-content-prod
S3 buckets. However, for uploads to the mdn-content-dev
bucket, the prefix is often used to specify a different folder for each
variation of the site that is being reviewed/considered.
When uploading files, the Deployer is intelligent about what it uploads. If only
uploads files whose content has changed, skipping the rest. However, since the
cache-control
attribute of a file is not considered part of its content, if
you'd like to change the cache-control
from what's in S3, it's important to
use the --force-refresh
option to ensure that all files are uploaded with
fresh cache-control
attributes.
export CONTENT_ROOT=/path/to/content/files
export CONTENT_TRANSLATED_ROOT=/path/to/translated-content/files
cd deployer
poetry run deployer upload --bucket mdn-content-dev --prefix pr1234 ../client/build
export CONTENT_ROOT=/path/to/content/files
export CONTENT_TRANSLATED_ROOT=/path/to/translated-content/files
export DEPLOYER_BUCKET_NAME=mdn-content-dev
export DEPLOYER_BUCKET_PREFIX=pr1234
cd deployer
poetry run deployer upload ../client/build
The command:
cd deployer
poetry run deployer update-lambda-functions
will discover every folder that contains a Lambda function, create a deployment package (Zip file) for each one by running:
yarn make-package
and if the deployment package is different from what is already in AWS, it will upload and publish a new version.
You just need a URL (or host name) for an Elasticsearch server and the root of
the build directory. The command will trawl all index.json
files and extract
all metadata and blocks of prose which get their HTML stripped. The command is:
cd deployer
poetry run deployer search-index --help
If you have built the whole site (or partially) you simply point to it with the first argument:
poetry run deployer search-index ../client/build
But by default, it does not specify the Elasticsearch URL/host. You can either use:
export DEPLOYER_ELASTICSEARCH_URL=http://localhost:9200
poetry run deployer search-index ../client/build
...or...
poetry run deployer search-index ../client/build --url http://localhost:9200
Note! If you don't specify either the environment variable or the --url
option, the script will not fail (ie. exit non-zero). This is to make it
convenient in GitHub Actions to control the execution purely based on the
presence of the environment variable.
The default behavior is that each day you get a different index name. E.g.
mdn_docs_20210331093714
. And then there's an alias with a more "generic" name.
E.g. mdn_docs
. It's the alias name that Kuma uses to send search queries to.
The way indexing works is that we leave the existing index and its alias in place, then we fill up a new index and once that works, we atomically "move the alias" and delete the old index. To demonstrate, consider this example timeline:
- Yesterday: index
mdn_docs_20210330093714
andmdn_docs --> mdn_docs_20210330093714
- Today:
- create new index
mdn_docs_20210331094500
- populate
mdn_docs_20210331094500
(could take a long time) - atomically re-assign alias
mdn_docs --> mdn_docs_20210331094500
and delete old indexmdn_docs_20210330093714
- delete old index
mdn_docs_20210330
- create new index
Note, this only applies if you don't use --update
. If you use --update
it
will just keep adding to the existing index whose name is based on today's date.
What this means it that there is zero downtime for the search queries. Nothing needs to be reconfigured on the Kuma side.
The default behavior is that it deletes the index first and immediately creates
it again. You can switch this off by using the --update
option. Then it will
"cake on" the documents. So if something has been deleted since the last build,
you would still have that "stuck" in Elasticsearch.
Deleting and re-creating the index is fast so it's relatively safe to use often. But the indexing can take many seconds and while indexing, Elasticsearch can only search what's been indexed so far.
An interesting pattern would be to use --update
most of the time and only from
time to time omit it for a fresh new start.
But note, if you omit the --update
(i.e. recreating the index), search will
work. It just may find less that it finds when it's fully indexed.
When you've built files you can analyze those built files to produce a Markdown comment that you can post as a PR issue comment. To do that, run:
poetry run deployer analyze-pr-build ../client/build
But the actions are controlled by various options. You can mix and match these:
This will open each built index.json
and look through the .flaws
and try to
convert each flaw into a list.
It will analyze all the content and look for content that could be "dangerous". For example, it will list all external URLs found in the content.
The prefix
refers to a prefix in the Deployer upload. I.e. what you set when
you run poetry run deployer upload --prefix=THIS
. The prefix
is used to
specify the proper Dev subdomain ({prefix}.content.dev.mdn.mozit.cloud
) for
the URLs of the built documents. For example, if --prefix experiment1
is
specified, it will list:
## Preview URLs
- <https://experiment1.content.dev.mdn.mozit.cloud/en-US/docs/MDN/Kitchensink>
...assuming the only page that was built was build/en-us/docs/mdn/kitchensink
.
Note that this assumes the PR build has been deployed to the Dev server.
This is useful for debugging when the PR you made wasn't on mdn/content
. For
example:
poetry run deployer analyze-pr-build ../client/build --repo peterbe/content ...
By default it will pick up the $GITHUB_TOKEN
environment variable but with
this option you can override it.
This is needed to be able to find the PR (on https://github.com/mdn/content/pulls) to post the comment to.
This is mostly useful for local development or when debugging. It determines
whether to print to stdout
what it would post as a PR issue comment.
This option, just like the --dry-run
is technically part of the deployer
command and not the analyze-pr-build
sub-command. So put it before the
analyze-pr-build
.
This example demonstrates all options.
poetry run deployer --verbose --dry-run analyze-pr-build ../client/build \
--analyze-flaws --analyze-dangerous-content --github-token="xxx" \
--repo=peterbe/content --pr-number=3
An important part of the analyze-pr-builds
command is that it must be easy to
debug and develop further without having to rely on landing code in main
and
seeing how it worked.
The first thing you need to do is to download a build
artifact or to simply
run yarn build
and use the ../client/build
directory. To download the
artifact go to a finished "PR Test" workflow, like
https://github.com/mdn/content/pull/3381/checks?check_run_id=2169672013 for
example. Near the upper right-hand corner of the content (near the "Re-run jobs"
button) it says "Artifacts (1)". Download that build.zip
file somewhere and
unpack it. Now you can run:
poetry run deployer --verbose analyze-pr-build ~/Downloads/build ...
You can even go and get a personal access token and set $GITHUB_TOKEN
(assuming it has the right scopes) and have it actually post the comment.
The following environment variables are supported.
DEPLOYER_BUCKET_NAME
is equivalent to using--bucket
(the default ismdn-content-dev
)DEPLOYER_BUCKET_PREFIX
is equivalent to using--prefix
(the default ismain
)DEPLOYER_NO_PROGRESSBAR
is equivalent to using--no-progressbar
(the default istrue
if not run from a terminal or theCI
environment variable istrue
like it is for GitHub Actions, otherwise the default isfalse
)DEPLOYER_CACHE_CONTROL
can be used to specify thecache-control
header for all non-hashed files that are uploaded (the default is3600
or one hour)DEPLOYER_HASHED_CACHE_CONTROL
can be used to specify thecache-control
header for all hashed files (e.g.,main.3c12da89.chunk.js
) that are uploaded (the default is31536000
or one year)DEPLOYER_MAX_WORKERS_PARALLEL_UPLOADS
controls the number of worker threads used when uploading (the default is50
)DEPLOYER_LOG_EACH_SUCCESSFUL_UPLOAD
will print successful upload tasks tostdout
. The default is that this isFalse
.DEPLOYER_ELASTICSEARCH_URL
used by thesearch-index
command.CONTENT_ROOT
is equivalent to using--content-root
(there is no default)CONTENT_TRANSLATED_ROOT
is equivalent to using--content-translated-root
(there is no default)
You need to have
poetry
installed on your system. Now run:
cd deployer
poetry install
That should have installed the CLI:
cd deployer
poetry run deployer --help
If you want to make a PR, make sure it's formatted with black
and passes
flake8
.
You can check that all files are flake8
fine by running:
cd deployer
poetry run flake8 .
And to format all files with black
run:
cd deployer
poetry run black .