-
Notifications
You must be signed in to change notification settings - Fork 238
Use multi-stage build and replace CMD with ENTRYPOINT #125
Conversation
We need to deal with two, rather conflicting requirements #90 and For k8s and other orchestration systems, there is no direct equivalent to docker native volumes[1] but instead volumes are owned by root by default. The GID at least can be influenced with `securityContext`[2], but for lots of people it's easier to have an easier mechanism. One solution is described in [3]. However the existing setup doesn't work if the container is executed with a different `--user` or in Openshift where the `USER` in the `Dockerfile` is overriden and the ENTRYPOINT/CMD is executed by a random UID>=1024, member of 0 group. This commit switches away from the esdocker CMD to an ENTRYPOINT. The ENTRYPOINT (`/usr/local/bin/docker-entrypoint.sh`) combines Docker + Kubernetes + Openshift best practices. The group ownership of `/usr/share/elasticsearch` is now 0, so in the case of Openshift, or forcing `--user <other_uid>`, elasticsearch can still run and write to bind mounted directories. The root group doesn't grant any additional special permissions, so should somehow the elasticsearch process in the container be compromised, it can't access/mutate critical files from the host os, even if `/` is mounted in the container. Optionally, a `MUTATE_MOUNTS` env var can be explicitly used to override the UID:GID of the `data` and `logs` dirs as well, when the ENTRYPOINT runs as the default root user (i.e. not in Openshift). It's still possible to run another CMD, through the ENTRYPOINT, such as `/bin/bash` to inspect things or directly `elasticsearch -Ex.y=z`[4]. Finally, CTRL-C is now handled more appropriately, when using the image interactively. [1]: https://success.docker.com/KBase/Different_Types_of_Volumes [2]: kubernetes/kubernetes#2630 (comment) [3]: #114 (comment) [4]: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
Ensuring all files under /usr/share/elasticsearch have GID=0 **and** the same group permissions as those of the user is tricky. `-R` options of `chmod`/`chgrp` increase the image size with a large layer and introduce slowness in the build process. `tar` doesn't give us any option to force `gid` while extracting and of course we'd still need to match the group permissions. One solution is to prepare the needed files using multi-stage builds[1] in a separate stage. Unfortunately `COPY` will always use 0:0[2] overriding `USER`, so even if prepare the files at another stage, we'll still need to mangle ownership+perms or COPY and then extract an archive, which again increases image size. docker-ce 17.09 introduced the `--chown`[3] parameter, which solves this problem. Move code to extract elasticsearch, adjust ownership + group permissions, install x-pack and plugins to a stage `prep_es_files`. This allows have a very clean stage to build the final elasticsearch image, with a clean history and reasonable size. IMPORTANT: With this commit, the minimum docker-ce version on the building machine needs to be `17.09`. [1] https://docs.docker.com/engine/userguide/eng-image/multistage-build/ [2] moby/moby#7537 (comment) [3] moby/moby#34263
As required by Openshift Origin best practices[1] all files+directories under our application path should have group set to 0. [1] https://docs.openshift.com/enterprise/3.0/creating_images/guidelines.html
In order to support tests for bind-mounts, we need to make the, up to now hardcoded, `volume: ` definitions in docker-compose configurable. Allow overriding the default named volumes via the `--mount-datavolume1` and `--mount-datavolume2` pytest parameters.
Now that we can configure the data volume definitions in the docker-compose file through env vars and testinfra[1], test that elasticsearch can start and write to the data dir. [1] 9bdbf3a
For test_datadirs.py we also need to ensure that bind-mounted host dirs are empty before starting tests. Clean up local datadir1,2 directories on the host, where tests get executed, using privilege escalation through `docker run`.
Adjust the default umask in the entrypoint so that new files are group writable too. This is particularly important for Openshift, where a container with a random uid may start writing to a bind-mount, then get restarted with a new user with a different uid.
Add helper functions in toolbelt.py to abstract the tasks of creating, deleting and chown'ing directories via bind-mounts and docker run. Generalize fixture asserting that the bind-mounted data dir is writable and add test to try elasticsearch access to a bind-mounted data dir with a different uid + gid=0. Do not store empty directories in git, tests can now created/delete test directories as required.
Also FYI @tylerjl |
@dliappis That's awesome, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bold new era!
# https://docs.openshift.org/latest/creating_images/guidelines.html | ||
if ! whoami &> /dev/null; then | ||
if [ -w /etc/passwd ]; then | ||
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:/usr/share/elasticsearch:/sbin/nologin" >> /etc/passwd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this user should be called default
. The default user is elasticsearch
. This user is the opposite of default. It's an unusual, unexpected and custom user. Let's call it custom_user
or something.
From a different angle, did you consider changing the UID of the existing elasticsearch
user instead? Is that a bad idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a different angle, did you consider changing the UID of the existing elasticsearch user instead? Is that a bad idea?
That's a good suggestion. One of my concerns was for people who want to inspect a running container and a new user (with something like --user <random> -ti <container_id> /bin/bash
) .
But this is very much a corner case I think and testing it seems to work fine. Therefore I suggest an even bolder move: remove this entire section. I introduced it as suggested in the Openshift docs for fear of applications that want to lookup the associated user from /etc/passwd
. I tried building things again now without it and every combination I tried doesn't give problems, including entering the container with an interactive shell. I also tried some of the binaries under bin/
and bin/x-pack
and didn't see issues.
Adding one test case that attempts to run elasticsearch with --user=<random
will cover us somewhat, but naturally we can't write test cases to execute every binary in this scenario.
I vote for removing it altogether, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to that. Less code is less complexity, and I'm always in favour of that.
Also, I think having an anonymous user that's identified only by UID makes it very clear that is a local, ephemeral user.
# or simply to run /bin/bash to check the image | ||
if [[ "$1" != "eswrapper" ]]; then | ||
if [[ "$(id -u)" == "0" ]] && [[ "$1" == *elasticsearch* ]]; then | ||
exec chroot --userspec=1000 / "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a comment about this very unusual use of chroot? (It's very clever but very weird).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 9e3d43d
|
||
WORKDIR /usr/share/elasticsearch | ||
|
||
COPY --from=prep_es_files --chown=1000:0 /usr/share/elasticsearch /usr/share/elasticsearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!!!!!!!
exec bin/elasticsearch "${es_opts[@]}" | ||
if [[ "$(id -u)" == "0" ]]; then | ||
# If requested and running as root, mutate the ownership of bind-mounts | ||
if [[ -n "$MUTATE_MOUNTS" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$TAKE_FILE_OWNERSHIP
? It's not obvious what kind of "mutation" is going to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 74ca02a
To simulate edge cases like Openshift Open, simulate running Elasticsearch with a random high uid while bound-mounting a host directory that is owned by a different uid having gid=0.
This code block was added to prevent errors that there's no matching user entry in /etc/passwd when the container runs with a random `--user`. However all current tests show nothing (including inspecting the image with `-ti <container-name> /bin/bash`) requires a matching entry in `/etc/passwd` for the current uid, so less code is better.
The chroot that comes with the coreutils version of the centos:7 base image doesn't have the --skip-chdir parameter, thus changes the CWD to /. Using the full path `/usr/share/elasticsearch/bin/elasticsearch` is perfectly fine, but if a user wants to override the CMD to execute elasticsearch directly as documented currently as configuration option D[1], `elasticsearch -E x.y=z` would work, but not `bin/elasticsearch -E x.y=z`. Overwrite $1 argument of a custom CMD to `elasticsearch` when using configuration option D. so that we retain backwards compatibility with the docs. [1] https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
This PR moves away from running the
es-docker
wrapper script viaCMD
to a newENTRYPOINT
script.The
ENTRYPOINT
runs as root, on vanilla docker and most orchestration frameworks, and switches to the pre-provisionedelasticsearch
user with uid=gid=1000 before exec'ingelasticsearch
.It's still possible to run another
CMD
, through theENTRYPOINT
, suchas
/bin/bash
to inspect things or directlyelasticsearch -Ex.y=z
[4].As there is at least one docker orchestration framework, Openshift Open[1], that overrides the Dockerfile
USER
to something with random uid and gid=0, the entrypoint script conditionally skips the privilege de-escalation in this case. To achieve this, theelasticsearch.tar.gz
needs to be extracted so that all files+dirs are gid=0 and have the same group permissions as the owner. This is according to RedHat's best practices described in [1].We avoid generating another large layer due to
chown -R
andchgrp -R
, by utilizing multi-stage builds and the newCOPY --chown=
feature[2], available starting withdocker-ce >=17.09
. This is only a build requirement, that doesn't affect the portability of the image. As a side effect we are also able to significantly clean up the Dockerfile jinja2 template.We also introduce the optional
TAKE_FILE_OWNERSHIP
flag which will. in most orchestration environments, forcefully change the uid and gid of the data and logdir. This can help users bind-mounting host directories that don't provide write access either to the default user 1000 or to gid 0. This flag is not expected to receive much use in orchestration frameworks as k8s has the securityContext option[3], elsewhere volumes are typically gid=0 and g=rwx and docker+docker swarm have named volumes and storage drivers.The PR also includes three new tests:
/usr/share/elasticsearch
have the right gid.There are a number of more tests that we can add here (e.g. for the
MUTATE_MOUNTS
flag, testing that elasticsearch can start with--user
forced to a different value), but I wanted to keep the size of this PR under control for now and add tests, if needed, in an additional PR.[1]: https://docs.openshift.org/latest/creating_images/guidelines.html
[2]: moby/moby#34263
[3]: kubernetes/kubernetes#2630 (comment)
[4]: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
f674bf9
Closes: #86, #90, #114,