Skip to content
This repository was archived by the owner on Apr 12, 2022. It is now read-only.

Use multi-stage build and replace CMD with ENTRYPOINT #125

Merged
merged 18 commits into from
Oct 12, 2017

Conversation

dliappis
Copy link
Contributor

@dliappis dliappis commented Oct 10, 2017

This PR moves away from running the es-docker wrapper script via CMD to a new ENTRYPOINT script.

The ENTRYPOINT runs as root, on vanilla docker and most orchestration frameworks, and switches to the pre-provisioned elasticsearch user with uid=gid=1000 before exec'ing elasticsearch.

It's still possible to run another CMD, through the ENTRYPOINT, such
as /bin/bash to inspect things or directly elasticsearch -Ex.y=z[4].

As there is at least one docker orchestration framework, Openshift Open[1], that overrides the Dockerfile USER to something with random uid and gid=0, the entrypoint script conditionally skips the privilege de-escalation in this case. To achieve this, the elasticsearch.tar.gz needs to be extracted so that all files+dirs are gid=0 and have the same group permissions as the owner. This is according to RedHat's best practices described in [1].

We avoid generating another large layer due to chown -R and chgrp -R, by utilizing multi-stage builds and the new COPY --chown= feature[2], available starting with docker-ce >=17.09. This is only a build requirement, that doesn't affect the portability of the image. As a side effect we are also able to significantly clean up the Dockerfile jinja2 template.

We also introduce the optional TAKE_FILE_OWNERSHIP flag which will. in most orchestration environments, forcefully change the uid and gid of the data and logdir. This can help users bind-mounting host directories that don't provide write access either to the default user 1000 or to gid 0. This flag is not expected to receive much use in orchestration frameworks as k8s has the securityContext option[3], elsewhere volumes are typically gid=0 and g=rwx and docker+docker swarm have named volumes and storage drivers.

The PR also includes three new tests:

  1. ensure all extracted files under /usr/share/elasticsearch have the right gid.
  2. tests that elasticsearch works ok when using a bind-mounted host directory for data.
  3. tests that elasticsearch works ok also when a bind-mounted host dir has a random uid (but gid=0).

There are a number of more tests that we can add here (e.g. for the MUTATE_MOUNTS flag, testing that elasticsearch can start with --user forced to a different value), but I wanted to keep the size of this PR under control for now and add tests, if needed, in an additional PR.

[1]: https://docs.openshift.org/latest/creating_images/guidelines.html
[2]: moby/moby#34263
[3]: kubernetes/kubernetes#2630 (comment)
[4]: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
f674bf9

Closes: #86, #90, #114,

We need to deal with two, rather conflicting requirements #90 and

For k8s and other orchestration systems, there is no direct equivalent
to docker native volumes[1] but instead volumes are owned by root by
default. The GID at least can be influenced with `securityContext`[2],
but for lots of people it's easier to have an easier mechanism. One
solution is described in [3].

However the existing setup doesn't work if the container is executed
with a different `--user` or in Openshift where the `USER` in the
`Dockerfile` is overriden and the ENTRYPOINT/CMD is executed by a
random UID>=1024, member of 0 group.

This commit switches away from the esdocker CMD to an ENTRYPOINT.
The ENTRYPOINT (`/usr/local/bin/docker-entrypoint.sh`) combines Docker
+ Kubernetes + Openshift best practices.

The group ownership of `/usr/share/elasticsearch` is now 0, so in the
case of Openshift, or forcing `--user <other_uid>`, elasticsearch can
still run and write to bind mounted directories. The root group
doesn't grant any additional special permissions, so should somehow the
elasticsearch process in the container be compromised, it can't access/mutate
critical files from the host os, even if `/` is mounted in the container.

Optionally, a `MUTATE_MOUNTS` env var can be explicitly used to
override the UID:GID of the `data` and `logs` dirs as well, when the
ENTRYPOINT runs as the default root user (i.e. not in Openshift).

It's still possible to run another CMD, through the ENTRYPOINT, such
as `/bin/bash` to inspect things or directly `elasticsearch -Ex.y=z`[4].

Finally, CTRL-C is now handled more appropriately, when using the
image interactively.

[1]: https://success.docker.com/KBase/Different_Types_of_Volumes
[2]: kubernetes/kubernetes#2630 (comment)
[3]: #114 (comment)
[4]: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
Ensuring all files under /usr/share/elasticsearch have GID=0 **and**
the same group permissions as those of the user is tricky. `-R`
options of `chmod`/`chgrp` increase the image size with a large layer
and introduce slowness in the build process. `tar` doesn't give us any
option to force `gid` while extracting and of course we'd still need to
match the group permissions.

One solution is to prepare the needed files using multi-stage
builds[1] in a separate stage. Unfortunately `COPY` will always use
0:0[2] overriding `USER`, so even if prepare the files at another
stage, we'll still need to mangle ownership+perms or COPY and then
extract an archive, which again increases image size.

docker-ce 17.09 introduced the `--chown`[3] parameter, which solves
this problem.

Move code to extract elasticsearch, adjust ownership + group
permissions, install x-pack and plugins to a stage
`prep_es_files`. This allows have a very clean stage to build the
final elasticsearch image, with a clean history and reasonable size.

IMPORTANT: With this commit, the minimum docker-ce version on the building
machine needs to be `17.09`.

[1]
https://docs.docker.com/engine/userguide/eng-image/multistage-build/
[2] moby/moby#7537 (comment)
[3] moby/moby#34263
As required by Openshift Origin best practices[1] all files+directories
under our application path should have group set to 0.

[1] https://docs.openshift.com/enterprise/3.0/creating_images/guidelines.html
In order to support tests for bind-mounts, we need to make the, up to
now hardcoded, `volume: ` definitions in docker-compose configurable.

Allow overriding the default named volumes via the
`--mount-datavolume1` and `--mount-datavolume2` pytest parameters.
Now that we can configure the data volume definitions in the
docker-compose file through env vars and testinfra[1], test that
elasticsearch can start and write to the data dir.

[1] 9bdbf3a
For test_datadirs.py we also need to ensure that bind-mounted host
dirs are empty before starting tests.

Clean up local datadir1,2 directories on the host, where tests get
executed, using privilege escalation through `docker run`.
Adjust the default umask in the entrypoint so that new files are group
writable too. This is particularly important for Openshift, where a
container with a random uid may start writing to a bind-mount, then
get restarted with a new user with a different uid.
Add helper functions in toolbelt.py to abstract the tasks of creating,
deleting and chown'ing directories via bind-mounts and docker run.

Generalize fixture asserting that the bind-mounted data dir is
writable and add test to try elasticsearch access to a bind-mounted
data dir with a different uid + gid=0.

Do not store empty directories in git, tests can now created/delete
test directories as required.
@dliappis dliappis requested a review from a user October 10, 2017 14:22
@dliappis
Copy link
Contributor Author

@drewr FYI as discussed

cc @xeraa

@agc93 This PR should resolve the issue #114 you reported using Openshift Open.

@dliappis
Copy link
Contributor Author

Also FYI @tylerjl

@agc93
Copy link

agc93 commented Oct 10, 2017

@dliappis That's awesome, thanks!

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bold new era!

# https://docs.openshift.org/latest/creating_images/guidelines.html
if ! whoami &> /dev/null; then
if [ -w /etc/passwd ]; then
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:/usr/share/elasticsearch:/sbin/nologin" >> /etc/passwd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this user should be called default. The default user is elasticsearch. This user is the opposite of default. It's an unusual, unexpected and custom user. Let's call it custom_user or something.

From a different angle, did you consider changing the UID of the existing elasticsearch user instead? Is that a bad idea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a different angle, did you consider changing the UID of the existing elasticsearch user instead? Is that a bad idea?

That's a good suggestion. One of my concerns was for people who want to inspect a running container and a new user (with something like --user <random> -ti <container_id> /bin/bash) .

But this is very much a corner case I think and testing it seems to work fine. Therefore I suggest an even bolder move: remove this entire section. I introduced it as suggested in the Openshift docs for fear of applications that want to lookup the associated user from /etc/passwd. I tried building things again now without it and every combination I tried doesn't give problems, including entering the container with an interactive shell. I also tried some of the binaries under bin/ and bin/x-pack and didn't see issues.

Adding one test case that attempts to run elasticsearch with --user=<random will cover us somewhat, but naturally we can't write test cases to execute every binary in this scenario.

I vote for removing it altogether, WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to that. Less code is less complexity, and I'm always in favour of that.

Also, I think having an anonymous user that's identified only by UID makes it very clear that is a local, ephemeral user.

# or simply to run /bin/bash to check the image
if [[ "$1" != "eswrapper" ]]; then
if [[ "$(id -u)" == "0" ]] && [[ "$1" == *elasticsearch* ]]; then
exec chroot --userspec=1000 / "$@"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a comment about this very unusual use of chroot? (It's very clever but very weird).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9e3d43d


WORKDIR /usr/share/elasticsearch

COPY --from=prep_es_files --chown=1000:0 /usr/share/elasticsearch /usr/share/elasticsearch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!!!!!!!

exec bin/elasticsearch "${es_opts[@]}"
if [[ "$(id -u)" == "0" ]]; then
# If requested and running as root, mutate the ownership of bind-mounts
if [[ -n "$MUTATE_MOUNTS" ]]; then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$TAKE_FILE_OWNERSHIP? It's not obvious what kind of "mutation" is going to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 74ca02a

To simulate edge cases like Openshift Open, simulate running
Elasticsearch with a random high uid while bound-mounting a host
directory that is owned by a different uid having gid=0.
This code block was added to prevent errors that there's no matching
user entry in /etc/passwd when the container runs with a random
`--user`. However all current tests show nothing (including inspecting
the image with `-ti <container-name> /bin/bash`) requires a matching
entry in `/etc/passwd` for the current uid, so less code is better.
The chroot that comes with the coreutils version of the centos:7 base
image doesn't have the --skip-chdir parameter, thus changes the CWD to
/. Using the full path `/usr/share/elasticsearch/bin/elasticsearch` is
perfectly fine, but if a user wants to override the CMD to execute
elasticsearch directly as documented currently as configuration option
D[1], `elasticsearch -E x.y=z` would work, but not `bin/elasticsearch -E
x.y=z`.

Overwrite $1 argument of a custom CMD to `elasticsearch` when using
configuration option D. so that we retain backwards compatibility with
the docs.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/5.6/docker.html#_d_override_the_image_8217_s_default_ulink_url_https_docs_docker_com_engine_reference_run_cmd_default_command_or_options_cmd_ulink
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants