Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commitment of clickhouse official image #15846

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Felixoid
Copy link
Contributor

@Felixoid Felixoid commented Dec 6, 2023

Hello, dear official library team. Here's our attempt to create an official image clickhouse in the scope of #14136 and ClickHouse/ClickHouse#31473

Both the docs and official images PRs are created simultaneously.

docker-library/docs#2397

Checklist for Review

NOTE: This checklist is intended for the use of the Official Images maintainers both to track the status of your PR and to help inform you and others of where we're at. As such, please leave the "checking" of items to the repository maintainers. If there is a point below for which you would like to provide additional information or note completion, please do so by commenting on the PR. Thanks! (and thanks for staying patient with us ❤️)

  • associated with or contacted upstream?
  • available under an OSI-approved license?
  • does it fit into one of the common categories? ("service", "language stack", "base distribution")
  • is it reasonably popular, or does it solve a particular use case well?
  • does a documentation PR exist? (should be reviewed and merged at roughly the same time so that we don't have an empty image page on the Hub for very long) Initial commitment of documentation for clickhouse official image docs#2397
  • official-images maintainer dockerization review for best practices and cache gotchas/improvements (ala the official review guidelines)?
  • 2+ official-images maintainer dockerization review?
  • existing official images have been considered as a base? (ie, if foobar needs Node.js, has FROM node:... instead of grabbing node via other means been considered?)
  • if FROM scratch, tarballs only exist in a single commit within the associated history?
  • passes current tests? any simple new tests that might be appropriate to add? (https://github.com/docker-library/official-images/tree/master/test)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

This PR is still in draft mode to adjust the automation in ClickHouse/ClickHouse#57203

The branch now is in the WIP mode, so it will be regenerated to the refs/heads/master before the merge

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

Interesting case, I am addressing it in our master, which will regenerate the LDF

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Unfortunately having the Dockerfile in the root of the upstream repository (while useful for local development) doesn't work well as the target for official-images. Since the Dockerfile uses COPY . ..., it essentially means the entire upstream source code is now part of the Docker build context, and thus becomes part of the reviewed artifacts (see diff-pr.sh for how we generate our diff; it is basically taking all docker build contexts from the starting library/foo file and diffing them to the new contexts from the PR).

This is why the "Diff Comment" CI job is failing 🙈

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Apologies, I misread the PR -- I think that might actually be failing because it's backfilling too many versions 😅

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 8, 2023

Hi Tianon, thanks for the review!

Indeed, the failure of Diff Comment is about the POST body.

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

No, the file for currently supported versions is the following:

# The file is generated by https://github.com/ClickHouse/ClickHouse/blob/e2f425011c589c5e6d8b1d4f67ccd7a8730158a4/tests/ci/official_docker.py

Maintainers: Misha f. Shiryaev <felixoid@clickhouse.com> (@Felixoid),
             Max Kainov <max.kainov@clickhouse.com> (@mkaynov),
             Alexander Sapin <alesapin@clickhouse.com> (@alesapin)
GitRepo: https://github.com/ClickHouse/ClickHouse.git
GitFetch: refs/heads/docker-library
GitCommit: 43208ec354e4873f59508c45b81204b9f7bf439e

Tags: alpine, 23-alpine, 23.11-alpine, 23.11.1-alpine, 23.11.1.2711-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.alpine

Tags: latest, focal, 23, 23-focal, 23.11, 23.11-focal, 23.11.1, 23.11.1-focal, 23.11.1.2711, 23.11.1.2711-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.ubuntu

Tags: 23.10-alpine, 23.10.5-alpine, 23.10.5.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.alpine

Tags: 23.10, 23.10-focal, 23.10.5, 23.10.5-focal, 23.10.5.20, 23.10.5.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.ubuntu

Tags: 23.9-alpine, 23.9.6-alpine, 23.9.6.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.alpine

Tags: 23.9, 23.9-focal, 23.9.6, 23.9.6-focal, 23.9.6.20, 23.9.6.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.ubuntu

Tags: alpine-lts, 23.8-alpine, 23.8.8-alpine, 23.8.8.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.alpine

Tags: lts, focal-lts, 23.8, 23.8-focal, 23.8.8, 23.8.8-focal, 23.8.8.20, 23.8.8.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.ubuntu

Tags: 23.3-alpine, 23.3.18-alpine, 23.3.18.15-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.alpine

Tags: 23.3, 23.3-focal, 23.3.18, 23.3.18-focal, 23.3.18.15, 23.3.18.15-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.ubuntu

I've generated all the versions that could be built at the moment, so following the next sentence from README:

When a new repository is proposed, it is common to include some older unsupported versions in the initial pull request with the agreement to remove them right after acceptance.

It's pretty easy to rebuild the file to any version, so I am very open to advice on what to use as the minimal one.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

Ah, let's start with just 23.11 (unless there's a significant difference in the Dockerization for older versions) and see if that gets the diff small enough. If we start there and the delta between that and other versions is small, then future diffs will be small too because they'll all diff against the merged version instead of all versions being considered brand new code.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 11, 2023

Ah, let's start with just 23.11

Yep, all versions have the same content now

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

I think it's related to the part COPY --from=glibc-donor? So, it looks like we won't be able to build alpine images. Looks fair to me, we can take them off the discussion

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

I feel you are referencing https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh#L132-L139. It's the bringing the clickhouse-server up for the initial setup of DB. And it has the fuse to check that the server is actually finished after it. The real server works as the only process in the container after exec command. It's similar to docker_temp_server_start, docker_process_init_files, docker_temp_server_stop

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

Thank you! It's very nice of you to have a transparent process 🙏

This comment has been minimized.

@Felixoid Felixoid marked this pull request as ready for review January 8, 2024 11:48
@Felixoid
Copy link
Contributor Author

Felixoid commented Jan 8, 2024

Happy New Year, dear colleagues!

Is there anything we could do from our site to boost the review, maybe?

@Felixoid
Copy link
Contributor Author

Felixoid commented Feb 9, 2024

Hello, dear @tianon @yosifkit. It's already two months since the latest update. Any feedback?

@Felixoid
Copy link
Contributor Author

Colleagues? It's already three months since the last review round. Is it even possible to add the image to the library?

@melvynator
Copy link

@tianon Is it possible to have an estimate of when this PR can be reviewed?

@whalelines
Copy link
Contributor

We cannot give an estimate, but we will review this PR as soon as we can. Reviewing new DOI submissions is quite time intensive and we have to balance that commitment with ensuring the normal flow of updates to existing DOI.

It can help to speed things up if you ensure your PR adheres to our contribution guidelines, https://github.com/docker-library/official-images?tab=readme-ov-file#contributing-to-the-standard-library .

We apologize for the delay and truly appreciate your patience through this process.

@Felixoid
Copy link
Contributor Author

Hi David,
Thanks. I reviewed the link in advance and tried my best to meet all requirements.

That's why we're waiting for feedback if there's something particular that remains necessary to improve.

@yosifkit
Copy link
Member

yosifkit commented May 2, 2024

Here is a first pass review. There might be other required changes, but I wanted to share these to lessen the wall of text.


&& if [ -n "${deb_location_url}" ]; then \
...
RUN if [ -n "${single_binary_location_url}" ]; then \
  • Since these would unused by the Docker Official Image, these essentially empty layers should probably live elsewhere so as to simplify the Dockerfile for DOI review and for users reviewing the image.

  • DEBIAN_FRONTEND is usually not necessary. In newer versions of Ubuntu and Debian, the problem is completely fixed. (ENV DEBIAN_FRONTEND noninteractive moby/moby#4032 (comment)) If there is a specific install that fails without it, it should only be applied for that install, like apt-get update; DEBIAN_FRONTEND=noninteractive apt-get install -y ....

apt-get upgrade -yq
  • We recommend against using blanket package upgrades (apt-get upgrade/apk upgrade/yum upgrade/yum update) for official-images. When package upgrades are applied in a dependent image, it duplicates content of the base image, making the image larger than necessary. It also only delays the inevitable "there are outdated packages". The Official Images build pipeline only rebuilds on a update to the Dockerfile or a base image update, so we make periodic base image updates to then fully rebuild all dependent images (e.g., the Debian and Ubuntu images are updated a least every 30 day).

    We strive to publish updated images at least monthly for Debian. We also rebuild earlier if there is a critical security need, e.g. docker-library/official-images#2171. Many Official Images are maintained by the community or their respective upstream projects, like Ubuntu, Alpine, and Oracle Linux, and are subject to their own maintenance schedule. These refreshed base images also means that any other image in the Official Images program that is FROM them will also be rebuilt (as described in the project README.md file).

    - https://github.com/docker-library/faq/tree/0ad5fd60288109c875a54a37f6581b2deaa836db#why-does-my-security-scanner-show-that-an-image-has-cves


ARG TARGETARCH
  • We do not recommend relying on TARGETARCH - if it is needed, then use dpkg -- print-architecture instead, but it looks like that is part of the "unused for Docker Official Images" block

  • gpg invocations should always include --batch (which effectively puts it into "API mode" instead of "UI mode").

  • gpg --recv-keys should use a full fingerprint (not merely short or long).

  • RUN chmod +x ... is going to duplicate the file added by COPY, so +x should be applied in Git instead.

Somewhat harmless, but good to improve:

@Felixoid
Copy link
Contributor Author

Felixoid commented May 3, 2024

Thank you @yosifkit, I'll improve it in our repository to address the points.

@Felixoid
Copy link
Contributor Author

Felixoid commented May 6, 2024

So, here is what I've come up with https://github.com/ClickHouse/ClickHouse/pull/63400/files


I'll try to isolate our specific layers so they won't be in the official Dockerfile. The TARGETARCH will also be filtered.


We use ubuntu:20.04. ARG DEBIAN_FRONTEND=noninteractive is build-specific, so it doesn't affect the final image. We follow the one-before-the-last comment in the issue you gave.

Since the image we use is affected, I'd like to leave it as is if there's no strict requirement to remove it.


apt-get upgrade -yq will go away with no regrets.


gpg parts will be fixed


COPY --chmod=0755 should be a valid replacement. Not sure since when it's available, but should work for both us and the official build system, right?


  • Nice catch of ca-certificates, I thought I cleaned everything already
  • Didn't know about the apt-get clean. But I am not sure I got your link. Is it happen automatically on every build step? Because we run apt update, the repositories indices are downloaded in the very same step. I think it can be a noop because of rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*.
  • It relates to our build system to easily switch on another repository. If the building process moves somewhere, we can switch it by a single argument instead of editing many places. I hope, it really doesn't harm.

@LaurentGoderre
Copy link
Member

Would it be possible to have an orphan branch that doesn't have all the rest of the code. Cloning even the docker branch is several hundred of meg and takes a while to download.

@Felixoid
Copy link
Contributor Author

Felixoid commented May 6, 2024

Hi Laurent. Do you mean for the review or regularly?

@LaurentGoderre
Copy link
Member

I mean regularly not to slow down the builds

@Felixoid
Copy link
Contributor Author

Felixoid commented May 6, 2024

Doesn't the build system use --depth=0? Despite the topic, it's a good practice to use it when the history is unnecessary.

With the argument the download is "only" 80M and 16 seconds on my poor German DSL

> git clone --depth=1 git@github.com:ClickHouse/ClickHouse.git ClickHouse
Cloning into 'ClickHouse'...
remote: Enumerating objects: 26569, done.
remote: Counting objects: 100% (26569/26569), done.
remote: Compressing objects: 100% (21855/21855), done.
remote: Total 26569 (delta 3428), reused 12207 (delta 2301), pack-reused 0
Receiving objects: 100% (26569/26569), 77.09 MiB | 12.43 MiB/s, done.
Resolving deltas: 100% (3428/3428), done.
Updating files: 100% (27655/27655), done.

I am not sure how safe will be an orphan branch in the repo. And having another repo for the docker-exclusive staff adds another layer of complexity to manage

@LaurentGoderre
Copy link
Member

ooh no., if there is a separate repo then we don't need the orphan branch

@Felixoid Felixoid requested a review from a team as a code owner May 6, 2024 15:09
@Felixoid
Copy link
Contributor Author

Felixoid commented May 6, 2024

Ok, I hope, the current state won't be merged. COPY --chmod didn't work, it requires build-kit to work

@Felixoid
Copy link
Contributor Author

Felixoid commented May 7, 2024

I've found how to patch bashbrew to fetch only necessary commits. It improves the build time by an order of magnitude for library/clickhouse docker-library/bashbrew#95

Does it make sense to temporarily use a patch from ClickHouse@2a6a967 to speed-up the builds until the bashbrew is fixed?

Copy link

github-actions bot commented May 7, 2024

Diff for 715f3b2:
diff --git a/_bashbrew-arches b/_bashbrew-arches
index 8b13789..e85a97f 100644
--- a/_bashbrew-arches
+++ b/_bashbrew-arches
@@ -1 +1,2 @@
-
+amd64
+arm64v8
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..1593678 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,9 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: Misha f. Shiryaev <felixoid@clickhouse.com> (@Felixoid), Max Kainov <max.kainov@clickhouse.com> (@mkaynov), Alexander Sapin <alesapin@clickhouse.com> (@alesapin)
+GitRepo: https://github.com/ClickHouse/ClickHouse.git
+GitFetch: refs/heads/docker-library
+GitCommit: cea7d758d3535e6026276ab24736fdd89baf5435
+
+Tags: latest, focal, 24, 24-focal, 24.4, 24.4-focal, 24.4.1, 24.4.1-focal, 24.4.1.2088, 24.4.1.2088-focal
+Architectures: amd64, arm64v8
+Directory: docker/official/server/24.4.1.2088
+File: Dockerfile.ubuntu
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..9c19829 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,10 @@
+clickhouse:24
+clickhouse:24-focal
+clickhouse:24.4
+clickhouse:24.4-focal
+clickhouse:24.4.1
+clickhouse:24.4.1-focal
+clickhouse:24.4.1.2088
+clickhouse:24.4.1.2088-focal
+clickhouse:focal
+clickhouse:latest
diff --git a/clickhouse_24.4.1.2088-focal/Dockerfile.ubuntu b/clickhouse_24.4.1.2088-focal/Dockerfile.ubuntu
new file mode 100644
index 0000000..16fccee
--- /dev/null
+++ b/clickhouse_24.4.1.2088-focal/Dockerfile.ubuntu
@@ -0,0 +1,88 @@
+FROM ubuntu:20.04
+
+# see https://github.com/moby/moby/issues/4032#issuecomment-192327844
+# It could be removed after we move on a version 23:04+
+ARG DEBIAN_FRONTEND=noninteractive
+
+# ARG for quick switch to a given ubuntu mirror
+ARG apt_archive="http://archive.ubuntu.com"
+
+# We shouldn't use `apt upgrade` to not change the upstream image. It's updated biweekly
+
+# user/group precreated explicitly with fixed uid/gid on purpose.
+# It is especially important for rootless containers: in that case entrypoint
+# can't do chown and owners of mounted volumes should be configured externally.
+# We do that in advance at the begining of Dockerfile before any packages will be
+# installed to prevent picking those uid / gid by some unrelated software.
+# The same uid / gid (101) is used both for alpine and ubuntu.
+RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list \
+    && groupadd -r clickhouse --gid=101 \
+    && useradd -r -g clickhouse --uid=101 --home-dir=/var/lib/clickhouse --shell=/bin/bash clickhouse \
+    && apt-get update \
+    && apt-get install --yes --no-install-recommends \
+        ca-certificates \
+        locales \
+        tzdata \
+        wget \
+    && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
+
+ARG REPO_CHANNEL="stable"
+ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
+ARG VERSION="24.4.1.2088"
+ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
+
+
+# A fallback to installation from ClickHouse repository
+RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then \
+        apt-get update \
+        && apt-get install --yes --no-install-recommends \
+            apt-transport-https \
+            dirmngr \
+            gnupg2 \
+        && mkdir -p /etc/apt/sources.list.d \
+        && GNUPGHOME=$(mktemp -d) \
+        && GNUPGHOME="$GNUPGHOME" gpg --batch --no-default-keyring \
+            --keyring /usr/share/keyrings/clickhouse-keyring.gpg \
+            --keyserver hkp://keyserver.ubuntu.com:80 \
+            --recv-keys 3a9ea1193a97b548be1457d48919f6bd2b48d754 \
+        && rm -rf "$GNUPGHOME" \
+        && chmod +r /usr/share/keyrings/clickhouse-keyring.gpg \
+        && echo "${REPOSITORY}" > /etc/apt/sources.list.d/clickhouse.list \
+        && echo "installing from repository: ${REPOSITORY}" \
+        && apt-get update \
+        && for package in ${PACKAGES}; do \
+            packages="${packages} ${package}=${VERSION}" \
+        ; done \
+        && apt-get install --allow-unauthenticated --yes --no-install-recommends ${packages} || exit 1 \
+        && rm -rf \
+            /var/lib/apt/lists/* \
+            /var/cache/debconf \
+            /tmp/* \
+        && apt-get autoremove --purge -yq libksba8 \
+        && apt-get autoremove -yq \
+    ; fi
+
+# post install
+# we need to allow "others" access to clickhouse folder, because docker container
+# can be started with arbitrary uid (openshift usecase)
+RUN clickhouse-local -q 'SELECT * FROM system.build_options' \
+    && mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client \
+    && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client
+
+RUN locale-gen en_US.UTF-8
+ENV LANG en_US.UTF-8
+ENV LANGUAGE en_US:en
+ENV LC_ALL en_US.UTF-8
+ENV TZ UTC
+
+RUN mkdir /docker-entrypoint-initdb.d
+
+COPY docker_related_config.xml /etc/clickhouse-server/config.d/
+COPY entrypoint.sh /entrypoint.sh
+
+EXPOSE 9000 8123 9009
+VOLUME /var/lib/clickhouse
+
+ENV CLICKHOUSE_CONFIG /etc/clickhouse-server/config.xml
+
+ENTRYPOINT ["/entrypoint.sh"]
diff --git a/clickhouse_24.4.1.2088-focal/docker_related_config.xml b/clickhouse_24.4.1.2088-focal/docker_related_config.xml
new file mode 100644
index 0000000..3025dc2
--- /dev/null
+++ b/clickhouse_24.4.1.2088-focal/docker_related_config.xml
@@ -0,0 +1,12 @@
+<clickhouse>
+     <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
+    <listen_host>::</listen_host>
+    <listen_host>0.0.0.0</listen_host>
+    <listen_try>1</listen_try>
+
+    <!--
+    <logger>
+        <console>1</console>
+    </logger>
+    -->
+</clickhouse>
diff --git a/clickhouse_24.4.1.2088-focal/entrypoint.sh b/clickhouse_24.4.1.2088-focal/entrypoint.sh
new file mode 100755
index 0000000..79e809e
--- /dev/null
+++ b/clickhouse_24.4.1.2088-focal/entrypoint.sh
@@ -0,0 +1,221 @@
+#!/bin/bash
+
+set -eo pipefail
+shopt -s nullglob
+
+DO_CHOWN=1
+if [ "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]; then
+    DO_CHOWN=0
+fi
+
+CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
+CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
+
+# support --user
+if [ "$(id -u)" = "0" ]; then
+    USER=$CLICKHOUSE_UID
+    GROUP=$CLICKHOUSE_GID
+else
+    USER="$(id -u)"
+    GROUP="$(id -g)"
+    DO_CHOWN=0
+fi
+
+# set some vars
+CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"
+
+# get CH directories locations
+DATA_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=path || true)"
+TMP_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=tmp_path || true)"
+USER_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=user_files_path || true)"
+LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.log || true)"
+LOG_DIR=""
+if [ -n "$LOG_PATH" ]; then LOG_DIR="$(dirname "$LOG_PATH")"; fi
+ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.errorlog || true)"
+ERROR_LOG_DIR=""
+if [ -n "$ERROR_LOG_PATH" ]; then ERROR_LOG_DIR="$(dirname "$ERROR_LOG_PATH")"; fi
+FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=format_schema_path || true)"
+
+# There could be many disks declared in config
+readarray -t DISKS_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.path' || true)
+readarray -t DISKS_METADATA_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.metadata_path' || true)
+
+CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
+CLICKHOUSE_PASSWORD_FILE="${CLICKHOUSE_PASSWORD_FILE:-}"
+if [[ -n "${CLICKHOUSE_PASSWORD_FILE}" && -f "${CLICKHOUSE_PASSWORD_FILE}" ]]; then
+  CLICKHOUSE_PASSWORD="$(cat "${CLICKHOUSE_PASSWORD_FILE}")"
+fi
+CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
+CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
+CLICKHOUSE_ACCESS_MANAGEMENT="${CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT:-0}"
+
+function create_directory_and_do_chown() {
+    local dir=$1
+    # check if variable not empty
+    [ -z "$dir" ] && return
+    # ensure directories exist
+    if [ "$DO_CHOWN" = "1" ]; then
+        mkdir="mkdir"
+    else
+        # if DO_CHOWN=0 it means that the system does not map root user to "admin" permissions
+        # it mainly happens on NFS mounts where root==nobody for security reasons
+        # thus mkdir MUST run with user id/gid and not from nobody that has zero permissions
+        mkdir="/usr/bin/clickhouse su "${USER}:${GROUP}" mkdir"
+    fi
+    if ! $mkdir -p "$dir"; then
+        echo "Couldn't create necessary directory: $dir"
+        exit 1
+    fi
+
+    if [ "$DO_CHOWN" = "1" ]; then
+        # ensure proper directories permissions
+        # but skip it for if directory already has proper premissions, cause recursive chown may be slow
+        if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then
+            chown -R "$USER:$GROUP" "$dir"
+        fi
+    fi
+}
+
+create_directory_and_do_chown "$DATA_DIR"
+
+# Change working directory to $DATA_DIR in case there're paths relative to $DATA_DIR, also avoids running
+# clickhouse-server at root directory.
+cd "$DATA_DIR"
+
+for dir in "$ERROR_LOG_DIR" \
+  "$LOG_DIR" \
+  "$TMP_DIR" \
+  "$USER_PATH" \
+  "$FORMAT_SCHEMA_PATH" \
+  "${DISKS_PATHS[@]}" \
+  "${DISKS_METADATA_PATHS[@]}"
+do
+    create_directory_and_do_chown "$dir"
+done
+
+# if clickhouse user is defined - create it (user "default" already exists out of box)
+if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ] || [ "$CLICKHOUSE_ACCESS_MANAGEMENT" != "0" ]; then
+    echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
+    cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
+    <clickhouse>
+      <!-- Docs: <https://clickhouse.com/docs/en/operations/settings/settings_users/> -->
+      <users>
+        <!-- Remove default user -->
+        <default remove="remove">
+        </default>
+
+        <${CLICKHOUSE_USER}>
+          <profile>default</profile>
+          <networks>
+            <ip>::/0</ip>
+          </networks>
+          <password>${CLICKHOUSE_PASSWORD}</password>
+          <quota>default</quota>
+          <access_management>${CLICKHOUSE_ACCESS_MANAGEMENT}</access_management>
+        </${CLICKHOUSE_USER}>
+      </users>
+    </clickhouse>
+EOT
+fi
+
+CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS="${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS:-}"
+
+# checking $DATA_DIR for initialization
+if [ -d "${DATA_DIR%/}/data" ]; then
+    DATABASE_ALREADY_EXISTS='true'
+fi
+
+# run initialization if flag CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS is not empty or data directory is empty
+if [[ -n "${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS}" || -z "${DATABASE_ALREADY_EXISTS}" ]]; then
+  RUN_INITDB_SCRIPTS='true'
+fi
+
+if [ -n "${RUN_INITDB_SCRIPTS}" ]; then
+    if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
+        # port is needed to check if clickhouse-server is ready for connections
+        HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port --try)"
+        HTTPS_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=https_port --try)"
+
+        if [ -n "$HTTP_PORT" ]; then
+            URL="http://127.0.0.1:$HTTP_PORT/ping"
+        else
+            URL="https://127.0.0.1:$HTTPS_PORT/ping"
+        fi
+
+        # Listen only on localhost until the initialization is done
+        /usr/bin/clickhouse su "${USER}:${GROUP}" /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
+        pid="$!"
+
+        # check if clickhouse is ready to accept connections
+        # will try to send ping clickhouse via http_port (max 1000 retries by default, with 1 sec timeout and 1 sec delay between retries)
+        tries=${CLICKHOUSE_INIT_TIMEOUT:-1000}
+        while ! wget --spider --no-check-certificate -T 1 -q "$URL" 2>/dev/null; do
+            if [ "$tries" -le "0" ]; then
+                echo >&2 'ClickHouse init process failed.'
+                exit 1
+            fi
+            tries=$(( tries-1 ))
+            sleep 1
+        done
+
+        clickhouseclient=( clickhouse-client --multiquery --host "127.0.0.1" -u "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" )
+
+        echo
+
+        # create default database, if defined
+        if [ -n "$CLICKHOUSE_DB" ]; then
+            echo "$0: create database '$CLICKHOUSE_DB'"
+            "${clickhouseclient[@]}" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
+        fi
+
+        for f in /docker-entrypoint-initdb.d/*; do
+            case "$f" in
+                *.sh)
+                    if [ -x "$f" ]; then
+                        echo "$0: running $f"
+                        "$f"
+                    else
+                        echo "$0: sourcing $f"
+                        # shellcheck source=/dev/null
+                        . "$f"
+                    fi
+                    ;;
+                *.sql)    echo "$0: running $f"; "${clickhouseclient[@]}" < "$f" ; echo ;;
+                *.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
+                *)        echo "$0: ignoring $f" ;;
+            esac
+            echo
+        done
+
+        if ! kill -s TERM "$pid" || ! wait "$pid"; then
+            echo >&2 'Finishing of ClickHouse init process failed.'
+            exit 1
+        fi
+    fi
+else
+    echo "ClickHouse Database directory appears to contain a database; Skipping initialization"
+fi
+
+# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
+if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
+    # Watchdog is launched by default, but does not send SIGINT to the main process,
+    # so the container can't be finished by ctrl+c
+    CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
+    export CLICKHOUSE_WATCHDOG_ENABLE
+
+    # An option for easy restarting and replacing clickhouse-server in a container, especially in Kubernetes.
+    # For example, you can replace the clickhouse-server binary to another and restart it while keeping the container running.
+    if [[ "${CLICKHOUSE_DOCKER_RESTART_ON_EXIT:-0}" -eq "1" ]]; then
+        while true; do
+            # This runs the server as a child process of the shell script:
+            /usr/bin/clickhouse su "${USER}:${GROUP}" /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@" ||:
+            echo >&2 'ClickHouse Server exited, and the environment variable CLICKHOUSE_DOCKER_RESTART_ON_EXIT is set to 1. Restarting the server.'
+        done
+    else
+        # This replaces the shell script with the server:
+        exec /usr/bin/clickhouse su "${USER}:${GROUP}" /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
+    fi
+fi
+
+# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
+exec "$@"

@Felixoid
Copy link
Contributor Author

Kindly hope for the next review round

@tbragin
Copy link

tbragin commented Jun 17, 2024

Docker Team - please let us know if there are any updates, and thank you in advance!

@whalelines
Copy link
Contributor

Performing another review on the current changes is still in our queue. Unfortunately, our queue is quite long at present and reviews of new DOI are quite time consuming. We will get to this as soon as we possibly can. Thanks for being patient!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants