Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Initial version of devel images with BlazingSQL added #118

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This repository contains the source files for [rapidsai Docker images](https://h

## Image Types

There are currently three different types of Docker images, which follow the same conventions provided by the [NVIDIA CUDA Docker images](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA), and allow users to use the RAPIDS images a drop-in replacements for their CUDA images. Each type is supported on a combination of OS, Python version, and CUDA version which produces a variety of available image types. The different types are described below:
There are currently three different types of Docker images, which follow the same conventions provided by the [NVIDIA CUDA Docker images](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA), and allow users to use the RAPIDS images as a drop-in replacements for their CUDA images. Each type is supported on a combination of OS, Python version, and CUDA version which produces a variety of available image types. The different types are described below:

Type | Description | Target Audience
---|---|---
Expand All @@ -37,7 +37,7 @@ Like any Docker image, the RAPIDS images can be extended to suit the needs of in

### Custom Token Example

For example, the `runtime` and `devel` images use an empty token for securing the Jupyter notebook server. While this is a fast easy solution for dev and exploratory environments, those in production environments may need more security.
For example, the `runtime` and `devel` images use an empty token for securing the Jupyter notebook server. While this is a fast easy solution for dev and exploratory environments, those in production environments may need more security.

Using the following short `Dockerfile` users can leverage the existing RAPIDS images and build a custom secure image:

Expand All @@ -46,7 +46,7 @@ FROM rapidsai/rapidsai-nightly:cuda10.2-runtime-ubuntu18.04-py3.7
RUN sed -i "s/NotebookApp.token=''/NotebookApp.token='secure-token-here'/g" /rapids/utils/start_jupyter.sh
```

Once built, the resulting image will be secured with the new token.
Once built, the resulting image will be secured with the new token.

This example can be repurposed by replacing the `sed` command with other commands for custom libraries or settings.

Expand Down
Binary file removed context/libm.so.6
Binary file not shown.
37 changes: 25 additions & 12 deletions context/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ RAPIDS_DIR=/rapids
NBTEST=${RAPIDS_DIR}/utils/nbtest.sh
LIBCUDF_KERNEL_CACHE_PATH=${WORKSPACE}/.jitcache
NOTEBOOKS_DIR=${RAPIDS_DIR}/notebooks
BLAZING_NOTEBOOKS_DIR=/blazing/Welcome_to_BlazingSQL_Notebooks

# Add notebooks that should be skipped here
# (space-separated list of filenames without paths)
Expand All @@ -14,28 +15,40 @@ env

EXITCODE=0

# Always run nbtest in NOTEBOOKS_DIR, set EXITCODE to failure if any run fails
cd ${NOTEBOOKS_DIR}
# Find all notebooks to run in both the RAPIDS notebook repo and the Blazing
# "welcome" NB repo.
NOTEBOOKS="$(find ${NOTEBOOKS_DIR}/repos/*/notebooks/* -name *.ipynb) \
${BLAZING_NOTEBOOKS_DIR}/welcome.ipynb"

# Special case: cugraph notebooks need specific datasets downloaded (this script
# only downloads them if they do not exist, so it's safe to run multiple times)
(cd cugraph; ./cugraph_benchmarks/dataPrep.sh)
(cd ${NOTEBOOKS_DIR}/cugraph; ./cugraph_benchmarks/dataPrep.sh)

# Every repo is submoduled into "repos/<repo>" and notebooks have been stored
# into a "notebooks" dir, this loop finds all notebooks specifically added to CI
for nb in $(find repos/*/notebooks/* -name *.ipynb); do
for nb in ${NOTEBOOKS}; do
nbBasename=$(basename ${nb})
# Output of find command looks like this: ./repos/<repo>/notebooks/<notebook> -name
# This grabs the <repo> element, skip CLX notebooks as they are not part of the runtime images yet
nbRepo=$(echo ${nb} | awk -F/ '{print $2}')

# Notebook paths can look like the following:
# "/rapids/notebooks/repos/<repo>/notebooks/<notebook>" or
# "/blazing/Welcome_to_BlazingSQL_Notebooks/welcome.ipynb"
# The repo name is extracted by pulling out a specific field in the path.
# For Blazing, simply use the blazing "root" dir.
if [[ ${nb:0:8} == "/blazing" ]]; then
nbRepo="blazing"
else
nbRepo=$(echo ${nb} | awk -F/ '{print $5}')
fi

# Output the name of the repo. This is needed for the nbtestlog2junitxml
# script, as well as to improve output readability.
echo "========================================"
echo "REPO: ${nbRepo}"
echo "========================================"

# Skip all NBs that use dask (in the code or even in their name)
if ((echo ${nb}|grep -qi dask) || \
(grep -q dask ${nb})); then
# Skip all NBs that use dask (in the code or even in their name).
# Blazing has a comment that mentions dask, so allow blazing to run
if [[ ${nbRepo} != "blazing" ]] \
&& ((echo ${nb}|grep -qi dask) \
|| (grep -q dask ${nb})); then
echo "--------------------------------------------------------------------------------"
echo "SKIPPING: ${nb} (suspected Dask usage, not currently automatable)"
echo "--------------------------------------------------------------------------------"
Expand Down
22 changes: 14 additions & 8 deletions generate_dockerfiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,14 @@ def load_settings():


def initialize_output_dir():
"""Creates or empties the OUTPUT_DIR directory"""
"""Creates the OUTPUT_DIR directory"""
if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)
return
filelist = [f for f in os.listdir(OUTPUT_DIR) if f.endswith(".Dockerfile")]
for dockerfile in filelist:
os.remove(os.path.join(OUTPUT_DIR, dockerfile))
return


def main():
def main(verbose=False):
"""Generates Dockerfiles using Jinja2"""
initialize_output_dir()
settings = load_settings()
Expand All @@ -48,10 +45,19 @@ def main():
output = template.render(
os=docker_os, image_type=image_type, now=datetime.utcnow(), **settings,
)
with open(f"{OUTPUT_DIR}/{dockerfile_name}", "w") as dockerfile:
dockerfile.write(output)
output_dockerfile_path = f"{OUTPUT_DIR}/{dockerfile_name}"
if not(os.path.exists(output_dockerfile_path)) \
or (open(output_dockerfile_path).read() != output):

with open(output_dockerfile_path, "w") as dockerfile:
dockerfile.write(output)
if verbose:
print(f"Updated: {output_dockerfile_path}")

print(f"Dockerfiles successfully written to the '{OUTPUT_DIR}' directory.")


if __name__ == "__main__":
main()
# FIXME: use argparse
import sys
main(verbose=("-v" in sys.argv))
32 changes: 29 additions & 3 deletions generated-dockerfiles/centos7-devel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ RUN source activate rapids \
&& conda info \
&& conda config --show-sources \
&& conda list --show-channel-urls
RUN gpuci_retry conda install -y -n rapids \
RUN gpuci_conda_retry install -y -n rapids \
rapids-build-env=${RAPIDS_VER} \
rapids-doc-env=${RAPIDS_VER} \
libcumlprims=${RAPIDS_VER} \
Expand Down Expand Up @@ -88,8 +88,6 @@ EXPOSE 8787
EXPOSE 8786

COPY .start_jupyter_run_in_rapids.sh /.run_in_rapids
COPY libm.so.6 ${GCC7_DIR}/lib64

RUN cd ${RAPIDS_DIR} \
&& source activate rapids \
&& git clone -b branch-0.15 --depth 1 --single-branch https://github.com/rapidsai/cudf.git \
Expand Down Expand Up @@ -208,6 +206,34 @@ RUN cd ${RAPIDS_DIR}/dask-cuda && \

ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH_PREBUILD}

ENV BLAZING_DIR=/blazing

RUN gpuci_conda_retry install -y -n rapids \
google-cloud-cpp \
ninja \
gtest \
gmock \
cppzmq \
openjdk=8.0 \
maven \
thrift=0.13.0 \
jpype1 \
netifaces \
pyhive

ENV CUDF_HOME=/rapids/cudf

RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/blazingsql.git

RUN source activate rapids \
&& ccache -s \
&& cd ${BLAZING_DIR}/blazingsql \
&& ./build.sh
RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks.git
RUN ccache -s \
&& ccache -c \
&& chmod -R ugo+w /ccache \
Expand Down
31 changes: 29 additions & 2 deletions generated-dockerfiles/ubuntu18.04-devel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ RUN source activate rapids \
&& conda info \
&& conda config --show-sources \
&& conda list --show-channel-urls
RUN gpuci_retry conda install -y -n rapids \
RUN gpuci_conda_retry install -y -n rapids \
rapids-build-env=${RAPIDS_VER} \
rapids-doc-env=${RAPIDS_VER} \
libcumlprims=${RAPIDS_VER} \
Expand Down Expand Up @@ -91,7 +91,6 @@ EXPOSE 8787
EXPOSE 8786

COPY .start_jupyter_run_in_rapids.sh /.run_in_rapids

RUN cd ${RAPIDS_DIR} \
&& source activate rapids \
&& git clone -b branch-0.15 --depth 1 --single-branch https://github.com/rapidsai/cudf.git \
Expand Down Expand Up @@ -205,6 +204,34 @@ RUN cd ${RAPIDS_DIR}/dask-cuda && \



ENV BLAZING_DIR=/blazing

RUN gpuci_conda_retry install -y -n rapids \
google-cloud-cpp \
ninja \
gtest \
gmock \
cppzmq \
openjdk=8.0 \
maven \
thrift=0.13.0 \
jpype1 \
netifaces \
pyhive

ENV CUDF_HOME=/rapids/cudf

RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/blazingsql.git

RUN source activate rapids \
&& ccache -s \
&& cd ${BLAZING_DIR}/blazingsql \
&& ./build.sh
RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks.git
RUN ccache -s \
&& ccache -c \
&& chmod -R ugo+w /ccache \
Expand Down
12 changes: 6 additions & 6 deletions templates/Devel.dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ between packages. #}
{% include 'partials/env_debug.dockerfile.j2' %}

{# Install rapids-build-env and rapids-doc-env from conda meta-pkg #}
RUN gpuci_retry conda install -y -n rapids \
RUN gpuci_conda_retry install -y -n rapids \
rapids-build-env=${RAPIDS_VER} \
rapids-doc-env=${RAPIDS_VER} \
libcumlprims=${RAPIDS_VER} \
Expand All @@ -73,11 +73,6 @@ RUN gpuci_retry conda install -y -n rapids \

{% include 'partials/install_notebooks.dockerfile.j2' %}

{% if "centos" in os %}
{# Add compatible libm #}
COPY libm.so.6 ${GCC7_DIR}/lib64
{% endif %}

{# Clone RAPIDS libraries #}
RUN cd ${RAPIDS_DIR} \
&& source activate rapids \
Expand Down Expand Up @@ -107,6 +102,11 @@ numba.cuda cannot load that, and instead have it load
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH_PREBUILD}
{% endif %}

{# Additions for BlazingSQL #}
ENV BLAZING_DIR=/blazing
{% include 'partials/clone_and_build_blazing.j2' %}
{% include 'partials/clone_blazing_notebooks.j2' %}

{# Report ccache stats and cleanup; also fix #}
RUN ccache -s \
&& ccache -c \
Expand Down
27 changes: 27 additions & 0 deletions templates/partials/clone_and_build_blazing.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{# This partial clones the BlazingSQL repo and build and installs it into the rapids conda env. #}

{# Install build prerequisites #}
RUN gpuci_conda_retry install -y -n rapids \
google-cloud-cpp \
ninja \
gtest \
gmock \
cppzmq \
openjdk=8.0 \
maven \
thrift=0.13.0 \
jpype1 \
netifaces \
pyhive

ENV CUDF_HOME=/rapids/cudf

{# Clone, build, install. Note: This uses the current default branch instead of main. #}
RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/blazingsql.git

RUN source activate rapids \
&& ccache -s \
&& cd ${BLAZING_DIR}/blazingsql \
&& ./build.sh
6 changes: 6 additions & 0 deletions templates/partials/clone_blazing_notebooks.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{# This partial clones the "Welcome to BlazingSQL" repo containing example notebooks. #}

{# Clone, build, install #}
RUN mkdir -p ${BLAZING_DIR} \
&& cd ${BLAZING_DIR} \
&& git clone https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks.git