Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure HPCX dependencies found in container #5922

Merged
merged 6 commits into from
Jun 11, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions build.py
Original file line number Diff line number Diff line change
Expand Up @@ -1102,6 +1102,16 @@ def dockerfile_prepare_container_linux(argmap, backends, enable_gpu,
if 'onnxruntime' in backends:
df += '''
ENV LD_LIBRARY_PATH /opt/tritonserver/backends/onnxruntime:${LD_LIBRARY_PATH}
'''

# Necessary for libtorch.so to find correct HPCX libraries
if ('pytorch' in backends):
if not enable_gpu:
df += '''
ENV LD_LIBRARY_PATH /opt/hpcx/ucc/lib/:${LD_LIBRARY_PATH}
'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this part? Otherwise looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot, or the CPU-only image will not be able to find libucc.so.1 (in the /opt/hpcx/ucc/lib folder). With the other version of these changes, it wasn't necessary since the library was added to the backend directory, which was on the RPATH. Since this is now copying over to the /opt/hpcx/ folder, the CPU-only build needs to know where to find library or it will have the previous failure to load.

If easier to read, I can combine this with the other LD_LIBRARY_PATH. I broke it up like this though, because I didn't want to modify the LD_LIBRARY_PATH in GPU images since we don't have to.

Copy link
Member

@Tabrizian Tabrizian Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only include ucc but not ucx in cpu-only build. Is that a typo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I'm a bit confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We include both. There's no else statement. For the CPU-only build, we need to both the ucc and ucx on the path. For the GPU-only build, we only need the ucx path (since the ucc is found automatically).

df += '''
ENV LD_LIBRARY_PATH /opt/hpcx/ucx/lib/:${LD_LIBRARY_PATH}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether opencv is ok with this order. I think this means that opencv would pick up the newer hpcx version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. It looks like the tests that failed in the pipeline were failing before. I'm unable to find an executable to check for opencv_dev to make sure it's linking against the right, but the apt repo isn't part of this variable anyway so perhaps it's set as part of the apt install.

'''

backend_dependencies = ""
Expand Down Expand Up @@ -1208,6 +1218,13 @@ def dockerfile_prepare_container_linux(argmap, backends, enable_gpu,
COPY --from=min_container /usr/local/cuda-12.1/targets/{cuda_arch}-linux/lib/libnvToolsExt.so.1 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.
COPY --from=min_container /usr/local/cuda-12.1/targets/{cuda_arch}-linux/lib/libnvJitLink.so.12 /usr/local/cuda/targets/{cuda_arch}-linux/lib/.

RUN mkdir -p /opt/hpcx/ucc/lib/ /opt/hpcx/ucx/lib/
COPY --from=min_container /opt/hpcx/ucc/lib/libucc.so.1 /opt/hpcx/ucc/lib/libucc.so.1
COPY --from=min_container /opt/hpcx/ucx/lib/libucm.so.0 /opt/hpcx/ucx/lib/libucm.so.0
COPY --from=min_container /opt/hpcx/ucx/lib/libucp.so.0 /opt/hpcx/ucx/lib/libucp.so.0
COPY --from=min_container /opt/hpcx/ucx/lib/libucs.so.0 /opt/hpcx/ucx/lib/libucs.so.0
COPY --from=min_container /opt/hpcx/ucx/lib/libuct.so.0 /opt/hpcx/ucx/lib/libuct.so.0

COPY --from=min_container /usr/lib/{libs_arch}-linux-gnu/libcudnn.so.8 /usr/lib/{libs_arch}-linux-gnu/libcudnn.so.8

# patchelf is needed to add deps of libcublasLt.so.12 to libtorch_cuda.so
Expand Down