GPU inference in Docker container fails due to missing libdevice directory

## Bug Report

### System information
- OS Platform and Distribution: RockyLinux 9.2
- TensorFlow Serving installed from (source or binary): Docker image
- TensorFlow Serving version: tensorflow/serving:2.14.1-gpu
- Docker 24.0.5
- NVIDIA R535 drivers 535.86.10
- NVIDIA Container Toolkit 1.13.5


### Describe the problem
With the latest version of Docker image *tensorflow/serving:2.14.1-gpu*, I cannot run inference of my model with GPU using Docker image tensorflow/serving. The following error is shown in the logs:
```log
2024-01-30 10:15:19.247458: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:521] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.8
  /usr/local/cuda
  /usr/bin/../nvidia/cuda_nvcc
  /usr/bin/../../nvidia/cuda_nvcc
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2024-01-30 10:15:19.259311: I external/org_tensorflow/tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-01-30 10:15:19.260050: I external/org_tensorflow/tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2024-01-30 10:15:19.260095: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:109] Couldn't get ptxas version : FAILED_PRECONDITION: Couldn't get ptxas/nvlink version string: INTERNAL: Couldn't invoke ptxas --version
...
2024-01-30 10:15:19.770155: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
error: libdevice not found at ./libdevice.10.bc
```
It appears that the CUDA libraries are not installed completely. The `libdevice` directory doesn't exist in the Docker image. I expected CUDA to be fully installed to support serving models with GPU. 

I encounter no problems with *tensorflow/serving:2.11.0-gpu*.


I considered the following solutions before raising this issue:
* [FIx for "Couldn't invoke ptxas --version" with cuda-11.3 and jaxlib 0.1.66+cuda111](https://github.com/google/jax/discussions/6843)
* [Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice](https://discuss.tensorflow.org/t/cant-find-libdevice-directory-cuda-dir-nvvm-libdevice/11896/7) 
* [Bug: Unable to find libdevice directory (${CUDA_DIR}/nvvm/libdevice) in GPU Images, Fix: Install cuda-nvcc to GPU images and update XLA_FLAGS](https://github.com/aws/sagemaker-distribution/issues/15) 
* [Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.](https://github.com/tensorflow/tensorflow/issues/58681) 
* [Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice](https://github.com/tensorflow/tensorflow/issues/56927) 
* [New optimizers fail to load CUDA installed through conda](https://github.com/keras-team/tf-keras/issues/62) 


#### Workaround
Install the `cuda-toolkit` package in the Docker image.
```Dockerfile
FROM tensorflow/serving:2.14.1-gpu
RUN apt-get update && apt-get install -y cuda-toolkit-11-8
```
This increases the size of the Docker image by ~4GB (uncompressed).

Alternatively, it also works with the `tensorflow/serving:2.14.1-devel-gpu` Docker image, but this is even larger in size.

### Exact Steps to Reproduce
```bash
docker run -u root:root -ti --entrypoint bash tensorflow/serving:2.14.1-gpu
```

ptxas is not available:
```bash
$ ptxas --version
bash: ptxas: command not found
```

Searching for a directory `nvvm` or `libdevice` returns nothing:
```bash
find / -type d -name nvvm 2>/dev/null
```

When using 2.11.0, it does work:
```bash
docker run -u root:root -ti --entrypoint bash tensorflow/serving:2.11.0-gpu
```

ptxas is available:
```bash
$ ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:21_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
```

Searching for a directory `nvvm` returns the directory in the cuda installation directory:
```bash
$ find / -type d -name nvvm 2>/dev/null
/usr/local/cuda-11.2/nvvm
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU inference in Docker container fails due to missing libdevice directory #2201

Bug Report

System information

Describe the problem

Workaround

Exact Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU inference in Docker container fails due to missing libdevice directory #2201

Description

Bug Report

System information

Describe the problem

Workaround

Exact Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions