Skip to content

Commit

Permalink
Merlin offers three containers (NVIDIA-Merlin#379)
Browse files Browse the repository at this point in the history
In 22.05 and earlier, the NGC catalog provided one
container for training and a separate container for
inference.

For example, we had a `merlin-tensorflow-training` and
a `merlin-tensorflow-inference` container.

For 22.06, the training and inference software is combined
in a single container.  For example, the previous
capabilities are now provided in a `merlin-tensorflow`
container.
  • Loading branch information
mikemckiernan authored Jun 14, 2022
1 parent 027587c commit a0fa064
Show file tree
Hide file tree
Showing 32 changed files with 238 additions and 665 deletions.
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
repos:
- repo: https://github.com/timothycrosley/isort
rev: 5.9.3
rev: 5.10.1
hooks:
- id: isort
additional_dependencies: [toml]
exclude: examples/*
- repo: https://github.com/python/black
rev: 21.7b0
rev: 22.3.0
hooks:
- id: black
- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
- repo: https://github.com/pycqa/pylint
rev: pylint-2.7.4
rev: v2.14.1
hooks:
- id: pylint
#- repo: https://github.com/econchick/interrogate
Expand All @@ -28,12 +28,12 @@ repos:
hooks:
- id: codespell
- repo: https://github.com/PyCQA/bandit
rev: 1.7.0
rev: 1.7.4
hooks:
- id: bandit
args: [--verbose, -ll, -x, tests,examples,bench]
- repo: https://github.com/s-weigand/flake8-nb
rev: v0.3.0
rev: v0.4.0
hooks:
- id: flake8-nb
files: \.ipynb$
8 changes: 0 additions & 8 deletions Release.md

This file was deleted.

149 changes: 0 additions & 149 deletions ci/versions.py

This file was deleted.

15 changes: 7 additions & 8 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ All NVIDIA Merlin components are available as open source projects. However, a m

Containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. These containers can be pulled and launched right out of the box. You can clone and adjust these containers if necessary.

The table below provides a list of Dockerfiles that can be used to build the corresponding Docker container:

| Container Name | Dockerfile | Container Location | Functionality |
|----------------------------|------------------|--------------------------------------------------------------------------------|-------------------------------------------------------|
| Merlin-training | dockerfile.ctr | https://ngc.nvidia.com/containers/nvidia:merlin:merlin-training | NVTabular and HugeCTR |
| Merlin-tensorflow-training | dockerfile.tf | https://ngc.nvidia.com/containers/nvidia:merlin:merlin-tensorflow-training | NVTabular, TensorFlow, and HugeCTR Tensorflow Embedding plugin |
| Merlin-pytorch-training | dockerfile.torch | https://ngc.nvidia.com/containers/nvidia:merlin:merlin-pytorch-training | NVTabular and PyTorch |
| Merlin-inference | dockerfile.tri | https://ngc.nvidia.com/containers/nvidia:merlin:merlin-inference | NVTabular, HugeCTR, and Triton Inference |
The following table provides a list of Dockerfiles that you can use to build the corresponding Docker container:

| Container Name | Dockerfile | Container Location | Functionality |
|----------------------|--------------------|----------------------------------------------------------------------------------------|----------------------------------------------------------------|
| `merlin-hugectr` | `dockerfile.ctr` | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr> | NVTabular and HugeCTR |
| `merlin-tensorflow` | `dockerfile.tf` | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow> | NVTabular, TensorFlow, and HugeCTR Tensorflow Embedding plugin |
| `merlin-pytorch` | `dockerfile.torch` | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch> | NVTabular and PyTorch |
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ jq 'walk(if type == "object" then del(.cuparse) else . end)' < data.json > x
### View a container for a release

```shell
jq '.["nvcr.io/nvidia/merlin/merlin-inference"]["22.03"]' < ../docs/source/data.json
jq '.["nvcr.io/nvidia/merlin/merlin-hugectr"]["22.03"]' < ../docs/source/data.json
```

### List the containers and releases
Expand Down
119 changes: 118 additions & 1 deletion docs/data.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,121 @@
{
"nvcr.io/nvidia/merlin/merlin-tensorflow": {
"22.05": {
"base_container": "Triton version 22.03",
"compressedSize": "5.01 GB",
"cublas": "11.8.1.74",
"cuda": "11.6.1.005",
"cudf": "22.2.0",
"cudnn": "8.3.3.40+cuda11.5",
"cufft": "10.7.1.112",
"curand": "10.2.9.55",
"cusolver": "11.3.3.112",
"cusparse": "11.7.2.112",
"cutensor": "1.5.0.1",
"dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
"gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
"hugectr": "Not applicable",
"hugectr2onnx": "Not applicable",
"merlin.core": "0.3.0",
"merlin.models": "0.4.0",
"merlin.systems": "0.2.0",
"nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required. However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
"nvidia_pytorch": "Not applicable",
"nvidia_tensorflow": "Not applicable",
"nvtabular": "1.1.1",
"openmpi": "4.1.2rc4",
"os": "Ubuntu 20.04.4 LTS",
"python_major": "3",
"pytorch": "Not applicable",
"release": "22.05",
"rmm": "21.12.0",
"size": "11.05 GB",
"sm": "Not applicable",
"sparse_operation_kit": "Not applicable",
"tensorrt": "8.2.3.0+cuda11.4.2.006",
"tf": "Not applicable",
"transformers4rec": "0.1.8",
"triton": "2.20.0"
}
},
"nvcr.io/nvidia/merlin/merlin-pytorch": {
"22.05": {
"base_container": "Triton version 22.04",
"compressedSize": "6.63 GB",
"cublas": "11.9.3.115",
"cuda": "11.6.2.010",
"cudf": "22.2.0",
"cudnn": "8.4.0.27",
"cufft": "10.7.2.124",
"curand": "10.2.9.124",
"cusolver": "11.3.4.124",
"cusparse": "11.7.2.124",
"cutensor": "1.5.0.3",
"dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
"gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
"hugectr": "Not applicable",
"hugectr2onnx": "Not applicable",
"merlin.core": "0.3.0",
"merlin.models": "0.4.0",
"merlin.systems": "0.2.0",
"nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required. However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
"nvidia_pytorch": "Not applicable",
"nvidia_tensorflow": "Not applicable",
"nvtabular": "1.1.1",
"openmpi": "4.1.2rc4",
"os": "Ubuntu 20.04.4 LTS",
"python_major": "3",
"pytorch": "1.11.0+cu113",
"release": "22.05",
"rmm": "21.12.0",
"size": "14.37 GB",
"sm": "Not applicable",
"sparse_operation_kit": "Not applicable",
"tensorrt": "8.2.4.2+cuda11.4.2.006",
"tf": "Not applicable",
"transformers4rec": "0.1.8",
"triton": "2.21.0"
}
},
"nvcr.io/nvidia/merlin/merlin-hugectr": {
"22.05": {
"base_container": "Triton version 22.03",
"compressedSize": "5.54 GB",
"cublas": "11.8.1.74",
"cuda": "11.6.1.005",
"cudf": "22.2.0",
"cudnn": "8.3.3.40+cuda11.5",
"cufft": "10.7.1.112",
"curand": "10.2.9.55",
"cusolver": "11.3.3.112",
"cusparse": "11.7.2.112",
"cutensor": "1.5.0.1",
"dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
"gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
"hugectr": "Not applicable",
"hugectr2onnx": "Not applicable",
"merlin.core": "0.3.0",
"merlin.models": "0.4.0",
"merlin.systems": "0.2.0",
"nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required. However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
"nvidia_pytorch": "Not applicable",
"nvidia_tensorflow": "Not applicable",
"nvtabular": "1.1.1",
"openmpi": "4.1.2rc4",
"os": "Ubuntu 20.04.4 LTS",
"python_major": "3",
"pytorch": "Not applicable",
"release": "22.05",
"rmm": "21.12.0",
"size": "12.04 GB",
"sm": "Not applicable",
"sparse_operation_kit": "Not applicable",
"tensorrt": "8.2.3.0+cuda11.4.2.006",
"tf": "Not applicable",
"transformers4rec": "0.1.8",
"triton": "2.20.0"
}
},
"nvcr.io/nvidia/merlin/merlin-inference": {
"21.09": {
"base_container": "Triton version 21.07",
Expand Down Expand Up @@ -1417,4 +1534,4 @@
"triton": "Not applicable"
}
}
}
}
4 changes: 2 additions & 2 deletions docs/smx2rst.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def to_rst(self, path: str):
each container.
The implementation is to iterate over the containers from
the JSON file and create one file for each container.
the `table_config.yaml` file and create one file for each container.
Parameters
----------
Expand All @@ -91,7 +91,7 @@ def to_rst(self, path: str):
outdir.mkdir(parents=True, exist_ok=True)
logger.info(" ...done.")

for container in self.data.keys():
for container in self.table_config.keys():
years = [
self.release_pattern.search(x).group(1)
for x in self.data[container].keys()
Expand Down
Loading

0 comments on commit a0fa064

Please sign in to comment.