Merlin offers three containers (NVIDIA-Merlin#379)

In 22.05 and earlier, the NGC catalog provided one container for training and a separate container for inference. For example, we had a `merlin-tensorflow-training` and a `merlin-tensorflow-inference` container. For 22.06, the training and inference software is combined in a single container. For example, the previous capabilities are now provided in a `merlin-tensorflow` container.
nv-alaiacano · Jun 14, 2022 · a0fa064 · a0fa064
1 parent 027587c
commit a0fa064
Show file tree

Hide file tree

Showing 32 changed files with 238 additions and 665 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,20 +1,20 @@
 repos:
       - repo: https://github.com/timothycrosley/isort
-        rev: 5.9.3
+        rev: 5.10.1
         hooks:
         - id: isort
           additional_dependencies: [toml]
           exclude: examples/*
       - repo: https://github.com/python/black
-        rev: 21.7b0
+        rev: 22.3.0
         hooks:
         - id: black
       - repo: https://gitlab.com/pycqa/flake8
         rev: 3.9.2
         hooks:
         - id: flake8
       - repo: https://github.com/pycqa/pylint
-        rev: pylint-2.7.4
+        rev: v2.14.1
         hooks:
         - id: pylint
       #- repo: https://github.com/econchick/interrogate
@@ -28,12 +28,12 @@ repos:
         hooks:
         - id: codespell
       - repo: https://github.com/PyCQA/bandit
-        rev: 1.7.0
+        rev: 1.7.4
         hooks:
         - id: bandit
           args: [--verbose, -ll, -x, tests,examples,bench]
       - repo: https://github.com/s-weigand/flake8-nb
-        rev: v0.3.0
+        rev: v0.4.0
         hooks:
         - id: flake8-nb
           files: \.ipynb$
diff --git a/Release.md b/Release.md
diff --git a/ci/versions.py b/ci/versions.py
diff --git a/docker/README.md b/docker/README.md
@@ -4,11 +4,10 @@ All NVIDIA Merlin components are available as open source projects. However, a m
 
 Containers allow you to package your software application, libraries, dependencies, and runtime compilers in a self-contained environment. These containers can be pulled and launched right out of the box. You can clone and adjust these containers if necessary. 
 
-The table below provides a list of Dockerfiles that can be used to build the corresponding Docker container:
-
-| Container Name             | Dockerfile       | Container Location                                                             | Functionality                                         |
-|----------------------------|------------------|--------------------------------------------------------------------------------|-------------------------------------------------------|
-| Merlin-training            | dockerfile.ctr   |  https://ngc.nvidia.com/containers/nvidia:merlin:merlin-training            | NVTabular and HugeCTR                                 |
-| Merlin-tensorflow-training | dockerfile.tf    |  https://ngc.nvidia.com/containers/nvidia:merlin:merlin-tensorflow-training | NVTabular, TensorFlow, and HugeCTR Tensorflow Embedding plugin |
-| Merlin-pytorch-training    | dockerfile.torch |  https://ngc.nvidia.com/containers/nvidia:merlin:merlin-pytorch-training    | NVTabular and PyTorch                                 |
-| Merlin-inference           | dockerfile.tri   |  https://ngc.nvidia.com/containers/nvidia:merlin:merlin-inference           | NVTabular, HugeCTR, and Triton Inference               |
+The following table provides a list of Dockerfiles that you can use to build the corresponding Docker container:
+
+| Container Name       | Dockerfile         | Container Location                                                                     | Functionality                                                  |
+|----------------------|--------------------|----------------------------------------------------------------------------------------|----------------------------------------------------------------|
+| `merlin-hugectr`     | `dockerfile.ctr`   | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr>    | NVTabular and HugeCTR                                          |
+| `merlin-tensorflow`  | `dockerfile.tf`    | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow> | NVTabular, TensorFlow, and HugeCTR Tensorflow Embedding plugin |
+| `merlin-pytorch`     | `dockerfile.torch` | <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch>    | NVTabular and PyTorch                                          |
diff --git a/docs/README.md b/docs/README.md
@@ -84,7 +84,7 @@ jq 'walk(if type == "object" then del(.cuparse) else . end)' < data.json > x
 ### View a container for a release
 
 ```shell
-jq '.["nvcr.io/nvidia/merlin/merlin-inference"]["22.03"]' < ../docs/source/data.json
+jq '.["nvcr.io/nvidia/merlin/merlin-hugectr"]["22.03"]' < ../docs/source/data.json
 ```
 
 ### List the containers and releases

diff --git a/docs/data.json b/docs/data.json
@@ -1,4 +1,121 @@
 {
+  "nvcr.io/nvidia/merlin/merlin-tensorflow": {
+    "22.05": {
+      "base_container": "Triton version 22.03",
+      "compressedSize": "5.01 GB",
+      "cublas": "11.8.1.74",
+      "cuda": "11.6.1.005",
+      "cudf": "22.2.0",
+      "cudnn": "8.3.3.40+cuda11.5",
+      "cufft": "10.7.1.112",
+      "curand": "10.2.9.55",
+      "cusolver": "11.3.3.112",
+      "cusparse": "11.7.2.112",
+      "cutensor": "1.5.0.1",
+      "dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
+      "gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
+      "hugectr": "Not applicable",
+      "hugectr2onnx": "Not applicable",
+      "merlin.core": "0.3.0",
+      "merlin.models": "0.4.0",
+      "merlin.systems": "0.2.0",
+      "nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required.  However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
+      "nvidia_pytorch": "Not applicable",
+      "nvidia_tensorflow": "Not applicable",
+      "nvtabular": "1.1.1",
+      "openmpi": "4.1.2rc4",
+      "os": "Ubuntu 20.04.4 LTS",
+      "python_major": "3",
+      "pytorch": "Not applicable",
+      "release": "22.05",
+      "rmm": "21.12.0",
+      "size": "11.05 GB",
+      "sm": "Not applicable",
+      "sparse_operation_kit": "Not applicable",
+      "tensorrt": "8.2.3.0+cuda11.4.2.006",
+      "tf": "Not applicable",
+      "transformers4rec": "0.1.8",
+      "triton": "2.20.0"
+    }
+  },
+  "nvcr.io/nvidia/merlin/merlin-pytorch": {
+    "22.05": {
+      "base_container": "Triton version 22.04",
+      "compressedSize": "6.63 GB",
+      "cublas": "11.9.3.115",
+      "cuda": "11.6.2.010",
+      "cudf": "22.2.0",
+      "cudnn": "8.4.0.27",
+      "cufft": "10.7.2.124",
+      "curand": "10.2.9.124",
+      "cusolver": "11.3.4.124",
+      "cusparse": "11.7.2.124",
+      "cutensor": "1.5.0.3",
+      "dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
+      "gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
+      "hugectr": "Not applicable",
+      "hugectr2onnx": "Not applicable",
+      "merlin.core": "0.3.0",
+      "merlin.models": "0.4.0",
+      "merlin.systems": "0.2.0",
+      "nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required.  However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
+      "nvidia_pytorch": "Not applicable",
+      "nvidia_tensorflow": "Not applicable",
+      "nvtabular": "1.1.1",
+      "openmpi": "4.1.2rc4",
+      "os": "Ubuntu 20.04.4 LTS",
+      "python_major": "3",
+      "pytorch": "1.11.0+cu113",
+      "release": "22.05",
+      "rmm": "21.12.0",
+      "size": "14.37 GB",
+      "sm": "Not applicable",
+      "sparse_operation_kit": "Not applicable",
+      "tensorrt": "8.2.4.2+cuda11.4.2.006",
+      "tf": "Not applicable",
+      "transformers4rec": "0.1.8",
+      "triton": "2.21.0"
+    }
+  },
+  "nvcr.io/nvidia/merlin/merlin-hugectr": {
+    "22.05": {
+      "base_container": "Triton version 22.03",
+      "compressedSize": "5.54 GB",
+      "cublas": "11.8.1.74",
+      "cuda": "11.6.1.005",
+      "cudf": "22.2.0",
+      "cudnn": "8.3.3.40+cuda11.5",
+      "cufft": "10.7.1.112",
+      "curand": "10.2.9.55",
+      "cusolver": "11.3.3.112",
+      "cusparse": "11.7.2.112",
+      "cutensor": "1.5.0.1",
+      "dgx_system": "* DGX-1\n* DGX-2\n* DGX A100\n* DGX Station",
+      "gpu_model": "* `NVIDIA Ampere GPU Architecture <https://www.nvidia.com/en-us/geforce/turing>`_\n* `Turing <https://www.nvidia.com/en-us/geforce/turing/>`_\n* `Volta <https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/>`_\n* `Pascal <https://www.nvidia.com/en-us/data-center/pascal-gpu-architecture/>`_",
+      "hugectr": "Not applicable",
+      "hugectr2onnx": "Not applicable",
+      "merlin.core": "0.3.0",
+      "merlin.models": "0.4.0",
+      "merlin.systems": "0.2.0",
+      "nvidia_driver": "NVIDIA Driver version 465.19.01\nor later is required.  However,\nif you're running on Data Center\nGPUs (formerly Tesla) such as T4,\nyou can use any of the following\nNVIDIA Driver versions:\n\n* 418.40 (or later R418)\n* 440.33 (or later R440)\n* 450.51 (or later R450)\n* 460.27 (or later R460)\n\n**Note**: The CUDA Driver\nCompatibility Package does not\nsupport all drivers.",
+      "nvidia_pytorch": "Not applicable",
+      "nvidia_tensorflow": "Not applicable",
+      "nvtabular": "1.1.1",
+      "openmpi": "4.1.2rc4",
+      "os": "Ubuntu 20.04.4 LTS",
+      "python_major": "3",
+      "pytorch": "Not applicable",
+      "release": "22.05",
+      "rmm": "21.12.0",
+      "size": "12.04 GB",
+      "sm": "Not applicable",
+      "sparse_operation_kit": "Not applicable",
+      "tensorrt": "8.2.3.0+cuda11.4.2.006",
+      "tf": "Not applicable",
+      "transformers4rec": "0.1.8",
+      "triton": "2.20.0"
+    }
+  },
   "nvcr.io/nvidia/merlin/merlin-inference": {
     "21.09": {
       "base_container": "Triton version 21.07",
@@ -1417,4 +1534,4 @@
       "triton": "Not applicable"
     }
   }
-}
+}
diff --git a/docs/smx2rst.py b/docs/smx2rst.py
@@ -76,7 +76,7 @@ def to_rst(self, path: str):
         each container.
 
         The implementation is to iterate over the containers from
-        the JSON file and create one file for each container.
+        the `table_config.yaml` file and create one file for each container.
 
         Parameters
         ----------
@@ -91,7 +91,7 @@ def to_rst(self, path: str):
             outdir.mkdir(parents=True, exist_ok=True)
             logger.info("   ...done.")
 
-        for container in self.data.keys():
+        for container in self.table_config.keys():
             years = [
                 self.release_pattern.search(x).group(1)
                 for x in self.data[container].keys()