diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 0000000000..42afabfd2a --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1 @@ +/build \ No newline at end of file diff --git a/docs/Dockerfile.docs b/docs/Dockerfile.docs new file mode 100644 index 0000000000..52cc014577 --- /dev/null +++ b/docs/Dockerfile.docs @@ -0,0 +1,54 @@ +# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# * Neither the name of NVIDIA CORPORATION nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +FROM ubuntu:20.04 + +# various documentation dependencies +RUN apt-get update && \ + apt-get install -y --no-install-recommends python3-pip python3-dev git \ + git-lfs zip doxygen build-essential unzip wget pandoc ssh && \ + rm -rf /var/lib/apt/lists/* + +# install protobuf +ADD https://github.com/google/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-x86_64.zip ./ +RUN unzip protoc-3.6.1-linux-x86_64.zip -d ./usr/local && \ + rm protoc-3.6.1-linux-x86_64.zip + +# install pseudomuto/protoc-gen-doc +RUN wget https://github.com/pseudomuto/protoc-gen-doc/releases/download/v1.3.2/protoc-gen-doc-1.3.2.linux-amd64.go1.12.6.tar.gz && \ + tar xzf protoc-gen-doc-1.3.2.linux-amd64.go1.12.6.tar.gz && \ + mv protoc-gen-doc-1.3.2.linux-amd64.go1.12.6/protoc-gen-doc /usr/local/bin/ + +# install sphinx et al +RUN pip3 install --upgrade "sphinx>=3.5,<5" "nbclient<0.6,>=0.2" \ + docutils==0.16 ablog myst-nb rst-to-myst nbsphinx==0.8.8 \ + sphinx-book-theme==0.3.2 sphinx-copybutton sphinx-design sphinx-prompt \ + sphinxcontrib-bibtex sphinx-tabs==3.2.0 \ + exhale==0.2.3 breathe==4.14.1 sphinx-sitemap ipython + +# Set visitor script to be included on every HTML page +ENV VISITS_COUNTING_SCRIPT="//assets.adobedtm.com/b92787824f2e0e9b68dc2e993f9bd995339fe417/satelliteLib-7ba51e58dc61bcb0e9311aadd02a0108ab24cc6c.js" + diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000000..9a2abe880c --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,53 @@ +# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# * Neither the name of NVIDIA CORPORATION nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = build + +#PROTOBUFFILES = $(wildcard ../triton/proto/*.proto) + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# protobuf: source/reference/protos/gen_proto_doc.sh +# cd source/reference/protos && \ +# rm -f *.proto.rst && \ +# bash -x ./gen_proto_doc.sh $(PROTOBUFFILES:%=../%) + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/_static/.gitattributes b/docs/_static/.gitattributes new file mode 100644 index 0000000000..04865f126a --- /dev/null +++ b/docs/_static/.gitattributes @@ -0,0 +1,2 @@ +nvidia-logo-horiz-rgb-blk-for-screen.png filter=lfs diff=lfs merge=lfs -text +nvidia-logo-vert-rgb-blk-for-screen.png filter=lfs diff=lfs merge=lfs -text diff --git a/docs/_static/custom.css b/docs/_static/custom.css new file mode 100644 index 0000000000..4b9d6dc8ec --- /dev/null +++ b/docs/_static/custom.css @@ -0,0 +1,292 @@ +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/5/2/52891dda673228d54e5d57bf1e4a3880d4b22405.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/0/e090b7dda7a582522c7f9045c6ce949cce60134f.woff) format("woff"); + font-weight: 300; + font-style: normal; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/a/1/a107baabcbf6b241099122336bce7429bcfd377a.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/3/a/3a6060a4e3bce70e5552ba0de8af4b22c6cf9144.woff) format("woff"); + font-weight: 300; + font-style: italic; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/9/9/9920d2b172b01d92fc9c1c0e521dcf45b59c47c3.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/6/c/6c7d947928a7e4ef3e80ed409bef6c243f2148cb.woff) format("woff"); + font-weight: 400; + font-style: normal; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/8/e8e63fe1244372cd942d957f44a5616a1eba0644.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/0/f/0f1fb2af0283ab09d36e7097bb07d895c3228f12.woff) format("woff"); + font-weight: 400; + font-style: italic; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/7/9/79d3c513a9cd72c59f65354f39f89ca52dc17dd2.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/2/5/2581ac533f5d01f4985d8a7245b0766b4630ced8.woff) format("woff"); + font-weight: 500; + font-style: normal; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/3/9/39d9ef1ee9770dd503f19bb2ace2fdb4eff3bb50.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/7/b/7bb5d5e2e71b2e13c8098b2e67c0a0ed9258e6c7.woff) format("woff"); + font-weight: 500; + font-style: italic; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/0/5/05276a55a43eb3f74981ec1e93252727afcd9d16.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/9/c/9cfec7ed941b06564aa4d5ca14610e81542d070f.woff) format("woff"); + font-weight: 700; + font-style: normal; +} +@font-face { + font-family: "NVIDIA Sans"; + src: url(https://aws1.discourse-cdn.com/nvidia/original/3X/a/e/aebd14d09ba56f541e1b8735fb051e33710f9ae7.woff2) format("woff2"), + url(https://aws1.discourse-cdn.com/nvidia/original/3X/e/d/edbdabef43acc5c12e84a94baaa5542c9404cfeb.woff) format("woff"); + font-weight: 700; + font-style: italic; +} + +/* Custom Styles */ +:root { +--pst-font-size-base: none; +--pst-color-primary: 0, 133, 197; +--pst-color-admonition-note: var(--pst-color-primary); +--pst-color-admonition-default: var(--pst-color-primary); +--pst-color-info: 255, 193, 7; +--pst-color-admonition-tip: var(--pst-color-info); +--pst-color-admonition-hint: var(--pst-color-info); +--pst-color-admonition-important: var(--pst-color-info); +--pst-color-warning: 245, 162, 82; +--pst-color-danger: 230, 101, 129; +--pst-color-admonition-warning: var(--pst-color-danger); +--pst-color-link: 118, 185, 0; +--pst-color-inline-code: 92, 22, 130; +--font-family-sans-serif: NVIDIA Sans, Helvetica, Arial, Sans-serif; +--pst-font-family-base-system: NVIDIA Sans, Helvetica, Arial, Sans-serif; +font-family: NVIDIA Sans, Helvetica, Arial, Sans-serif; +} + +.prev-next-area { + font-size: small; +} + +.docutils caption { + caption-side: top; +} + +#site-navigation h1.site-logo { + font-size: 0.85em; +} + +/* colors +nv green 118,185,0 +black 0, 0, 0 +light gray 205, 205, 205 +medium gray 140, 140, 140 +dark gray 94, 94, 94 + +emerald 0, 133, 100 +emerald #008564 +amethyst 92, 22, 130 +amethyst #5C1682 +cpu blue 0, 133, 197 +cpu blue #0085C5 +garnet 137, 12, 88 +garnet 890C58 +fluorite 250, 194, 0 +fluorite FAC200 +*/ + +:root { + --nv-green: #76b900; + --nv-green-darken: #6ead00; + --emerald: #008564; + --emerald-darken: #017c5d; + --amethyst: #5d1682; + --amethyst-darken: #4c116b; + --cpu-blue: #0071c5; + --cpu-blue-darken: #0062ad; + --garnet: #890c58; + --garnet-darken: #7a0c4e; + --fluorite: #fac200; + --fluorite-darken: #e4b301; + --dark-gray: #5e5e5e; + --light-gray: #cdcdcd; + --medium-gray: #8c8c8c; + --medium-gray-darken: #8c8c8cde; + --primary: #76b900; + --secondary: #008564; + --success: #5d1682; + --info: #0071c5; + --warning: #fac200; + --danger: #890c58; +} + +/* Riva TBYB (ASR and TTS) Styling */ +.demo-box { + background-color: rgb(245,245,245); +} +a:link { text-decoration: none; } +.scrollable { + height: 125px; + overflow-y: auto; + font-size: 1.3rem; +} +.dot { + height: 8px; + width: 8px; + background-color: rgb(228, 77, 77); + border-radius: 50%; + display: inline-block; +} +.timer { + font-size: 80%; + text-transform: uppercase; + white-space: nowrap; +} +.form-select { + border-radius: 0%; + font-size: 80%; +} +.form-control { + border-radius: 0%; +} +.input-group-text { + border-radius: 0%; + font-size: 80%; + text-transform: uppercase; + background-color: rgb(245,245,245); +} +.card { + border-radius: 0%; +} +.speech-control { + border-top-width: 0px; +} +.btn { + border-radius: 0%; + font-size: 80%; + text-transform: uppercase; + white-space: nowrap; + min-width: 125px; +} +.btn-primary { + background-color: var(--nv-green); + border-color: var(--nv-green); +} +.btn-primary:hover { + background-color: var(--nv-green-darken); + border-color: var(--nv-green-darken); +} +.btn-primary:focus, .btn-primary.focus { + background-color: var(--nv-green-darken); + border-color: var(--nv-green-darken); + -webkit-box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); + box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); +} +.btn-primary.disabled, .btn-primary:disabled { + background-color: var(--nv-green); + border-color: var(--nv-green); +} +.btn-primary:not(:disabled):not(.disabled):active, .btn-primary:not(:disabled):not(.disabled).active, +.show > .btn-primary.dropdown-toggle { + background-color: var(--nv-green-darken); + border-color: var(--nv-green-darken); +} +.btn-primary:not(:disabled):not(.disabled):active:focus, .btn-primary:not(:disabled):not(.disabled).active:focus, +.show > .btn-primary.dropdown-toggle:focus { + -webkit-box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); + box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); +} +.btn-secondary { + background-color: var(--medium-gray); + border-color: var(--medium-gray); +} +.btn-secondary:hover { + background-color: var(--medium-gray-darken); + border-color: var(--medium-gray-darken); +} +.btn-secondary:focus, .btn-secondary.focus { + background-color: var(--medium-gray-darken); + border-color: var(--medium-gray-darken); + -webkit-box-shadow: 0 0 0 0.2rem rgba(140, 140, 140, 0.5); + box-shadow: 0 0 0 0.2rem rgba(140, 140, 140, 0.5); +} +.btn-secondary.disabled, .btn-secondary:disabled { + background-color: var(--medium-gray); + border-color: var(--medium-gray); +} +.btn-secondary:not(:disabled):not(.disabled):active, .btn-secondary:not(:disabled):not(.disabled).active, +.show > .btn-secondary.dropdown-toggle { + background-color: var(--medium-gray-darken); + border-color: var(--medium-gray-darken); +} +.btn-secondary:not(:disabled):not(.disabled):active:focus, .btn-secondary:not(:disabled):not(.disabled).active:focus, +.show > .btn-secondary.dropdown-toggle:focus { + -webkit-box-shadow: 0 0 0 0.2rem rgba(140, 140, 140, 0.5); + box-shadow: 0 0 0 0.2rem rgba(140, 140, 140, 0.5); +} +.btn-link { + color: var(--nv-green); + text-decoration-line: none; +} +.btn-link:hover { + color: var(--nv-green-darken); +} +.btn-link:focus, .btn-link.focus { + color: var(--nv-green-darken); + -webkit-box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); + box-shadow: 0 0 0 0.2rem rgba(147, 173, 102, 0.5); +} +.link-primary { + color: var(--nv-green); +} +.link-primary:hover { + color: var(--nv-green-darken); +} + +/* Riva ASR Styles */ +#riva-upload-label { + margin-top: 0.5rem; +} + +/* Riva TTS Styles */ +.tts-control { + justify-content: space-between; + align-items: center; +} + +.tts-control > p { + margin: unset; +} + +#riva-tts-field { + resize: none; + border: unset; + padding: 0; + height: 100%; + font-size: 1.0rem; +} + +#riva-terms-of-use p { + max-width: 620px; +} + +/* Media Queries */ +@media (max-width: 1024px) { + + /* Riva TTS and ASR */ + .scrollable { + height: 250px; + } +} + diff --git a/docs/_static/logo_2color_horizontal.svg b/docs/_static/logo_2color_horizontal.svg new file mode 100644 index 0000000000..5ab0442d32 --- /dev/null +++ b/docs/_static/logo_2color_horizontal.svg @@ -0,0 +1,2 @@ + + diff --git a/docs/_static/logo_2color_vertical.svg b/docs/_static/logo_2color_vertical.svg new file mode 100644 index 0000000000..69e64b7001 --- /dev/null +++ b/docs/_static/logo_2color_vertical.svg @@ -0,0 +1,2 @@ + + diff --git a/docs/_static/nvidia-logo-horiz-rgb-blk-for-screen.png b/docs/_static/nvidia-logo-horiz-rgb-blk-for-screen.png new file mode 100644 index 0000000000..6316a9340f --- /dev/null +++ b/docs/_static/nvidia-logo-horiz-rgb-blk-for-screen.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd57ffce985e08c97c6af5fdadd2a28e4a92996455edc2d0598dd964cca51eae +size 48928 diff --git a/docs/_static/nvidia-logo-vert-rgb-blk-for-screen.png b/docs/_static/nvidia-logo-vert-rgb-blk-for-screen.png new file mode 100644 index 0000000000..5546c1b57d --- /dev/null +++ b/docs/_static/nvidia-logo-vert-rgb-blk-for-screen.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17a25111e145aa52b77ec5a89eb3b0c7d9a2a90dea25a0bb867a937514fc783c +size 63541 diff --git a/docs/_static/rtd-data.js b/docs/_static/rtd-data.js new file mode 100644 index 0000000000..3f3468eb01 --- /dev/null +++ b/docs/_static/rtd-data.js @@ -0,0 +1,8 @@ +// Dummy data for testing ReadTheDocs footer insertion +// This mimics RTD data for a project that uses both versions + languages +var READTHEDOCS_DATA = { + project: "frc-docs", + version: "latest", + language: "en", + proxied_api_host: "https://readthedocs.org", +}; diff --git a/docs/_templates/layout.html b/docs/_templates/layout.html new file mode 100644 index 0000000000..35cde06fb9 --- /dev/null +++ b/docs/_templates/layout.html @@ -0,0 +1,4 @@ +{% extends "!layout.html" %} +{%- block footer %} + +{%- endblock %} diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 0000000000..9b4ea426d5 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,248 @@ +# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# * Neither the name of NVIDIA CORPORATION nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os + +from docutils import nodes +from sphinx import search + +# import sys +# sys.path.insert(0, os.path.abspath('.')) + + +# -- Project information ----------------------------------------------------- + +project = 'NVIDIA Triton Inference Server' +copyright = '2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved' +author = 'NVIDIA' + +# The full version, including alpha/beta/rc tags +# Env only set during riva-release process, otherwise keep as dev for all internal builds +release = os.getenv("TRITON_VERSION", "dev") + +# maintain left-side bar toctrees in `contents` file +# so it doesn't show up needlessly in the index page +master_doc = "contents" + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + "ablog", + "myst_nb", + "sphinx_copybutton", + "sphinx_design", + "sphinx-prompt", + #"sphinxcontrib.bibtex", + "sphinx_tabs.tabs", + "sphinx_sitemap", +] + +suppress_warnings = ["myst.domains", "ref.ref"] + +numfig = True + +# final location of docs for seo/sitemap +html_baseurl = 'https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/' + +myst_enable_extensions = [ + "dollarmath", + "amsmath", + "deflist", + # "html_admonition", + # "html_image", + "colon_fence", + # "smartquotes", + "replacements", + # "linkify", + "substitution", +] +myst_heading_anchors = 5 + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [ + "README.md" +] + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = "sphinx_book_theme" +html_logo = "_static/nvidia-logo-horiz-rgb-blk-for-screen.png" +html_title = "NVIDIA Triton Inference Server" +html_short_title = "Triton" +html_copy_source = True +html_sourcelink_suffix = "" +html_favicon = "_static/nvidia-logo-vert-rgb-blk-for-screen.png" +html_last_updated_fmt = "" +html_additional_files = ["index.html"] + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] +html_css_files = ["custom.css"] + +html_theme_options = { + "path_to_docs": "docs", + # "launch_buttons": { + # "binderhub_url": "https://mybinder.org", + # "colab_url": "https://colab.research.google.com/", + # "deepnote_url": "https://deepnote.com/", + # "notebook_interface": "jupyterlab", + # "thebe": True, + # # "jupyterhub_url": "https://datahub.berkeley.edu", # For testing + # }, + "use_edit_page_button": False, + "use_issues_button": True, + "use_repository_button": True, + "use_download_button": False, + "logo_only": False, + "show_toc_level": 2, + "extra_navbar": "", + "extra_footer": "", + "repository_url": "https://github.com/triton-inference-server/server", + "use_repository_button": True, +} + +version_short = release +deploy_ngc_org = "nvidia" +deploy_ngc_team = "triton" +myst_substitutions = { + "VersionNum": version_short, + "deploy_ngc_org_team": f"{deploy_ngc_org}/{deploy_ngc_team}" if deploy_ngc_team else deploy_ngc_org, +} + + +def ultimateReplace(app, docname, source): + result = source[0] + for key in app.config.ultimate_replacements: + result = result.replace(key, app.config.ultimate_replacements[key]) + source[0] = result + + +# this is a necessary hack to allow us to fill in variables that exist in code blocks +ultimate_replacements = { + "{VersionNum}": version_short, + "{SamplesVersionNum}": version_short, + "{NgcOrgTeam}": f"{deploy_ngc_org}/{deploy_ngc_team}" if deploy_ngc_team else deploy_ngc_org, +} + +#bibtex_bibfiles = ["references.bib"] +# To test that style looks good with common bibtex config +#bibtex_reference_style = "author_year" +#bibtex_default_style = "plain" + +### We currrently use Myst: https://myst-nb.readthedocs.io/en/latest/use/execute.html +jupyter_execute_notebooks = "off" # Global execution disable +# execution_excludepatterns = ['tutorials/tts-python-basics.ipynb'] # Individual notebook disable + + +def setup(app): + app.add_config_value('ultimate_replacements', {}, True) + app.connect('source-read', ultimateReplace) + app.add_js_file("https://js.hcaptcha.com/1/api.js") + + visitor_script = "//assets.adobedtm.com/5d4962a43b79/c1061d2c5e7b/launch-191c2462b890.min.js" + + if visitor_script: + app.add_js_file(visitor_script) + + # if not os.environ.get("READTHEDOCS") and not os.environ.get("GITHUB_ACTIONS"): + # app.add_css_file( + # "https://assets.readthedocs.org/static/css/readthedocs-doc-embed.css" + # ) + # app.add_css_file("https://assets.readthedocs.org/static/css/badge_only.css") + + # # Create the dummy data file so we can link it + # # ref: https://github.com/readthedocs/readthedocs.org/blob/bc3e147770e5740314a8e8c33fec5d111c850498/readthedocs/core/static-src/core/js/doc-embed/footer.js # noqa: E501 + # app.add_js_file("rtd-data.js") + # app.add_js_file( + # "https://assets.readthedocs.org/static/javascript/readthedocs-doc-embed.js", + # priority=501, + # ) + + +# Patch for sphinx.search stemming short terms (i.e. tts -> tt) +# https://github.com/sphinx-doc/sphinx/blob/4.5.x/sphinx/search/__init__.py#L380 +def sphinxSearchIndexFeed(self, docname: str, filename: str, title: str, doctree: nodes.document): + """Feed a doctree to the index.""" + self._titles[docname] = title + self._filenames[docname] = filename + + visitor = search.WordCollector(doctree, self.lang) + doctree.walk(visitor) + + # memoize self.lang.stem + def stem(word: str) -> str: + try: + return self._stem_cache[word] + except KeyError: + self._stem_cache[word] = self.lang.stem(word).lower() + return self._stem_cache[word] + + _filter = self.lang.word_filter + + for word in visitor.found_title_words: + stemmed_word = stem(word) + if len(stemmed_word) > 3 and _filter(stemmed_word): + self._title_mapping.setdefault(stemmed_word, set()).add(docname) + elif _filter(word): # stemmer must not remove words from search index + self._title_mapping.setdefault(word.lower(), set()).add(docname) + + for word in visitor.found_words: + stemmed_word = stem(word) + # again, stemmer must not remove words from search index + if len(stemmed_word) <= 3 or not _filter(stemmed_word) and _filter(word): + stemmed_word = word.lower() + already_indexed = docname in self._title_mapping.get(stemmed_word, set()) + if _filter(stemmed_word) and not already_indexed: + self._mapping.setdefault(stemmed_word, set()).add(docname) + + +search.IndexBuilder.feed = sphinxSearchIndexFeed diff --git a/docs/contents.md b/docs/contents.md new file mode 100644 index 0000000000..5741a9fcc5 --- /dev/null +++ b/docs/contents.md @@ -0,0 +1,99 @@ + + +```{toctree} +:maxdepth: 1 +:caption: Getting Started + +getting_started/quickstart +``` + +```{toctree} +:maxdepth: 1 +:caption: User Guide + +user_guide/performance_tuning +user_guide/architecture +user_guide/model_repository +customization_guide/repository_agents +user_guide/model_configuration +user_guide/optimization +user_guide/ragged_batching +user_guide/rate_limiter +user_guide/model_analyzer +user_guide/perf_analyzer +user_guide/model_management +user_guide/custom_operations +user_guide/decoupled_models +user_guide/response_cache +user_guide/metrics +user_guide/trace +user_guide/jetson +user_guide/v1_to_v2 +``` + +```{toctree} +:maxdepth: 1 + +user_guide/faq +``` + +```{toctree} +:maxdepth: 1 +:caption: Protocol Guides + +protocol/README.md +customization_guide/inference_protocols +protocol/extension_binary_data +protocol/extension_classification +protocol/extension_logging +protocol/extension_model_configuration +protocol/extension_model_repository +protocol/extension_schedule_policy +protocol/extension_sequence +protocol/extension_shared_memory +protocol/extension_statistics +protocol/extension_trace +``` + +```{toctree} +:maxdepth: 1 +:caption: Customization Guide + +customization_guide/build +customization_guide/compose +customization_guide/test +``` + +```{toctree} +:maxdepth: 1 +:caption: Examples + +examples/jetson/README +examples/jetson/concurrency_and_dynamic_batching/README +``` \ No newline at end of file diff --git a/docs/customization_guide/build.md b/docs/customization_guide/build.md index cf9ac119e4..cb01f7a662 100644 --- a/docs/customization_guide/build.md +++ b/docs/customization_guide/build.md @@ -47,37 +47,37 @@ The Triton source is distributed across multiple GitHub repositories that together can be built and installed to create a complete Triton installation. Triton server is built using CMake and (optionally) Docker. To simplify the build process, Triton provides a -[build.py](../../build.py) script. The build.py script will generate the -CMake and Docker build steps required to build Triton, and will -optionally invoke those steps or leave the invocation to you, as -described below. +[build.py](https://github.com/triton-inference-server/server/blob/main/build.py) script. +The build.py script will generate the CMake and Docker build steps required to +build Triton, and will optionally invoke those steps or leave the invocation to +you, as described below. The build.py script currently supports building Triton for the following platforms. See [Building on Unsupported Platforms](#building-on-unsupported-platforms) if you are attempting to build Triton on a platform that is not listed here. -* [Ubuntu 20.04, x86-64](#ubuntu) +* [Ubuntu 20.04, x86-64](#building-for-ubuntu-2004) -* [Jetpack 4.x, NVIDIA Jetson (Xavier, Nano, TX2)](#jetpack) +* [Jetpack 4.x, NVIDIA Jetson (Xavier, Nano, TX2)](#building-for-jetpack-4x) -* [Windows 10, x86-64](#windows) +* [Windows 10, x86-64](#building-for-windows-10) If you are developing or debugging Triton, see [Development and Incremental Builds](#development-and-incremental-builds) for information on how to perform incremental build. -## Building for Ubuntu 20.04 +## Building for Ubuntu 20.04 For Ubuntu-20.04, build.py supports both a Docker build and a non-Docker build. -* [Build using Docker](#ubuntu-docker) and the TensorFlow and PyTorch +* [Build using Docker](#building-with-docker) and the TensorFlow and PyTorch Docker images from [NVIDIA GPU Cloud (NGC)](https://ngc.nvidia.com). -* [Build without Docker](#ubuntu-without-docker). +* [Build without Docker](#building-without-docker). -### Building With Docker +### Building With Docker The easiest way to build Triton is to use Docker. The result of the build will be a Docker image called *tritonserver* that will contain @@ -202,7 +202,7 @@ flag. This is needed since the CPU-only builds of the TensorFlow and PyTorch backends require some CUDA stubs and runtime dependencies that are not present in the CPU-only base container. -### Building Without Docker +### Building Without Docker To build Triton without using Docker you must install the build dependencies that are handled automatically when building with Docker. @@ -239,7 +239,7 @@ Triton. $ ./build.py -v --no-container-build --build-dir=`pwd`/build --enable-all ``` -See [Building with Docker](#ubuntu-docker) for more details on how the +See [Building with Docker](#building-with-docker) for more details on how the cmake_build script is used to perform the build. #### CUDA, cuBLAS, cuDNN @@ -267,18 +267,18 @@ For a given version of Triton you can attempt to build with non-supported versions of TensorRT but you may have build or execution issues since non-supported versions are not tested. -## Building for JetPack 4.x +## Building for JetPack 4.x *Under Construction* -## Building for Windows 10 +## Building for Windows 10 For Windows 10, build.py supports both a Docker build and a non-Docker -build in a similar way as described for [Ubuntu](#ubuntu). The primary +build in a similar way as described for [Ubuntu](#building-for-ubuntu-2004). The primary difference is that the minimal/base image used as the base of Dockerfile.buildbase image can be built from the provided -[Dockerfile.win10.min](../../Dockerfile.win10.min) file as described in -[Windows 10 "Min" Image](#windows-10-min-image). When running build.py +[Dockerfile.win10.min](https://github.com/triton-inference-server/server/blob/main/Dockerfile.win10.min) +file as described in [Windows 10 "Min" Image](#windows-10-min-image). When running build.py use the --image flag to specify the tag that you assigned to this image. For example, --image=base,win10-py3-min. @@ -296,7 +296,7 @@ step. The "min" container describes the base dependencies needed to perform the Windows build. The Windows min container is -[Dockerfile.win10.min](../../Dockerfile.win10.min). +[Dockerfile.win10.min](https://github.com/triton-inference-server/server/blob/main/Dockerfile.win10.min). Before building the min container you must download the appropriate cuDNN and TensorRT versions and place them in the same directory as @@ -360,7 +360,7 @@ dependencies that were used for the build. Building for an unsupported OS and/or hardware platform is possible. All of the build scripting, Dockerfiles and CMake invocations are included in the public repos or are generated by -build.py as described in [Building with Docker](#ubuntu-docker). From +build.py as described in [Building with Docker](#building-with-docker). From these files you can find the required dependencies and CMake invocations. However, due to differences in compilers, libraries, package management, etc. you may have to make changes in the build @@ -428,14 +428,14 @@ and cmake_build or the equivalent commands to perform a build. ### Development Builds Without Docker -If you are [building without Docker](#ubuntu-without-docker) use the +If you are [building without Docker](#building-without-docker) use the CMake invocation steps in cmake_build to invoke CMake to set-up a build environment where you can invoke make/msbuild.exe to incremental build the Triton core, a backend, or a repository agent. ### Development Builds With Docker -If you are [building with Docker](#ubuntu-docker), the generated +If you are [building with Docker](#building-with-docker), the generated *tritonserver_buildbase* image contains all the dependencies needed to perform a full or incremental build. Within *tritonserver_buildbase*, /workspace/build/cmake_build contains the CMake invocations that are diff --git a/docs/customization_guide/inference_protocols.md b/docs/customization_guide/inference_protocols.md index 71f0745910..03b9d31e68 100644 --- a/docs/customization_guide/inference_protocols.md +++ b/docs/customization_guide/inference_protocols.md @@ -135,13 +135,13 @@ All capabilities of Triton server are encapsulated in the shared library and are exposed via the Server API. The `tritonserver` executable implements HTTP/REST and GRPC endpoints and uses the Server API to communicate with core Triton logic. The primary source files -for the endpoints are [grpc_server.cc](../../src/grpc_server.cc) and -[http_server.cc](../../src/http_server.cc). In these source files you can +for the endpoints are [grpc_server.cc](https://github.com/triton-inference-server/server/blob/main/src/grpc_server.cc) and +[http_server.cc](https://github.com/triton-inference-server/server/blob/main/src/http_server.cc). In these source files you can see the Server API being used. You can use the Server API in your own application as well. A simple example using the Server API can be found in -[simple.cc](../../src/simple.cc). +[simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). ### API Description @@ -166,7 +166,7 @@ all of the features and capabilities of Triton. A `TRITONSERVER_Server` object is created by calling `TRITONSERVER_ServerNew` with a set of options that indicate how the object should be initialized. Use of `TRITONSERVER_ServerNew` is -demonstrated in [simple.cc](../../src/simple.cc). Once you have created a +demonstrated in [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). Once you have created a `TRITONSERVER_Server` object, you can begin using the rest of the Server API as described below. @@ -185,12 +185,12 @@ the Server API function. As a result, your application is responsible for managing the lifecycle of the returned `TRITONSERVER_Error` object. You must delete the error object using `TRITONSERVER_ErrorDelete` when you are done using it. Macros such as -`FAIL_IF_ERR` shown in [common.h](../../src/common.h) are useful for +`FAIL_IF_ERR` shown in [common.h](https://github.com/triton-inference-server/server/blob/main/src/common.h) are useful for managing error object lifetimes. #### Versioning and Backwards Compatibility -A typical pattern, demonstrated in [simple.cc](../../src/simple.cc) and +A typical pattern, demonstrated in [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc) and shown below, shows how you can compare the Server API version provided by the shared library against the Server API version that you compiled your application against. The Server API is backwards compatible, so @@ -218,14 +218,14 @@ The Server API contains functions for checking health and readiness, getting model information, getting model statistics and metrics, loading and unloading models, etc. The use of these functions is straightforward and some of these functions are demonstrated in -[simple.cc](../../src/simple.cc) and all are documented in +[simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc) and all are documented in [tritonserver.h](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonserver.h). #### Inference APIs Performing an inference request requires the use of many Server API functions and objects, as demonstrated in -[simple.cc](../../src/simple.cc). The general usage requires the +[simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). The general usage requires the following steps. * Create a `TRITONSERVER_ResponseAllocator` using @@ -242,7 +242,7 @@ following steps. `TRITONSERVER_ResponseAllocatorAllocFn_t` and `TRITONSERVER_ResponseAllocatorReleaseFn_t` as defined in [tritonserver.h](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonserver.h). In - [simple.cc](../../src/simple.cc), these callback functions are + [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc), these callback functions are implemented as `ResponseAlloc` and `ResponseRelease`. * Create an inference request as a `TRITONSERVER_InferenceRequest` @@ -277,7 +277,7 @@ following steps. You can reuse an existing `TRITONSERVER_InferenceRequest` object for a new inference request. A couple of examples of how this is done - and why it is useful are shown in [simple.cc](../../src/simple.cc). + and why it is useful are shown in [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). * Ask Triton to execute the inference request using `TRITONSERVER_ServerInferAsync`. `TRITONSERVER_ServerInferAsync` is @@ -285,7 +285,7 @@ following steps. is returned via a callback into your application. You register this callback using `TRITONSERVER_InferenceRequestSetResponseCallback` before you invoke `TRITONSERVER_ServerInferAsync`. In - [simple.cc](../../src/simple.cc) this callback is + [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc) this callback is `InferResponseComplete`. When you invoke `TRITONSERVER_ServerInferAsync` and it returns @@ -311,7 +311,7 @@ following steps. output tensors, and `TRITONSERVER_InferenceResponseOutput` to get information about each output tensor. - Note that the [simple.cc](../../src/simple.cc) example uses a + Note that the [simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc) example uses a std::promise to simply wait for the response, but synchronizing response handling in this way is not required. You can have multiple inference requests in flight at the same time and can issue @@ -322,12 +322,12 @@ is documented in [tritonserver.h](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonserver.h). A simple example using the C API can be found in -[simple.cc](../../src/simple.cc). A more complicated example can be +[simple.cc](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). A more complicated example can be found in the source that implements the HTTP/REST and GRPC endpoints for Triton. These endpoints use the C API to communicate with the core of Triton. The primary source files for the endpoints are -[grpc_server.cc](../../src/grpc_server.cc) and -[http_server.cc](../../src/http_server.cc). +[grpc_server.cc](https://github.com/triton-inference-server/server/blob/main/src/grpc_server.cc) and +[http_server.cc](https://github.com/triton-inference-server/server/blob/main/src/http_server.cc). ## Java bindings for In-Process Triton Server API @@ -341,14 +341,14 @@ generated from `tritonserver.java`. A simple example using the Java API can be found in [Samples folder](https://github.com/bytedeco/javacpp-presets/tree/master/tritonserver/samples) which includes `Simple.java` which is similar to -[`simple.cc`](../../src/simple.cc). +[`simple.cc`](https://github.com/triton-inference-server/server/blob/main/src/simple.cc). Please refer to [sample usage documentation](https://github.com/bytedeco/javacpp-presets/tree/master/tritonserver#sample-usage) to learn about how to build and run `Simple.java`. -In the [QA folder](../../qa), folders starting with L0_java include Java API tests. +In the [QA folder](https://github.com/triton-inference-server/server/blob/main/qa), folders starting with L0_java include Java API tests. These can be useful references for getting started, such as the -[ResNet50 test](../../qa/L0_java_resnet). +[ResNet50 test](https://github.com/triton-inference-server/server/blob/main/qa/L0_java_resnet). ### Java API setup instructions diff --git a/docs/examples/jetson/README.md b/docs/examples/jetson/README.md index 57a5649fd1..61831259d5 100644 --- a/docs/examples/jetson/README.md +++ b/docs/examples/jetson/README.md @@ -28,6 +28,7 @@ # Using Triton Inference Server as a shared library for execution on Jetson +## Overview This project demonstrates how to run C API applications using Triton Inference Server as a shared library. We also show how to build and execute such applications on Jetson. ### Prerequisites @@ -44,7 +45,7 @@ In our example, we placed the contents of downloaded release directory under `/o ## Part 1. Concurrent inference and dynamic batching -The purpose of the sample located under [concurrency_and_dynamic_batching](concurrency_and_dynamic_batching) +The purpose of the sample located under [concurrency_and_dynamic_batching](concurrency_and_dynamic_batching/README.md) is to demonstrate the important features of Triton Inference Server such as concurrent model execution and dynamic batching. In order to do that, we implemented a people detection application using C API and Triton Inference Server as a shared library. diff --git a/docs/getting_started/quickstart.md b/docs/getting_started/quickstart.md index 4abb9646d4..c7c0050355 100644 --- a/docs/getting_started/quickstart.md +++ b/docs/getting_started/quickstart.md @@ -44,8 +44,8 @@ Launching and maintaining Triton Inference Server revolves around the use of bui The [model repository](../user_guide/model_repository.md) is the directory where you place the models that you want Triton to serve. An example model repository is included in the -[docs/examples/model_repository](../examples/model_repository). Before -using the repository, you must fetch any missing model definition +[docs/examples/model_repository](https://github.com/triton-inference-server/server/blob/main/examples/model_repository). +Before using the repository, you must fetch any missing model definition files from their public model zoos via the provided script. ``` diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000000..16a1461ced --- /dev/null +++ b/docs/index.md @@ -0,0 +1,70 @@ +--- +title: Triton Inference Server +--- + + +::::{grid} +:reverse: +:gutter: 2 1 1 1 +:margin: 4 4 1 1 + +:::{grid-item} +:columns: 4 + +```{image} ./_static/nvidia-logo-vert-rgb-blk-for-screen.png +:width: 300px +``` +::: +:::{grid-item} +:columns: 8 +:class: sd-fs-3 + +NVIDIA Triton Inference Server + +::: +:::: + +Triton Inference Server is an open source inference serving software that streamlines AI inferencing. + + + +
+ +
+ +# Triton + +Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference across cloud, data center,edge and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real time, batched, ensembles and audio/video streaming. + +Major features include: + +- [Supports multiple deep learning + frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton) +- [Supports multiple machine learning + frameworks](https://github.com/triton-inference-server/fil_backend) +- [Concurrent model + execution](user_guide/architecture.md#concurrent-model-execution) +- [Dynamic batching](user_guide/model_configuration.md#dynamic-batcher) +- [Sequence batching](user_guide/model_configuration.md#sequence-batcher) and + [implicit state management](user_guide/architecture.md#implicit-state-management) + for stateful models +- Provides [Backend API](https://github.com/triton-inference-server/backend) that + allows adding custom backends and pre/post processing operations +- Model pipelines using + [Ensembling](user_guide/architecture.md#ensemble-models) or [Business + Logic Scripting + (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) +- [HTTP/REST and GRPC inference + protocols](customization_guide/inference_protocols.md) based on the community + developed [KServe + protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) +- A [C API](customization_guide/inference_protocols.md#in-process-triton-server-api) and + [Java API](customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api) + allow Triton to link directly into your application for edge and other in-process use cases +- [Metrics](user_guide/metrics.md) indicating GPU utilization, server + throughput, server latency, and more + +Join the Triton and TensorRT community and stay current on the latest product updates, bug fixes, content, best practices, and more. Need enterprise support? NVIDIA global support is available for Triton Inference Server with the NVIDIA AI Enterprise software suite. diff --git a/docs/protocol/extension_binary_data.md b/docs/protocol/extension_binary_data.md index 90c0962e98..d04edda28b 100644 --- a/docs/protocol/extension_binary_data.md +++ b/docs/protocol/extension_binary_data.md @@ -47,13 +47,13 @@ delivered in the HTTP body after the JSON object (see Examples). The binary tensor data extension uses parameters to indicate that an input or output tensor is communicated as binary data. The first -parameter is used in $request_input and $response_output to indicate +parameter is used in `$request_input` and `$response_output` to indicate that the input or output tensor is communicated as binary data: - "binary_data_size" : int64 parameter indicating the size of the tensor binary data, in bytes. -The second parameter is used in $request_output to indicate that the +The second parameter is used in `$request_output` to indicate that the output should be returned from Triton as binary data. - "binary_data" : bool parameter that is true if the output should be diff --git a/docs/protocol/extension_classification.md b/docs/protocol/extension_classification.md index 9a63e2c748..5c481e16a7 100644 --- a/docs/protocol/extension_classification.md +++ b/docs/protocol/extension_classification.md @@ -62,15 +62,15 @@ indices, the returned tensor will be [ “10:2:apple”, “5:1:pickle” ]. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. #optional indicates an optional JSON field. The classification extension requires that the “classification” parameter, when applied to a requested inference output, be recognized by Triton as follows: -- “classification” : $number indicating the number of classes that +- “classification” : `$number` indicating the number of classes that should be returned for the output. The following example shows how the classification parameter is used diff --git a/docs/protocol/extension_logging.md b/docs/protocol/extension_logging.md index dbce2fda72..2b31863f0f 100644 --- a/docs/protocol/extension_logging.md +++ b/docs/protocol/extension_logging.md @@ -34,8 +34,8 @@ in the extensions field of its Server Metadata. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. #optional indicates an optional JSON field. Triton exposes the logging endpoint at the following URL. The client may use @@ -52,7 +52,7 @@ POST v2/logging ### Log Setting Response JSON Object A successful log setting request is indicated by a 200 HTTP status -code. The response object, identified as $log_setting_response, is +code. The response object, identified as `$log_setting_response`, is returned in the HTTP body for every successful log setting request. ``` @@ -64,34 +64,33 @@ $log_setting_response = $log_setting = $string : $string | $boolean | $number ``` -Each $log_setting JSON describes a “name”/”value” pair, where the “name” is -the $string representation of the log setting and the “value” is a $string, $bool, or $number representation of the -setting value. Currently, the following log settings are defined: +Each `$log_setting` JSON describes a “name”/”value” pair, where the “name” is +the `$string` representation of the log setting and the “value” is a `$string`, +`$bool`, or `$number` representation of the setting value. Currently, the +following log settings are defined: -- "log_file" : a $string parameter defining the file where the log outputs will be saved. If an empty string is specified, -log outputs will stream to the console. +- "log_file" : a `$string` parameter defining the file where the log outputs will be saved. If an empty string is specified, log outputs will stream to the console. -- "log_info" : a $boolean parameter that controls whether the Triton server logs INFO level messages. +- "log_info" : a `$boolean` parameter that controls whether the Triton server logs INFO level messages. -- "log_warning" : a $boolean parameter that controls whether the Triton server logs WARNING level messages. +- "log_warning" : a `$boolean` parameter that controls whether the Triton server logs WARNING level messages. -- "log_error" : a $boolean parameter that controls whether the Triton server logs ERROR level messages. +- "log_error" : a `$boolean` parameter that controls whether the Triton server logs ERROR level messages. -- "log_verbose_level" : a $number parameter that controls whether the Triton server outputs verbose messages +- "log_verbose_level" : a `$number` parameter that controls whether the Triton server outputs verbose messages of varying degrees. This value can be any integer >= 0. If "log_verbose_level" is 0, verbose logging will be disabled, and no verbose messages will be output by the Triton server. If "log_verbose_level" is 1, level 1 verbose messages will be output by the Triton server. If "log_verbose_level" is 2, the Triton server will output all verbose messages of level <= 2, etc. Attempting to set "log_verbose_level" to a number < 0 will result in an error. -- "log_format" : a $string parameter that controls the format of Triton server log messages. There are currently +- "log_format" : a `$string` parameter that controls the format of Triton server log messages. There are currently 2 formats: "default" and "ISO8601". ### Log Setting Response JSON Error Object A failed log setting request will be indicated by an HTTP error status -(typically 400). The HTTP body will contain a -$log_setting_error_response object. +(typically 400). The HTTP body will contain a `$log_setting_error_response` object. ``` $log_setting_error_response = @@ -108,7 +107,7 @@ A log setting request is made with a HTTP POST to the logging endpoint. In the corresponding response, the HTTP body contains the response JSON. A successful request is indicated by a 200 HTTP status code. -The request object, identified as $log_setting_request must be provided in the HTTP +The request object, identified as `$log_setting_request` must be provided in the HTTP body. ``` @@ -118,7 +117,7 @@ $log_setting_request = } ``` -When a $log_setting JSON is received (defined above), only the specified +When a `$log_setting` JSON is received (defined above), only the specified settings will be updated. ### Example Usage @@ -127,7 +126,7 @@ a Triton server is running at `localhost:8000`): ``` curl -s -w '\n%{http_code}\n' -d '{"log_verbose_level":1}' -X POST localhost:8000/v2/logging ``` -This command should return a $log_setting_response JSON object with the following format: +This command should return a `$log_setting_response` JSON object with the following format: ``` {"log_file":"","log_info":true,"log_warnings":true,"log_errors":true,"log_verbose_level":1,"log_format":"default"} 200 diff --git a/docs/protocol/extension_model_configuration.md b/docs/protocol/extension_model_configuration.md index 6e995cf77c..07ecc63e94 100644 --- a/docs/protocol/extension_model_configuration.md +++ b/docs/protocol/extension_model_configuration.md @@ -35,8 +35,8 @@ information. Because this extension is supported, Triton reports ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. #optional indicates an optional JSON field. Triton exposes the model configuation endpoint at the following @@ -51,7 +51,7 @@ GET v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/config A model configuration request is made with an HTTP GET to the model configuration endpoint.A successful model configuration request is indicated by a 200 HTTP status code. The model configuration response -object, identified as $model_configuration_response, is returned in +object, identified as `$model_configuration_response`, is returned in the HTTP body for every successful request. ``` @@ -67,7 +67,7 @@ model_config.proto](https://github.com/triton-inference-server/common/blob/main/ A failed model configuration request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$model_configuration_error_response object. +`$model_configuration_error_response` object. ``` $model_configuration_error_response = diff --git a/docs/protocol/extension_model_repository.md b/docs/protocol/extension_model_repository.md index 90d3158ea5..44d271d2dc 100644 --- a/docs/protocol/extension_model_repository.md +++ b/docs/protocol/extension_model_repository.md @@ -41,8 +41,8 @@ Server Metadata. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. `#optional` indicates an optional JSON field. The model-repository extension requires Index, Load and Unload @@ -65,7 +65,7 @@ loaded by the Load API. A model-repository index request is made with an HTTP POST to the index endpoint. In the corresponding response the HTTP body contains the JSON response. -The index request object, identified as $repository_index_request, is +The index request object, identified as `$repository_index_request`, is required in the HTTP body of the POST request. ``` @@ -78,7 +78,7 @@ $repository_index_request = - "ready" : Optional, default is false. If true return only models ready for inferencing. A successful index request is indicated by a 200 HTTP status code. The -response object, identified as $repository_index_response, is returned +response object, identified as `$repository_index_response`, is returned in the HTTP body for every successful request. ``` @@ -101,7 +101,7 @@ $repository_index_response = A failed index request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$repository_index_error_response object. +`$repository_index_error_response` object. ``` $repository_index_error_response = @@ -117,7 +117,7 @@ $repository_index_error_response = The load API requests that a model be loaded into Triton, or reloaded if the model is already loaded. A load request is made with an HTTP POST to a load endpoint. The HTTP body may be empty or may contain -the load request object, identified as $repository_load_request. +the load request object, identified as `$repository_load_request`. A successful load request is indicated by a 200 HTTP status. @@ -153,7 +153,7 @@ override model directory. A failed load request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$repository_load_error_response object. +`$repository_load_error_response` object. ``` $repository_load_error_response = @@ -200,7 +200,7 @@ Host: localhost:8000 The unload API requests that a model be unloaded from Triton. An unload request is made with an HTTP POST to an unload endpoint. The HTTP body may be empty or may contain the unload request object, -identified as $repository_unload_request. A successful unload request +identified as `$repository_unload_request`. A successful unload request is indicated by a 200 HTTP status. ``` @@ -224,7 +224,7 @@ The unload API accepts the following parameters: A failed unload request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$repository_unload_error_response object. +`$repository_unload_error_response` object. ``` $repository_unload_error_response = diff --git a/docs/protocol/extension_shared_memory.md b/docs/protocol/extension_shared_memory.md index c01938ca61..984dd46b6f 100644 --- a/docs/protocol/extension_shared_memory.md +++ b/docs/protocol/extension_shared_memory.md @@ -57,20 +57,20 @@ given Triton will return an error. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. #optional indicates an optional JSON field. -The shared-memory parameters may be used in the $request_input +The shared-memory parameters may be used in the `$request_input` parameters to indicate that the corresponding input is being communicated via shared memory. The parameters may be used in the -$request_output parameters to indicate that the requested output +`$request_output` parameters to indicate that the requested output should be communicated via shared memory. When these parameters are set for an input tensor the “data” field of -$request_input must not be set. If the “data” field is set Triton will +`$request_input` must not be set. If the “data” field is set Triton will return an error. When these parameters are set for a requested output -tensor the returned $response_output must not define the “data” field. +tensor the returned `$response_output` must not define the “data” field. Shared memory regions must be created by the client and then registered with Triton before they can be referenced with a @@ -105,7 +105,7 @@ registered regions. A successful status request is indicated by a 200 HTTP status code. The response object, identified as -$system_shared_memory_status_response, is returned in the HTTP body +`$system_shared_memory_status_response`, is returned in the HTTP body for every successful request. ``` @@ -133,7 +133,7 @@ $system_shared_memory_status_response = A failed status request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$system_shared_memory_status_error_response object. +`$system_shared_memory_status_error_response` object. ``` $system_shared_memory_status_error_response = @@ -152,7 +152,7 @@ contains the response JSON. A successful register request is indicated by a 200 HTTP status code. The request object, identified as -$system_shared_memory_register_request must be provided in the HTTP +`$system_shared_memory_register_request` must be provided in the HTTP body. ``` @@ -174,7 +174,7 @@ $system_shared_memory_register_request = A failed register request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$system_shared_memory_register_error_response object. +`$system_shared_memory_register_error_response` object. ``` $system_shared_memory_register_error_response = @@ -196,7 +196,7 @@ are unregisered. A failed unregister request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$system_shared_memory_unregister_error_response object. +`$system_shared_memory_unregister_error_response` object. ``` $system_shared_memory_unregister_error_response = @@ -234,7 +234,7 @@ registered regions. A successful status request is indicated by a 200 HTTP status code. The response object, identified as -$cuda_shared_memory_status_response, is returned in the HTTP body +`$cuda_shared_memory_status_response`, is returned in the HTTP body for every successful request. ``` @@ -258,7 +258,7 @@ $cuda_shared_memory_status_response = A failed status request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$cuda_shared_memory_status_error_response object. +`$cuda_shared_memory_status_error_response` object. ``` $cuda_shared_memory_status_error_response = @@ -277,7 +277,7 @@ contains the response JSON. A successful register request is indicated by a 200 HTTP status code. The request object, identified as -$cuda_shared_memory_register_request must be provided in the HTTP +`$cuda_shared_memory_register_request` must be provided in the HTTP body. ``` @@ -298,7 +298,7 @@ $cuda_shared_memory_register_request = A failed register request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$cuda_shared_memory_register_error_response object. +`$cuda_shared_memory_register_error_response` object. ``` $cuda_shared_memory_register_error_response = @@ -321,7 +321,7 @@ are unregisered. A failed unregister request must be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$cuda_shared_memory_unregister_error_response object. +`$cuda_shared_memory_unregister_error_response` object. ``` $cuda_shared_memory_unregister_error_response = diff --git a/docs/protocol/extension_statistics.md b/docs/protocol/extension_statistics.md index a7e1ebcb05..6e11f3623e 100644 --- a/docs/protocol/extension_statistics.md +++ b/docs/protocol/extension_statistics.md @@ -37,8 +37,8 @@ its Server Metadata. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. #optional indicates an optional JSON field. Triton exposes the statistics endpoint at the following URL. The @@ -55,7 +55,7 @@ GET v2/models[/${MODEL_NAME}[/versions/${MODEL_VERSION}]]/stats ### Statistics Response JSON Object A successful statistics request is indicated by a 200 HTTP status -code. The response object, identified as $stats_model_response, is +code. The response object, identified as `$stats_model_response`, is returned in the HTTP body for every successful statistics request. ``` @@ -65,8 +65,8 @@ $stats_model_response = } ``` -Each $model_stat object gives the statistics for a specific model and -version. The $version field is optional for servers that do not +Each `$model_stat` object gives the statistics for a specific model and +version. The `$version` field is optional for servers that do not support versions. ``` @@ -201,7 +201,7 @@ $batch_stats = the given batch size. For example, this duration should include the time to copy output tensor data from the GPU. -The $duration_stat object reports a count and a total time. This +The `$duration_stat` object reports a count and a total time. This format can be sampled to determine not only long-running averages but also incremental averages between sample points. @@ -221,7 +221,7 @@ $duration_stat = A failed statistics request will be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$repository_statistics_error_response object. +`$repository_statistics_error_response` object. ``` $repository_statistics_error_response = diff --git a/docs/protocol/extension_trace.md b/docs/protocol/extension_trace.md index ff03bc7d3a..35905b6bef 100644 --- a/docs/protocol/extension_trace.md +++ b/docs/protocol/extension_trace.md @@ -35,8 +35,8 @@ its Server Metadata. ## HTTP/REST -In all JSON schemas shown in this document $number, $string, $boolean, -$object and $array refer to the fundamental JSON types. #optional +In all JSON schemas shown in this document `$number`, `$string`, `$boolean`, +`$object` and `$array` refer to the fundamental JSON types. `#optional` indicates an optional JSON field. Triton exposes the trace endpoint at the following URL. The client may use @@ -54,7 +54,7 @@ POST v2[/models/${MODEL_NAME}]/trace/setting ### Trace Setting Response JSON Object A successful trace setting request is indicated by a 200 HTTP status -code. The response object, identified as $trace_setting_response, is +code. The response object, identified as `$trace_setting_response`, is returned in the HTTP body for every successful trace setting request. ``` @@ -66,9 +66,9 @@ $trace_setting_response = $trace_setting = $string : $string | [ $string, ...] ``` -Each $trace_setting JSON describes a “name”/”value” pair, where the “name” is -the name of the trace setting and the “value” is a $string representation of the -setting value, or an array of $string for some settings. Currently the following +Each `$trace_setting` JSON describes a “name”/”value” pair, where the “name” is +the name of the trace setting and the “value” is a `$string representation` of the +setting value, or an array of `$string` for some settings. Currently the following trace settings are defined: - "trace_file" : the file where the trace output will be saved. If @@ -89,12 +89,12 @@ in "log_frequency", regardless of the "log_frequency" status. If the value is "-1", the number of traces to be sampled will not be limited. - "log_frequency" : the frequency that Triton will log the trace output to the files. If the value is "0", Triton will only log -the trace output to ${trace_file} when shutting down. Otherwise, Triton will log +the trace output to `${trace_file}` when shutting down. Otherwise, Triton will log the trace output to `${trace_file}.${idx}` when it collects the specified number of traces. For example, if the log frequency is "100", when Triton collects the 100-th trace, it logs the traces to file -"${trace_file}.0", and when it collects the 200-th trace, it logs the 101-th to -the 200-th traces to file "${trace_file}.1". Note that the file index will be +`"${trace_file}.0"`, and when it collects the 200-th trace, it logs the 101-th to +the 200-th traces to file `"${trace_file}.1"`. Note that the file index will be reset to 0 when "trace_file" setting is updated. @@ -102,7 +102,7 @@ reset to 0 when "trace_file" setting is updated. A failed trace setting request will be indicated by an HTTP error status (typically 400). The HTTP body must contain the -$trace_setting_error_response object. +`$trace_setting_error_response` object. ``` $trace_setting_error_response = @@ -119,7 +119,7 @@ A trace setting request is made with a HTTP POST to the trace endpoint. In the corresponding response the HTTP body contains the response JSON. A successful request is indicated by a 200 HTTP status code. -The request object, identified as $trace_setting_request must be provided in the HTTP +The request object, identified as `$trace_setting_request` must be provided in the HTTP body. ``` @@ -129,7 +129,7 @@ $trace_setting_request = } ``` -The $trace_setting JSON is defined in +The `$trace_setting` JSON is defined in [Trace Setting Response JSON Object](#trace-setting-response-json-object), only the specified settings will be updated. In addition to the values mentioned in response JSON object, JSON null value may be used to remove the specification of diff --git a/docs/user_guide/architecture.md b/docs/user_guide/architecture.md index 094eb8fe0c..90e84ab164 100644 --- a/docs/user_guide/architecture.md +++ b/docs/user_guide/architecture.md @@ -313,7 +313,7 @@ non-starting requests in the sequence, the input state is the output state of the previous request in the sequence. For an example ONNX model that uses implicit state you can refer to this onnx model generated from the `create_onnx_modelfile_wo_initial_state()` -[from this generation script](../../qa/common/gen_qa_implicit_models.py). +[from this generation script](https://github.com/triton-inference-server/server/blob/main/qa/common/gen_qa_implicit_models.py). This is a simple accumulator model that stores the partial sum of the requests in a sequence in Triton using implicit state. For state initialization, if the request is starting, the model sets the "OUTPUT\_STATE" to be equal to the diff --git a/docs/user_guide/decoupled_models.md b/docs/user_guide/decoupled_models.md index 855015d089..566d429102 100644 --- a/docs/user_guide/decoupled_models.md +++ b/docs/user_guide/decoupled_models.md @@ -87,10 +87,10 @@ exactly one response per request. Even standard ModelInfer RPC in the GRPC endpo does not support decoupled responses. In order to run inference on a decoupled model, the client must use the bi-directional streaming RPC. See [here](https://github.com/triton-inference-server/common/blob/main/protobuf/grpc_service.proto) -for more details. The [decoupled_test.py](../../qa/L0_decoupled/decoupled_test.py) demonstrates +for more details. The [decoupled_test.py](https://github.com/triton-inference-server/server/blob/main/qa/L0_decoupled/decoupled_test.py) demonstrates how the gRPC streaming can be used to infer decoupled models. If using [Triton's in-process C API](../customization_guide/inference_protocols.md#in-process-triton-server-api), your application should be cognizant that the callback function you registered with `TRITONSERVER_InferenceRequestSetResponseCallback` can be invoked any number of times, -each time with a new response. You can take a look at [grpc_server.cc](../../src/grpc_server.cc) +each time with a new response. You can take a look at [grpc_server.cc](https://github.com/triton-inference-server/server/blob/main/src/grpc_server.cc) diff --git a/docs/user_guide/jetson.md b/docs/user_guide/jetson.md index 0aee6487ff..cdd95e4b47 100644 --- a/docs/user_guide/jetson.md +++ b/docs/user_guide/jetson.md @@ -206,4 +206,4 @@ perf_analyzer -m graphdef_int32_int32_int32 --service-kind=triton_c_api \ --model-repository=/workspace/qa/L0_perf_analyzer_capi/models ``` -Refer to these [examples](../examples/jetson) that demonstrate how to use Triton Inference Server on Jetson. +Refer to these [examples](../examples/jetson/README.md) that demonstrate how to use Triton Inference Server on Jetson. diff --git a/docs/user_guide/performance_tuning.md b/docs/user_guide/performance_tuning.md index 35ac2a214d..729b6b0e18 100644 --- a/docs/user_guide/performance_tuning.md +++ b/docs/user_guide/performance_tuning.md @@ -277,7 +277,7 @@ cp ./results/densenet_onnx_config_3/config.pbtxt /mnt/models/densenet_onnx/ ``` Now that we have an optimized Model Configuration, we are ready to take our model to deployment. -For further manual tuning, read the [Model Configuration](model_configuration.md) and [Optimization](optimization.md#) docs to learn more about Triton's complete set of capabilities. +For further manual tuning, read the [Model Configuration](model_configuration.md) and [Optimization](optimization.md) docs to learn more about Triton's complete set of capabilities. In this example, we happened to get both the highest throughput and almost lowest latency from the same configuration, but in some cases this is a tradeoff that must be made. Certain models or configurations may achieve a higher throughput but also incur a higher latency in return. diff --git a/docs/user_guide/trace.md b/docs/user_guide/trace.md index 0637abf022..11432a0437 100644 --- a/docs/user_guide/trace.md +++ b/docs/user_guide/trace.md @@ -137,7 +137,7 @@ including its "name", "data" and "dtype". For example: ## Trace Summary Tool -An example [trace summary tool](../../qa/common/trace_summary.py) can be +An example [trace summary tool](https://github.com/triton-inference-server/server/blob/main/qa/common/trace_summary.py) can be used to summarize a set of traces collected from Triton. Basic usage is: