Suraj/update triton main #1

suraj-vathsa · 2023-12-15T18:53:38Z

No description provided.

…riton-inference-server#5696) * Modify timeout test in L0_sequence_batcher to use portable backend * Use identity backend that is built by default on Windows

…nce-server#5684)

…e-server#5716) * Use better value in timeout test L0_sequence_batcher * Format

…inference-server#5710)

…#5719) * Check TRT err msg more granularly * Clarify source of error messages * Consolidate tests for message parts

…rence-server#5727) * updating with pinned versions for python dependencies * updated with pinned sphinx and nbclient versions

…ence-server#5729) * Add testing for batcher init failure, add wait for status check * Formatting * Change search string

Add fastertransformer test that uses 1GPU.

* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson

* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage

* Add HTTP client plugin test * Add testing for HTTP asyncio * Add async plugin support * Fix qa container for L0_grpc * Add testing for grpc client plugin * Remove unused imports * Fix up * Fix L0_grpc models QA folder * Update the test based on review feedback * Remove unused import * Add testing for .plugin method

* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address

…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports

…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors

…erver#5723)

…ence-server#5753)

…ce-server#5764)

* Add model instance name update test * Add gap for timestamp to update * Add some tests with dynamic batching * Extend supported test on rate limit off * Continue test if off mode failed

(1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage

* Add note on --metrics-address * Copyright

…riton ..." (triton-inference-server#5658) UnboundLocalError: local variable 'meta_dict' referenced before assignment The above error shows in listing models in Triton model repository

* Adding test for new sequence mode * Update option name * Clean up testing spacing and new lines

…RL (triton-inference-server#5686) * MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Update the function order of config.py and use os.path.join to replace filtering a list of strings then joining Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Update onnx flavor to support s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Fix two typos in MLFlow Triton plugin README.md Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments (replace => strip) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments (init regex only for s3) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Remove unused local variable: slash_locations Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> --------- Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

…triton-inference-server#6620) * Extend request objects lifetime * Remove explicit TRITONSERVER_InferenceRequestDelete * Format fix * Include the inference_request_ initialization to cover RequestNew --------- Co-authored-by: Neelay Shah <neelays@nvidia.com>

…ver#6638) This fixes the issue where python client has `AttributeError: 'NoneType' object has no attribute 'enum_types_by_name' errors after python version is updated.

* Update README and versions for 2.40.0 / 23.11 (triton-inference-server#6544) * Removing path construction to use SymLink alternatives * Update version for PyTorch * Update windows Dockerfile configuration * Update triton version to 23.11 * Update README and versions for 2.40.0 / 23.11 * Fix typo * Ading 'ldconfig' to configure dynamic linking in container (triton-inference-server#6602) * Point to tekit_backend (triton-inference-server#6616) * Point to tekit_backend * Update version * Revert tekit changes (triton-inference-server#6640) --------- Co-authored-by: Kris Hung <krish@nvidia.com>

* New testing to confirm large request timeout values can be passed and retrieved within Python BLS models.

…rence-server#6663) * Add test for optional internal tensor within an ensemble * Fix up

* Set CMake version to 3.27.7 * Set CMake version to 3.27.7 * Fix double slash typo

…e-server#6673)

…server#6691)

* Mlflow plugin fix

…r#6678)

* Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <kmcgill@kmcgill-ubuntu.nvidia.com>

…er#6705) * adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <kmcgill@kmcgill-ubuntu.nvidia.com>

…-server#6686) * Add test case for decoupled model raising exception * Remove unused import * Address comment

* vLLM Benchmarking Test

triton-inference-server#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com>

docs/conf.py

+    # },
+    "use_edit_page_button": False,
+    "use_issues_button": True,
+    "use_repository_button": True,


qa/L0_backend_python/decoupled/decoupled_test.py

+        # Test combinations of BLS and decoupled API in Python backend.
+        model_name = "decoupled_bls_stream"
+        in_values = [4, 2, 0, 1]
+        shape = [1]


deploy/mlflow-triton-plugin/mlflow_triton/config.py


+class Config(dict):


This reverts commit 7b98b8b.

oandreeva-nv and others added 30 commits April 27, 2023 10:52

Changed copyright (triton-inference-server#5705)

9eedbc8

Modify timeout test in L0_sequence_batcher to use portable backend (t…

ed26916

…riton-inference-server#5696) * Modify timeout test in L0_sequence_batcher to use portable backend * Use identity backend that is built by default on Windows

updated upstream container name (triton-inference-server#5713)

6e9f726

Fix triton container version (triton-inference-server#5714)

876dc0c

Update the L0_model_config test expected error message (triton-infere…

e6eda20

…nce-server#5684)

Use better value in timeout test L0_sequence_batcher (triton-inferenc…

b795c01

…e-server#5716) * Use better value in timeout test L0_sequence_batcher * Format

Update JAX install (triton-inference-server#5613)

10f12c6

Add notes about socket usage to L0_client_memory_growth test (triton-…

23172b2

…inference-server#5710)

Check TensorRT error message more granularly (triton-inference-server…

5b4bbe9

…#5719) * Check TRT err msg more granularly * Clarify source of error messages * Consolidate tests for message parts

Pin Python Package Versions for HTML Document Generation (triton-infe…

19a3686

…rence-server#5727) * updating with pinned versions for python dependencies * updated with pinned sphinx and nbclient versions

Test full error returned when custom batcher init fails (triton-infer…

e7ddaf1

…ence-server#5729) * Add testing for batcher init failure, add wait for status check * Formatting * Change search string

Add fastertransformer test (triton-inference-server#5500)

049ea02

Add fastertransformer test that uses 1GPU.

Fix L0_backend_python on Jetson (triton-inference-server#5728)

0a51f7e

* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson

Add testing for Python custom metrics API (triton-inference-server#5669)

734363f

* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage

Install jemalloc (triton-inference-server#5738)

c7df57a

Add --metrics-address and testing (triton-inference-server#5737)

1046b0f

* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address

Add testing for basic auth plugin for HTTP/gRPC clients (triton-infer…

151376e

…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports

Add multi-gpu, multi-stream testing for dlpack tensors (triton-infere…

be4493f

…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors

Update note on SageMaker MME support for ensemble (triton-inference-s…

1b12110

…erver#5723)

Run L0_backend_python subtests with virtual environment (triton-infer…

bd8f4a7

…ence-server#5753)

Update 'main' to track development of 2.35.0 / r23.06 (triton-inferen…

e343bfc

…ce-server#5764)

Include jemalloc into the documentation (triton-inference-server#5760)

c8b1b66

Enhance tests in L0_model_update (triton-inference-server#5724)

9786c75

* Add model instance name update test * Add gap for timestamp to update * Add some tests with dynamic batching * Extend supported test on rate limit off * Continue test if off mode failed

Fix L0_memory_growth (triton-inference-server#5795)

26553e1

(1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage

Add note on --metrics-address (triton-inference-server#5800)

96226c9

* Add note on --metrics-address * Copyright

Minor fix for running "mlflow deployments create -t triton --flavor t…

c7254d3

…riton ..." (triton-inference-server#5658) UnboundLocalError: local variable 'meta_dict' referenced before assignment The above error shows in listing models in Triton model repository

Adding test for new sequence mode (triton-inference-server#5771)

725655e

* Adding test for new sequence mode * Update option name * Clean up testing spacing and new lines

Fix client script (triton-inference-server#5806)

b99ddd2

oandreeva-nv and others added 24 commits November 21, 2023 11:17

Bumped vLLM version to v0.2.2 (triton-inference-server#6623)

9647526

Upgrade ORT version (triton-inference-server#6618)

18ee5ac

Use compliant preprocessor (triton-inference-server#6626)

92214f7

Update README.md (triton-inference-server#6627)

738996f

Update protobuf after python update for testing (triton-inference-ser…

b96ae5f

…ver#6638) This fixes the issue where python client has `AttributeError: 'NoneType' object has no attribute 'enum_types_by_name' errors after python version is updated.

PYBE Timeout Tests (triton-inference-server#6483)

b44ee7c

* New testing to confirm large request timeout values can be passed and retrieved within Python BLS models.

Add note on lack of ensemble support (triton-inference-server#6648)

4ac7f37

Added request id to span attributes (triton-inference-server#6667)

817428a

Add test for optional internal tensor within an ensemble (triton-infe…

8afdad2

…rence-server#6663) * Add test for optional internal tensor within an ensemble * Fix up

Set CMake version to 3.27.7 (triton-inference-server#6675)

a34770b

* Set CMake version to 3.27.7 * Set CMake version to 3.27.7 * Fix double slash typo

restore typo (triton-inference-server#6680)

cbe58e7

Update 'main' to track development of 2.42.0 / 24.01 (triton-inferenc…

f5717c6

…e-server#6673)

iGPU build refactor (triton-inference-server#6684) (triton-inference-…

e6c300d

…server#6691)

Mlflow Plugin Fix (triton-inference-server#6685)

f2cd999

* Mlflow plugin fix

Fix extra content-type headers in HTTP server (triton-inference-serve…

8165ca7

…r#6678)

adding default value for TRITON_IGPU_BUILD=OFF (triton-inference-serv…

9c56e19

…er#6705) * adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <kmcgill@kmcgill-ubuntu.nvidia.com>

Add test case for decoupled model raising exception (triton-inference…

d6bd668

…-server#6686) * Add test case for decoupled model raising exception * Remove unused import * Address comment

Escape special characters in general docs (triton-inference-server#6697)

13dd37e

vLLM Benchmarking Test (triton-inference-server#6631)

2df7b25

* vLLM Benchmarking Test

Merge remote-tracking branch 'upstream/main' into main

488de2e

github-advanced-security bot found potential problems Dec 15, 2023

View reviewed changes

nitish-verkada approved these changes Dec 15, 2023

View reviewed changes

suraj-vathsa merged commit 7b98b8b into main Dec 15, 2023
3 checks passed

babakbehzad added a commit that referenced this pull request Dec 19, 2023

Revert "Suraj/update triton main (#1)"

3d957ea

This reverts commit 7b98b8b.

babakbehzad mentioned this pull request Dec 19, 2023

Revert "Suraj/update triton main" #2

Merged

suraj-vathsa pushed a commit that referenced this pull request Dec 19, 2023

Revert "Suraj/update triton main (#1)" (#2)

dc4e2c5

This reverts commit 7b98b8b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suraj/update triton main #1

Suraj/update triton main #1

suraj-vathsa commented Dec 15, 2023


		class Config(dict):

Suraj/update triton main #1

Suraj/update triton main #1

Conversation

suraj-vathsa commented Dec 15, 2023