-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suraj/update triton main #1
Conversation
…riton-inference-server#5696) * Modify timeout test in L0_sequence_batcher to use portable backend * Use identity backend that is built by default on Windows
…e-server#5716) * Use better value in timeout test L0_sequence_batcher * Format
…#5719) * Check TRT err msg more granularly * Clarify source of error messages * Consolidate tests for message parts
…rence-server#5727) * updating with pinned versions for python dependencies * updated with pinned sphinx and nbclient versions
…ence-server#5729) * Add testing for batcher init failure, add wait for status check * Formatting * Change search string
Add fastertransformer test that uses 1GPU.
* Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson
* Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage
* Add HTTP client plugin test * Add testing for HTTP asyncio * Add async plugin support * Fix qa container for L0_grpc * Add testing for grpc client plugin * Remove unused imports * Fix up * Fix L0_grpc models QA folder * Update the test based on review feedback * Remove unused import * Add testing for .plugin method
* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address
…ence-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports
…nce-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors
* Add model instance name update test * Add gap for timestamp to update * Add some tests with dynamic batching * Extend supported test on rate limit off * Continue test if off mode failed
(1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage
* Add note on --metrics-address * Copyright
…riton ..." (triton-inference-server#5658) UnboundLocalError: local variable 'meta_dict' referenced before assignment The above error shows in listing models in Triton model repository
* Adding test for new sequence mode * Update option name * Clean up testing spacing and new lines
…RL (triton-inference-server#5686) * MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Update the function order of config.py and use os.path.join to replace filtering a list of strings then joining Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Update onnx flavor to support s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Fix two typos in MLFlow Triton plugin README.md Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments (replace => strip) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments (init regex only for s3) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Remove unused local variable: slash_locations Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> --------- Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
…triton-inference-server#6620) * Extend request objects lifetime * Remove explicit TRITONSERVER_InferenceRequestDelete * Format fix * Include the inference_request_ initialization to cover RequestNew --------- Co-authored-by: Neelay Shah <neelays@nvidia.com>
…ver#6638) This fixes the issue where python client has `AttributeError: 'NoneType' object has no attribute 'enum_types_by_name' errors after python version is updated.
* Update README and versions for 2.40.0 / 23.11 (triton-inference-server#6544) * Removing path construction to use SymLink alternatives * Update version for PyTorch * Update windows Dockerfile configuration * Update triton version to 23.11 * Update README and versions for 2.40.0 / 23.11 * Fix typo * Ading 'ldconfig' to configure dynamic linking in container (triton-inference-server#6602) * Point to tekit_backend (triton-inference-server#6616) * Point to tekit_backend * Update version * Revert tekit changes (triton-inference-server#6640) --------- Co-authored-by: Kris Hung <krish@nvidia.com>
* New testing to confirm large request timeout values can be passed and retrieved within Python BLS models.
…rence-server#6663) * Add test for optional internal tensor within an ensemble * Fix up
* Set CMake version to 3.27.7 * Set CMake version to 3.27.7 * Fix double slash typo
* Mlflow plugin fix
* Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <kmcgill@kmcgill-ubuntu.nvidia.com>
…er#6705) * adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <kmcgill@kmcgill-ubuntu.nvidia.com>
…-server#6686) * Add test case for decoupled model raising exception * Remove unused import * Address comment
* vLLM Benchmarking Test
triton-inference-server#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com>
# }, | ||
"use_edit_page_button": False, | ||
"use_issues_button": True, | ||
"use_repository_button": True, |
Check warning
Code scanning / CodeQL
Duplicate key in dict literal Warning documentation
overwritten
# Test combinations of BLS and decoupled API in Python backend. | ||
model_name = "decoupled_bls_stream" | ||
in_values = [4, 2, 0, 1] | ||
shape = [1] |
Check notice
Code scanning / CodeQL
Unused local variable Note
|
||
class Config(dict): |
Check warning
Code scanning / CodeQL
`__eq__` not overridden when adding attributes Warning
'__eq__'
s3_regex
This reverts commit 7b98b8b.
No description provided.