-
Notifications
You must be signed in to change notification settings - Fork 554
Insights: ai-dynamo/dynamo
Overview
Could not load contribution data
Please try again later
136 Pull requests merged by 39 people
-
docs: Simplify sphinx build and table of contents on webpage (#2519)
#2703 merged
Aug 26, 2025 -
docs: Simplify sphinx build and table of contents on webpage
#2519 merged
Aug 25, 2025 -
feat: enable --dyn-reasoning-parser flag to set reasoning parser for …
#2700 merged
Aug 25, 2025 -
refactor: move uptime tracking from system_status_server(HTTP) to DRT level
#2587 merged
Aug 25, 2025 -
feat: python bindings for the entire KvPushRouter
#2658 merged
Aug 25, 2025 -
chore: vllm 0.10.1.1
#2691 merged
Aug 25, 2025 -
refactor: Switch ModelManager locks from
std::sync::Mutex
toparking_lot::Mutex
#2696 merged
Aug 25, 2025 -
feat: FT Python Context and Unit Tests
#2677 merged
Aug 25, 2025 -
feat: support HF_HOME/_ENDPOINT env for Hugging Face models
#2642 merged
Aug 25, 2025 -
fix: do not fail if backendFramework cannot be detected (#2655)
#2690 merged
Aug 25, 2025 -
feat: add prometheus to the runtime image for sglang (#2689)
#2692 merged
Aug 25, 2025 -
feat: add prometheus to the runtime image for sglang
#2689 merged
Aug 25, 2025 -
fix: fix env vars override (#2640)
#2688 merged
Aug 25, 2025 -
fix: increase shm default size and make it configurable (#2616)
#2687 merged
Aug 25, 2025 -
chore: vllm 0.10.1.1
#2641 merged
Aug 25, 2025 -
feat: add gpt oss reasoning parser through harmony
#2656 merged
Aug 25, 2025 -
feat: add initial batch of KVBM metrics on match, offload and onboard
#2673 merged
Aug 25, 2025 -
fix: correct planner test example after tokenizer fix
#2674 merged
Aug 25, 2025 -
fix: fix manual helm chart (#2648)
#2686 merged
Aug 25, 2025 -
DYN-952: Implement BLAKE3 content hashing and 32MiB size validation
#2685 merged
Aug 25, 2025 -
feat: add multimodal deployment example for llava based on vllm v1 #2628
#2672 merged
Aug 25, 2025 -
fix: prevent crash looping hello world (#2625)
#2671 merged
Aug 25, 2025 -
fix: pytest robustness and parsing error
#2676 merged
Aug 24, 2025 -
fix: increase shm default size and make it configurable
#2616 merged
Aug 24, 2025 -
fix: fix env vars override
#2640 merged
Aug 24, 2025 -
fix: do not fail if backendFramework cannot be detected
#2655 merged
Aug 24, 2025 -
fix: fix manual helm chart
#2648 merged
Aug 24, 2025 -
fix: Skip checksum tests in release mode since they're not computed
#2669 merged
Aug 23, 2025 -
fix: move metrics registration to service creation
#2664 merged
Aug 22, 2025 -
fix: hello world deployment liveness probe and match message responses with readme
#2634 merged
Aug 22, 2025 -
chore: Rust to 1.89 and edition 2024
#2659 merged
Aug 22, 2025 -
fix: missing tokenizer args in sla_planner.py
#2667 merged
Aug 22, 2025 -
fix: handle missing span_name in logging test assertions
#2665 merged
Aug 22, 2025 -
feat: [vLLM] implement cli args for tool and reasoning parsers
#2619 merged
Aug 22, 2025 -
feat: enable dynamo metrics on KVBM
#2626 merged
Aug 22, 2025 -
docs: Update supported model in readme for multimodal
#2651 merged
Aug 22, 2025 -
fix: Tests now pass with RUST_BACKTRACE set
#2647 merged
Aug 22, 2025 -
feat: add BaseLogitsProcessor core interface
#2613 merged
Aug 22, 2025 -
feat: Remove Duplicate Multimodal Nixl Connect Example
#2622 merged
Aug 22, 2025 -
docs: update trtllm know issue message (#2639)
#2643 merged
Aug 22, 2025 -
docs: update trtllm know issue message
#2639 merged
Aug 22, 2025 -
fix: Update pytest markers for deepep test.
#2561 merged
Aug 22, 2025 -
chore(llm): Rename protocols::Endpoint to EndpointId
#2615 merged
Aug 22, 2025 -
feat: add multimodal deployment example for llava based on vllm v1
#2628 merged
Aug 22, 2025 -
fix: 0.4.1 disable kvbm tests (CP #2611)
#2635 merged
Aug 22, 2025 -
fix: functional tests to use qwen (CP #2617)
#2618 merged
Aug 22, 2025 -
fix: set functional tests to use qwen
#2617 merged
Aug 22, 2025 -
fix: Update tensorrt_llm to 1.0.0rc6 (#2606)
#2630 merged
Aug 22, 2025 -
fix: prevent crash looping hello world
#2625 merged
Aug 22, 2025 -
fix: Update tensorrt_llm to 1.0.0rc6
#2606 merged
Aug 22, 2025 -
fix: --kv-event-config now respects command line
#2627 merged
Aug 21, 2025 -
feat: improve devcontainer setup and documentation
#2578 merged
Aug 21, 2025 -
fix: guard inflight_requests and request_duration from early returns.
#2576 merged
Aug 21, 2025 -
fix: Httpengine sync-enable-endpoint
#2591 merged
Aug 21, 2025 -
feat: improve dynamo_check.py messaging & instructions
#2453 merged
Aug 21, 2025 -
docs: fix doc for kvbm
#2621 merged
Aug 21, 2025 -
chore: Remove async-openai-macros (#2554)
#2609 merged
Aug 21, 2025 -
docs: add trtllm known issue for al2023 (#2604)
#2612 merged
Aug 21, 2025 -
docs: add trtllm known issue for al2023 (#2604)
#2614 merged
Aug 21, 2025 -
feat: use consistent small models across all deploy examples (#2573)
#2607 merged
Aug 21, 2025 -
docs: add trtllm known issue for al2023
#2604 merged
Aug 21, 2025 -
chore: remove circus dependency
#2602 merged
Aug 21, 2025 -
test: Temporary exclude kvbm tests from vllm, nightly, gpu1...
#2611 merged
Aug 21, 2025 -
feat: enable basic reasoning parsing of <think> </think> tokens
#2555 merged
Aug 21, 2025 -
fix: Remove metric_labels from multimodal codes.
#2608 merged
Aug 21, 2025 -
chore: Update UCX to 1.19.0 release (#2551)
#2610 merged
Aug 21, 2025 -
chore: Update UCX to 1.19.0 release
#2551 merged
Aug 21, 2025 -
chore: fix
skip-tokenizer-init
#2605 merged
Aug 21, 2025 -
chore: fix wideep h100
#2568 merged
Aug 21, 2025 -
feat: h100 wideep instructions fix
#2594 merged
Aug 21, 2025 -
fix:
skip-tokenizer-init
by default in sglang#2595 merged
Aug 21, 2025 -
feat: Add model label for vllm backend metrics
#2474 merged
Aug 21, 2025 -
fix: add prometheus to the runtime image (#2565)
#2603 merged
Aug 21, 2025 -
fix: limit Support for HTTP Body limit in axum server
#2581 merged
Aug 21, 2025 -
fix: ensure nats fails fast with jetstream failure
#2590 merged
Aug 21, 2025 -
fix: turn off kvbm for al2023 support
#2533 merged
Aug 21, 2025 -
feat: register Kv router instance into etcd
#2548 merged
Aug 21, 2025 -
chore: Remove Clone / Sync from DeltaGenerator
#2598 merged
Aug 21, 2025 -
fix: removed legacy kustomization reference
#2043 merged
Aug 21, 2025 -
refactor: share common drt test functions
#2583 merged
Aug 21, 2025 -
fix: profiling script missing tests when kv cache is tight
#2567 merged
Aug 21, 2025 -
chore: add planner code owners
#2570 merged
Aug 21, 2025 -
docs: change sglang hicache example to use hicache-ratio
#2588 merged
Aug 21, 2025 -
fix: add prometheus to the runtime image
#2565 merged
Aug 21, 2025 -
feat: use consistent small models across all deploy examples
#2573 merged
Aug 21, 2025 -
docs: change sglang hicache example to use hicache-ratio
#2582 merged
Aug 21, 2025 -
docs: Add pinned install for cuda-python in pip install path (#2553)
#2569 merged
Aug 20, 2025 -
feat: allow user to adjust kv_transfer_config
#2517 merged
Aug 20, 2025 -
fix: Increase timeout for deepep test (#2572)
#2575 merged
Aug 20, 2025 -
feat(sglang): allow for multi worker spin up in slurm scripts
#2328 merged
Aug 20, 2025 -
fix: Increase timeout for deepep test
#2572 merged
Aug 20, 2025 -
docs: Add health check section to GPT OSS guide (#2556)
#2571 merged
Aug 20, 2025 -
fix(hub): Download faster from Hugging Face
#2566 merged
Aug 20, 2025 -
feat: Added CudaEvent interface
#2536 merged
Aug 20, 2025 -
test: add dryrun mode for sla planner
#2557 merged
Aug 20, 2025 -
feat: cuda prototype v2
#2563 merged
Aug 20, 2025 -
fix: resources in wrong location in vllm disag_router example (#2558)
#2560 merged
Aug 20, 2025 -
feat(request cancellation): pycontext, propagating the
is_stopped
into python land.#2158 merged
Aug 20, 2025 -
chore: Remove async-openai-macros
#2554 merged
Aug 20, 2025 -
fix: resources in wrong location in vllm disag_router example
#2558 merged
Aug 20, 2025 -
docs: Add pinned install for cuda-python in pip install path
#2553 merged
Aug 20, 2025 -
feat: upload/download rust structs directly through NATs object store
#2540 merged
Aug 20, 2025 -
docs: Add health check section to GPT OSS guide
#2556 merged
Aug 20, 2025 -
feat: add --tokenizer_path to profile_endpoint.py. close #2652
#2550 merged
Aug 20, 2025 -
feat: enhance devcontainer configuration and documentation
#2255 merged
Aug 20, 2025 -
test: lychee offline in pre-merge CI (full check only on push to main)
#2549 merged
Aug 20, 2025 -
feat: added parsers lib
#2542 merged
Aug 20, 2025 -
test: Add deepep test for vllm
#2534 merged
Aug 20, 2025 -
chore: Bumped Dynamo version to 0.4.1
#2545 merged
Aug 20, 2025 -
fix: Dockerfile.sglang - Fixed sglang and dynamo wheels installaiton in ru…
#2537 merged
Aug 20, 2025 -
chore: remove flatten for chat response types, add reasoning_content
#2543 merged
Aug 20, 2025 -
docs: update grove ref to new url
#2538 merged
Aug 19, 2025 -
fix: Fix KVBM Guide
#2539 merged
Aug 19, 2025 -
feat: Use health check and improve instructions for perf sweeps
#2423 merged
Aug 19, 2025 -
feat: Rename dynamo_component_concurrent_requests
#2515 merged
Aug 19, 2025 -
chore: Bring async-openai into repo as request starter
#2520 merged
Aug 19, 2025 -
chore: add hhzhang16 to py codeowner
#2529 merged
Aug 19, 2025 -
docs: Add note for LMCache ARM support
#2535 merged
Aug 19, 2025 -
fix: Correct the metric name and labels.
#2508 merged
Aug 19, 2025 -
chore: Finish vllm upgrade to 0.10.1 + cleanup
#2528 merged
Aug 19, 2025 -
feat: kvbm + connector
#2258 merged
Aug 19, 2025 -
fix: add --enable-kvbm doc to build.sh help
#2530 merged
Aug 19, 2025 -
fix: Implement scaling Grove subresources
#2531 merged
Aug 19, 2025 -
ci: run gitlab triggers on runner group
#2532 merged
Aug 19, 2025 -
feat: task scheduler
#2406 merged
Aug 19, 2025 -
docs: Consolidate documentation and fix redundant headings
#2518 merged
Aug 19, 2025 -
feat: add a knob to turn off correction factor in sla planner
#2511 merged
Aug 19, 2025 -
docs: update support matrix with links to ngc
#2524 merged
Aug 19, 2025 -
feat: add mistral and phi4 tool parser
#2510 merged
Aug 19, 2025 -
feat(frontend): support setting HTTP host via CLI (--http-host)
#2523 merged
Aug 19, 2025 -
feat: skip router when worker id is pre-determined
#2450 merged
Aug 19, 2025 -
fix: Use
/usr
dir prefix when freeing disk space#2514 merged
Aug 19, 2025 -
feat: router-level request rejection
#2465 merged
Aug 19, 2025
40 Pull requests opened by 30 people
-
test: add tests for replica calculation and planner scaling
#2525 opened
Aug 19, 2025 -
ci: add vllm job
#2526 opened
Aug 19, 2025 -
feat: add vllm aggregated multinode deployment example
#2541 opened
Aug 19, 2025 -
feat: DIS-373 dynamo KVBM connector API integration with TRTLLM
#2544 opened
Aug 19, 2025 -
Feat: Remove batch_token_ids and replace token_ids with nested Vec
#2546 opened
Aug 20, 2025 -
feat: Prevent double-tokenization when EPP picks worker
#2559 opened
Aug 20, 2025 -
fix: revisit grove and LWS selection
#2564 opened
Aug 20, 2025 -
feat: Integrate Model Express Client into Dynamo Model Downloads
#2574 opened
Aug 20, 2025 -
fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5
#2579 opened
Aug 20, 2025 -
ci: Use --release in pre-merge-rust builds to reduce disk space usage
#2585 opened
Aug 21, 2025 -
feat: delay python stream until yield
#2592 opened
Aug 21, 2025 -
fix: TRTLLM container -copy NVIDIA torch lib into uv venv location
#2597 opened
Aug 21, 2025 -
feat: cuda update
#2599 opened
Aug 21, 2025 -
feat: cuda updates
#2600 opened
Aug 21, 2025 -
feat: add benchmarking guide
#2620 opened
Aug 21, 2025 -
[WIP] slurm deployment
#2623 opened
Aug 21, 2025 -
docs: add mermaid graph to .devcontainer/README.md
#2632 opened
Aug 22, 2025 -
feat: Deployment for Dynamo EPP - aware gateway
#2633 opened
Aug 22, 2025 -
[DO NOT MERGE] deploy 1+ frontends using nginx
#2636 opened
Aug 22, 2025 -
feat: HF_ENDPOINT addition
#2637 opened
Aug 22, 2025 -
feat: KServe gRPC support
#2638 opened
Aug 22, 2025 -
fix: deploy readme changes based on customer feedback
#2645 opened
Aug 22, 2025 -
fix: fix #2653: links of h100_prefill_performance.png and h100_decode_performance.png
#2650 opened
Aug 22, 2025 -
Add '_set_dev_version.py' file
#2657 opened
Aug 22, 2025 -
Initial shot at migrating to aiperf
#2663 opened
Aug 22, 2025 -
feat: add model metric_labels to trtllm component
#2666 opened
Aug 22, 2025 -
feat: Sglang metrics labels.
#2679 opened
Aug 23, 2025 -
fix: block_manager/disk - fix `fallocate` failure on some file systems
#2680 opened
Aug 23, 2025 -
Add SLURM deployment guide with sbatch
#2681 opened
Aug 24, 2025 -
fix: removed hardcoded metadata name "vllm-agg" for profiling job
#2682 opened
Aug 24, 2025 -
fix: [trtllm] add wait_for_instance before register_llm
#2683 opened
Aug 25, 2025 -
feat: Set up Oscar Rust workspace with dynamo-runtime integration (DYN-951)
#2684 opened
Aug 25, 2025 -
feat: Add vllm multimodal qwen aggregated support
#2694 opened
Aug 25, 2025 -
feat: align OpenAI response IDs with distributed trace IDs
#2695 opened
Aug 25, 2025 -
feat: Shutdown DRT when vLLM engine fails
#2698 opened
Aug 25, 2025 -
feat: add reference setup for dynamo logging in k8s with loki
#2699 opened
Aug 25, 2025 -
fix: serve endpoint before model registration
#2701 opened
Aug 25, 2025 -
feat: add logits processor support for trtllm backend
#2702 opened
Aug 25, 2025 -
feat: add Prometheus metrics integration for KvStats
#2704 opened
Aug 26, 2025
28 Issues closed by 13 people
-
[FEATURE]: Allow passing in router_config_override for each request
#2697 closed
Aug 25, 2025 -
[FEATURE]: Make Python bindings for KvPushRouter
#2662 closed
Aug 25, 2025 -
[FEATURE]: Structured Output Support for all backends
#2220 closed
Aug 25, 2025 -
[FEATURE]: Pass user_data to register_llm for LoRA support
#2268 closed
Aug 25, 2025 -
[FEATURE]: Add deprecation warning for nvext
#2412 closed
Aug 25, 2025 -
[BUG]: OSError: libdynamo_llm_capi.so: cannot open shared object file: No such file or directory
#1500 closed
Aug 25, 2025 -
[FEATURE]: Implement automatic rate limiting when all KV router workers are busy
#2165 closed
Aug 25, 2025 -
[BUG]: Link doesn't work in the readme of example multimodal
#2162 closed
Aug 25, 2025 -
[BUG]: dynamo 0.2.1, "/examples/vllm_v1/ " Failed to generate completions
#1372 closed
Aug 25, 2025 -
[bug] intermittent logging::tests::test_json_log_capture failures
#2586 closed
Aug 22, 2025 -
[BUG]: Remove duplicate nixl_connect implementations
#2481 closed
Aug 22, 2025 -
[Question]: Why choose Rust as the primary development language?
#2589 closed
Aug 22, 2025 -
[FEATURE]: Allow `decode TP < prefill TP` in disaggregated Prefill/Decode
#2295 closed
Aug 22, 2025 -
[FEATURE]: Make HTTP server max body size limit configurable
#2584 closed
Aug 22, 2025 -
[BUG]: Large embedding or chat requests over 2MB are dropped.
#2580 closed
Aug 22, 2025 -
[bug]: missing KV Cache metrics
#2370 closed
Aug 21, 2025 -
[BUG]: Custom app=FastAPI() Configuration Not Taking Effect Due to Missing Assignment
#1855 closed
Aug 21, 2025 -
BUG: Subprocess logs with dynamo-run are confusing
#1568 closed
Aug 20, 2025 -
[FEATURE]: NIXL prepare/make xfer list rust bindings - missing API (moved to next PR)
#935 closed
Aug 20, 2025 -
Unable to build Dynamo Docker image due to inaccessible BASE_IMAGE in Dockerfile.tensorrt_llm
#460 closed
Aug 19, 2025 -
[BUG]: TypeError("unsupported operand type(s) for +=: 'float' and 'NoneType'")
#2139 closed
Aug 19, 2025 -
[BUG]: vLLM V1 Deepseek R1 Multinode Inference fails with NIXL Error
#1519 closed
Aug 19, 2025 -
[BUG]: UCX error when using dynamo vllm_v1 on IL1/GAIA
#1758 closed
Aug 19, 2025 -
[BUG]: Default password for nats
#1699 closed
Aug 19, 2025
13 Issues opened by 11 people
-
[BUG]: run large model on multiple node by vllm with nixl connector
#2706 opened
Aug 26, 2025 -
[BUG]: dynamo v0.4.0 Multi-node Disaggregated Serving not work
#2705 opened
Aug 26, 2025 -
[BUG]: links of h100_prefill_performance.png and h100_decode_performance.png are incorrect
#2653 opened
Aug 22, 2025 -
[Roadmap]: 0.4.1 - 0.5.0 roadmap and key dates
#2649 opened
Aug 22, 2025 -
[FEATURE]: Dynamo should support `HF_ENDPOINT` while connects to huggingface.
#2631 opened
Aug 22, 2025 -
[BUG]: TP size overwritten by engine config without warning
#2624 opened
Aug 21, 2025 -
[BUG]: reasoning parser not yet effective
#2596 opened
Aug 21, 2025 -
[BUG]: No error on some components when running NATS without `--jetstream`
#2577 opened
Aug 20, 2025 -
[BUG]: Dynamo+vLLM pd performance is slower than vllm
#2552 opened
Aug 20, 2025 -
[BUG]: Wrong NixTransfer in KVBM Device to Disk
#2547 opened
Aug 20, 2025 -
[Question]: When will tool function calling be supported?
#2522 opened
Aug 19, 2025 -
[Question]: vllm connector with dynamo
#2521 opened
Aug 19, 2025
62 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Add Encode Worker and NIXL support to trtllm multimodal flow
#2452 commented on
Aug 22, 2025 • 41 new comments -
feat: align OpenAI response IDs with distributed trace IDs
#2496 commented on
Aug 25, 2025 • 6 new comments -
feat: dynamo namespace isolation for frontend component
#2394 commented on
Aug 19, 2025 • 3 new comments -
feat: dynamo namespace isolation for backend component
#2475 commented on
Aug 19, 2025 • 2 new comments -
Add AWS ECS deployment example for Dynamo vLLM
#2415 commented on
Aug 19, 2025 • 2 new comments -
refactor: update devcontainer
#2025 commented on
Aug 21, 2025 • 0 new comments -
feat: Add multimodal support fields to NVIDIA extensions (nvext) inside request
#1995 commented on
Aug 21, 2025 • 0 new comments -
feat: add rate limiter logic to dynamo's openai api compatible http service (v1)
#1949 commented on
Aug 21, 2025 • 0 new comments -
Update worker.py for the TensorRT-LLM example to account for chat_template.jinja file
#1537 commented on
Aug 23, 2025 • 0 new comments -
refactor: create descriptor and entity separation
#1489 commented on
Aug 25, 2025 • 0 new comments -
fix: rename deploy related entities
#1226 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Inconsistency in Dynamo Namespace environment keys
#2477 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: Support for whisper-large-v3-turbo (ASR)
#2374 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: Request Migration when Decode Worker Failed/Shutdown
#2310 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: failed to join writer task: I/O error: Broken pipe (os error 32)
#1594 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: Revisit aligning `chatcmpl-<ID>` with other notions of request ID throughout
#2248 commented on
Aug 25, 2025 • 0 new comments -
[Question]: How to integrate pipeline parallel in PD disaggregation?
#2513 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: K8s Operator Won't Start Pods in DGD if resources/limits is not set
#2227 commented on
Aug 25, 2025 • 0 new comments -
ci: build runtime target in github vllm build
#2036 commented on
Aug 22, 2025 • 0 new comments -
refactor: entity descriptors
#2047 commented on
Aug 21, 2025 • 0 new comments -
feat: NVTX tracing for block manager
#2081 commented on
Aug 23, 2025 • 0 new comments -
feat: hello world deploy example
#2094 commented on
Aug 24, 2025 • 0 new comments -
chore: Add codeowners for tensorrtllm backend
#2229 commented on
Aug 22, 2025 • 0 new comments -
Adding determinism pytest
#2329 commented on
Aug 19, 2025 • 0 new comments -
test: HTTP Request Cancellation E2E Testing
#2350 commented on
Aug 25, 2025 • 0 new comments -
feat: Request Migration when Decode Worker Failed/Shutdown (sglang)
#2352 commented on
Aug 25, 2025 • 0 new comments -
Not processed inflight request handling
#2365 commented on
Aug 19, 2025 • 0 new comments -
feat: Pass CancelledError into Python asynchronous generator object upon stream cancellation
#2385 commented on
Aug 19, 2025 • 0 new comments -
ci: Improve the stability of CI tests
#2467 commented on
Aug 20, 2025 • 0 new comments -
fix: Standardize namespace environment variable to DYN_NAMESPACE
#2485 commented on
Aug 19, 2025 • 0 new comments -
feat: FT Request Cancellation feature and test for 0.5.0
#2500 commented on
Aug 25, 2025 • 0 new comments -
Error processing video request in multimodal example
#1946 commented on
Aug 19, 2025 • 0 new comments -
[SGLANG]: Backend Roadmap/Improvements
#2261 commented on
Aug 19, 2025 • 0 new comments -
[FEATURE]: request_template should be applied to completion endpoint as well
#2494 commented on
Aug 19, 2025 • 0 new comments -
[BUG]: Looking forward to see the example of deploying large model with vLLM
#2479 commented on
Aug 19, 2025 • 0 new comments -
[FEATURE]: Add a knob to include LLMMetricAnnotation in SSE stream
#1516 commented on
Aug 19, 2025 • 0 new comments -
[BUG]: No module named 'vllm.remote_prefill'
#1509 commented on
Aug 19, 2025 • 0 new comments -
[BUG]: Dynamo operator can't be built due to legacy kustomization referencing
#2042 commented on
Aug 19, 2025 • 0 new comments -
[BUG]: NIXL is not available
#2092 commented on
Aug 19, 2025 • 0 new comments -
How to run TRT engine with TRT-LLM backend?
#336 commented on
Aug 19, 2025 • 0 new comments -
[FEATURE]: Cancellation support with trtllm
#676 commented on
Aug 19, 2025 • 0 new comments -
[CLEANUP]: Remove `batch_token_ids` and replace `token_ids` with nested `Vec` for cleaner processing
#1648 commented on
Aug 20, 2025 • 0 new comments -
[BUG]: Failed uploading to bucket / object store gpt-oss-120b/tokenizer.json: failed publishing object chunks: failed getting chunk ack: timed out: didn't receive ack in time
#2443 commented on
Aug 20, 2025 • 0 new comments -
[FEATURE]: Migrate from Earthly to docker
#755 commented on
Aug 20, 2025 • 0 new comments -
[BUG]: Dynamo+vLMM: performance degradation in D/A mode
#2028 commented on
Aug 22, 2025 • 0 new comments -
[BUG]: libcublas_static.a is not installed in the base dev image
#1791 commented on
Aug 22, 2025 • 0 new comments -
[FEATURE]: Enable WideEP sweep on DSR1 SGLang with profiler
#1665 commented on
Aug 22, 2025 • 0 new comments -
[BUG]: For genai-perf benchmarking, use completions mode to replace chat mode, and use batch_size to replace concurrency.
#2013 commented on
Aug 23, 2025 • 0 new comments -
[Roadmap]: H2 timelines and key focus areas
#2486 commented on
Aug 24, 2025 • 0 new comments -
[BUG]: dynamo performace not scaling properly
#1495 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Dynamo currently requires GPU 0 to be available and cannot run without it
#1492 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Race condition in KV router cause stuck at fault
#1419 commented on
Aug 25, 2025 • 0 new comments -
[DOCS] Documentation/Steps to reproduce KV cache offloading
#2147 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: Resource selection
#2110 commented on
Aug 25, 2025 • 0 new comments -
It cannot be used in k8s
#2106 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: How to replace UCX with Mooncake for nixl when starting Disaggregated Serving?
#2093 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: dynamo operator lws worker not serving while occupy GPUs
#2214 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Frontend is not using DYN_NAMESPACE nor be aware of Dynamo Namespace
#2478 commented on
Aug 25, 2025 • 0 new comments -
[FEATURE]: VLLM PD supports skipping the prefill process when the cache hit rate is high
#2483 commented on
Aug 25, 2025 • 0 new comments -
[DOCS]: Bring back benchmarking guide
#2031 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Rust backend does not use HF_HOME
#2491 commented on
Aug 25, 2025 • 0 new comments -
[BUG]: Incorrect document in disagg-multinode.yaml with sglang
#2512 commented on
Aug 25, 2025 • 0 new comments