generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 179
Open
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
What would you like to be added:
Add WeightedRandomPicker as a new picker plugin alongside the existing max-score-picker to provide
intelligent load balancing through weighted random sampling.
Core Features:
- Weighted Random Sampling: Distributes traffic probabilistically based on pod scores instead of
always selecting the highest-scoring pod - Optional Score Normalization: Four normalization types (None, Square Root, Capping, Logarithmic)
for enhanced load distribution when needed - Flexible API: Variadic parameters supporting simple usage (NewWeightedRandomPicker(5)) to
advanced configuration (NewWeightedRandomPicker(5, NormalizationCapping, 2.5)) - YAML Configuration Support: Standard picker configuration format with optional normalization
parameters
Implementation:
- New weighted-random-picker type in the scheduler framework
- Progressive normalization options for extreme score variations
Why is this needed:
Critical Problem: Hot-spotting with max-score-picker
llm-d/llm-d-inference-scheduler#298
max-score-picker scheduler, which creates severe production issues in high-traffic environments:
Current Behavior (Problematic):
High-traffic scenario with max-score-picker:
Pod Scores: Pod A=100, Pod B=20, Pod C=10
Traffic Distribution:
Pod A: 100% traffic → Queue length: 25-30
Pod B: 0% traffic → Completely idle → Wasted resources
Pod C: 0% traffic → Completely idle → Wasted resources
Result: System bottleneck at single "best" pod
Business Impact:
- Performance Degradation: Queue buildup causes 3-5x latency increases
- Resource Waste: High-quality pods sit idle while one pod is overwhelmed
- Poor Scalability: Adding more pods doesn't help since traffic still goes to one pod
- Unreliable Service: Single point of failure at the highest-scoring pod
Solution Impact with WeightedRandomPicker:
Same scenario with weighted-random-picker:
Pod Scores: Pod A=100, Pod B=20, Pod C=10
Traffic Distribution:
Pod A: 77% traffic → Queue length: 15-20
Pod B: 15% traffic → Queue length: 2-3 → Actually utilized
Pod C: 8% traffic → Queue length: 1 → Actually utilized
Result: Better performance + resource utilization + system resilience
Why Weighted Random Sampling:
- Maintains Intelligence: Still prefers higher-scoring pods (77% vs 15% vs 8%)
- Prevents Hot-spotting: No single pod gets 100% traffic
- Improves Throughput: Utilizes cluster capacity instead of creating idle resources
- Zero Overhead: Basic weighted sampling has same performance as max-score-picker
Metadata
Metadata
Assignees
Labels
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.