Skip to content

[Task] improve gray release logic #6071

Description

@Aias00

Description

An API gateway is a central entry point for all client requests, making it an ideal place to manage traffic and implement deployment strategies like grayscale (or canary) releases. The core idea is to route a small subset of users to a new version of a service while the majority continue to use the stable version. This minimizes the risk of introducing a bug to the entire user base.

Here’s how a gateway typically achieves this:

  1. Traffic Routing Based on Request Attributes
    The gateway inspects incoming requests and decides where to route them based on specific rules. Common attributes used for routing include:

HTTP Headers: This is the most common method. A specific header, like X-Canary: true or X-Version: v2, can be added to requests. The gateway reads this header and routes the request to the new (canary) version of the service. This is often used for internal testing or for allowing specific users to opt-in to new features.

User Identity / JWT Claims: If a user is authenticated, the gateway can inspect their JWT (JSON Web Token) or session information. You can roll out the new version to users with specific roles (e.g., role: "beta-tester") or to a list of specific user IDs.

IP Address / Geolocation: You can route traffic based on the source IP address. This is useful for rolling out a feature to a specific office, region, or country first.

Query Parameters: A URL query parameter, like ?use_canary=true, can also be used to direct traffic, which is useful for debugging or temporary access.

  1. Weighted Traffic Splitting
    For a gradual rollout, the gateway can be configured to split traffic by percentage. For example:

Phase 1: 99% of traffic goes to the stable version (v1), and 1% goes to the new canary version (v2).
Phase 2: If monitoring shows no issues, you increase the percentage to 10% for v2.
Phase 3: Continue increasing the percentage until 100% of the traffic is on v2.
Phase 4: Once v2 is stable, it becomes the new v1, and the old v1 is decommissioned.
This process is usually automated and tied to monitoring and alerting systems. If error rates spike for the canary version, the gateway can automatically roll back by routing 100% of traffic back to the stable version.

Example Workflow:
Deployment: Both service-v1 (stable) and service-v2 (canary) are running.
Gateway Configuration: A rule is set up in the API gateway.
Rule: "IF Header['X-Version'] == 'v2' THEN route to service-v2."
Default Rule: "ELSE, route to service-v1."
Alternatively, for weighted routing: "Send 5% of all traffic to service-v2 and 95% to service-v1."
Request Flow:
A regular user's request (no header) arrives at the gateway and is routed to service-v1.
A developer or QA tester sends a request with the header X-Version: v2. The gateway sees this and routes their request to service-v2.
Monitoring: The team closely monitors the performance (latency, error rate, resource usage) of service-v2.
Rollout/Rollback: Based on the monitoring data, the team either gradually increases the traffic to service-v2 or rolls back the change by updating the gateway rule to send all traffic back to service-v1.
This approach allows teams to test new features in a live production environment with real users, but in a controlled and low-risk way. Modern API gateways (like Kong, NGINX, Istio, Traefik, or cloud-native ones from AWS, Google Cloud, and Azure) have built-in support for these routing strategies.

Task List

  1. add metadata map to model upstream
  2. LoadBalancer support header parameters
  3. Feature flagging should be determined by request headers.

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions