Skip to content

Conversation

@win5923
Copy link
Member

@win5923 win5923 commented Jan 17, 2026

Why are these changes needed?

  • Add E2E test to verify history server can proxy API requests to live Ray clusters
  • Refactor test support code into support package

Live clusters test:

  • Verifies /clusters/ endpoint returns live clusters with sessionName=live
  • Tests /enter_cluster/{ns}/{name}/live endpoint sets cluster context correctly
  • Validates all implemented history server endpoints proxy successfully (status < 500)
image

Related issue number

Closes #4377

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@win5923 win5923 force-pushed the historyserver-live-cluster branch 6 times, most recently from 88d8a21 to 1b2f5c4 Compare January 17, 2026 10:02
@win5923 win5923 marked this pull request as ready for review January 17, 2026 10:05
@win5923 win5923 force-pushed the historyserver-live-cluster branch 2 times, most recently from 570ddd8 to 947900f Compare January 17, 2026 10:08
Comment on lines +55 to +89
sa := &corev1.ServiceAccount{
ObjectMeta: metav1.ObjectMeta{
Name: "historyserver",
Namespace: namespace.Name,
},
}
clusterRole := &rbacv1.ClusterRole{
ObjectMeta: metav1.ObjectMeta{
Name: "raycluster-reader",
},
Rules: []rbacv1.PolicyRule{
{
APIGroups: []string{"ray.io"},
Resources: []string{"rayclusters"},
Verbs: []string{"list", "get"},
},
},
}
clusterRoleBinding := &rbacv1.ClusterRoleBinding{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("historyserver-%s", namespace.Name),
},
Subjects: []rbacv1.Subject{
{
Kind: "ServiceAccount",
Name: "historyserver",
Namespace: namespace.Name, // Use the test namespace
},
},
RoleRef: rbacv1.RoleRef{
APIGroup: "rbac.authorization.k8s.io",
Kind: "ClusterRole",
Name: "raycluster-reader",
},
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The service_account.yaml file has a hardcoded namespace which doesn't work with dynamic test namespaces.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: historyserver
namespace: default
subjects:
- kind: ServiceAccount
name: historyserver
namespace: default
roleRef:
kind: ClusterRole
name: raycluster-reader

cursor[bot]

This comment was marked as outdated.

@win5923 win5923 force-pushed the historyserver-live-cluster branch 3 times, most recently from 3a355b6 to 16d4f08 Compare January 17, 2026 10:46
@win5923 win5923 marked this pull request as draft January 17, 2026 12:00
@win5923 win5923 force-pushed the historyserver-live-cluster branch from 16d4f08 to f0d2142 Compare January 17, 2026 12:09
@win5923 win5923 marked this pull request as ready for review January 17, 2026 12:09
@win5923 win5923 force-pushed the historyserver-live-cluster branch from f0d2142 to 18e8fb4 Compare January 17, 2026 12:10
cursor[bot]

This comment was marked as outdated.

@win5923 win5923 force-pushed the historyserver-live-cluster branch 4 times, most recently from e06809f to b1ec511 Compare January 17, 2026 13:05
@win5923 win5923 marked this pull request as draft January 17, 2026 13:15
Signed-off-by: win5923 <ken89@kimo.com>
@win5923 win5923 force-pushed the historyserver-live-cluster branch from b1ec511 to 9c1bd43 Compare January 17, 2026 13:17
@win5923 win5923 marked this pull request as ready for review January 17, 2026 13:17

.PHONY: localimage-historyserver
localimage-historyserver: dockerbuilder_instance
localimage-historyserver:
Copy link
Member Author

@win5923 win5923 Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: win5923 <ken89@kimo.com>
Comment on lines +35 to +44
// Excluded endpoints that are not yet implemented:
// - /events
// - /api/cluster_status
// - /api/grafana_health
// - /api/prometheus_health
// - /api/data/datasets/{job_id}
// - /api/jobs
// - /api/serve/applications
// - /api/v0/placement_groups
// - /api/v0/logs/file
Copy link
Member Author

@win5923 win5923 Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some APIs work for live clusters but are not implemented for dead clusters.
I have excluded the unimplemented ones for dead clusters for now.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: win5923 <ken89@kimo.com>
@win5923 win5923 force-pushed the historyserver-live-cluster branch from 958a8d3 to 5fcfc62 Compare January 21, 2026 18:06
cursor[bot]

This comment was marked as outdated.

@win5923 win5923 force-pushed the historyserver-live-cluster branch from 5fcfc62 to 1601f74 Compare January 21, 2026 18:31
@win5923 win5923 force-pushed the historyserver-live-cluster branch from 1601f74 to bcce969 Compare January 21, 2026 18:34
// - /api/v0/placement_groups
// - /api/v0/logs/file
var HistoryServerEndpoints = []string{
"/nodes?view=summary",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Endpoint /nodes should not return server error, got 500: {"result": false, "msg": "Unknown view None", "data": {}}
        Expected
            <int>: 500
        to be <
            <int>: 500

Just a workaround for live cluster here, need another way to support dead cluster /node endpoint.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: win5923 <ken89@kimo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][history server] E2E test for live clusters

1 participant