Skip to content

Conversation

@omer-vishlitzky
Copy link
Contributor

Summary

Adds GPU information to Agent CR status.inventory so users in kubeAPI mode can see GPU details for discovered hosts.

Changes:

  • Add HostGpu struct and Gpus field to HostInventory
  • Copy GPU data from internal inventory to Agent CR status
  • Add unit test

Jira: https://issues.redhat.com/browse/MGMT-22320

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 8, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 8, 2025

@omer-vishlitzky: This pull request references MGMT-22320 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

In response to this:

Summary

Adds GPU information to Agent CR status.inventory so users in kubeAPI mode can see GPU details for discovered hosts.

Changes:

  • Add HostGpu struct and Gpus field to HostInventory
  • Copy GPU data from internal inventory to Agent CR status
  • Add unit test

Jira: https://issues.redhat.com/browse/MGMT-22320

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Walkthrough

Adds GPU support to host inventory: defines a new public HostGpu type, adds Gpus []HostGpu to HostInventory, updates deepcopy generation, maps GPU data in the agent controller, updates tests, and extends CRD/openAPI schemas and manifests to include status.inventory.gpus.

Changes

Cohort / File(s) Change Summary
Type definitions
api/v1beta1/agent_types.go
Adds new public HostGpu struct (Address, DeviceID, Name, Vendor, VendorID) and a Gpus []HostGpu field to HostInventory (JSON: gpus).
Generated deepcopy methods
api/v1beta1/zz_generated.deepcopy.go
Adds DeepCopyInto and DeepCopy for HostGpu. Updates HostInventory.DeepCopyInto to allocate and copy the Gpus slice when non-nil.
Controller logic
internal/controller/controllers/agent_controller.go
updateInventoryAndLabels updated to copy GPU entries from incoming inventory into agent.Status.Inventory.Gpus, mapping all GPU fields.
Tests
internal/controller/controllers/agent_controller_test.go
Test data now includes a GPU entry; assertions added to verify agent.Status.Inventory.Gpus is populated and fields match expected values.
CRD / OpenAPI schemas
config/crd/bases/agent-install.openshift.io_agents.yaml
config/crd/resources.yaml
deploy/olm-catalog/manifests/agent-install.openshift.io_agents.yaml
Adds status.inventory.gpus (array of objects) with string properties: address, deviceId, name, vendor, deviceId/vendorId (as per schema) to Agent and InfraEnv/OpenAPI schemas and manifests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Check GPU mapping logic in internal/controller/controllers/agent_controller.go for nil handling and field correspondence.
  • Verify deepcopy behavior in api/v1beta1/zz_generated.deepcopy.go correctly handles nil and non-nil Gpus slices (deep copy semantics).
  • Confirm CRD/openAPI schema additions in config/crd/... and deploy/... match the Go types and JSON tags.
  • Review updated tests in agent_controller_test.go for completeness and stability.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 6490554 and 59517b4.

⛔ Files ignored due to path filters (2)
  • vendor/github.com/openshift/assisted-service/api/v1beta1/agent_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/assisted-service/api/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (7)
  • api/v1beta1/agent_types.go (2 hunks)
  • api/v1beta1/zz_generated.deepcopy.go (2 hunks)
  • config/crd/bases/agent-install.openshift.io_agents.yaml (1 hunks)
  • config/crd/resources.yaml (1 hunks)
  • deploy/olm-catalog/manifests/agent-install.openshift.io_agents.yaml (1 hunks)
  • internal/controller/controllers/agent_controller.go (1 hunks)
  • internal/controller/controllers/agent_controller_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/controllers/agent_controller.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • api/v1beta1/zz_generated.deepcopy.go
  • deploy/olm-catalog/manifests/agent-install.openshift.io_agents.yaml
  • config/crd/resources.yaml
  • config/crd/bases/agent-install.openshift.io_agents.yaml
  • api/v1beta1/agent_types.go
  • internal/controller/controllers/agent_controller_test.go
🧬 Code graph analysis (1)
api/v1beta1/zz_generated.deepcopy.go (1)
api/v1beta1/agent_types.go (1)
  • HostGpu (168-179)
🔇 Additional comments (6)
config/crd/resources.yaml (1)

1166-1187: GPU schema addition is well-structured.

The gpus field is properly integrated into status.inventory with clear descriptions and practical examples for each property. Optional fields appropriately accommodate hardware discovery scenarios where some attributes may not be available.

config/crd/bases/agent-install.openshift.io_agents.yaml (1)

333-354: GPU schema in base CRD matches resources definition.

The gpus field schema is consistent with config/crd/resources.yaml and properly positioned within status.inventory.

deploy/olm-catalog/manifests/agent-install.openshift.io_agents.yaml (1)

343-364: GPU schema in OLM manifest is consistent across all CRD definitions.

The gpus field is properly replicated in the OLM-packaged manifest with identical schema to base and resources definitions, ensuring consistency across deployment methods.

Please verify that the agent controller (internal/controller/controllers/agent_controller.go) correctly populates all five GPU fields (address, deviceId, name, vendor, vendorId) when copying data from the internal inventory to the Agent CR status.inventory.gpus. Given that these CRDs will be published as the source of truth, we should confirm the mapping is complete and the controller doesn't inadvertently omit any properties.

internal/controller/controllers/agent_controller_test.go (1)

2277-2285: GPU inventory test coverage looks correct

The test inventory’s Gpus initialization and the subsequent assertions on agent.Status.Inventory.Gpus[0] fields give good end‑to‑end coverage that GPU data flows from backend inventory into the Agent status as intended. No issues from a correctness or maintainability standpoint.

Also applies to: 2331-2336

api/v1beta1/zz_generated.deepcopy.go (1)

589-602: Deep‑copy implementation for HostGpu and HostInventory.Gpus is appropriate

HostGpu only contains value‑type fields, so using *out = *in in DeepCopyInto and copy(*out, *in) for the Gpus []HostGpu slice in HostInventory.DeepCopyInto provides correct deep‑copy semantics and matches existing patterns in this generated file.

Also applies to: 694-698

api/v1beta1/agent_types.go (1)

168-179: GPU fields on HostInventory are well‑modeled

The new HostGpu type and Gpus []HostGpu field on HostInventory provide a clear, minimal GPU representation (address, IDs, name, vendor) and integrate cleanly with existing inventory fields and JSON conventions. This aligns with the intended API surface without introducing compatibility or maintainability concerns.

Also applies to: 193-193


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 8, 2025
@openshift-ci openshift-ci bot requested review from maorfr and yoavsc0302 December 8, 2025 11:12
@openshift-ci openshift-ci bot added the api-review Categorizes an issue or PR as actively needing an API review. label Dec 8, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/controller/controllers/agent_controller.go (1)

1541-1551: GPU inventory propagation mirrors existing patterns and looks correct

The new block preallocates the slice and copies all relevant GPU fields, consistent with how disks and interfaces are handled. Only optional improvement (for future robustness) would be to explicitly clear agent.Status.Inventory.Gpus when inventory.Gpus is nil, if you ever need to reflect GPUs being removed; current behavior matches the existing “leave last known data” pattern used elsewhere.

internal/controller/controllers/agent_controller_test.go (1)

2277-2285: GPU inventory propagation test looks solid

The test correctly seeds models.Inventory.Gpus and then asserts that all GPU fields are present and mapped as expected into agent.Status.Inventory.Gpus[0]. This gives good coverage for the new HostGpu wiring.

If you later expand behavior (e.g., multiple GPUs, optional fields), consider adding a second GPU to the inventory and asserting slice length to guard against partial copies, but this is not required for the current change.

Also applies to: 2331-2336

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between a5bca97 and 6490554.

⛔ Files ignored due to path filters (2)
  • vendor/github.com/openshift/assisted-service/api/v1beta1/agent_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/assisted-service/api/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (4)
  • api/v1beta1/agent_types.go (2 hunks)
  • api/v1beta1/zz_generated.deepcopy.go (2 hunks)
  • internal/controller/controllers/agent_controller.go (1 hunks)
  • internal/controller/controllers/agent_controller_test.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • api/v1beta1/agent_types.go
  • internal/controller/controllers/agent_controller.go
  • internal/controller/controllers/agent_controller_test.go
  • api/v1beta1/zz_generated.deepcopy.go
🧬 Code graph analysis (2)
internal/controller/controllers/agent_controller.go (1)
api/v1beta1/agent_types.go (1)
  • HostGpu (168-179)
api/v1beta1/zz_generated.deepcopy.go (1)
api/v1beta1/agent_types.go (1)
  • HostGpu (168-179)
🔇 Additional comments (4)
api/v1beta1/zz_generated.deepcopy.go (2)

589-602: HostGpu DeepCopy helpers are consistent with other Host types*

Implementation mirrors other scalar-only structs (e.g., HostBoot, HostIOPerf); shallow copy is sufficient and nil-safe via DeepCopy.


694-698: HostInventory deepcopy correctly adds GPU slice handling

The new Gpus slice handling uses the same make + copy pattern as other value-only slices, so GPU data will be faithfully preserved on deep copies.

api/v1beta1/agent_types.go (2)

168-179: HostGpu struct definition is clear and appropriately scoped

Fields and JSON tags line up with the inventory GPU attributes and keep everything optional; no functional or compatibility concerns here.


181-194: Extending HostInventory with Gpus is a safe, backwards-compatible addition

Adding Gpus []HostGpu to HostInventory cleanly exposes GPU data on the Agent status without impacting existing fields or wire formats.

@maorfr
Copy link
Member

maorfr commented Dec 8, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2025
@maorfr
Copy link
Member

maorfr commented Dec 8, 2025

/approve

@openshift-ci
Copy link

openshift-ci bot commented Dec 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maorfr, omer-vishlitzky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 8, 2025
@maorfr
Copy link
Member

maorfr commented Dec 8, 2025

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 8, 2025
@codecov
Copy link

codecov bot commented Dec 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.50%. Comparing base (a5bca97) to head (59517b4).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8495   +/-   ##
=======================================
  Coverage   43.50%   43.50%           
=======================================
  Files         411      411           
  Lines       71080    71089    +9     
=======================================
+ Hits        30920    30926    +6     
- Misses      37406    37408    +2     
- Partials     2754     2755    +1     
Files with missing lines Coverage Δ
...nternal/controller/controllers/agent_controller.go 76.59% <100.00%> (+0.15%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

GPU data is collected by the agent from hosts but was not exposed in
the Agent CR status inventory. This adds GPU information to the Agent
CR so users can see GPU details when using kubeAPI mode (MCE, ACM,
infrastructure-operator).

Changes:
- Add HostGpu struct to agent_types.go with fields: Address, DeviceID,
  Name, Vendor, VendorID
- Add Gpus field to HostInventory struct
- Update agent_controller to copy GPU data from host inventory to
  Agent CR status
- Add DeepCopy functions for HostGpu type
- Add unit test verifying GPU data is copied to Agent CR
@omer-vishlitzky omer-vishlitzky force-pushed the MGMT-22320-gpu-inventory branch from 6490554 to 59517b4 Compare December 8, 2025 12:42
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 8, 2025
}
}

return r.updateLabels(log, ctx, agent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the labels we update here get aggregated to a ManagedCluster CR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't integrate with ManagedCluster at all really.

@omer-vishlitzky
Copy link
Contributor Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented Dec 8, 2025

@omer-vishlitzky: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@carbonin
Copy link
Member

carbonin commented Dec 8, 2025

/lgtm

If we want labels we could always add them later. Otherwise just unhold this when you are done testing

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants