Skip to content

Conversation

@pratik-mahalle
Copy link
Contributor

@pratik-mahalle pratik-mahalle commented Jul 26, 2025

Add Metal3 Custom Resources Documentation

Summary

This PR adds comprehensive documentation for Metal3 custom resources in the CAPM3 (Cluster API Provider Metal3) user guide. The documentation covers three key custom resources essential for managing bare metal infrastructure in Kubernetes clusters.

Changes Made

New Documentation Files Added

  1. custom_resources.md - Overview and introduction to Metal3 custom resources
  2. metal3data.md - Documentation for the Metal3Data resource
  3. metal3datatemplate.md - Comprehensive guide for Metal3DataTemplate resource

Updated Files

  • SUMMARY.md - Added navigation links for the new documentation sections

Documentation Coverage

Metal3DataTemplate

  • Complete API reference with YAML examples
  • Metadata specifications (strings, object names, indexes, IP pool references)
  • Network data specifications following Nova network_data.json format
  • Template reference management for versioning
  • Usage guidelines and examples

Metal3Data

  • Resource lifecycle and management
  • Template rendering process
  • Integration with Metal3DataTemplate

Custom Resources Overview

  • Introduction to Metal3 custom resources
  • Relationship between different resources
  • Common use cases and patterns

Benefits

  • Provides comprehensive reference for Metal3 users
  • Includes practical examples for real-world scenarios
  • Improves discoverability through proper navigation structure

Related Issue:
Fixes #398

@metal3-io-bot
Copy link
Contributor

Hi @pratik-mahalle. Thanks for your PR.

I'm waiting for a metal3-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@metal3-io-bot metal3-io-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 26, 2025
@lentzi90
Copy link
Member

/ok-to-test

@metal3-io-bot metal3-io-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 30, 2025
Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I have gone through it and added comments below. Unfortunately it feels quite AI generated with redundant information, useless links to the front page and confident "best practices" out of nowhere. There is still a lot of good here, but you will need to work on it before we merge.
This was supposed to be a pretty straight forward task, just copy and paste from the links in the issue description and clean it up. Now it is both harder to review (because it is hard to compare with the old docs) and more work to clean it up because of all the extra stuff that I assume came with the AI.

Comment on lines 20 to 43
## Key Concepts

### Data Templates and Instances

CAPM3 uses a template-based approach for generating host-specific configuration
data:

1. **Metal3DataTemplate**: Contains templates for metadata and network
configuration that will be rendered for each host
2. **Metal3Data**: Represents the actual rendered data for a specific host,
created from a template

### Metadata and Network Data

- **Metadata**: Contains host-specific information like hostnames, labels, and
custom key-value pairs
- **Network Data**: Defines the network configuration including interfaces, IP
addresses, routes, and DNS settings

### Index Management

CAPM3 automatically manages indexes for hosts to ensure unique identification and
proper resource allocation. Each Metal3Data instance gets a unique index that is
used in naming and resource allocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels redundant to me (everything from key concepts). Some of this is already in the overview and the details are better covered on the separate pages.

resource
- [Metal3DataTemplate](metal3datatemplate.md) - Detailed documentation for the
Metal3DataTemplate resource
- [Cluster API Documentation](https://cluster-api.sigs.k8s.io/) - General
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave this out. It is not really relevant here, and we are so deep in the docs that any user getting here should already have come across CAPI.

Comment on lines 260 to 262
- [Metal3Machine](../introduction.md) - Machine management
- [BareMetalHost](../bmo/introduction.md) - Host provisioning
- [IP Address Manager](../ipam/introduction.md) - IP pool management
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete these. The Metal3Machine link just goes to the front page... BareMetalHost goes to BMO intro...

Comment on lines 193 to 255
## Error Handling

### Common Error Scenarios

1. **Template Not Found**: The referenced `Metal3DataTemplate` doesn't exist
2. **Index Conflicts**: Multiple controllers try to use the same index
3. **Secret Creation Failure**: Unable to create the required secrets
4. **Template Rendering Errors**: Invalid template configuration

### Error Status

When errors occur, the status fields are updated:

```yaml
status:
ready: false
error: true
errorMessage: "Failed to create metadata secret: template validation failed"
```

## Best Practices

1. **Monitor Status**: Check the `ready` and `error` status fields
2. **Handle Errors**: Implement proper error handling for failed data
generation
3. **Use Indexing**: Leverage the automatic indexing for consistent naming
4. **Template Validation**: Validate templates before deployment
5. **Secret Management**: Ensure proper RBAC for secret access

## Troubleshooting

### Common Issues

**Issue**: Metal3Data stuck in "not ready" state

- **Check**: Template configuration and validation
- **Solution**: Verify template syntax and referenced resources

**Issue**: Index conflicts

- **Check**: Existing Metal3Data objects and their indexes
- **Solution**: Clean up orphaned objects or adjust indexing strategy

**Issue**: Secret creation failures

- **Check**: RBAC permissions and namespace access
- **Solution**: Ensure proper permissions for secret creation

### Debugging Commands

```bash
# Check Metal3Data status
kubectl get metal3data -n <namespace>
# View Metal3Data details
kubectl describe metal3data <name> -n <namespace>
# Check generated secrets
kubectl get secrets -n <namespace> | grep <machine-name>
# View secret content
kubectl get secret <secret-name> -n <namespace> -o yaml
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this (from Error handling). This does not seem useful to me and it should anyway not be on this page. We have a separate troubleshooting page where it may fit. But even then, we should not include things like index conflicts... that is handled by the controllers and not something users should have to think about.

errorMessage: "<error-message>"
```
## Lifecycle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is missing an important thing:

If the Metal3DataTemplate object is updated, the generated secrets will not be updated, to allow for reprovisioning of the nodes in the exact same state as they were initially provisioned. Hence, to do an update, it is necessary to do a rolling upgrade of all nodes.

https://github.com/metal3-io/cluster-api-provider-metal3/blob/686d7e69531b21eb9370edaf0f72d017d3e037e9/docs/api.md?plain=1#L1001-L1004

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not addressed either?

Comment on lines 343 to 351
## Best Practices

1. **Use Template References**: Always set `templateReference` for better
template management
2. **Plan Indexing**: Consider your indexing strategy for consistent naming
3. **Validate Network Config**: Test network configurations before deployment
4. **Use IP Pools**: Leverage IP pools for dynamic address allocation
5. **Document Templates**: Keep templates well-documented for team
collaboration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this section. Especially the template reference is going to be removed and should not be used.

- [Metal3Data](metal3data.md) - The rendered data instances
- [IP Address Manager](https://github.com/metal3-io/ip-address-manager) - IP pool
management
- [Nova Network Data Format](https://docs.openstack.org/nova/latest/) - Network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the same link as earlier or remove it.

Comment on lines 260 to 278
## Template Reference Management

The `templateReference` field enables template versioning and updates:

- **Immutable Templates**: Data template parts are immutable since BareMetalHost
references the secrets
- **Update Process**: Updates require creating a new template and referencing it
in the Metal3MachineTemplate
- **Backward Compatibility**: Supports transition from old templates without
`templateReference` to new ones

### Template Linking

Metal3Data objects are linked to Metal3DataTemplate by:

1. Direct reference in the `template` field
2. Matching `templateReference` key
3. Template's `templateReference` matching the Metal3Data's template name
(backward compatibility)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing everything about template references as we are going to remove that field soon anyway.

- `fromAnnotation`: MAC from object annotation
- `fromHostInterface`: MAC from BareMetalHost interface

### Networks Configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could benefit from some sub-sections like Bond Modes and Ethernet Types above. For example the fields under networks and routes

@metal3-io-bot metal3-io-bot added the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2025
@tuminoid
Copy link
Member

tuminoid commented Aug 5, 2025

/retitle Add Metal3 Custom Resources Documentation
More descriptive title for the PR.

@metal3-io-bot metal3-io-bot changed the title add a section in metal3 docs Add Metal3 Custom Resources Documentation Aug 5, 2025
@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2025
@pratik-mahalle
Copy link
Contributor Author

Hey @lentzi90, I have updated and improve it. Please let me know if anything needs to revise or check

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it looks better!
I still have a few comments below. Could you also check the test failures?
We are using a markdown linter that requires a specific format for some things. For example it is complaining if code blocks or titles are not surrounded by blank lines.

Comment on lines 123 to 130
## Template Reference Management

The `templateReference` field enables linking to specific template versions:

- **Backward Compatibility**: If not set, the controller matches by template
name
- **Version Control**: When set, enables template versioning and updates
- **Transition Support**: Allows migration from old templates to new ones
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

kind: Metal3DataTemplate
name: worker-template
spec:
templateReference: worker-v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
templateReference: worker-v1

Comment on lines 52 to 60
## Template Update Behavior
**Important**: If the `Metal3DataTemplate` object is updated, the generated
secrets will not be updated automatically. This behavior is intentional to
allow for reprovisioning of the nodes in the exact same state as they were
initially provisioned.

To apply template updates to existing nodes, it is necessary to perform a
rolling upgrade of all nodes that reference the updated template.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this down and make it a subsection under "Lifecycle"?

Comment on lines 20 to 43
## Key Concepts

### Data Templates and Instances

CAPM3 uses a template-based approach for generating host-specific configuration
data:

1. **Metal3DataTemplate**: Contains templates for metadata and network
configuration that will be rendered for each host
2. **Metal3Data**: Represents the actual rendered data for a specific host,
created from a template

### Metadata and Network Data

- **Metadata**: Contains host-specific information like hostnames, labels, and
custom key-value pairs
- **Network Data**: Defines the network configuration including interfaces, IP
addresses, routes, and DNS settings

### Index Management

CAPM3 automatically manages indexes for hosts to ensure unique identification and
proper resource allocation. Each Metal3Data instance gets a unique index that is
used in naming and resource allocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Key Concepts
### Data Templates and Instances
CAPM3 uses a template-based approach for generating host-specific configuration
data:
1. **Metal3DataTemplate**: Contains templates for metadata and network
configuration that will be rendered for each host
2. **Metal3Data**: Represents the actual rendered data for a specific host,
created from a template
### Metadata and Network Data
- **Metadata**: Contains host-specific information like hostnames, labels, and
custom key-value pairs
- **Network Data**: Defines the network configuration including interfaces, IP
addresses, routes, and DNS settings
### Index Management
CAPM3 automatically manages indexes for hosts to ensure unique identification and
proper resource allocation. Each Metal3Data instance gets a unique index that is
used in naming and resource allocation.

@pratik-mahalle
Copy link
Contributor Author

Hey @lentzi90, I have updated the changes you suggest. Please have a look at it when you got time

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think this is good enough to merge now.
Could you squash the commits? Then I am happy to approve

@pratik-mahalle pratik-mahalle force-pushed the doc branch 2 times, most recently from 137982a to aba03ce Compare August 19, 2025 12:30
@pratik-mahalle
Copy link
Contributor Author

Ok I think this is good enough to merge now. Could you squash the commits? Then I am happy to approve

Hey @lentzi90 I have squash the commit, Please go ahead

@lentzi90
Copy link
Member

One more thing. Could you include some more details in the commit description? It would also be nice to update the PR description. It still includes things like best practices and what tests have been done (which is quite irrelevant since the automated tests run on the PR). Could you instead please link to the issue (#398) you are fixing?

@pratik-mahalle
Copy link
Contributor Author

pratik-mahalle commented Aug 20, 2025

Hey @lentzi90, please check. I have updated that

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/approve

kind: KubeadmControlPlane
name: test1
namespace: metal3
````
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is four ticks, breaks formatting.

The `Cluster` resource is **CAPI resource** and includes a reference to the control
plane via the `controlPlaneRef` field:

``` yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``` yaml
```yaml

No spaces

an infrastructure provider for cluster API (CAPI), it necessarily references
also other CAPI resources, however, this document focuses on metal3 resources.

For more details about CAPI resources and to get a big picture, refer to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more details about CAPI resources and to get a big picture, refer to
For more details about CAPI resources and to get the big picture, refer to

Comment on lines 38 to 42
1. Create a Metal3DataTemplate with your desired metadata and network
configuration
2. Reference the template in your Metal3MachineTemplate
3. CAPM3 automatically creates Metal3Data instances and renders the
configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks around CRD names like elsewhere in this PR for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any backticks?


### Advanced Usage

- Use IP pools for dynamic IP address allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific resource names mentioned here, preferably with links, would go long way for user friendly documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for putting in the work. Some feedback:

  1. Can you please fix all of the misspellings of metal3 to Metal3, also cluster API to Cluster API etc
  2. These new files besides data_sources.md are not linked from anywhere, how is user supposed to find them in user-guide?
  3. Let's not use backticks in titles. It is not enforced but it is not good practice
  4. Maybe leave out the design doc URL fix into separate PR as it requires different set of approvers than the rest and is not related to CRD documenation
  5. Some comments left here and there

With some polishing this will make nice addition to our user guide.

/cc @elfosardo @dtantsur
for some more eyes

@metal3-io-bot metal3-io-bot requested a review from dtantsur August 20, 2025 08:14
@pratik-mahalle
Copy link
Contributor Author

pratik-mahalle commented Aug 21, 2025

Hi, thanks for putting in the work. Some feedback:

1. Can you please fix all of the misspellings of `metal3` to `Metal3`, also `cluster API` to `Cluster API` etc

2. These new files besides data_sources.md are not linked from anywhere, how is user supposed to find them in user-guide?

3. Let's not use backticks in titles. It is not enforced but it is not good practice

4. Maybe leave out the design doc URL fix into separate PR as it requires different set of approvers than the rest and is not related to CRD documenation

5. Some comments left here and there

With some polishing this will make nice addition to our user guide.

/cc @elfosardo @dtantsur for some more eyes

Hey @tuminoid, need to clarify some thing.
As you mentioned about - These new files besides data_sources.md are not linked from anywhere, how is user supposed to find them in user-guide?
Where should i mentioned this?

And also -Maybe leave out the design doc URL fix into separate PR as it requires different set of approvers than the rest and is not related to CRD documenation
What should i do now for this?

@tuminoid
Copy link
Member

Hey @tuminoid, need to clarify some thing. As you mentioned about - These new files besides data_sources.md are not linked from anywhere, how is user supposed to find them in user-guide? Where should i mentioned this?

These new documentation should be linked somewhere in the main menu, or linked from related existing documentation, so the user's can reach them.

And also -Maybe leave out the design doc URL fix into separate PR as it requires different set of approvers than the rest and is not related to CRD documenation What should i do now for this?

Remove that part from this PR, and open a separate PR for the URL fix.

@metal3-io-bot metal3-io-bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 22, 2025
@pratik-mahalle
Copy link
Contributor Author

Hey @tuminoid, As you mentioned, I have updated things. Please let me know if there is any changes.

@tuminoid
Copy link
Member

/Cc @lentzi90
also to take another look

@metal3-io-bot metal3-io-bot requested a review from lentzi90 August 22, 2025 07:25
Copy link
Member

@tuminoid tuminoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some open comments left.

Also, squash the commits when done fixing them up.

Visualization of relationships between metal3 resources can be found for example
[here](https://github.com/metal3-io/cluster-api-provider-metal3/issues/1358).
Visualization of relationships between Metal3 resources can be found for example
[cluster provider Metal3](https://github.com/metal3-io/cluster-api-provider-metal3/issues/1358).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[cluster provider Metal3](https://github.com/metal3-io/cluster-api-provider-metal3/issues/1358).
in this [CAPM3 isssue](https://github.com/metal3-io/cluster-api-provider-metal3/issues/1358).

Comment on lines 38 to 42
1. Create a Metal3DataTemplate with your desired metadata and network
configuration
2. Reference the template in your Metal3MachineTemplate
3. CAPM3 automatically creates Metal3Data instances and renders the
configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any backticks?


### Advanced Usage

- Use IP pools for dynamic IP address allocation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still open.

errorMessage: "<error-message>"
```
## Lifecycle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not addressed either?

@pratik-mahalle
Copy link
Contributor Author

Hey @lentzi90 @tuminoid Any news on this??

@tuminoid
Copy link
Member

tuminoid commented Sep 29, 2025

There are some open comments left.

Also, squash the commits when done fixing them up.

I don't see updates on the PR after these comments, so they're still open on your side.

@lentzi90
Copy link
Member

I don't see much progress here. Are you still working on this?
There are now again multiple commits that should be squashed. Please address the comments!
/approve cancel

@metal3-io-bot metal3-io-bot added needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 24, 2025
@pratik-mahalle
Copy link
Contributor Author

Hey @lentzi90, sorry for the delay, I will update the pr and squash the commit

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kashifest for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot removed the needs-rebase Indicates that a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2025
@metal3-io-bot
Copy link
Contributor

@pratik-mahalle: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
spellcheck c08f7d3 link true /test spellcheck
markdownlint c08f7d3 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pratik-mahalle
Copy link
Contributor Author

pratik-mahalle commented Nov 24, 2025

Hey @lentzi90 As this pr is going big, I am creating the new pr for this one to understand the changes quickly. I would really appreciate your feedback on that
#598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document Metal3Data and Metal3DataTemplate

4 participants