Skip to content

Add metadata scraper#27

Merged
ulucinar merged 3 commits into
crossplane:mainfrom
ulucinar:fix-providers-19
Jul 8, 2022
Merged

Add metadata scraper#27
ulucinar merged 3 commits into
crossplane:mainfrom
ulucinar:fix-providers-19

Conversation

@ulucinar

@ulucinar ulucinar commented Jun 29, 2022

Copy link
Copy Markdown
Collaborator

Description of your changes

Related to https://github.com/upbound/official-providers/issues/19
Depends on #19

Adds a metadata scraper, which is invoked during provider code generation to generate a provider metadata file.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Enabled the example generation pipelines in this PR.

@ulucinar ulucinar force-pushed the fix-providers-19 branch 3 times, most recently from 6276f9a to af6b041 Compare June 30, 2022 08:43

@muvaf muvaf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR @ulucinar ! It's quite challenging to follow recursive traversal like most functions here, would you mind adding unit tests so that we don't break it accidentally and see immediately what we expect them to do?

Comment thread pkg/registry/resource.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be intended as per-resource but in practice, we set it to the same for all resources. I wonder how it'd look if we had ScrapeConfiguration include existing fields as well as XPath configs and used as input only for the ScrapeRepo function call. This way registry.ProviderMetadata and registry.Resource would be free of scraping-methodology-specific details. What do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed scraper configuration fields from registry.Resource and registry.ProviderMetadata. Thanks @muvaf!

@ulucinar ulucinar force-pushed the fix-providers-19 branch 2 times, most recently from 8ca4bc4 to d5b8bb2 Compare July 3, 2022 22:55
@ulucinar ulucinar force-pushed the fix-providers-19 branch from d5b8bb2 to 4f865fb Compare July 3, 2022 23:17
ulucinar added 2 commits July 4, 2022 02:21
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…etadata}

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>

@muvaf muvaf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened a couple issues for stuff that may require refactoring since we need to wrap up metadata work given our timelines and we can iterate on it later. Dropped a few comments that are non-blocking but LGTM! Thanks!

Comment thread go.mod Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this separate require block?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Looks like go mod tidy populated those dependencies in a separate block. Manually merged those two blocks for required dependencies and ran go mod tidy again.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like all scrape functions require the doc and an x-path string. Do you think there is a room for a common interface with Scrape(r *Resource, doc, xpath string) error function which would let us write unit tests focused on single units?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not included unit tests for the scrape functions as they are not exported (only the registry.ProviderMetadata.ScrapeRepo and registry.ProviderMetadata.Store methods are currently exported). Let's consider this as a future enhancement.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: seems like a chance to return early here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc seems to be already used.

@ulucinar ulucinar Jul 8, 2022

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed as docStr.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: does this deserve its own function given it's used only here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Embedded the function.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that the metadata document should be as raw as it can be and we can do everything references-related in a single place, which is Upjet itself in this PR. But changing it won't have much effect on authoring experience at the moment even though I believe it'd have been simpler to reason about how Upjet works. Opened this issue to have a place to drop feedback about it.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our chat, I remember that having a separate dependencies array doesn't make much difference for the later stages so we could possibly remove it to reduce the complexity here. Opened this issue to track the discussion.

Comment thread pkg/registry/meta.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://github.com/upbound/official-providers/pull/312 , I see that all entries have the same value for name, such as aws_acm_certificate_validation in provider-aws. Is that intentional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the markdown files, we scrape the resource name first from the title of the documentation page, and then if any example manifests are available, use what's available in that HCL configuration's block label. I think in the past I have seen some cases where the title did not correctly reflect the resource name. But for most resources title name and the extracted resource name match. And when/if we generate per-resource metadata files, we can get rid of the duplicated map keys.

@ulucinar ulucinar force-pushed the fix-providers-19 branch from d479ba1 to d1ddadf Compare July 8, 2022 15:27
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
@ulucinar ulucinar force-pushed the fix-providers-19 branch from d1ddadf to 3c03dec Compare July 8, 2022 15:52
@ulucinar

ulucinar commented Jul 8, 2022

Copy link
Copy Markdown
Collaborator Author

Thanks @muvaf.

@ulucinar ulucinar merged commit 7d3d113 into crossplane:main Jul 8, 2022
@ulucinar ulucinar deleted the fix-providers-19 branch July 8, 2022 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants