Add metadata scraper#27
Conversation
6276f9a to
af6b041
Compare
muvaf
left a comment
There was a problem hiding this comment.
Thanks a lot for this PR @ulucinar ! It's quite challenging to follow recursive traversal like most functions here, would you mind adding unit tests so that we don't break it accidentally and see immediately what we expect them to do?
There was a problem hiding this comment.
This seems to be intended as per-resource but in practice, we set it to the same for all resources. I wonder how it'd look if we had ScrapeConfiguration include existing fields as well as XPath configs and used as input only for the ScrapeRepo function call. This way registry.ProviderMetadata and registry.Resource would be free of scraping-methodology-specific details. What do you think?
There was a problem hiding this comment.
Removed scraper configuration fields from registry.Resource and registry.ProviderMetadata. Thanks @muvaf!
8ca4bc4 to
d5b8bb2
Compare
d5b8bb2 to
4f865fb
Compare
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…etadata} Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
4f865fb to
d479ba1
Compare
muvaf
left a comment
There was a problem hiding this comment.
Opened a couple issues for stuff that may require refactoring since we need to wrap up metadata work given our timelines and we can iterate on it later. Dropped a few comments that are non-blocking but LGTM! Thanks!
There was a problem hiding this comment.
Do we need this separate require block?
There was a problem hiding this comment.
Done. Looks like go mod tidy populated those dependencies in a separate block. Manually merged those two blocks for required dependencies and ran go mod tidy again.
There was a problem hiding this comment.
Seems like all scrape functions require the doc and an x-path string. Do you think there is a room for a common interface with Scrape(r *Resource, doc, xpath string) error function which would let us write unit tests focused on single units?
There was a problem hiding this comment.
I have not included unit tests for the scrape functions as they are not exported (only the registry.ProviderMetadata.ScrapeRepo and registry.ProviderMetadata.Store methods are currently exported). Let's consider this as a future enhancement.
There was a problem hiding this comment.
Done. Renamed as docStr.
There was a problem hiding this comment.
nitpick: does this deserve its own function given it's used only here?
There was a problem hiding this comment.
Done. Embedded the function.
There was a problem hiding this comment.
I still think that the metadata document should be as raw as it can be and we can do everything references-related in a single place, which is Upjet itself in this PR. But changing it won't have much effect on authoring experience at the moment even though I believe it'd have been simpler to reason about how Upjet works. Opened this issue to have a place to drop feedback about it.
There was a problem hiding this comment.
From our chat, I remember that having a separate dependencies array doesn't make much difference for the later stages so we could possibly remove it to reduce the complexity here. Opened this issue to track the discussion.
There was a problem hiding this comment.
In https://github.com/upbound/official-providers/pull/312 , I see that all entries have the same value for name, such as aws_acm_certificate_validation in provider-aws. Is that intentional?
There was a problem hiding this comment.
From the markdown files, we scrape the resource name first from the title of the documentation page, and then if any example manifests are available, use what's available in that HCL configuration's block label. I think in the past I have seen some cases where the title did not correctly reflect the resource name. But for most resources title name and the extracted resource name match. And when/if we generate per-resource metadata files, we can get rid of the duplicated map keys.
d479ba1 to
d1ddadf
Compare
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
d1ddadf to
3c03dec
Compare
|
Thanks @muvaf. |
Description of your changes
Related to https://github.com/upbound/official-providers/issues/19
Depends on #19
Adds a metadata scraper, which is invoked during provider code generation to generate a provider metadata file.
I have:
make reviewableto ensure this PR is ready for review.backport release-x.ylabels to auto-backport this PR if necessary.How has this code been tested
Enabled the example generation pipelines in this PR.