feat: add ItemList structured data for comparison table pages#18477
feat: add ItemList structured data for comparison table pages#18477alexleventer wants to merge 2 commits intomasterfrom
Conversation
Emits JSON-LD ItemList schema on /comparisons/ pages by parsing the feature comparison Markdown table, improving search-engine visibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docs ReviewScope: Reviewed all changes in this PR — the new Overall this is a well-structured addition that follows the existing collector patterns (matches Issues1. Bug — Product/competitor label mismatch on 2 of 7 pages ( The title parsing assumes the order of names in the page title matches the order of columns in the table. This breaks on pages where the title order is reversed relative to the table columns. Affected pages:
For example, on the Terraform page, the The other 5 pages with Suggested fix: Extract column names from the table header row itself rather than parsing the page title. The header row (e.g., {{/* Extract column names from the table header row */}}
{{ $headerCells := split (index $allTableLines 0) "|" }}
{{ $productName := index $headerCells 2 | strings.TrimSpace }}
{{ $competitorName := index $headerCells 3 | strings.TrimSpace }}This requires restructuring slightly (moving the header extraction before the name check), but it eliminates the title/column mismatch entirely and is more robust against future title changes. 2. Minor — Trailing " IaC" in competitor name ( On the Terraform page (h1: "Terraform vs. Pulumi IaC"), the fallback title parser produces Looks good
Mention me (@claude) if you'd like additional reviews or want me to help implement the fix. |
|
Your site preview for commit 6f552f9 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-18477-6f552f98.s3-website.us-west-2.amazonaws.com |
Lighthouse Performance ReportCommit: 6f552f9 | Metric definitions
|
|
@claude - review again pls |
|
Claude finished @alexleventer's task in 2m 36s —— View job Docs Review
Re-reviewed the full diff ( Bug 1 (Critical, unchanged) — Product/competitor labels are swapped on 2 of 7 pages
The title-parsing logic assigns
The root cause: Suggested fix — extract column names from the table header itself, which is always in the correct order: {{/* Replace lines 7-18 with: */}}
{{ $allTableLines := findRE `(?m)^\|.+\|` $afterHeader }}
{{ if ge (len $allTableLines) 3 }}
{{ $headerCells := split (index $allTableLines 0) "|" }}
{{ $productName := index $headerCells 2 | strings.TrimSpace }}
{{ $competitorName := index $headerCells 3 | strings.TrimSpace }}This ties the label directly to the column position and is immune to h1 wording. Fix this → Bug 2 (Moderate, new) — Verbose competitor name on 2 more pages
For pages where the h1 reads "Pulumi vs. <Long Name>",
The data is attributed to the right product (columns aren't swapped), but the entity Issue 3 (Minor, new) —
|
CamSoper
left a comment
There was a problem hiding this comment.
Doing a human maintainer pass and landing on the same conclusions @claude did — including the column inversion, which I confirmed on the preview site. The Terraform page's JSON-LD currently reads:
"Terraform: Python, TypeScript, JavaScript, Go, C#, F#, Java, YAML | Pulumi IaC: HashiCorp Configuration Language (HCL)"
…which attributes Pulumi's languages to Terraform and HCL to Pulumi. Definitely want that fixed before this ships — structured data that's confidently wrong is worse than no structured data at all.
The header-row extraction approach @claude suggested resolves both the column swap and the verbose-name issue on aws-cdk / cdktf in one shot. Worth also picking up the hasPart wiring from WebPage to the ItemList while you're in there.
Ping me when it's ready for another look.
|
Claude finished @CamSoper's task in 2m 49s —— View job Fixes implemented
What changed
{{/* Extract column names from the table header row (always in correct column order) */}}
{{ $headerCells := split (index $allTableLines 0) "|" }}
{{ $productName := index $headerCells 2 | strings.TrimSpace }}
{{ $competitorName := index $headerCells 3 | strings.TrimSpace }}Column names now come from the actual table (
The two top-level guards (
{{ $comparisonTable = partial "schema/collectors/comparison-table-entity.html" . }}
{{ if and $comparisonTable (ne $comparisonTable (dict)) }}
{{ $webpage = merge $webpage (dict "hasPart" (dict "@id" "#comparison-table")) }}
{{ end }}The |
- Replace title-parsing heuristic in comparison-table-entity.html with
header-row extraction: split the first table row on "|" and read cells
2 and 3 for productName/competitorName. This fixes the label swap on
terraform (h1 "Terraform vs. Pulumi IaC", cols "Pulumi | Terraform")
and crossplane (h1 "Crossplane vs Pulumi", cols "Pulumi | Crossplane"),
and also produces shorter canonical names for aws-cdk ("AWS CDK") and
cdktf ("CDKTF") instead of the verbose h1 forms.
- Restructure graph-builder.html so the comparison table collector runs
before $webpage is appended to $graph, allowing a hasPart link to be
wired into the WebPage node when an ItemList entity is present.
Co-authored-by: Cam Soper <CamSoper@users.noreply.github.com>
CamSoper
left a comment
There was a problem hiding this comment.
Sorry, @alexleventer! It seems I accidentally triggered Claude and he just came along and fixed it. Feel free to revert that commit if you need to!
Summary
comparison-table-entity.html) that parses Markdown feature-comparison tables on/comparisons/pages and emits JSON-LDItemListstructured datagraph-builder.htmlso it's included in the page's schema graphTest plan
make buildsucceedsItemListJSON-LD block appears in the page source🤖 Generated with Claude Code