Skip to content

feat: add ItemList structured data for comparison table pages#18477

Open
alexleventer wants to merge 2 commits intomasterfrom
feat/comparison-table-schema
Open

feat: add ItemList structured data for comparison table pages#18477
alexleventer wants to merge 2 commits intomasterfrom
feat/comparison-table-schema

Conversation

@alexleventer
Copy link
Copy Markdown
Contributor

Summary

  • Adds a new schema collector (comparison-table-entity.html) that parses Markdown feature-comparison tables on /comparisons/ pages and emits JSON-LD ItemList structured data
  • Wires the collector into graph-builder.html so it's included in the page's schema graph

Test plan

  • Verify make build succeeds
  • Spot-check a comparison page (e.g., Pulumi vs Terraform) and confirm the ItemList JSON-LD block appears in the page source
  • Confirm non-comparison pages are unaffected

🤖 Generated with Claude Code

Emits JSON-LD ItemList schema on /comparisons/ pages by parsing the
feature comparison Markdown table, improving search-engine visibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Docs Review

Scope: Reviewed all changes in this PR — the new layouts/partials/schema/collectors/comparison-table-entity.html (90 lines added) and the integration into layouts/partials/schema/graph-builder.html (8 lines added).

Overall this is a well-structured addition that follows the existing collector patterns (matches video-entity.html integration style, proper Scratch cleanup, correct return pattern). The JSON-LD schema.org types (ItemList, ListItem) are valid. However, there is one significant semantic bug in the title-to-column mapping.


Issues

1. Bug — Product/competitor label mismatch on 2 of 7 pages (comparison-table-entity.html:10-18)

The title parsing assumes the order of names in the page title matches the order of columns in the table. This breaks on pages where the title order is reversed relative to the table columns.

Affected pages:

Page h1 (title) Table columns Result
terraform/_index.md "Terraform vs. Pulumi IaC" | Pulumi | Terraform | Labels swapped — Terraform label paired with Pulumi's values
crossplane.md "Crossplane vs Pulumi" | Pulumi | Crossplane | Labels swapped — Crossplane label paired with Pulumi's values

For example, on the Terraform page, the ListItem descriptions would read:
"Terraform: Python, TypeScript, JavaScript, Go, C#, F#, Java, YAML | Pulumi IaC: HashiCorp Configuration Language (HCL)"
...which is backwards.

The other 5 pages with | Feature | tables produce correct output because their title order matches column order.

Suggested fix: Extract column names from the table header row itself rather than parsing the page title. The header row (e.g., | Feature | Pulumi | Terraform |) already contains the product names in the correct column order. This would replace lines 7–18 with something like:

{{/* Extract column names from the table header row */}}
{{ $headerCells := split (index $allTableLines 0) "|" }}
{{ $productName := index $headerCells 2 | strings.TrimSpace }}
{{ $competitorName := index $headerCells 3 | strings.TrimSpace }}

This requires restructuring slightly (moving the header extraction before the name check), but it eliminates the title/column mismatch entirely and is more robust against future title changes.

2. Minor — Trailing " IaC" in competitor name (comparison-table-entity.html:14-17)

On the Terraform page (h1: "Terraform vs. Pulumi IaC"), the fallback title parser produces $competitorName = "Pulumi IaC". This flows into the entity name as "Terraform vs. Pulumi IaC Feature Comparison" and the description as "Side-by-side feature comparison of Terraform and Pulumi IaC for infrastructure as code." The trailing "IaC" is a bit awkward in structured data. This would also be resolved by extracting names from the table header.


Looks good

  • Graph-builder integration (graph-builder.html:174-180): Follows the exact same guard pattern as video-entity.html. Clean.
  • Table parsing logic: The after 2 to skip header/separator, the split ... "|" indexing, separator row detection regex, Markdown link stripping, and HTML tag removal all look correct.
  • Scratch usage and cleanup: Properly sets, accumulates, reads, and deletes both compRows and compItems.
  • URL scoping: The strings.Contains .RelPermalink "/comparisons/" guard correctly limits this to comparison pages only, and returning an empty dict for non-matches avoids polluting the graph on other pages.
  • Schema.org validity: ItemList with ListItem elements is valid structured data. The @id, position, name, and description fields are appropriate.

Mention me (@claude) if you'd like additional reviews or want me to help implement the fix.

@pulumi-bot
Copy link
Copy Markdown
Collaborator

@pulumi-bot
Copy link
Copy Markdown
Collaborator

Lighthouse Performance Report

Commit: 6f552f9 | Metric definitions

Page Device Score FCP LCP TBT CLS SI
Homepage Mobile 🟡 75 2.4s 2.4s 546ms 0.014 6.7s
Homepage Desktop 🟡 79 0.7s 0.9s 26ms 0.271 2.6s
Install Pulumi Mobile 🔴 39 4.5s 6.0s 236ms 0.431 7.6s
Install Pulumi Desktop 🟢 91 1.2s 1.4s 20ms 0.023 1.7s
AWS Get Started Mobile 🟡 62 4.4s 5.9s 229ms 0.085 4.4s
AWS Get Started Desktop 🟡 86 1.2s 1.5s 20ms 0.030 2.9s

@alexleventer
Copy link
Copy Markdown
Contributor Author

@claude - review again pls

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Claude finished @alexleventer's task in 2m 36s —— View job


Docs Review

  • Read changed files
  • Review comparison pages for column/title alignment
  • Assess schema correctness
  • Post findings

Re-reviewed the full diff (comparison-table-entity.html + graph-builder.html). The previous critical bug remains unfixed, and I found two additional issues.


Bug 1 (Critical, unchanged) — Product/competitor labels are swapped on 2 of 7 pages

comparison-table-entity.html:10-18

The title-parsing logic assigns $productName / $competitorName from the h1 text, but the table columns are always ordered Pulumi first, competitor second — regardless of how the h1 is worded. On two pages the h1 has the competitor name first, so the labels are assigned backwards:

Page h1 Table col 2 (val1) Table col 3 (val2) Parsed $productName Parsed $competitorName Effect
terraform/_index.md "Terraform vs. Pulumi IaC" Pulumi data Terraform data "Terraform" "Pulumi IaC" Description reads "Terraform: Python, TS … | Pulumi IaC: HCL" — backwards
crossplane.md "Crossplane vs Pulumi" Pulumi data Crossplane data "Crossplane" "Pulumi" Description reads "Crossplane: Python, TS … | Pulumi: Kubernetes CRDs" — backwards

The root cause: $productName and $competitorName are derived from the h1 word order, but $val1/$val2 are always fixed at cell index 2 and 3 in the table — the two sources are independent and can disagree.

Suggested fix — extract column names from the table header itself, which is always in the correct order:

{{/* Replace lines 7-18 with: */}}
{{ $allTableLines := findRE `(?m)^\|.+\|` $afterHeader }}

{{ if ge (len $allTableLines) 3 }}
  {{ $headerCells := split (index $allTableLines 0) "|" }}
  {{ $productName   := index $headerCells 2 | strings.TrimSpace }}
  {{ $competitorName := index $headerCells 3 | strings.TrimSpace }}

This ties the label directly to the column position and is immune to h1 wording. Fix this →


Bug 2 (Moderate, new) — Verbose competitor name on 2 more pages

comparison-table-entity.html:11-12

For pages where the h1 reads "Pulumi vs. <Long Name>", $competitorName gets the full h1 wording, not the shorter column header name:

Page h1 (parsed $competitorName) Table col 3 header
aws-cdk/_index.md "AWS Cloud Development Kit (CDK)" "AWS CDK"
cdktf/_index.md "CDK for Terraform (CDKTF)" "CDKTF"

The data is attributed to the right product (columns aren't swapped), but the entity name and description fields will use the long form: e.g., "Pulumi vs. CDK for Terraform (CDKTF) Feature Comparison". Using the table column header would produce the shorter, canonical form actually used in the comparison. This too is resolved by the fix above.


Issue 3 (Minor, new) — ItemList is not linked from the WebPage node

graph-builder.html:174-180

The ItemList entity is appended to $graph as a standalone node, but the WebPage entity has no outgoing reference to it. Contrast with how mainEntity and breadcrumb are wired:

{{ $webpage = merge $webpage (dict "mainEntity" (dict "@id" "#main-content")) }}

Without an analogous hasPart (or mainEntity) reference from $webpage, the ItemList floats unconnected in the graph. This is valid JSON-LD and Google's Rich Results Test will still parse it, but explicitly connecting it strengthens the semantic relationship and is more consistent with the rest of the collector pattern.


Looks good (unchanged)

  • Graph-builder integration: follows video-entity.html guard pattern exactly. Clean.
  • Table parsing: after 2, separator-row regex, Markdown link stripping, HTML tag removal all correct.
  • Scratch usage: proper set/accumulate/read/delete lifecycle on both compRows and compItems.
  • URL scoping: strings.Contains .RelPermalink "/comparisons/" guard is correct.
  • Schema.org types: ItemList + ListItem with @id, position, name, description is valid.
  • Empty-dict return and guard in graph-builder: consistent with other collectors.

The title-parsing approach is the root cause of both bugs 1 and 2; switching to header-row extraction resolves both in one change.
| Branch: feat/comparison-table-schema

@alexleventer alexleventer requested a review from sicarul April 10, 2026 19:59
@sicarul sicarul requested a review from CamSoper April 10, 2026 21:20
Copy link
Copy Markdown
Contributor

@CamSoper CamSoper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a human maintainer pass and landing on the same conclusions @claude did — including the column inversion, which I confirmed on the preview site. The Terraform page's JSON-LD currently reads:

"Terraform: Python, TypeScript, JavaScript, Go, C#, F#, Java, YAML | Pulumi IaC: HashiCorp Configuration Language (HCL)"

…which attributes Pulumi's languages to Terraform and HCL to Pulumi. Definitely want that fixed before this ships — structured data that's confidently wrong is worse than no structured data at all.

The header-row extraction approach @claude suggested resolves both the column swap and the verbose-name issue on aws-cdk / cdktf in one shot. Worth also picking up the hasPart wiring from WebPage to the ItemList while you're in there.

Ping me when it's ready for another look.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 13, 2026

Claude finished @CamSoper's task in 2m 49s —— View job


Fixes implemented

  • Gather context and understand request
  • Read current file contents
  • Fix 1: Replace title-parsing with header-row extraction for product/competitor names
  • Fix 2: Add hasPart wiring from WebPage to ItemList in graph-builder.html
  • Commit and push

What changed

comparison-table-entity.html — replaced the title-parsing block (lines 7–19) with header-row extraction:

{{/* Extract column names from the table header row (always in correct column order) */}}
{{ $headerCells := split (index $allTableLines 0) "|" }}
{{ $productName := index $headerCells 2 | strings.TrimSpace }}
{{ $competitorName := index $headerCells 3 | strings.TrimSpace }}

Column names now come from the actual table (| Feature | Pulumi | Terraform |) rather than the h1 title. This fixes:

  • Terraform page: was "Terraform: Python… | Pulumi IaC: HCL" → now correctly "Pulumi: Python… | Terraform: HCL"
  • Crossplane page: labels were swapped → now correct
  • AWS CDK / CDKTF: verbose names ("CDK for Terraform (CDKTF)") → now canonical short forms ("AWS CDK", "CDKTF")

The two top-level guards (/comparisons/ path and | Feature | content check) are now combined into a single if and at line 5, so the title-parsing block is gone entirely.

graph-builder.html — moved the comparison table collector to run before $webpage is appended to $graph. This allows a hasPart link to be wired into the WebPage node when an ItemList is present:

{{ $comparisonTable = partial "schema/collectors/comparison-table-entity.html" . }}
{{ if and $comparisonTable (ne $comparisonTable (dict)) }}
  {{ $webpage = merge $webpage (dict "hasPart" (dict "@id" "#comparison-table")) }}
{{ end }}

The $comparisonTable variable is then re-used at the original insertion point to append the entity to $graph — no double-invocation of the partial.

Branch: feat/comparison-table-schema |

- Replace title-parsing heuristic in comparison-table-entity.html with
  header-row extraction: split the first table row on "|" and read cells
  2 and 3 for productName/competitorName. This fixes the label swap on
  terraform (h1 "Terraform vs. Pulumi IaC", cols "Pulumi | Terraform")
  and crossplane (h1 "Crossplane vs Pulumi", cols "Pulumi | Crossplane"),
  and also produces shorter canonical names for aws-cdk ("AWS CDK") and
  cdktf ("CDKTF") instead of the verbose h1 forms.
- Restructure graph-builder.html so the comparison table collector runs
  before $webpage is appended to $graph, allowing a hasPart link to be
  wired into the WebPage node when an ItemList entity is present.

Co-authored-by: Cam Soper <CamSoper@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@CamSoper CamSoper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @alexleventer! It seems I accidentally triggered Claude and he just came along and fixed it. Feel free to revert that commit if you need to!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants