Skip to content

fix: add OG images for Twitter/social card previews#4

Merged
NameetP merged 1 commit into
mainfrom
fix/og-images
Mar 12, 2026
Merged

fix: add OG images for Twitter/social card previews#4
NameetP merged 1 commit into
mainfrom
fix/og-images

Conversation

@NameetP

@NameetP NameetP commented Mar 12, 2026

Copy link
Copy Markdown
Owner

Summary

  • Twitter card was showing generic link preview — no og:image or twitter:image meta tags existed
  • Created 1200x630 OG images: benchmark-specific + default fallback
  • Added og:image, og:image:width, og:image:height, twitter:image to Hugo base template
  • Per-post og_image frontmatter param with automatic fallback

Test plan

  • Hugo builds clean (25 pages, 3 static files)
  • Verified og:image and twitter:image tags in built HTML
  • Validate with Twitter Card Validator after deploy

🤖 Generated with Claude Code

- Created OG images (1200x630) for benchmark blog post and default fallback
- Added og:image + twitter:image meta tags to Hugo base template
- Per-post og_image frontmatter param with fallback to default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@NameetP NameetP merged commit cacd0bc into main Mar 12, 2026
1 check failed
@NameetP NameetP deleted the fix/og-images branch March 12, 2026 13:47
NameetP pushed a commit that referenced this pull request Mar 18, 2026
Font-size-based heading detection (headings.py, ~220 lines):
- Analyzes PyMuPDF font metadata to identify heading spans
- Maps distinct font sizes to h1/h2/h3 (relative to body size)
- Detects bold-at-same-size headings common in academic PDFs
- Promotes short bold-only lines to ### as fallback
- Early exit when pymupdf4llm already detected headings

Borderless table fallback (table_fallback.py, ~200 lines):
- Whitespace column detection for tables missed by find_tables()
- Validates: 3+ rows, 2+ columns, numeric column required
- Returns ExtractedTable objects matching existing type

Integration:
- fast.py: always opens fitz doc, injects headings per page
- audit.py: injects headings in multipass/standard quality path

Benchmark results (opendataloader-bench, 200 PDFs):
  Overall: 0.792 → 0.853 (+0.061)
  MHS:     0.500 → 0.740 (+0.240)
  NID:     0.911 → 0.911 (unchanged)
  TEDS:    0.704 → 0.704 (unchanged)

Leaderboard: #6#4 (ahead of opendataloader local, mineru)

21 new tests, 246 total passing, zero new dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NameetP added a commit that referenced this pull request Mar 18, 2026
Font-size-based heading detection (headings.py, ~220 lines):
- Analyzes PyMuPDF font metadata to identify heading spans
- Maps distinct font sizes to h1/h2/h3 (relative to body size)
- Detects bold-at-same-size headings common in academic PDFs
- Promotes short bold-only lines to ### as fallback
- Early exit when pymupdf4llm already detected headings

Borderless table fallback (table_fallback.py, ~200 lines):
- Whitespace column detection for tables missed by find_tables()
- Validates: 3+ rows, 2+ columns, numeric column required
- Returns ExtractedTable objects matching existing type

Integration:
- fast.py: always opens fitz doc, injects headings per page
- audit.py: injects headings in multipass/standard quality path

Benchmark results (opendataloader-bench, 200 PDFs):
  Overall: 0.792 → 0.853 (+0.061)
  MHS:     0.500 → 0.740 (+0.240)
  NID:     0.911 → 0.911 (unchanged)
  TEDS:    0.704 → 0.704 (unchanged)

Leaderboard: #6#4 (ahead of opendataloader local, mineru)

21 new tests, 246 total passing, zero new dependencies.

Co-authored-by: Nameet Potnis <nameetpotnis@Nameets-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
NameetP pushed a commit that referenced this pull request Mar 18, 2026
- Added benchmark leaderboard (opendataloader-bench, 200 PDFs)
- pdfmux #4 overall (0.853), #2 reading order (0.911)
- Heading detection in pipeline diagram and multi-pass description
- Updated project structure with headings.py, table_fallback.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant