feat: Add Serper scrape API for content extraction#48
Conversation
WebCat now uses Serper's optimized scraping infrastructure as the primary content extraction method, with Trafilatura as fallback. This makes WebCat a true composite search tool - one SERPER_API_KEY enables both search + scraping. Benefits: - Much faster and more reliable scraping via Serper's infrastructure - Cleaner markdown output with preserved document structure - Single API key for both search and content extraction - Automatic fallback to Trafilatura when Serper unavailable - Reduced compute costs compared to local scraping Changes: - Add scrape_webpage() function to serper_client.py - Update content_scraper.py to prioritize Serper scrape API - Update README to highlight composite tool functionality - Maintain backward compatibility with Trafilatura fallback Pricing: Serper scraping at $0.001 per scrape (included in free tier) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. WalkthroughAdds Serper-based web scraping with optional fallback to Trafilatura, updates README to reflect new scraping/search setup, introduces a Serper client function for scraping, adjusts content scraper control flow to prefer Serper when configured, and increments the version to 2.5.1. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant CS as ContentScraper
participant SC as SerperClient
participant SA as Serper Scrape API
participant T as Trafilatura
U->>CS: scrape(url)
alt SERPER_API_KEY present
CS->>SC: scrape_webpage(url, api_key)
SC->>SA: POST / (url)
SA-->>SC: text/markdown or empty
alt Content returned
SC-->>CS: markdown text
CS-->>U: wrapped+truncated content
else No content/error
SC-->>CS: None / error
Note over CS: Fallback to Trafilatura
CS->>T: extract(url)
T-->>CS: text/markdown/snippet or None
CS-->>U: result (or snippet)
end
else No SERPER_API_KEY
CS->>T: extract(url)
T-->>CS: text/markdown/snippet or None
CS-->>U: result (or snippet)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
WebCat now uses Serper's optimized scraping infrastructure as the primary content extraction method, with Trafilatura as fallback. This makes WebCat a true composite search tool - one SERPER_API_KEY enables both search + scraping.
Benefits:
Changes:
Pricing: Serper scraping at $0.001 per scrape (included in free tier)
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores