Open
Description
crawl4ai version
0.4.248
Expected Behavior
JsonCssExtractionStrategy should return results, and using the example in "Pattern-Based with JsonCssExtractionStrategy" should not return empty.
Current Behavior
I was trying to properly configure JsonCssExtractionStrategy for my use, and I continually got no results even with a very simple schema. So, I went back to the example from the docs, pasted it into a script and ran it with no response. See screenshot. (I tried changing baseSelector to "tr.athing submission" because that is what ycombinator shows as the current table row style. But no variations worked.)
See bottom: "Sample extracted items: []"

Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Run the sample script as-is
Code snippets
Exactly as from https://docs.crawl4ai.com/core/content-selection/
section 4.1
import asyncio
import json
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
async def main():
# Minimal schema for repeated items
schema = {
"name": "News Items",
"baseSelector": "tr.athing",
"fields": [
{"name": "title", "selector": "a.storylink", "type": "text"},
{
"name": "link",
"selector": "a.storylink",
"type": "attribute",
"attribute": "href"
}
]
}
config = CrawlerRunConfig(
# Content filtering
excluded_tags=["form", "header"],
exclude_domains=["adsite.com"],
# CSS selection or entire page
css_selector="table.itemlist",
# No caching for demonstration
cache_mode=CacheMode.BYPASS,
# Extraction strategy
extraction_strategy=JsonCssExtractionStrategy(schema)
)
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://news.ycombinator.com/newest",
config=config
)
data = json.loads(result.extracted_content)
print("Sample extracted item:", data[:1]) # Show first item
if __name__ == "__main__":
asyncio.run(main())
OS
MacOS
Python version
3.12.8
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Activity