Skip to content

[Bug]: JsonCssExtractionStrategy not returning results (even with doc example) #651

Open
@encoded-evolution

Description

crawl4ai version

0.4.248

Expected Behavior

JsonCssExtractionStrategy should return results, and using the example in "Pattern-Based with JsonCssExtractionStrategy" should not return empty.

Current Behavior

I was trying to properly configure JsonCssExtractionStrategy for my use, and I continually got no results even with a very simple schema. So, I went back to the example from the docs, pasted it into a script and ran it with no response. See screenshot. (I tried changing baseSelector to "tr.athing submission" because that is what ycombinator shows as the current table row style. But no variations worked.)

See bottom: "Sample extracted items: []"

Image

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Run the sample script as-is

Code snippets

Exactly as from https://docs.crawl4ai.com/core/content-selection/
section 4.1

import asyncio
import json
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

async def main():
    # Minimal schema for repeated items
    schema = {
        "name": "News Items",
        "baseSelector": "tr.athing",
        "fields": [
            {"name": "title", "selector": "a.storylink", "type": "text"},
            {
                "name": "link", 
                "selector": "a.storylink", 
                "type": "attribute", 
                "attribute": "href"
            }
        ]
    }

    config = CrawlerRunConfig(
        # Content filtering
        excluded_tags=["form", "header"],
        exclude_domains=["adsite.com"],

        # CSS selection or entire page
        css_selector="table.itemlist",

        # No caching for demonstration
        cache_mode=CacheMode.BYPASS,

        # Extraction strategy
        extraction_strategy=JsonCssExtractionStrategy(schema)
    )

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://news.ycombinator.com/newest", 
            config=config
        )
        data = json.loads(result.extracted_content)
        print("Sample extracted item:", data[:1])  # Show first item

if __name__ == "__main__":
    asyncio.run(main())

OS

MacOS

Python version

3.12.8

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions