Pulse · unclecode/crawl4ai · GitHub

January 19, 2025 – February 19, 2025

Overview

19 Active pull requests

173 Active issues

Could not load contribution data

Please try again later

10 Pull requests merged by 8 people

2025 feb alpha 1
#685 merged Feb 19, 2025
spelling change in prompt and support to gpt-4o-mini
#128 merged Feb 12, 2025
Remove leading Y before here
#129 merged Feb 12, 2025
(Docs) Fix numbered list end-of-line formatting
#609 merged Feb 12, 2025
Fix Markdown Incorrect Spacing #599
#658 merged Feb 12, 2025
fix: access downloads_path through browser_config in _handle_download…
#612 merged Feb 11, 2025
Next
#657 merged Feb 11, 2025
base-config structure is changed
#618 merged Feb 7, 2025
Update README.md
#562 merged Jan 26, 2025
Scraper uc
#496 merged Jan 20, 2025

9 Pull requests opened by 9 people

Update docker-deploymeny.md
#605 opened Feb 3, 2025
Fix cdp_url import into the managed browser
#640 opened Feb 8, 2025
Update README.md
#671 opened Feb 14, 2025
Update how pydantic models load configurations to comply with pydantic 2
#679 opened Feb 14, 2025
docs(README): add installation steps and create new Jupyter notebook for Legat4me
#680 opened Feb 14, 2025
executing JS should happen after waiting
#681 opened Feb 15, 2025
Fix `raw://` URL parsing logic
#687 opened Feb 15, 2025
[WIP] mvp for async extraction strategies
#706 opened Feb 17, 2025
Fix/robots.txt parsing
#708 opened Feb 17, 2025

126 Issues closed by 8 people

[Bug]: Caching Issue: Crawler Ignores Updated CrawlerRunConfig Despite disable_cache=True in crawl4ai
#714 closed Feb 19, 2025
[Bug]: excluded_tags and excluded_selector arguments for CrawlerRunConfig don't work
#693 closed Feb 18, 2025
[Bug]: Double Appended screenshot image
#697 closed Feb 17, 2025
After using CacheMode.ENABLE, the markdown value in the result of scaling repeated pages is changed to '2204056abe8e6198', which looks like an id. How to retrieve its original content?
#338 closed Feb 17, 2025
[Bug]: Execution context was destroyed, most likely because of a navigation
#690 closed Feb 16, 2025
[Bug]: PruningContentFilter takes no effect
#673 closed Feb 14, 2025
[Bug]: How can I accurately wait for all files to finish downloading?
#672 closed Feb 14, 2025
cannot access local variable 'filtered_html"
#290 closed Feb 14, 2025
[Bug]: I am unable to get text content for all the pages of the provided Website URL
#661 closed Feb 12, 2025
[Bug]: Failed to scroll down and handle dynamic content loading when crawling app comments from Google Play store
#654 closed Feb 12, 2025
pip install fails Mac M4
#291 closed Feb 12, 2025
[Bug]: Cannot scrape the full page (lazy loading)
#634 closed Feb 11, 2025
[Bug]: Quickstart Example 4: result.markdown is a string instead of MarkdownGenerationResult
#639 closed Feb 11, 2025
[Bug]: Cant save a pdf using run_many()
#636 closed Feb 11, 2025
[Bug]: `ignore_links` does not work unless `cache_mode` is set to `CacheMode.BYPASS`
#638 closed Feb 11, 2025
[Bug]: custom headers are ignored
#633 closed Feb 11, 2025
[Bug]: CrawlRunConfig doesn't work
#642 closed Feb 10, 2025
[Bug]: Cannot run crawl4ai in conda env
#637 closed Feb 10, 2025
[Bug]: The TAB opened using CRAWL4AI come across CORS errors
#630 closed Feb 10, 2025
[Bug]: CORS (Cross-Origin Resource Sharing) error when trying to use Crawl4AI to connect to Twitter.
#641 closed Feb 10, 2025
[Bug]: Fonts not showing as expected
#644 closed Feb 9, 2025
[Bug]: RateLimitConfig Class is missing in 0.4.3
#573 closed Feb 9, 2025
[Bug]: PruningContentFilter strips out <a> and <strong> tags completely
#582 closed Feb 7, 2025
[Bug]: cannot import name 'Crawler' from 'crawl4ai'
#620 closed Feb 5, 2025
Code blocks lose formatting when converting from HTML to markdown
#325 closed Jan 31, 2025
AttributeError: __aenter__
#333 closed Jan 31, 2025
what version of python should I use?
#334 closed Jan 31, 2025
Version 0.3.74 - Output of scraped website to markdown returns an error
#287 closed Jan 31, 2025
Timeout error, wait_for_selector
#219 closed Jan 31, 2025
cannot modify the timeout
#217 closed Jan 31, 2025
cannot bypass cache db
#216 closed Jan 31, 2025
[Bug]: No such file or directory: 'google-chrome'
#571 closed Jan 31, 2025
Version 0.4.247: Persistent `fit_markdown` Issue
#453 closed Jan 31, 2025
[Bug]: LLMContentFilter ImportError
#588 closed Jan 31, 2025
[Bug]: javascript added to url
#584 closed Jan 31, 2025
how to pass raw html to LLMExtractionStrategy?
#591 closed Jan 31, 2025
[Bug]: llm_strategy not working on arun_many
#557 closed Jan 28, 2025
[Bug]: RuntimeError: asyncio.run() cannot be called from a running event loop
#563 closed Jan 27, 2025
[Bug]: Last released version misses async_dispatcher
#567 closed Jan 27, 2025
lxml.parser error
#466 closed Jan 27, 2025
deployed crawl4AI tool not working - exception=AttributeError('`copy` is not supported.')>
#349 closed Jan 25, 2025
Bug Report for Crawl4A multiple async
#143 closed Jan 25, 2025
[Bug]: Anti bot detection not working for artnet.com
#504 closed Jan 25, 2025
uvloop error
#439 closed Jan 24, 2025
scripts from the js_snippets folder are not installed via pip
#348 closed Jan 24, 2025
JsonCssExtractionStrategy Fails to Handle Lists of Elements
#433 closed Jan 24, 2025
Request: Please tag your git repository to match the release history on PyPI
#549 closed Jan 24, 2025
Bug with "rotating proxies"
#460 closed Jan 23, 2025
Prevent Crawl4AI from Crawling After Link Failure – Only Extract Content
#237 closed Jan 23, 2025
Quick Start cant run
#342 closed Jan 22, 2025
Don’t print or give option to not print in __init__
#250 closed Jan 22, 2025
Crawl4AI Error: This page is not fully supported.
#281 closed Jan 22, 2025
Use of set_hook in docker container
#247 closed Jan 22, 2025
Ability to add browser extensions
#261 closed Jan 22, 2025
How to "track" async calls?
#319 closed Jan 22, 2025
Storing cache in a custom directory
#252 closed Jan 22, 2025
user data crawling opens two windows, unable to control correct user browser
#236 closed Jan 22, 2025
Feature Request: Filtering for Small and Invisible Text
#274 closed Jan 22, 2025
Define extraction strategy schema typings
#230 closed Jan 22, 2025
Screenshot must be taken after wait_for condition is met
#120 closed Jan 22, 2025
Temperature
#321 closed Jan 22, 2025
Can you make a llms.txt file for latest .md files
#326 closed Jan 22, 2025
Slow performance of crawl4AI in Docker compared to pip installation outside Docker environment
#329 closed Jan 22, 2025
Incorrect Conversion of Relative to Absolute Paths for href in Web Pages
#231 closed Jan 22, 2025
Why some of the options in the documentation are not actually available?
#331 closed Jan 22, 2025
Playwright is to support the intercept Request
#330 closed Jan 22, 2025
google anti-bot detection
#301 closed Jan 22, 2025
unstructured data download
#283 closed Jan 22, 2025
GitHub issues not scraped fully
#412 closed Jan 22, 2025
page_timout does not work for crawler.arun_many
#436 closed Jan 22, 2025
How to retrieve only the markdown format data of a specified page through Docker deployed APIs and return it
#415 closed Jan 22, 2025
Regarding scrpaping of Dynamic website like Skyscanner.net
#341 closed Jan 22, 2025
Adding batching feature for openAI
#140 closed Jan 22, 2025
Version 0.3.71 is more stable than 0.3.72
#212 closed Jan 22, 2025
Reliable and easy to setup way to deploy Crawl4ai
#180 closed Jan 22, 2025
Facing error in using open source LLM
#209 closed Jan 22, 2025
Input value to search
#190 closed Jan 22, 2025
Base64 image format not parsed
#182 closed Jan 22, 2025
Cache Optionality
#137 closed Jan 22, 2025
Improve discoverability in chatGPT and other coding assistants
#126 closed Jan 22, 2025
Can Docker API use all functions
#318 closed Jan 22, 2025
Issue with website with anti-bot detection.
#238 closed Jan 22, 2025
What hook is needed to trigger browser clicks using Python?
#347 closed Jan 22, 2025
"detail": "Not authenticated" - when I use API
#365 closed Jan 22, 2025
When running in docker container
#354 closed Jan 22, 2025
Why does the extracted-content field only return a maximum of 8 pieces of data after structuring the large model data
#373 closed Jan 22, 2025
Is there a room version that automatically categorizes list pages and detail pages
#385 closed Jan 22, 2025
LLMExtractionStrategy Extracting Irrelevant Data from Infinite Scrolling Pages
#386 closed Jan 22, 2025
Add a push to hub method
#257 closed Jan 21, 2025
Add Timestamp to result.links
#327 closed Jan 21, 2025
Can you create an option so we can install on pinokio?
#113 closed Jan 21, 2025
Not able to crawl github repo recursively
#408 closed Jan 21, 2025
Is their a way to scrape an already opened playwright webpage?
#429 closed Jan 21, 2025
Expose completion tokens, total tokens, cost, etc. on OpenAI
#210 closed Jan 21, 2025
Documentation Fixes (2025 - JAN)
#435 closed Jan 21, 2025
How to make the crawler wait 2.5 seconds before getting markdown?
#306 closed Jan 21, 2025
how to use ollama corectly
#280 closed Jan 21, 2025
exec /usr/local/bin/uvicorn: exec format error
#273 closed Jan 21, 2025
No response
#272 closed Jan 21, 2025
How to crawl current webpage only?
#242 closed Jan 21, 2025
Input Length Exceeds Maximum Limit in LLama:8B Model API (Deep Infra)
#395 closed Jan 21, 2025
ERROR: No matching distribution found for pillow~=10.4 (from crawl4ai[all])
#323 closed Jan 21, 2025
How can I implement this with websites that perform cookie consent?
#223 closed Jan 21, 2025
! crawl4ai The requested image's platform (linux/arm64/v8) does not match the detected host platform (linux/amd64/v3) and no specific platform was requested 0.0s
#240 closed Jan 21, 2025
system diagnostics tool
#275 closed Jan 21, 2025
PermissionError when running crawl4ai in Docker: [Errno 13] Permission denied: '/nonexistent'
#222 closed Jan 21, 2025
Cannot get response headers
#220 closed Jan 21, 2025
AsyncWebCrawler returns arrays of JSON objects instead of single objects per scrape
#205 closed Jan 21, 2025
When I start using Docker, where can I find all the environment variables and configurations that I can modify?
#195 closed Jan 21, 2025
Issue rendering images
#285 closed Jan 21, 2025
IE 11 is not supported. For an optimal experience visit our site on another browser
#208 closed Jan 21, 2025
Remove Headers, Footers, External Links and their related data
#181 closed Jan 21, 2025
how can i extract text from the CrawlResult?
#171 closed Jan 21, 2025
异步报错，无法创建子进程
#175 closed Jan 21, 2025
Bypassing automated crawler detection by Firewalls
#136 closed Jan 21, 2025
Timeout setting
#123 closed Jan 21, 2025
cannot import name 'WebCrawler' from 'crawl4ai'
#122 closed Jan 21, 2025
Language Support
#118 closed Jan 21, 2025
Using Proxy
#116 closed Jan 21, 2025
[DOUBT] Performance expectations
#115 closed Jan 21, 2025
Documentation commands fail with 404 error (missing llm.txt)
#451 closed Jan 20, 2025
Ignore links not working
#468 closed Jan 20, 2025
[Bug]: status_code is always None
#499 closed Jan 20, 2025
Extracting image or video links
#469 closed Jan 20, 2025
Receive "No authenticated" error when CRAWL4AI_API_TOKEN unset
#470 closed Jan 20, 2025
Neither screenshot nor PDF creation is working.
#477 closed Jan 20, 2025

47 Issues opened by 39 people

[Bug]: Extractors should be able to receive cleaned_html
#720 opened Feb 19, 2025
[Bug]: class 'crawl4ai.models.CrawlResult' object has no attribute 'raw_markdown'
#719 opened Feb 19, 2025
[Bug]: `AttributeError: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'config'` in `_crawl_web` when body is hidden (e.g., in `<frame>`-based sites)
#717 opened Feb 19, 2025
[Bug]: when use crawl4ai docker ver,got garbled text when fetch site other than with english content
#716 opened Feb 19, 2025
[Bug]: Typo in AsyncWebCrawler constructor
#715 opened Feb 18, 2025
[Bug]: JsonCssExtractionStrategy.generate_schema returns XPath
#713 opened Feb 18, 2025
[Bug]: arun_many and LLMExtractionStrategy with two URLs lead to 8 hallucinating requests
#712 opened Feb 18, 2025
[Bug]: the `LINK_PATTERN` used for extracting citations does not handle nested brackets
#711 opened Feb 18, 2025
[Bug]: Update current PIL Version to enable use with smolagents
#709 opened Feb 17, 2025
[Bug]: llm_strategy not working
#707 opened Feb 17, 2025
[Bug]: AsyncWebCrawler only scrapes input ulr, does not crawl links
#705 opened Feb 17, 2025
[Bug]: extraction strategies are not async
#704 opened Feb 17, 2025
[Bug]: SemaphoreDispatcher does not work with `stream=True`
#703 opened Feb 17, 2025
[Bug]: screenshot does not work with raw/file url's
#702 opened Feb 17, 2025
[Bug]: remove_overlay_elements is not working
#701 opened Feb 17, 2025
[Bug]: check_robots_txt not working
#699 opened Feb 17, 2025
[Bug]: Access to XMLHttpRequest has been blocked by CORS policy
#695 opened Feb 17, 2025
[Bug]: Setting MAGIC parameters causes [ERROR]... × Error updating image dimensions: Page.evaluate: Execution context was destroyed, most likely because of a navigation
#692 opened Feb 17, 2025
[Bug]: Forward slashes of `raw://` are not removed when converting raw URLs to HTML
#686 opened Feb 15, 2025
[Bug]: XHR requests not going through with managed browser
#684 opened Feb 15, 2025
[Bug]: ImportError: cannot import name 'CrawlerRunConfig' from 'crawl4ai' (/app/crawl4ai/__init__.py)
#682 opened Feb 15, 2025
[Bug]: Deprecation Warning: Replace Config with ConfigDict for Pydantic v2 Compatibility
#678 opened Feb 14, 2025
[Bug]: Unable to scrape Cloudfare protected sites
#677 opened Feb 14, 2025
[Bug]: PDF doesn't get parsed
#675 opened Feb 14, 2025
[Bug]: result.link (links extraction results empty lists) not working when using raw_html_url = f"raw:{raw_html}" as input
#668 opened Feb 13, 2025
[Bug]: CrawlerRunConfig is not consistent across systems/environments
#665 opened Feb 12, 2025
[Bug]: Status code for redirect URLs is not correct
#660 opened Feb 12, 2025
[Bug]: 'NoneType' object has no attribute 'new_context'
#653 opened Feb 10, 2025
[Bug]: JsonCssExtractionStrategy not returning results (even with doc example)
#651 opened Feb 10, 2025
[Bug]: 429 Unprocessable entity when trying to process raw html with docker crawl4ai
#650 opened Feb 10, 2025
[Bug]: OpenAI authentication error onsetting gemini-2.0-flash or ollama as LLM Provider when generating extraction schema
#649 opened Feb 10, 2025
[Bug]: Help me to crawl all the links from Apple VisionOS Documentation? Vue Recycler
#625 opened Feb 6, 2025
[Bug]: Infinite Scroll Page isn't loading new content
#616 opened Feb 4, 2025
[Bug]: Issue with `screenshot=True` — Capturing Screenshot Twice and Increasing Image Size
#615 opened Feb 4, 2025
[Bug]: Proxy Not Working with proxy_config option
#604 opened Feb 3, 2025
[Bug]: LLMContentFilter is ignored
#603 opened Feb 2, 2025
[Bug]: Markdown output has incorect spacing.
#599 opened Feb 1, 2025
[Bug]: scan_full_page not working on facebook
#592 opened Jan 30, 2025
[Bug]: Failed to handle download: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'downloads_path'
#585 opened Jan 29, 2025
[Bug]: Incorrect rendering of inline code inside of links
#583 opened Jan 29, 2025
[Bug]: Relative Urls in the webpage not extracted properly
#570 opened Jan 28, 2025
[Bug]: `arun_many` doesn't parallelize tasks when using `raw://`
#560 opened Jan 25, 2025
[Bug]: Docker compose is not working
#554 opened Jan 23, 2025
[Bug]: Incomplete Extraction
#505 opened Jan 21, 2025
[Bug]: Browser path detection failing in Windmill.dev with crawl4ai
#503 opened Jan 20, 2025
[Bug]: Title: Browser context becomes contaminated after failed scrapes, returning "no results" pages
#501 opened Jan 20, 2025
[Bug]: [Errno 30] Read-only file system error when using AWS Lambda
#497 opened Jan 20, 2025

11 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Error code: 422 - {'detail': [{'loc': ['body', 'user'], 'msg': 'field required', 'type': 'value_error.missing'}
#391 commented on Jan 26, 2025 • 0 new comments
Multiple JS Code execution in a single run
#226 commented on Jan 28, 2025 • 0 new comments
Crawl4AI in Streamlit
#437 commented on Jan 30, 2025 • 0 new comments
Browser context error
#443 commented on Jan 30, 2025 • 0 new comments
LiteLLM does not find my openai key
#481 commented on Jan 30, 2025 • 0 new comments
Unable to share login state across multiple crawler
#449 commented on Feb 10, 2025 • 0 new comments
Adding save to HF support for async webcrawler
#312 commented on Jan 22, 2025 • 0 new comments
feat: Add remove_invisible_texts method to AsyncPlaywrightCrawlerStr…
#332 commented on Jan 22, 2025 • 0 new comments
[Docs]: Add Documentation for Monitoring with OpenTelemetry
#335 commented on Jan 22, 2025 • 0 new comments
addition keep-aria-label-attribute option
#416 commented on Jan 25, 2025 • 0 new comments
fix: Add newline before pre codeblock start
#462 commented on Jan 25, 2025 • 0 new comments