-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Insights: unclecode/crawl4ai
Overview
Could not load contribution data
Please try again later
10 Pull requests merged by 8 people
-
2025 feb alpha 1
#685 merged
Feb 19, 2025 -
spelling change in prompt and support to gpt-4o-mini
#128 merged
Feb 12, 2025 -
Remove leading Y before here
#129 merged
Feb 12, 2025 -
(Docs) Fix numbered list end-of-line formatting
#609 merged
Feb 12, 2025 -
Fix Markdown Incorrect Spacing #599
#658 merged
Feb 12, 2025 -
fix: access downloads_path through browser_config in _handle_download…
#612 merged
Feb 11, 2025 -
Next
#657 merged
Feb 11, 2025 -
base-config structure is changed
#618 merged
Feb 7, 2025 -
Update README.md
#562 merged
Jan 26, 2025 -
Scraper uc
#496 merged
Jan 20, 2025
9 Pull requests opened by 9 people
-
Update docker-deploymeny.md
#605 opened
Feb 3, 2025 -
Fix cdp_url import into the managed browser
#640 opened
Feb 8, 2025 -
Update README.md
#671 opened
Feb 14, 2025 -
Update how pydantic models load configurations to comply with pydantic 2
#679 opened
Feb 14, 2025 -
docs(README): add installation steps and create new Jupyter notebook for Legat4me
#680 opened
Feb 14, 2025 -
executing JS should happen after waiting
#681 opened
Feb 15, 2025 -
Fix `raw://` URL parsing logic
#687 opened
Feb 15, 2025 -
[WIP] mvp for async extraction strategies
#706 opened
Feb 17, 2025 -
Fix/robots.txt parsing
#708 opened
Feb 17, 2025
126 Issues closed by 8 people
-
[Bug]: excluded_tags and excluded_selector arguments for CrawlerRunConfig don't work
#693 closed
Feb 18, 2025 -
[Bug]: Double Appended screenshot image
#697 closed
Feb 17, 2025 -
[Bug]: Execution context was destroyed, most likely because of a navigation
#690 closed
Feb 16, 2025 -
[Bug]: PruningContentFilter takes no effect
#673 closed
Feb 14, 2025 -
[Bug]: How can I accurately wait for all files to finish downloading?
#672 closed
Feb 14, 2025 -
cannot access local variable 'filtered_html"
#290 closed
Feb 14, 2025 -
[Bug]: I am unable to get text content for all the pages of the provided Website URL
#661 closed
Feb 12, 2025 -
pip install fails Mac M4
#291 closed
Feb 12, 2025 -
[Bug]: Cannot scrape the full page (lazy loading)
#634 closed
Feb 11, 2025 -
[Bug]: Quickstart Example 4: result.markdown is a string instead of MarkdownGenerationResult
#639 closed
Feb 11, 2025 -
[Bug]: Cant save a pdf using run_many()
#636 closed
Feb 11, 2025 -
[Bug]: `ignore_links` does not work unless `cache_mode` is set to `CacheMode.BYPASS`
#638 closed
Feb 11, 2025 -
[Bug]: custom headers are ignored
#633 closed
Feb 11, 2025 -
[Bug]: CrawlRunConfig doesn't work
#642 closed
Feb 10, 2025 -
[Bug]: Cannot run crawl4ai in conda env
#637 closed
Feb 10, 2025 -
[Bug]: The TAB opened using CRAWL4AI come across CORS errors
#630 closed
Feb 10, 2025 -
[Bug]: CORS (Cross-Origin Resource Sharing) error when trying to use Crawl4AI to connect to Twitter.
#641 closed
Feb 10, 2025 -
[Bug]: Fonts not showing as expected
#644 closed
Feb 9, 2025 -
[Bug]: RateLimitConfig Class is missing in 0.4.3
#573 closed
Feb 9, 2025 -
[Bug]: PruningContentFilter strips out <a> and <strong> tags completely
#582 closed
Feb 7, 2025 -
[Bug]: cannot import name 'Crawler' from 'crawl4ai'
#620 closed
Feb 5, 2025 -
Code blocks lose formatting when converting from HTML to markdown
#325 closed
Jan 31, 2025 -
AttributeError: __aenter__
#333 closed
Jan 31, 2025 -
what version of python should I use?
#334 closed
Jan 31, 2025 -
Version 0.3.74 - Output of scraped website to markdown returns an error
#287 closed
Jan 31, 2025 -
Timeout error, wait_for_selector
#219 closed
Jan 31, 2025 -
cannot modify the timeout
#217 closed
Jan 31, 2025 -
cannot bypass cache db
#216 closed
Jan 31, 2025 -
[Bug]: No such file or directory: 'google-chrome'
#571 closed
Jan 31, 2025 -
Version 0.4.247: Persistent `fit_markdown` Issue
#453 closed
Jan 31, 2025 -
[Bug]: LLMContentFilter ImportError
#588 closed
Jan 31, 2025 -
[Bug]: javascript added to url
#584 closed
Jan 31, 2025 -
how to pass raw html to LLMExtractionStrategy?
#591 closed
Jan 31, 2025 -
[Bug]: llm_strategy not working on arun_many
#557 closed
Jan 28, 2025 -
[Bug]: RuntimeError: asyncio.run() cannot be called from a running event loop
#563 closed
Jan 27, 2025 -
[Bug]: Last released version misses async_dispatcher
#567 closed
Jan 27, 2025 -
lxml.parser error
#466 closed
Jan 27, 2025 -
deployed crawl4AI tool not working - exception=AttributeError('`copy` is not supported.')>
#349 closed
Jan 25, 2025 -
Bug Report for Crawl4A multiple async
#143 closed
Jan 25, 2025 -
[Bug]: Anti bot detection not working for artnet.com
#504 closed
Jan 25, 2025 -
uvloop error
#439 closed
Jan 24, 2025 -
scripts from the js_snippets folder are not installed via pip
#348 closed
Jan 24, 2025 -
JsonCssExtractionStrategy Fails to Handle Lists of Elements
#433 closed
Jan 24, 2025 -
Request: Please tag your git repository to match the release history on PyPI
#549 closed
Jan 24, 2025 -
Bug with "rotating proxies"
#460 closed
Jan 23, 2025 -
Prevent Crawl4AI from Crawling After Link Failure – Only Extract Content
#237 closed
Jan 23, 2025 -
Quick Start cant run
#342 closed
Jan 22, 2025 -
Don’t print or give option to not print in __init__
#250 closed
Jan 22, 2025 -
Crawl4AI Error: This page is not fully supported.
#281 closed
Jan 22, 2025 -
Use of set_hook in docker container
#247 closed
Jan 22, 2025 -
Ability to add browser extensions
#261 closed
Jan 22, 2025 -
How to "track" async calls?
#319 closed
Jan 22, 2025 -
Storing cache in a custom directory
#252 closed
Jan 22, 2025 -
user data crawling opens two windows, unable to control correct user browser
#236 closed
Jan 22, 2025 -
Feature Request: Filtering for Small and Invisible Text
#274 closed
Jan 22, 2025 -
Define extraction strategy schema typings
#230 closed
Jan 22, 2025 -
Screenshot must be taken after wait_for condition is met
#120 closed
Jan 22, 2025 -
Temperature
#321 closed
Jan 22, 2025 -
Can you make a llms.txt file for latest .md files
#326 closed
Jan 22, 2025 -
Slow performance of crawl4AI in Docker compared to pip installation outside Docker environment
#329 closed
Jan 22, 2025 -
Incorrect Conversion of Relative to Absolute Paths for href in Web Pages
#231 closed
Jan 22, 2025 -
Why some of the options in the documentation are not actually available?
#331 closed
Jan 22, 2025 -
Playwright is to support the intercept Request
#330 closed
Jan 22, 2025 -
google anti-bot detection
#301 closed
Jan 22, 2025 -
unstructured data download
#283 closed
Jan 22, 2025 -
GitHub issues not scraped fully
#412 closed
Jan 22, 2025 -
page_timout does not work for crawler.arun_many
#436 closed
Jan 22, 2025 -
Regarding scrpaping of Dynamic website like Skyscanner.net
#341 closed
Jan 22, 2025 -
Adding batching feature for openAI
#140 closed
Jan 22, 2025 -
Version 0.3.71 is more stable than 0.3.72
#212 closed
Jan 22, 2025 -
Reliable and easy to setup way to deploy Crawl4ai
#180 closed
Jan 22, 2025 -
Facing error in using open source LLM
#209 closed
Jan 22, 2025 -
Input value to search
#190 closed
Jan 22, 2025 -
Base64 image format not parsed
#182 closed
Jan 22, 2025 -
Cache Optionality
#137 closed
Jan 22, 2025 -
Improve discoverability in chatGPT and other coding assistants
#126 closed
Jan 22, 2025 -
Can Docker API use all functions
#318 closed
Jan 22, 2025 -
Issue with website with anti-bot detection.
#238 closed
Jan 22, 2025 -
What hook is needed to trigger browser clicks using Python?
#347 closed
Jan 22, 2025 -
"detail": "Not authenticated" - when I use API
#365 closed
Jan 22, 2025 -
When running in docker container
#354 closed
Jan 22, 2025 -
Is there a room version that automatically categorizes list pages and detail pages
#385 closed
Jan 22, 2025 -
LLMExtractionStrategy Extracting Irrelevant Data from Infinite Scrolling Pages
#386 closed
Jan 22, 2025 -
Add a push to hub method
#257 closed
Jan 21, 2025 -
Add Timestamp to result.links
#327 closed
Jan 21, 2025 -
Can you create an option so we can install on pinokio?
#113 closed
Jan 21, 2025 -
Not able to crawl github repo recursively
#408 closed
Jan 21, 2025 -
Is their a way to scrape an already opened playwright webpage?
#429 closed
Jan 21, 2025 -
Expose completion tokens, total tokens, cost, etc. on OpenAI
#210 closed
Jan 21, 2025 -
Documentation Fixes (2025 - JAN)
#435 closed
Jan 21, 2025 -
How to make the crawler wait 2.5 seconds before getting markdown?
#306 closed
Jan 21, 2025 -
how to use ollama corectly
#280 closed
Jan 21, 2025 -
exec /usr/local/bin/uvicorn: exec format error
#273 closed
Jan 21, 2025 -
No response
#272 closed
Jan 21, 2025 -
How to crawl current webpage only?
#242 closed
Jan 21, 2025 -
Input Length Exceeds Maximum Limit in LLama:8B Model API (Deep Infra)
#395 closed
Jan 21, 2025 -
ERROR: No matching distribution found for pillow~=10.4 (from crawl4ai[all])
#323 closed
Jan 21, 2025 -
How can I implement this with websites that perform cookie consent?
#223 closed
Jan 21, 2025 -
system diagnostics tool
#275 closed
Jan 21, 2025 -
PermissionError when running crawl4ai in Docker: [Errno 13] Permission denied: '/nonexistent'
#222 closed
Jan 21, 2025 -
Cannot get response headers
#220 closed
Jan 21, 2025 -
AsyncWebCrawler returns arrays of JSON objects instead of single objects per scrape
#205 closed
Jan 21, 2025 -
Issue rendering images
#285 closed
Jan 21, 2025 -
IE 11 is not supported. For an optimal experience visit our site on another browser
#208 closed
Jan 21, 2025 -
Remove Headers, Footers, External Links and their related data
#181 closed
Jan 21, 2025 -
how can i extract text from the CrawlResult?
#171 closed
Jan 21, 2025 -
异步报错,无法创建子进程
#175 closed
Jan 21, 2025 -
Bypassing automated crawler detection by Firewalls
#136 closed
Jan 21, 2025 -
Timeout setting
#123 closed
Jan 21, 2025 -
cannot import name 'WebCrawler' from 'crawl4ai'
#122 closed
Jan 21, 2025 -
Language Support
#118 closed
Jan 21, 2025 -
Using Proxy
#116 closed
Jan 21, 2025 -
[DOUBT] Performance expectations
#115 closed
Jan 21, 2025 -
Documentation commands fail with 404 error (missing llm.txt)
#451 closed
Jan 20, 2025 -
Ignore links not working
#468 closed
Jan 20, 2025 -
[Bug]: status_code is always None
#499 closed
Jan 20, 2025 -
Extracting image or video links
#469 closed
Jan 20, 2025 -
Receive "No authenticated" error when CRAWL4AI_API_TOKEN unset
#470 closed
Jan 20, 2025 -
Neither screenshot nor PDF creation is working.
#477 closed
Jan 20, 2025
47 Issues opened by 39 people
-
[Bug]: Extractors should be able to receive cleaned_html
#720 opened
Feb 19, 2025 -
[Bug]: class 'crawl4ai.models.CrawlResult' object has no attribute 'raw_markdown'
#719 opened
Feb 19, 2025 -
[Bug]: when use crawl4ai docker ver,got garbled text when fetch site other than with english content
#716 opened
Feb 19, 2025 -
[Bug]: Typo in AsyncWebCrawler constructor
#715 opened
Feb 18, 2025 -
[Bug]: JsonCssExtractionStrategy.generate_schema returns XPath
#713 opened
Feb 18, 2025 -
[Bug]: arun_many and LLMExtractionStrategy with two URLs lead to 8 hallucinating requests
#712 opened
Feb 18, 2025 -
[Bug]: the `LINK_PATTERN` used for extracting citations does not handle nested brackets
#711 opened
Feb 18, 2025 -
[Bug]: Update current PIL Version to enable use with smolagents
#709 opened
Feb 17, 2025 -
[Bug]: llm_strategy not working
#707 opened
Feb 17, 2025 -
[Bug]: AsyncWebCrawler only scrapes input ulr, does not crawl links
#705 opened
Feb 17, 2025 -
[Bug]: extraction strategies are not async
#704 opened
Feb 17, 2025 -
[Bug]: SemaphoreDispatcher does not work with `stream=True`
#703 opened
Feb 17, 2025 -
[Bug]: screenshot does not work with raw/file url's
#702 opened
Feb 17, 2025 -
[Bug]: remove_overlay_elements is not working
#701 opened
Feb 17, 2025 -
[Bug]: check_robots_txt not working
#699 opened
Feb 17, 2025 -
[Bug]: Access to XMLHttpRequest has been blocked by CORS policy
#695 opened
Feb 17, 2025 -
[Bug]: Forward slashes of `raw://` are not removed when converting raw URLs to HTML
#686 opened
Feb 15, 2025 -
[Bug]: XHR requests not going through with managed browser
#684 opened
Feb 15, 2025 -
[Bug]: Deprecation Warning: Replace Config with ConfigDict for Pydantic v2 Compatibility
#678 opened
Feb 14, 2025 -
[Bug]: Unable to scrape Cloudfare protected sites
#677 opened
Feb 14, 2025 -
[Bug]: PDF doesn't get parsed
#675 opened
Feb 14, 2025 -
[Bug]: CrawlerRunConfig is not consistent across systems/environments
#665 opened
Feb 12, 2025 -
[Bug]: Status code for redirect URLs is not correct
#660 opened
Feb 12, 2025 -
[Bug]: 'NoneType' object has no attribute 'new_context'
#653 opened
Feb 10, 2025 -
[Bug]: JsonCssExtractionStrategy not returning results (even with doc example)
#651 opened
Feb 10, 2025 -
[Bug]: 429 Unprocessable entity when trying to process raw html with docker crawl4ai
#650 opened
Feb 10, 2025 -
[Bug]: Help me to crawl all the links from Apple VisionOS Documentation? Vue Recycler
#625 opened
Feb 6, 2025 -
[Bug]: Infinite Scroll Page isn't loading new content
#616 opened
Feb 4, 2025 -
[Bug]: Issue with `screenshot=True` — Capturing Screenshot Twice and Increasing Image Size
#615 opened
Feb 4, 2025 -
[Bug]: Proxy Not Working with proxy_config option
#604 opened
Feb 3, 2025 -
[Bug]: LLMContentFilter is ignored
#603 opened
Feb 2, 2025 -
[Bug]: Markdown output has incorect spacing.
#599 opened
Feb 1, 2025 -
[Bug]: scan_full_page not working on facebook
#592 opened
Jan 30, 2025 -
[Bug]: Incorrect rendering of inline code inside of links
#583 opened
Jan 29, 2025 -
[Bug]: Relative Urls in the webpage not extracted properly
#570 opened
Jan 28, 2025 -
[Bug]: `arun_many` doesn't parallelize tasks when using `raw://`
#560 opened
Jan 25, 2025 -
[Bug]: Docker compose is not working
#554 opened
Jan 23, 2025 -
[Bug]: Incomplete Extraction
#505 opened
Jan 21, 2025 -
[Bug]: Browser path detection failing in Windmill.dev with crawl4ai
#503 opened
Jan 20, 2025 -
[Bug]: [Errno 30] Read-only file system error when using AWS Lambda
#497 opened
Jan 20, 2025
11 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Error code: 422 - {'detail': [{'loc': ['body', 'user'], 'msg': 'field required', 'type': 'value_error.missing'}
#391 commented on
Jan 26, 2025 • 0 new comments -
Multiple JS Code execution in a single run
#226 commented on
Jan 28, 2025 • 0 new comments -
Crawl4AI in Streamlit
#437 commented on
Jan 30, 2025 • 0 new comments -
Browser context error
#443 commented on
Jan 30, 2025 • 0 new comments -
LiteLLM does not find my openai key
#481 commented on
Jan 30, 2025 • 0 new comments -
Unable to share login state across multiple crawler
#449 commented on
Feb 10, 2025 • 0 new comments -
Adding save to HF support for async webcrawler
#312 commented on
Jan 22, 2025 • 0 new comments -
feat: Add remove_invisible_texts method to AsyncPlaywrightCrawlerStr…
#332 commented on
Jan 22, 2025 • 0 new comments -
[Docs]: Add Documentation for Monitoring with OpenTelemetry
#335 commented on
Jan 22, 2025 • 0 new comments -
addition keep-aria-label-attribute option
#416 commented on
Jan 25, 2025 • 0 new comments -
fix: Add newline before pre codeblock start
#462 commented on
Jan 25, 2025 • 0 new comments