web-crawler

Star

Here are 687 public repositories matching this topic...

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

Updated Jun 25, 2026
Python

apify / crawlee-python

Star

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling selenium pip web-scraping beautifulsoup web-crawling headless-chrome apify parsel playwright

Updated Jul 7, 2026
Python

adithya-s-k / omniparse

Sponsor

Star

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ocr parser-library web-crawler parse-server whisper-api ingestion-api vision-transformer omniparser

Updated Dec 12, 2025
Python

jasonxtn / Argus

Star

The Ultimate Information Gathering Toolkit

osint web-crawler whois-lookup virustotal information-gathering server-info dns-lookup reconnaissance cms-detection recon-tools email-harvester ssl-analitcs directory-finder txt-records pastebin-monitoring

Updated Dec 10, 2025
Python

xianhu / PSpider

Star

简单易用的Python爬虫框架，QQ交流群：597510560

python crawler multi-threading spider multiprocessing web-crawler proxies python-spider web-spider

Updated Jun 10, 2022
Python

scrapfly / scrapfly-scrapers

Star

Scalable Python web scraping scripts for +40 popular domains

Updated Jul 7, 2026
Python

Algebra-FUN / WeReadScan

Star

扫描“微信读书”已购图书并下载本地PDF的爬虫

web-crawler selenium weread book-downloader

Updated Sep 19, 2023
Python

PhialsBasement / LibreCrawl

Star

Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!

desktop-app python open-source flask seo web-crawler website-auditing free seo-analysis

Updated Jun 1, 2026
Python

cxcscmu / Craw4LLM

Star

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

crawler web-crawler crawling web-crawling pre-training pretraining large-language-models llm

Updated Feb 24, 2025
Python

lefterisloukas / edgar-crawler

Star

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)

python nlp finance natural-language-processing business data-mining web-crawler sec edgar edgar-crawler

Updated Jul 18, 2025
Python

jpjacobpadilla / Stealth-Requests

Star

Undetected web-scraping & seamless HTML parsing in Python!

python data web-crawler http-client http-requests requests web-scraping xpath data-extraction html-parsing webscraping python-web-scraper python-scraping

Updated Apr 4, 2026
Python

hyunwoongko / kochat

Sponsor

Star

Opensource Korean chatbot framework

deep-learning web-crawler chatbot korean deeplearning sentence-classification korean-chatbot sequance-tagging

Updated May 22, 2023
Python

rivermont / spidy

Star

The simple, easy to use command line web crawler.

python crawler web-crawler crawling python3 web-spider

Updated Aug 8, 2024
Python

lucasxlu / LagouJob

Star

Data Analysis & Mining for lagou.com

nlp machine-learning data-mining web-crawler python3 data-analysis lagou

Updated Apr 19, 2019
Python

Madi-S / Lead-Generation

Star

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

python parser scraper web-crawler leads chromedriver lead-generation leadscanner playwright

Updated Jun 26, 2026
Python

xiayouran / Musicer

Star

旨在将网易云、酷狗、QQ、酷我等各音乐平台集于一体

python music-player web-crawler web-spider music-downloader music-download-script qq-music wangyiyunmusic kugou-music kuwo-music music-robot

Updated Nov 26, 2022
Python

Hecate2 / Ignareo-ISML-auto-voter

Star

Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)

python http microservice high-performance web-crawler concurrency distributed asyncio gevent web-spider isml sukasuka chtholly sukamoka ignareo tiat

Updated Jun 19, 2026
Python

elliotxx / zhihu-crawler-people

Star

A simple distributed crawler for zhihu && data analysis

python crawler spider web-crawler python-crawler web-spider

Updated Dec 7, 2022
Python

abaykan / CrawlBox

Star

Easy way to brute-force web directory.

python crawler web-crawler wordlist admin-finder

Updated Jun 2, 2019
Python

skytruine / OSpider

Star

开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)

download ad web-crawler free poi building street aoi land-use

Updated May 23, 2023
Python

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawler

Here are 687 public repositories matching this topic...

ScrapeGraphAI / Scrapegraph-ai

apify / crawlee-python

adithya-s-k / omniparse

jasonxtn / Argus

xianhu / PSpider

scrapfly / scrapfly-scrapers

Algebra-FUN / WeReadScan

PhialsBasement / LibreCrawl

cxcscmu / Craw4LLM

lefterisloukas / edgar-crawler

jpjacobpadilla / Stealth-Requests

hyunwoongko / kochat

rivermont / spidy

lucasxlu / LagouJob

Madi-S / Lead-Generation

xiayouran / Musicer

Hecate2 / Ignareo-ISML-auto-voter

elliotxx / zhihu-crawler-people

abaykan / CrawlBox

skytruine / OSpider

Improve this page

Add this topic to your repo