Skip to content
View Butters3388214's full-sized avatar

Block or report Butters3388214

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Crawlers

25 repositories

Google Drive Public File Downloader when Curl/Wget Fails

Python 4,300 350 Updated Aug 12, 2024

Media Downloader is a Qt/C++ front end to yt-dlp, youtube-dl, gallery-dl, lux, you-get, svtplay-dl, aria2c, wget and safari books..

C++ 1,667 128 Updated Nov 4, 2024

Google Drive direct download of big files

Perl 937 196 Updated May 12, 2023

Got: Simple golang package and CLI tool to download large files faster πŸƒ than cURL and Wget!

Go 723 46 Updated Jan 16, 2024

Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON do…

Pascal 681 42 Updated Apr 20, 2024

tget is wget for torrents

JavaScript 622 51 Updated Dec 11, 2020

Wget Git mirror

C 393 132 Updated Sep 25, 2024

The successor of GNU Wget. Contributions preferred at https://gitlab.com/gnuwget/wget2. But accepted here as well 😍

C 562 76 Updated Nov 1, 2024

Wget-compatible web downloader and crawler.

HTML 555 77 Updated Apr 29, 2024

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

Python 1,395 135 Updated Jul 7, 2024

πŸ—ƒ Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Python 22,243 1,178 Updated Nov 4, 2024

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

JavaScript 50 5 Updated Aug 15, 2024

An Awesome List for getting started with web archiving

2,040 156 Updated Nov 6, 2024

A curated list of awesome tools for website diffing and change monitoring.

494 31 Updated Aug 9, 2022

πŸ“š A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity

91 11 Updated Sep 27, 2018

List of libraries, tools and APIs for web scraping and data processing.

Makefile 6,671 787 Updated Oct 27, 2024

A curated list of awesome puppeteer resources.

2,403 161 Updated Jul 19, 2024

List of libraries, tools and APIs for web scraping and data processing.

Makefile 240 33 Updated Apr 5, 2024

List of data-hoarding related tools

1,084 83 Updated Sep 14, 2023

πŸ‹ Web Archiving Integration Layer: One-Click User Instigated Preservation

Roff 350 35 Updated Oct 4, 2024

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

Java 2,829 763 Updated Nov 7, 2024

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

JavaScript 437 38 Updated Sep 17, 2020

simple script to convert web resources to a single warc file

Python 18 2 Updated May 11, 2023

wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.

PHP 10,448 768 Updated Nov 8, 2024

A server to collect & archive websites that also supports video downloads

TypeScript 78 10 Updated Feb 11, 2023