Releases · CryShana/CryCrawler

11 Sep 17:02

CryShana

v1.0.6

5b5173e

Hotfix for URL matching bug Latest

Latest

bug was fixed where files wouldn't be matched if no URL patterns were specified

Assets 6

16 Aug 08:38

CryShana

v1.0.5

dfef029

Global crawl delay and small improvements

You can now set a global crawl delay in seconds in the configuration file.

Small improvements include:

Package version changed so the --version shows the correct version now
More logs in DEBUG mode to display why some URLs were skipped
Seed URLs will be reloaded regardless if they were crawled or not - if the backlog is empty - this should fix the annoyance of having to delete the cache every time either a crawl fails or backlog isn't properly saved

Assets 6

15 Aug 18:38

CryShana

v1.0.4

9fdcbca

Blacklisted URL patterns

You can now define a list of URL patterns to be blacklisted - similar to v1.0.3 with File URL pattern matching.

Unlike the file URL pattern matching where only file URLs are compared to patterns and accepted if they match any pattern - similar to a whitelist. This blacklist applies to all URLs, not only file URLs. Rules are the same.

Assets 6

15 Aug 17:54

CryShana

v1.0.3

8ec30de

File URL pattern matching

Implemented new file criteria - ability to filter out files based on file URL.

You can now define a list of URL patterns in the configuration file or using Web GUI.

Example URL patterns include:
somedomain.com/image/*
/original/*
/page/*/anotherpage/*

Beware that /image/ will only match URLs ending with /image/ and not /image/somethingelse or /image. This is why it is recommended to always add a * at the end.

Assets 6

14 Aug 13:46

CryShana

v1.0.2

61327b6

Blacklist fixed

This release contains a hotfix for the blacklist issue where blacklisted subdomains got skipped if the main domain was whitelisted. Blacklist is now checked before the whitelist.

Assets 6

01 Aug 20:22

CryShana

v1.0.1

4fa3509

Robots.txt functionality

added option to set User-Agent from config.js and WebGUI
added option to respect robots.txt on websites

Assets 6

31 Jul 14:02

CryShana

v1.0

ecfcda9

Initial release

v1.0

function rename

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: CryShana/CryCrawler

Hotfix for URL matching bug

Global crawl delay and small improvements

Blacklisted URL patterns

File URL pattern matching

Blacklist fixed

Robots.txt functionality

Initial release