Skip to content

Latest commit

 

History

History
180 lines (162 loc) · 6.1 KB

CHANGELOG.md

File metadata and controls

180 lines (162 loc) · 6.1 KB

Broken Link Finder Change Log

v0.0.0 (TEMPLATE - DO NOT EDIT)

Added

  • ...

Changed/Removed

  • ...

Fixed

  • ...

v0.12.2

Added

  • Updated to Ruby 3.3 and updated production dependencies including Wgit (v0.11)
  • Added --js and --js-delay flag options to the executable. This allows JS parsing to update a page's DOM before it get crawled.

Changed/Removed

  • ...

Fixed

  • ...

v0.12.1

Added

  • Support for Ruby 3.

Changed/Removed

  • Removed support for Ruby 2.5 (as it's too old).

Fixed

  • ...

v0.12.0

Added

  • BrokenLinkFinder::link_xpath and link_xpath= methods so you can customise how links are extracted from each crawled page using the API.
  • An --xpath (or just -x) command line flag so you can customise how links are extracted when using the command line.

Changed/Removed

  • Changed the default way in which links are extracted from a page. Previously any element with a href or src attribute was extracted and checked; now only those links inside the <body> are extracted and checked, ignoring the <head> section entirely. You can change this behaviour back with: BrokenLinkFinder::link_xpath = '//*/@href | //*/@src' before you perform a crawl. Alternatively, if using the command line, use the --xpath //*/@href | //*/@src option.

Fixed


v0.11.1

Added

  • ...

Changed/Removed

  • Updated wgit gem to version 0.9.0 which contains improvements and bugs fixes.

Fixed

  • ...

v0.11.0

Added

  • Additional crawl statistics.
  • Exit code handling to executable. 0 for success, 1 for an error scenario.

Changed/Removed

  • Updated the report formats slightly bringing various improvements such as the total number of links crawled etc.

Fixed

  • Bug in html report, summary url is now an <a> link.
  • Bug in Finder@broken_link_map URLs and Finder#crawl_stats[:url] URL during redirects.
  • Bug causing an error on crawling unparsable/invalid URL's.

v0.10.0

Added

  • A --html flag to the crawl executable command which produces a HTML report (instead of text).
  • Added a 'retry' mechanism for any broken links found. This is essentially a verification step before generating a report.
  • Finder#crawl_stats for info such as crawl duration, total links crawled etc.

Changed/Removed

  • The API has changed somewhat. See the docs for the up to date code signatures if you're using broken_link_finder outside of its executable.

Fixed

  • ...

v0.9.5

Added

  • ...

Changed/Removed

  • Now using optimistic dep versioning.
  • Updated wgit to version 0.5.1 containing improvements and bug fixes.

Fixed

  • ...

v0.9.4

Added

  • ...

Changed/Removed

  • Updated wgit gem to version 0.5.0 which contains improvements and bugs fixes.

Fixed

  • ...

v0.9.3

Added

  • ...

Changed/Removed

  • ...

Fixed

  • A bug resulting in some servers dropping crawl requests from broken_link_finder.

v0.9.2

Added

  • ...

Changed/Removed

  • Updated wgit gem to version 0.4.0 which brings a speed boost to crawls.

Fixed

  • ...

v0.9.1

Added

  • BrokenLinkFinder::Finder.crawl_site alias: crawl_r.

Changed/Removed

  • Upgraded wgit to v0.2.0.
  • Refactored the code base (no breaking changes).

Fixed

  • ...

v0.9.0

Added

  • The version command to the executable.
  • The --threads aka -t option to the executable's crawl command to control crawl speed vs. resource usage.

Changed/Removed

  • Changed the default number of maximum threads for a recursive crawl from 30 to 100. Users will see a speed boost with increased resource usage as a result. This is configurable using the new crawl command option e.g. --threads 30.

Fixed

  • Several bugs by updating the wgit dependancy.
  • A bug in the report logic causing an incorrect link count.

v0.8.1

Added

  • ...

Changed/Removed

  • ...

Fixed

  • Updated wgit dep containing bug fixes.

v0.8.0

Added

  • Logic to prevent re-crawling links for more efficiency.

Changed/Removed

  • Updated the wgit gem which fixes a bug in crawl_site and adds support for IRI's.

Fixed

  • Bug where an error from the executable wasn't being rescued.

v0.7.0

Added

  • Added the --verbose flag to the executable for displaying all ignored links.
  • Added the --concise flag to the executable for displaying the broken links in summary form.
  • Added the --sort-by-link flag to the executable for displaying the broken links found and the pages containing that link (as opposed to sorting by page by default).

Changed/Removed

  • Changed the default sorting (format) for ignored links to be summarised (much more concise) reducing noise in the reports.
  • Updated the README.md to reflect the new changes.

Fixed

  • Bug where the broken/ignored links weren't being ordered consistently between runs. Now, all links are reported alphabetically. This will change existing report formats.
  • Bug where an anchor of # was being returned as broken when it shouldn't.

v0.6.0

Added

  • Support for ignored links e.g. mailto's, tel's etc. The README has been updated.

Changed/Removed

  • Only HTML files now have their links verified, JS files for example, do not have their contents checked. This also boosts crawl speed.
  • Links are now reported exactly as they appear in the HTML (for easier location after reading the reports).

Fixed

  • Links with anchors aren't regarded as separate pages during a crawl anymore, thus removing duplicate reports.

v0.5.0

Added

  • Anchor support is now included meaning the response HTML must include an element with an ID matching that of the anchor in the link's URL; otherwise, it's regarded as broken. Previously, there was no anchor support.
  • The README now includes a How It Works section detailing what constitutes a broken link. See this for more information.

Changed/Removed

  • Any element with a href or src attribute is now regarded as a link. Before it was just <a> elements.

Fixed

  • ...