All notable changes to spatie/crawler
will be documented in this file.
- always add links to pool if robots shouldn't be respected
- refactor of internals
- make it possible to override
$defaultClientOptions
- Bump minimum required version of
spatie/robots-txt
to1.0.1
.
- Respect robots.txt
- improved extensibility by removing php native type hinting of url, queue and crawler pool Closures
- do not follow links that have attribute
rel
set tonofollow
- Support both
Illuminate
's andTighten
'sCollection
.
- fix bugs when installing into a Laravel app
- the
CrawlObserver
andCrawlProfile
are upgraded from interfaces to abstract classes - don't crawl
tel:
links
- fix endless loop
- add
setCrawlObservers
,addCrawlObserver
- fix
setMaximumResponseSize
(someday we'll get this right)
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- fix
setMaximumResponseSize
CONTAINS BUGS, DO NOT USE THIS VERSION
- add
setMaximumResponseSize
- fix for exception being thrown when encountering a malformatted url
- use
\Psr\Http\Message\UriInterface
for all urls - use Puppeteer
- drop support from PHP 7.0
- allow symfony 4 crawler
- added the ability to change the crawl queue
- more performance improvements
- performance improvements
- add
CrawlSubdomains
profile
- add crawl count limit
- add depth limit
- add JavaScript execution
- fix deps for PHP 7.2
- add
EmptyCrawlObserver
- refactor to make use of Symfony Crawler's
link
function
- fix bugs around relative urls
- add
CrawlInternalUrls
- make sure the passed client options are being used
- second attempt to fix detection of redirects
- fix detection of redirects
- fix the default timeout of 5 seconds
- set a default timeout of 5 seconds
- fix for non responding hosts
- fix for the accidental crawling of mailto-links
- improve performance by concurrent crawling
- make it possible to determine on which url a url was found
- Ignore
tel:
links when crawling
- Added
path
,segment
andsegments
functions toUrl
- Updated the required version of Guzzle to a secure version
- Fixed a bug where the crawler would not take query strings into account
- Fixed a bug where the crawler tries to follow JavaScript links
- Add support for DomCrawler 3.x
- Fix for normalizing relative links when using non-80 ports
- Add support for custom ports
- Lower required php version to 5.5
- Make url's case sensitive
- First release