A web scraping library for PHP with a nice fluent interface.
A fork of laravel/browser-kit-testing, repurposed to use with real HTTP requests.
Developed for a project I worked on at Sainsbury's.
PHP 7.1+ and Goutte 3.1+
The recommended way to install the library is through Composer.
Add rebelinblue/fluent-web-crawler
as a require dependency in your composer.json
file:
composer require rebelinblue/fluent-web-crawler
Create an instance of the Crawler
use REBELinBLUE\Crawler;
$crawler = new Crawler();
Visit a URL
$crawler->visit('http://www.example.com');
Interact with the page
$crawler->type('username', 'admin')
->type('password', 'password')
->press('Login');
// This can also be written as the following
$crawler->submitForm('Login', [
'username' => 'admin',
'password' => 'password',
]);
Check the response is as expected
if ($crawler->dontSeeText('Hello World')) {
throw new \Exception('The page does not contain the expected text');
}
For a full list of the available actions see api.md.
If you wish to customize the instance of Goutte which is used (or more likely, the instance of Guzzle), you can inject your own instance when constructing the class. For example, you may want to increase Guzzle's timeout
use Goutte\Client as GoutteClient;
use GuzzleHttp\Client as GuzzleClient;
$goutteClient = new GoutteClient();
$guzzleClient = new GuzzleClient([
'timeout' => 60,
]);
$goutteClient->setClient($guzzleClient);
$crawler = new Crawler($goutteClient);
Fluent Crawler is a wrapper around the following PHP libraries.
- Goutte web scraper.
- Symfony BrowserKit, CssSelector and DomCrawler.
- Guzzle HTTP client.