Snippet sniffer allows you to extract code snippets from any websites.
This library allows you
- To get code snippets using search engine api (Google)
- To get code snippets from any web page by crawling url seeds.
$ composer require snippetify/snippet-sniffer
use Snippetify\SnippetSniffer\SnippetSniffer;
// Configurations
$config = [
// Required
// Search engine api configuration keys
'provider' => [
"cx" => "your google Search engine ID",
"key" => "your google API key"
'name' => 'provider name (google)',
],
// Optional
// Useful for adding meta information to each snippet
'app' => [
"name" => "your App name",
'version' => 'your App version',
],
// Optional
// Useful for logging
'logger' => [
"name" => "logger name",
'file' => 'logger file path',
]
];
// Required
// Your query
$query = "your query";
// Optional
// Meta params
$meta = [
"page" => 1,
"limit" => 10,
];
// Fetch snippets
// @return Snippetify\SnippetSniffer\Common\Snippet[]
$snippets = SnippetSniffer::create($config)->fetch($query, $meta);
/*
* Snippet object public attributes [
* title: string,
* code: string,
* description: string,
* tags: array, // Array of string, also contains the snippet language
* meta: array
*]
*/
Providers allow you to get a stack of seeds(urls to scrape) from search engine API. Only Google search engine API is supported at this time, but you can create your own.
use Snippetify\SnippetSniffer\Providers\GoogleProvider;
// Search engine api configuration keys
$config = [
"cx" => "your google Search engine ID",
"key" => "your google API key"
];
// Your query
$query = "your query";
// Meta params
$meta = [
"page" => 1,
"limit" => 10,
];
// url seeds
// @return GuzzleHttp\Psr7\Uri[]
$urlSeeds = GoogleProvider::create($config)->fetch($query, $meta);
- Git clone the project
- Create your new class in the
Snippetify\SnippetSniffer\Providers
folder - Each provider implements
Snippetify\SnippetSniffer\Providers\ProviderInterface
- Take a look at
Snippetify\SnippetSniffer\Providers\GoogleProvider
to get you helped - Your fetch method must return an array of
Psr\Http\Message\UriInterface
- Add it in the providers stacks in the
Snippetify\SnippetSniffer\Core.php
- Write tests. Take a look at
Snippetify\SnippetSniffer\Tests\Providers\GoogleProviderTest
to get you helped - Send a pull request to us
- Your provider must implement
Snippetify\SnippetSniffer\Providers\ProviderInterface
- Take a look at
Snippetify\SnippetSniffer\Providers\GoogleProvider
to get you helped - Your fetch method must return an array of
Psr\Http\Message\UriInterface
- Pass your new provider in the configuration parameter or use the
addProvider
method
use Snippetify\SnippetSniffer\SnippetSniffer;
// Use Configurations
$config = [
"providers" => [
"provider_name" => ProviderClass::class,
"provider_2_name" => Provider2Class::class // You can add as many as you want
]
];
// Or use addProvider method as follow
SnippetSniffer::create(...)
->addProvider('provider_name', ProviderClass::class)
->addProvider('provider_2_name', Provider2Class::class) // You can add as many as you want
...
Scrappers allow you to scrape html page and extract the snippets.
use GuzzleHttp\Psr7\Uri;
use Snippetify\SnippetSniffer\Scrapers\DefaultScraper;
// Configurations
$config = [
// Optional
// Useful for adding meta information to each snippet
'app' => [
"name" => "your App name",
'version' => 'your App version',
],
// Optional
// Useful for logging
'logger' => [
"name" => "logger name",
'file' => 'logger file path',
]
];
// Your url
$urlSeed = "website url to scrape";
// Fetch snippets
// @return Snippetify\SnippetSniffer\Common\Snippet[]
$snippets = (new DefaultScraper($config))->fetch(new Uri($urlSeed));
- Git clone the project
- Create your new class in the
Snippetify\SnippetSniffer\Scrapers
folder - Each scraper implements
Snippetify\SnippetSniffer\Scrapers\ScraperInterface
- Take a look at
Snippetify\SnippetSniffer\Scrapers\StackoverflowScraper
to get you helped - Your fetch method must return an array of
Snippetify\SnippetSniffer\Common\Snippet
- Add it in the scrapers stacks in the
Snippetify\SnippetSniffer\Core.php
- Write tests. Take a look at
Snippetify\SnippetSniffer\Tests\Scrapers\StackoverflowScraperTest
to get you helped - Send a pull request to us
- Your scraper must implement
Snippetify\SnippetSniffer\Scrapers\ScraperInterface
- Take a look at
Snippetify\SnippetSniffer\Scrapers\StackoverflowScraper
to get you helped - Your fetch method must return an array of
Snippetify\SnippetSniffer\Common\Snippet
- Pass your new scraper in the configuration parameter or use the
addScraper
method
use Snippetify\SnippetSniffer\SnippetSniffer;
// Important: Scrapper's name must be the website uri without the scheme. i.e. vuejs.org
// Configurations
$config = [
"scrapers" => [
"scraper_name" => ScraperClass::class,
"scraper_2_name" => Scraper2Class::class // You can add as many as you want
]
];
// Or use addProvider method as follow
SnippetSniffer::create(...)
->addScraper('scraper_name', ScraperClass::class)
->addScraper('scraper_2_name', Scraper2Class::class) // You can add as many as you want
...
Snippet crawler allows you to extract all snippets from a website by crawling it.
use Snippetify\SnippetSniffer\WebCrawler;
// Optional
$config = [...];
// @return Snippetify\SnippetSniffer\Common\MetaSnippetCollection[]
$snippets = WebCrawler::create($config)->fetch(['your uri']);
$config = [
// Required
// Search engine api configuration keys
// https://developers.google.com/custom-search/v1/introduction
'provider' => [
"cx" => "your google Search engine ID",
"key" => "your google API key"
'name' => 'provider name (google)',
],
// Optional
// Useful for adding meta information to each snippet
'app' => [
"name" => "your App name",
'version' => 'your App version',
],
// Optional
// Useful for logging
'logger' => [
"name" => "logger name",
'file' => 'logger file path',
],
// Optional
// Useful for scraping
"html_tags" => [
"snippet" => "pre[class] code, div[class] code, .highlight pre, code[class]", // Tags to fetch snippets
"index" => "h1, h2, h3, h4, h5, h6, p, li" // Tags to index
],
// Optional
// Useful for adding new scrapers
// The name must be the website host without the scheme i.e. not https://foo.com but foo.com
"scrapers" => [
"scraper_name" => ScraperClass::class,
"scraper_2_name" => Scraper2Class::class // You can add as many as you want
],
// Optional
// Useful for adding new providers
"providers" => [
"provider_name" => ProviderClass::class,
"provider_2_name" => Provider2Class::class // You can add as many as you want
],
// Optional
// Useful for web crawling
// Please follow the link below for more information as we use Spatie crawler
// https://github.com/spatie/crawler
"crawler" => [
"langs" => ['en'],
"profile" => CrawlSubdomainsAndUniqueUri::class,
"user_agent" => 'your user agent',
"concurrency" => 10,
"ignore_robots" => false,
"maximum_depth" => null,
"execute_javascript" => false,
"maximum_crawl_count" => null,
"parseable_mime_types" => 'text/html',
"maximum_response_size" => 1024 * 1024 * 3,
"delay_between_requests" => 250,
]
];
Please see CHANGELOG for more information what has changed recently.
You must set the PROVIDER_NAME, PROVIDER_CX, PROVIDER_KEY, CRAWLER_URI, DEFAULT_SCRAPER_URI, STACKOVERFLOW_SCRAPER_URI keys in phpunit.xml file before running tests.
Important: Those links must contains at least one snippet otherwise the tests will failed. The Stackoverflow uri must be a question link with an accepted answer otherwise the tests will failed.
composer test
Please see CONTRIBUTING for details.
The MIT License (MIT). Please see License File for more information.