Your preferred open source focused crawler for the Deep Web.
Our aim is to create a blazing fast, fully customizable and robust crawler that is simple and handy to use.
Website | API Reference | Examples | Tutorial | PreferredAI
- Multi-threaded out of the box
- Structured crawling with JSoup integration
- Page Validation
- Automatic Retries
- Proxy support
Getting started with Venom is quick and easy. There are two ways to get started.
If you are starting out in a new project, you can consider cloning our Examples:
git clone https://github.com/PreferredAI/venom-examples.git
or, if you would like a more guided package, you can check out our Tutorial:
git clone https://github.com/PreferredAI/venom-tutorial.git
If you already have a project then just add Venom as a dependency to your pom.xml:
<dependency>
<!-- Venom: A focused crawler framework @ https://venom.preferred.ai/ -->
<groupId>ai.preferred</groupId>
<artifactId>venom</artifactId>
<version>[4.0,4.1)</version>
</dependency>
If you are new to Venom, we have created a set of exercises to get you up and sprinting. The exercises are bundled in our venom-tutorial. More information can be found on this page.
Think you are beyond exercises? Get started quickly with these few lines of code.
public class Example {
private static class VenomHandler implements Handler {
@Override
public void handle(Request request,
VResponse response,
Scheduler scheduler,
Session session,
Worker worker) {
String about = response.getJsoup().select(".about p").text();
System.out.println("ABOUT: " + about);
}
}
public static void main(String[] args) throws Exception {
try (Crawler c = Crawler.buildDefault().start()) {
Request r = new VRequest("https://venom.preferred.ai");
c.getScheduler().add(r, new VenomHandler());
}
}
}