Skip to content

PreferredAI/venom

Repository files navigation

Venom

Your preferred open source focused crawler for the Deep Web.

Maven Central Build Status Coverage Status Javadocs

Overview

Our aim is to create a blazing fast, fully customizable and robust crawler that is simple and handy to use.

Quick links

Website | API Reference | Examples | Tutorial | PreferredAI

Features

  • Multi-threaded out of the box
  • Structured crawling with JSoup integration
  • Page Validation
  • Automatic Retries
  • Proxy support

Getting started

Getting started with Venom is quick and easy. There are two ways to get started.

Clone our examples or tutorial

If you are starting out in a new project, you can consider cloning our Examples:

git clone https://github.com/PreferredAI/venom-examples.git

or, if you would like a more guided package, you can check out our Tutorial:

git clone https://github.com/PreferredAI/venom-tutorial.git

Add a dependency

If you already have a project then just add Venom as a dependency to your pom.xml:

<dependency>
    <!-- Venom: A focused crawler framework @ https://venom.preferred.ai/ -->
    <groupId>ai.preferred</groupId>
    <artifactId>venom</artifactId>
    <version>[4.0,4.1)</version>
</dependency>

Tutorial

If you are new to Venom, we have created a set of exercises to get you up and sprinting. The exercises are bundled in our venom-tutorial. More information can be found on this page.

Example

Think you are beyond exercises? Get started quickly with these few lines of code.

public class Example {
 
    private static class VenomHandler implements Handler {
 
        @Override
        public void handle(Request request,
                           VResponse response,
                           Scheduler scheduler,
                           Session session,
                           Worker worker) {
 
            String about = response.getJsoup().select(".about p").text();
            System.out.println("ABOUT: " + about);
 
        }
 
    }
 
    public static void main(String[] args) throws Exception {
        try (Crawler c = Crawler.buildDefault().start()) {
            Request r = new VRequest("https://venom.preferred.ai");
            c.getScheduler().add(r, new VenomHandler());
        }
    }
 
}

License

Apache License 2.0