Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lukaszkujawa committed Dec 30, 2013
1 parent 7e59e0d commit 9d730d8
Showing 1 changed file with 41 additions and 0 deletions.
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
node-web-crawler
================
Easy and fast Web Crawler

## why NodeJs?

Web Crawler spends most of his time on reading/writing to netwrok, database and files. NodeJs implements the non-blocking I/O model which makes it a perfect tool for the job.

## requirements

- NodeJs >= 0.10.21
- CouchDB

## installation

If you don't have the Node installed or apt-get returns an old version:
```
$ curl -O http://nodejs.org/dist/v0.10.24/node-v0.10.24.tar.gz
$ tar -xzf node-v0.10.24.tar.gz
$ cd node-v0.10.24
$ ./configure
$ make
$ sudo make install
```


```
$ apt-get install couchdb
$ git clone https://github.com/lukaszkujawa/node-web-crawler.git
$ cd node-web-crawler/
$ npm install
```

## run
```
$ node crawler.js conf.example.json
``
The crawler will scrape your local copy of CouchDB manual and save it to "example" database. You can browse results at http://127.0.0.1:5984/_utils/database.html?example/_design/documents/_view/all

0 comments on commit 9d730d8

Please sign in to comment.