These rake tasks create new files or override exisiting json files, so make sure if you have old data back them up. the results are some of these: boards.json, pins.json or users.json
Get all the boards and pins from user mdoroudi. Replace the username mdoroudi whith whatever username you want.
this creates two result files: pins.json
and boards.json
$ rake crawl:pins_boards:from_seed seed=mdoroudi
Get first 50 pins of the main page
this creates one result file: pins.json
$ rake crawl:pins_boards:pins_from_homepage
From the first page get the user of all the first 50 pins and crawl their boards and pins
this crates two result files: pins.json
and boards.json
$ rake crawl:pins_boards:from_homepage_deep
Given a user slug get all it's fololowers and followings, and for each get their follower and followings, the limit right no is 500 users
$ rake crawl:users:from_seed seed=mdoroudi
To analyze the data further you might want to load the data into mysql database, (right now it only pins and boards).
before creating tables, make sure you have a config/database.yml
file that almost looks like this but has your info in it
adapter: mysql2
encoding: utf8
host: localhost
database: pinterest
user: root
password:
and also create your database, in my case it's called pinterest
> create database pinterest
This process creates the following three table:
- users
- pins
- boards
$ rake create_tables:all
loads the json data into the corresponding tables
$ rake load_data:all
- User: creating following/follower relationship and chaning the user_crawler code to respect it (following_followers_rel branch)
- User: add has_many pins & has_many baords
- User: add a rake task to load them into database just like pins and boards
- Pins & Boards: add belong_to user, remove username from table (following_followers_rel branch)
- Pins: work on is_video
- Bring code/datastructures/graph.rb here so can be used