Skip to content

simonseo/instacrawler-privateapi

Repository files navigation

Instagram Crawler

This crawler was made because most of the crawlers out there seems to either require a browser or a developer account. This Instagram crawler utilizes a private API of Instagram and thus no developer account is required. However, it needs your Instagram account information as it uses your user endpoints.

Instagram may or may not approve of this method. It is known to regularly shut down user accounts that are suspected of traffic hoarding. Use at your own risk.

This README assumes, to an extent, the reader's knowledge of graphs and graph search algorithms. Regardless, there shouldn't be a big problem understanding it.

Installation

First install Instagram Private API. Kudos for a great project!

$ pip install git+https://github.com/ping/instagram_private_api.git@1.2.7

Then download or clone this project into a folder.

$ git clone https://github.com/simonseo/instacrawler-privateapi.git

Sign up to Instagram if you don't have an account. Take note of your username and password.

Get Crawlin'

Now if you try to run __init__.py in the project folder from a shell, it'll provide you with the command options. If this shows up, everything probably works. Also try python __init.py__ -h for more information regarding the options.

$ python __init__.py
usage: __init__.py [-h] -u USERNAME -p PASSWORD [-f TARGETFILE] [-t TARGET]

To get crawlin', you need to provide your

  1. Instagram username
  2. Instagram passwor
  3. either an Instagram ID (target -t) or a text file of Instagram IDs in each row (targetfile -f)

Examples

Single Root Node

In the case you want to start at one specific user node, provide the ID/username/handle with the option -t. selenagomez is a good place to start because this account is one of the most followed account.

$ python __init__.py -u <yourUsername> -p <yourPassword> -t selenagomez

Multiple Root Nodes

In the case you want to crawl from multiple user nodes, list the IDs in a separate file and pass the filename with the -f option. Example:

$ python __init__.py -u <yourUsername> -p <yourPassword> -f "people I stalk.txt"

In people I stalk.txt you should have accounts that you want to start at:

instagram
selenagomez
realdonaldtrump
president_vladimir_putin

Wait a bit and a folder will be made with crawled profiles as json files.

Config

Inside __init__.py, there is a config dictionary. Each config option is explained in the comments. Note that min_collect_media and max_collect_media is trumped if min_timestamp is provided as a number.

config = {
	'search_algorithm' : 'BFS',                             # Possible values: BFS, DFS
	'profile_path' : './profiles',                          # Path where output data gets saved
	'max_followers' : 10,                                   # How many followers per user to collect
	'max_following' : 15,                                   # how many follows to collect per user
	'min_collect_media' : 10,                               # how many media items to be collected per person. If time is specified, this is ignored
	'max_collect_media' : 10,                               # how many media items to be collected per person. If time is specified, this is ignored
	'max_collect_users' : 1000,                             # how many users to collect in total.
	# 'min_timestamp' : int(time() - 60*60*24*30*2)           # up to how recent you want the posts to be in seconds. If you do not want to use this, put None as value
	'min_timestamp' : None
}

About

Instagram crawler that utilizes a private API of Instagram. No developer account required.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages