A Python script that fetches Instagram posts from a specified account using a GraphQL endpoint. It supports batching, pagination, and resumable downloads, saving your results into JSON format. It also handles encoding issues for robust JSON loading.
- Batch Fetching: Pull a specified number of Instagram posts in batches.
- Pagination: Uses
after_cursorto fetch subsequent pages automatically. - Resumable: Keeps track of the last fetched cursor in a state file (
resume_file) so you can resume fetching later. - Flexible Encoding: Includes error-handling for different text encodings (
utf-8,latin-1, etc.).
-
Python 3.7 or later
-
Requests library
pip install requests
-
A valid Instagram GraphQL
doc_idor endpoint (the example usesdoc_id="7898261xxxxxxxxxxx", but this can change).
-
Clone the repository (or download the script files):
git clone https://github.com/hitthecodelabs/InstagramPostsFetcher.git
-
Install Dependencies:
cd InstagramPostsFetcher pip install -r requirements.txt -
If you don't have a requirements file, just ensure
requestsis installed:pip install requests
-
Update the Code:
- Replace
YOUR_ACCESS_TOKENwith your token if needed (or remove the Authorization header if you’re not using it). - Make sure your
doc_idis correct or aligned with the currently valid Instagram GraphQL endpoints.
- Replace
-
Run the Script:
python ig_posts_fecther.py
By default, it will:
- Fetch posts from the specified username (
username). - Save them to a JSON file (
output_file, e.g.,instagram_posts_<username>.json). - Keep track of pagination state in a resume file (
resume_file, e.g.,resume_state_<username>.json).
- Fetch posts from the specified username (
-
Check the Output:
- After running, open or parse the JSON file to verify the fetched posts.
- If you stop the script or if it stops due to an error, just run it again to resume where it left off.
-
fetch_instagram_posts(username, after_cursor, post_count)
Makes a GET request to the Instagram GraphQL endpoint to fetch a specific batch of posts. -
load_resume_state(resume_file)
Loads the last pagination state from a JSON file so you can resume from the last cursor. -
save_resume_state(resume_file, after_cursor, post_count)
Saves the pagination state after each successful fetch. -
load_existing_posts(output_file)
Loads previously saved posts so you can append newly fetched data. -
iterate_and_save_posts(username, output_file, resume_file, batch_size)
Orchestrates fetching in batches until there are no more posts or an error occurs. -
load_json_file(file_path)
Demonstrates safe reading of JSON files with fallback encodings.
- If Invalid Token errors occur, check your token or authorization method.
- If the
doc_idis invalid or outdated, you may see errors from Instagram's GraphQL endpoint. Make sure to update to a valid endpoint ordoc_id.
- Fork the project.
- Create your feature branch (
git checkout -b feature/NewFeature). - Commit your changes (
git commit -m 'Add a new feature'). - Push to the branch (
git push origin feature/NewFeature). - Open a Pull Request.
This project is licensed under the MIT License. You are free to modify and use the code. Refer to the license file for details.