Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error While Scraping Comments: json.decoder.JSONDecodeError #1097

Open
mscarl opened this issue Apr 11, 2024 · 1 comment
Open

Error While Scraping Comments: json.decoder.JSONDecodeError #1097

mscarl opened this issue Apr 11, 2024 · 1 comment

Comments

@mscarl
Copy link

mscarl commented Apr 11, 2024

Hi, I'm relatively new to python and I've been trying to scrape some Facebook comments using your code. When scraping from a couple of posts I've gotten the following error:

json.decoder.JSONDecodeError: Extra data: line 1 column 31109 (char 31108)
Traceback (most recent call last):
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\facebook_scraper\utils.py", line 279, in safe_consume
for item in generator:
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\facebook_scraper\extractors.py", line 1139, in extract_comment_replies
data = json.loads(response.text[prefix_length:]) # Strip 'for (;;);'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.1008.0_x64__qbz5n2kfra8p0\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^

Here's the code I'm using to scrape comments:

from pprint import pprint
from facebook_scraper import *
import logging
import os
import json
import pandas as pd
from tqdm import tqdm
import time

post_ids = ["https://www.facebook.com/EurovisionSongContest/posts/pfbid02uj7TGbujB8KrytL9H1qnTqYGV2xFnDnLzvN4ntGXopcusdzcawPzwT78NUziMQoql"]
cookies = "www.facebook.com_cookies.txt"
set_cookies(cookies)

options = {"comments": True, "progress": True, "allow_extra_requests": True}

def format_comment(c):
    obj = {
        "comment_id": c["comment_id"],
        "comment_text": c["comment_text"]
    }
    return obj

fb_comments = []
post = next(get_posts(post_urls=post_ids, options=options))
for comment in post["comments_full"]:
    fb_comments.append(format_comment(comment))
    for reply in comment["replies"]:
        fb_comments.append(format_comment(reply))
pd.DataFrame(fb_comments).to_csv("Winner_2022.csv", index=False)

Any help would be greatly appreciated.

@conventoangelo
Copy link

Just had this error as well. I guess the error happens when you've scraped too much, and it blocks your IP address, possibly sending you a JSON return error message longer than what is expected. I haven't tried to print the JSON in the terminal yet to see if that's true. I'm just inferring from the decoder.py. What worked for me is just simply connecting to a different country, as a different server in my VPN does not work. Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants