Skip to content

成片的Crawled (200) #343

Open
Open
@Macedonialapadian

Description

@Macedonialapadian

如图所示,comment正常爬取时,返回的都是DEBUG: Scraped from <200 URL>{content}。但是图中出现了成片DEBUG: Crawled (200) (referer: None)的形式。出现此种问题后,comment.py往往会飞快地结束。(可能是直接跳过了无法爬取的微博)。

我对comment.py做了改动,改动是将tweet_id加入了对应评论的数据集中。(见附件)
同时,我将setting.py中的并行数从16改为8,将随机请求时间上限从1改为5

截屏2024-09-22 15 02 55 [comment.py.zip](https://github.com/user-attachments/files/17088611/comment.py.zip)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions