Skip to content

FmKnight/Selenium-Twitter-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Selenium Twitter Scraper

This twitter scraper use selenium to crawl data from twitter without authentication.

Feature

  • Use Redis to deal with duplicate crawel
  • Use Mysql to store data
  • Use python database ORM SQLalchemy

Install

If you want to use latest version, install from source. To install twitter-scraper from source, follow these steps:

Linux and macOS:

git clone git@github.com:FmKnight/Selenium-Twitter-Scraper.git
cd Selenium-Twitter-Scraperhttps://github.com/FmKnight/Selenium-Twitter-Scraper
pip3 install -r requirements.txt

Run

1、crawel tweets

tweet_craweler.py : run this py file to get specific keywords tweets.Contain following fields:

  • user_name
  • user_id
  • date
  • content
  • reply
  • retweet
  • like

2、crawel user info

user_info_craweler.py: run this py file to get specific user's info.Contain following fields:

  • following
  • followers

Result

Change Log

v0.6.3(2021/04/22 15:50)

  • change tweet duplicate detection way from user_id+time to sha256 digest of tweet content
  • add logs to monitor running process

v0.6.2(2021/04/21 21:50)

  • change crawl way from one-time to time-span-based
  • refactor the running process,add more condition judge

v0.6.1(2021/04/20 21:50)

  • Crawel tweets of specific keywords
  • Crawel specific user's info

About

Use Selenium to get data from Twitter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages