Twitter Account Data Crawler

A 'smol' program that crawls following/followers/statuses count data from Twitter account profile page using Selenium, and put the crawled data into MySQL database using PyMySQL.

The purpose of this program is to record the followers count daily and see how the count changes everyday. MAYBE THIS IS NOT PRODUCTION-READY, so use this with caution!

Why? You Can Simply Use Twitter API, Aren't You?

YES, I HAD. but one day Twitter suspended my API application, even though I didn't overuse or abuse it! ~~Probably this is an Elon thing~~

Source code of original implementation, which uses Twitter API using python-twitter, is stored in old branch.

Deal With Docker

Dockerfile is ready, in both current and old(original) source tree.

To build:

$ cd <root-directory-of-source>
$ docker build -t twitter-account-data-crawler:latest .

After build, run:

$ docker run -d \
             --name twitter-account-data-crawler \
             -v <path-of-config.yaml>:/app/config/config.yaml \
             twitter-account-data-crawler

You have to prepare configuration file(config.yaml). Please refer the example config file and create your own.

If you're using Podman, just replace docker with podman in command line.

Deal Without Docker

You may still run the program without Docker or OCI-compliant runtimes.

To get this work:

$ cd <root-direvtory-of-source>
# Install requirements
$ pip install -r requirements.txt
# and run!
$ python index.py

Configuration file(config.yaml) should be exist in config folder.

Database Table Structure

Currently only MySQL(and probably MySQL-based DBMS like MariaDB) is supported.

Creating tables per target account is recommended.

The table at least should have these columns:

date: type date
following_count: type int, unsigned
follower_count: type int, unsigned
tweet_count: type int, unsigned

An example SQL query for these columns:

CREATE TABLE `account_track_table` (
  `date` date NOT NULL,
  `following_count` int UNSIGNED NOT NULL,
  `follower_count` int UNSIGNED NOT NULL,
  `tweet_count` int UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
config		config
crawlers		crawlers
docs		docs
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
__init__.py		__init__.py
config.py		config.py
const.py		const.py
db.py		db.py
index.py		index.py
requirements.txt		requirements.txt
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Account Data Crawler

Why? You Can Simply Use Twitter API, Aren't You?

Deal With Docker

Deal Without Docker

Database Table Structure

About

Releases

Languages

License

somnisomni/twitter-account-data-crawler

Folders and files

Latest commit

History

Repository files navigation

Twitter Account Data Crawler

Why? You Can Simply Use Twitter API, Aren't You?

Deal With Docker

Deal Without Docker

Database Table Structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages