Skip to content

Sample project for using stylometry to deanonymize Twitter account author.

License

Notifications You must be signed in to change notification settings

salhasalman/stylometry

 
 

Repository files navigation

Stylometry

Sample project for using stylometry to deanonymize Twitter account author.

Instructions

1. Install the dependencies and run Python:

    $ pip install tweepy numpy unidecode nltk scipy sklearn
    $ python3

In Python, import nltk and download Model punkt. ​

    >>> import nltk
    >>> nltk.download()

2. Download files:

    $ git clone https://github.com/ViliamV/stylometry.git
    $ cd stylometry/

3. Get Twitter API credentials

  • Follow these steps.
  • Input credentials into twitter-API.txt

4. Download tweets

  • Create accounts.txt in main directory and put there account's names to download, one in each line. Put the unknown author's account last.
  • Create directory data in main directory.
  • Run tweet-downloader.py and wait. Due to Twitter API speed, it might take a while.
  • Verify if data contains downloaded tweets.

5. Run stylometry

  • Edit classification.py and change value UNKNOWN (line 28) to unknown author's account.
    UNKNOWN="example_account"
  • Run classification.py.

About classification

This code uses Bag of Words model for extracting features from the text. A great introduction for implementing this model can be found here.

The code also uses Czech stopwords and Czech tokenizer, however, it is quite simple to change it.

About

Sample project for using stylometry to deanonymize Twitter account author.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%