You have to install the required packages using pip
:
pip install -r requirements.txt
The application additionally needs the following NLTK data packages:
punkt
- Punkt Tokenizer Models
For installation instructions please visit http://www.nltk.org/data.html.
python cli.py [options]
The first option is the switch between the crawl cli and the analyze cli:
crawl
crawls tweets from the (default: 100) most popular twitter users and stores them on disk.analyze
analyzes social media status updates in order to determine, whether an account was compromised or not.evaluate
evaluates the anomaly detection approach with cross-validation.
-o /--output-path OUTPUT_PATH
The output path of the generated dataset.--user-limit USER_LIMIT
The maximum number of accounts to crawl.--limit LIMIT
The maximum number of status updates per user to crawl. (default: 100)
-ut / --user-data-source USER_DATA_SOURCE
The data source for tweets of the user that should be analyzed. Possible values arefth
,mp
andtwitter
.-uu / --user-twitter-id USER_TWITTER_ID
The id of the twitter user, whose status updates should be analyzed.-up / --user-dataset-path USER_DATASET_PATH
The path of the dataset of the user data source.-et / --ext-data-source EXT_DATA_SOURCE
The data source for external tweets not written by the user. Possible values arefth
,mp
andtwitter
.-ep / --ext-dataset-path EXT_DATASET_PATH
The path of the dataset of the external data source.-c / --classifier-type CLASSIFIER_TYPE
The type of the classifier to be trained. Possible values aredecision_tree
,one_class_svm
,isolation_forest
andperceptron
.--no-scaling
Disable feature scaling.
-t / --data-source DATA_SOURCE
The data source for tweets that should be used for cross-validation. Possible values arefth
,mp
andtwitter
.-p / --dataset-path DATASET_PATH
The path of the dataset that should be used for cross-validation.-c / --classifier-type CLASSIFIER_TYPE
The type of the classifier to be trained. Possible values aredecision_tree
,one_class_svm
,isolation_forest
andperceptron
.--no-scaling
Disable feature scaling.
# Crawl 50 most popular users
python cli.py -a crawl -o output.csv --user-limit 50
# Analyze twitter account of sebsatian_kliem against status updates from the 'Follow the Hashtag' dataset using the perceptron classifier
python cli.py analyze -ut twitter -uu sebastian_kliem -et fth -ep data/follow_the_hashtag_usa.csv -c perceptron
# Run the app locally (DO NOT use this in production)
./run_app_dev.sh
The app takes a twitter url from a specific user as input and uses the perceptron classifier.