Pinpoint

⚠️ This repository is based on PhD research that seeks to identify radicalisation on online platforms. Due to this; text, themes, and content relating to far-right extremism are present in this repository. Please continue with care. ⚠️ Samaritans - Call 116 123 \| ACT Early \| actearly.uk \| Prevent advice line 0800 011 3764

📍 Pinpoint is a suite of functionality for building and using a binary classifier for the identification of extremist content. 💻

Pinpoint

Pinpoint is a suite of functionality for building a Gaussian classifier for the identification of far-right extremist content. This tooling builds off the methodology in the paper Radical Mind: Identifying Signals to Detect Extremist Content on Twitter by Mariam Nouh, Jason R.C. Nurse, and Michael Goldsmith' .

Installation

python -m pip install git+https://github.com/CartographerLabs/Pinpoint.git

Datasets

Parler dataset

A dataset was acquired from A Large Open Dataset from the Parler Social Network. This dataset was further broken into two separate datasets using the Log-Likelihood tooling from the Parler Toolbox repository. For this, 100 posts in the dataset were manually marked as either violent extremist or non-extremist, and using the tooling a list of the top 30 keywords relating to violent-far-right extremism were identified. A subsection of these can be seen below:

genocidal
fire
destroyers
democraticnazi
fucker
tribunals
invoke
squad
punch
tyrannical

After these violent-extremist words were aggregated the dataset was split with text posts containing the keywords being marked as violent-far-right-extremist and those without marked as a baseline. After this text posts were converted to CSV and marked up with the LIWC Text Analysis Engine.

Stormfront dataset

The second dataset, used for developing a known radical corpus, was extracted from Hate speech dataset from a white supremacist forum and converted to CSV format.

Example Usage

from Pinpoint.FeatureExtraction import *
from Pinpoint.RandomForest import *

# Performs feature extraction from the provided Extremist, Counterpoise, and Baseline datasets.
extractor = feature_extraction(violent_words_dataset_location=r"datasets/swears",
                               baseline_training_dataset_location=r"datasets/far-right/LIWC2015 Results (Storm_Front_Posts).csv")

extractor.MAX_RECORD_SIZE = 250000

extractor.dump_training_data_features(
    feature_file_path_to_save_to=r"outputs/training_features.json",
    extremist_data_location=r"datasets/far-right/LIWC2015 Results (extreamist-messages.csv).csv",
    baseline_data_location=r"datasets/far-right/LIWC2015 Results (non-extreamist-messages.csv).csv")

# Trains a model off the features file created in the previous stage
model = random_forest()

model.RADICAL_LANGUAGE_ENABLED = True
model.BEHAVIOURAL_FEATURES_ENABLED = True
model.PSYCHOLOGICAL_SIGNALS_ENABLED = True

model.train_model(features_file= r"outputs/training_features.json",
                  force_new_dataset=True, model_location=r"outputs/far-right-baseline.model")

model.create_model_info_output_file(location_of_output_file="outputs/far-right-baseline-output.txt",
                                    training_data_csv_location=r"outputs/training_features.json.csv")

Outputs

Once trained and a model created it will be pickled and saved as a re-loadable file in the tooling’s output directory for future use. In addition to this a text file is also created detailing the specifications and related accuracy scores of the created model - examples of these have been provided in the provided folder.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
Pinpoint		Pinpoint
datasets		datasets
outputs		outputs
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Pinpoint.png		Pinpoint.png
README.md		README.md
Untitled design.png		Untitled design.png
core.py		core.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pinpoint

Installation

Datasets

Parler dataset

Stormfront dataset

Example Usage

Outputs

About

Uh oh!

Languages

License

CartographerLabs/Pinpoint

Folders and files

Latest commit

History

Repository files navigation

Pinpoint

Installation

Datasets

Parler dataset

Stormfront dataset

Example Usage

Outputs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages