This is a moderation bot that uses a Logistic Regression model in NLTK to classify posts as bannable, and then writes/updates the JSON to include an "is_bannable" label.
-
You can activate the necessary modules by running python -m venv .venv and calling "./venv/Scripts/activate".
-
To connect to your postgreSQL database, add the connection details in the query.py file.
-
If you would like to change the automod model to view different languages or use less sample data, you must write the changes in automod_model.py and run this file to change toxic_comment_classifier.pkl.
Note, this file has already been loaded and is the complete model with all data trained (this is why it is so large!). It would be a pain to download it, so think carefully before downloading without changing the model parameters.
- Then, run sh run_all.sh with the necessary JSON files added (and the output JSON changed) in query.py.
This program takes details for a postgreSQL database that has "post" fields, and creates a json of the post content and post IDs. From there, it turns this data into a json. The classifier reads this json for the post content and declares each post as being "bannable" or not. It gives this label to a new "is_bannable" label in the same json file, corresponding to each post!
Larry Freeman multi-lingual Set 478713 data points!