deep-learning-content-moderation

Various sources for deep learning based content moderation, sensitive content detection, scene genre classification, nudity detection, violence detection, substance detection from text, audio, video & image input modalities.

citation

If you find this source useful, please consider citing it in your work as:

@INPROCEEDINGS{10193621,
  author={Akyon, Fatih Cagatay and Temizel, Alptekin},
  booktitle={2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)}, 
  title={State-of-the-Art in Nudity Classification: A Comparative Analysis}, 
  year={2023},
  pages={1-5},
  keywords={Analytical models;Convolution;Conferences;Transfer learning;Benchmark testing;Transformers;Safety;content moderation;nudity detection;safety;transformers},
  doi={10.1109/ICASSPW59220.2023.10193621}}

@article{akyon2022contentmoderation,
  title={Deep Architectures for Content Moderation and Movie Content Rating},
  author={Akyon, Fatih Cagatay and Temizel, Alptekin},
  journal={arXiv},
  doi={https://doi.org/10.48550/arXiv.2212.04533},
  year={2022}
}

datasets

movie and content moderation datasets

name	paper	year	url	input modality	task	labels
LSPD	pdf	2022	page	image, video	image/video classification, instance segmentation	porn, normal, sexy, hentai, drawings, female/male genital, female breast, anus
MM-Trailer	pdf	2021	page	video	video classification	age rating
Movienet	scholar	2021	page	image, video, text	object detection, video classification	scene level actions and places, character bboxes
Movie script severity dataset	pdf	2021	github	text	text classification	frightening, mild, moderate, severe
LVU	pdf	2021	page	video	video classification	relationship, place, like ration, view count, genre, writer, year per movie scene
Violence detection dataset	scholar	2020	github	video	video classification	violent, not-violent
Movie script dataset	pdf	2019	github	text	text classification	violent or not
Nudenet	github	2019	archive.org	image	image classification	nude or not
Adult content dataset	pdf	2017	contact	image	image classification	nude or not
Substance use dataset	pdf	2017	first author	image	image classification	drug related or not
NDPI2k dataset	pdf	2016	contact	video	video classification	porn or not
Violent Scenes Dataset	springer	2014	page	video	video classification	blood, fire, gun, gore, fight
VSD2014	pdf	2014	download	video	video classification	blood, fire, gun, gore, fight
AIIA-PID4	pdf	2013	-	image	image classification	bikini, porn, skin, non-skin
NDPI800 dataset	scholar	2013	page	video	video classification	porn or not
HMDB-51	scholar	2011	page	video	video classification	smoke, drink

techniques

sensitive content detection

movie content rating

name	paper	year	model	features	datasets	tasks	context
Movies2Scenes: Learning Scene Representations Using Movie Similarities	scholar	2022	ViT-like video encoder + MLP	ViT-like video encoder embedings	Private, Movienet, LVU	movie scene representation learning, video classifcation (sex, violence, drug-use)	movie scene content rating
Detection and Classification of Sensitive Audio-Visual Content for Automated Film Censorship and Rating	pdf	2022	CNN + GRU + MLP	CNN embeddings from video frames	Violence detection dataset	violent/non-violent classification from videos	movie scene content rating
Automatic parental guide ratings for short movies	page	2021	separate model for each task: concat + LSTM, object detector, one-class CNN embeddings	video frame pixel values, image embeddings, text	Nudenet, private dataset	profanity, violence, nudity, drug classification	movie content rating
From None to Severe: Predicting Severity in Movie Scripts	scholar	2021	multi-task pairwise ranking-classification network	GloVe, Bert and TextCNN text embeddings	Movie script severity dataset	rating classifcation (frightening, mild, moderate, severe)	movie content rating
A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers	scholar	2021	multi-modal + multi output concat+MLP	CNN+LSTM video features, Bert and DeepMoji text embeddings, MFCC audio features	MM-Trailer	rating classifcation (red, yellow, green)	movie trailer content rating
Automatic Parental Guide Scene Classification Menggunakan Metode Deep Convolutional Neural Network Dan Lstm	scholar	2020	3 CNN model for 3 modality, multi-label dataset	CNN video and audio embeddings, LSTM text (subitle) embeddings	private dataset	gore, nudity, drug, profanity classification from video and subtitle	movie scene content rating
Multimodal data fusion for sensitive scene localization	scholar	2019	meta-learning with Naive Bayes, SVM	MFCC and prosodic features from audio, HOG and TRoF features from images	Pornography-2k dataset, VSD2014	violent and pornographic scene localization from video	movie scene content rating
A Deep Learning approach for the Motion Picture Content Rating	scholar	2019	MLP + rule-based decision	InceptionV3 image embeddings	Violent Scenes Dataset, private dataset	violence (shooting, blood, fire, weapon) classification from video	movie scene content rating
Hybrid System for MPAA Ratings of Movie Clips Using Support Vector Machine	springer	2019	SVM	DCT features from image	private dataset	movie content rating classification from images	movie content rating
Inappropriate scene detection in a video stream	page	2017	SVM classifier + Lenet image classifier + rules-based decision	HoG and CNN features for image	private dataset	image classification: no/mild/high violence, safe/unsafe/pornoghraphy	movie frame content rating

content moderation

name	paper	year	model	features	datasets	tasks	context
State-of-the-Art in Nudity Classification: A Comparative Analysis	ieee	2023	CNN, Transformers	EfficientNet, ViT, ConvNeXT image embeddings	LSPD, Nudenet, NDPI2k	nudity classification from images	general content moderation
Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild	scholar	2022	novel threshold optimization tech. (TruSThresh)	prediction scores	UnSmile (Korean hatespeech dataset)	optimum threshold prediction	social media content moderation
On-Device Content Moderation	scholar	2021	mobilenet v3 + SSD object detector	mobilenet v3 image embeddings	private dataset	object detection + nudity classification from images	on-device content moderation
Gore Classification and Censoring in Images	scholar	2021	ensemble of CNN + MLP	mobilenet v2, densenent, vgg16 image embeddings	private dataset	gore classification from images	general content moderation
Automated Censoring of Cigarettes in Videos Using Deep Learning Techniques	scholar	2020	CNN + MLP	inception v3 image embeddings	private dataset	cigarette classification from video	general content moderation
A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes	scholar	2019	CNN + SVM	InceptionV3 image embeddings, AudioVGG audio embeddings	private dataset	inappropriate (nudity+gore) classification from video	general video content moderation
A baseline for NSFW video detection in e-learning environments	scholar	2019	concat + SVM, MLP	InceptionV3 image embeddings, AudioVGG audio embeddings	YouTube8M, NDPI, Cholec80	nudity classification from video	e-learning content moderation
Bringing the kid back into youtube kids: Detecting inappropriate content on video streaming platforms	scholar	2019	CNN + LSTM (late fusion)	CNN based encoder for image, video and audio spectrograms	private dataset	video classification: orignal, fake explicit, fake violent	social media content moderation

movie/scene genre classification

name	paper	year	model	features	datasets	tasks
Effectively leveraging Multi-modal Features for Movie Genre Classification	scholar	2022	embeddings + fusion + MLP	CLIP image embeddings, PANNs audio embeddings, CLIP text embeddings	MovieNet	movie genre classification
OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification	scholar	2022	embeddings + novel transformer	ResNet-18 image embeddings, ResNet-VLAD audio embeddings	TI-News	news scene segmentation/classification (studio, outdoor, interview)
Detection of Animated Scenes Among Movie Trailers	scholar	2022	CNN + GRU	EfficientNet image embeddings	Private dataset	genre classification from movie trailer scenes
A multi-label movie genre classification scheme based on the movie's subtitles	springer	2022	KNN	text frequency vectors	Private dataset	genre classification from movie subtitle text
A multimodal approach for multi-label movie genre classification	scholar	2020	CNN + LSTM	MFCCs/SSD/LBP from audio, LBP/3DCNN from video frames, Inception-v3 from poster, TFIDF from text	Private dataset	genre classification from movie trailers
Genre classification of movie trailers using 3d convolutional neural networks	ieee	2020	3D CNN	images	Private dataset	genre classification from movie trailer scenes
A unified framework of deep networks for genre classification using movie trailer	scholar	2020	CNN + LSTM	Inception V4 image embeddings	EmoGDB	genre classification from movie trailer scenes
Towards story-based classification of movie scenes	scholar	2020	logistic regression	manually extracted categorical features	Flintstones Scene Dataset	scene classification (Obstacle, Midpoint, Climax of Act 1)

multimodal architectures

synchronous multimodal architectures

name	paper	year	model	features	datasets	tasks	modalities
M&M Mix: A Multimodal Multiview Transformer Ensemble	scholar	2022	transformer with 2 cls heads	ViT image embeddings from audio spect., frame image, optical flow	Epic-Kitchens	video/action classification	image + audio + optical flow
MultiMAE: Multi-modal Multi-task Masked Autoencoders	scholar	2022	transformer with 3 decoder + cls heads	ViT-like image enc. patch embeddings (optional modalities)	ImageNet: Pseudo labeled multi-task training dataset (depth, segm)	image cs., semantic segm., depth est.	image + depth map
Data2vec: A general framework for self-supervised learning in speech, vision and language	scholar	2022	single encoder	transformer based audio, text, image encoder embeddings	ImageNet, Librispeech	masked pretraining	image + audio + text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text	scholar	2022	1 encoder per modality	transformer based audio, text, image encoder embeddings	AudioSet, HowTo100M	pretraining + video/audio classification	image + audio + text
Expanding Language-Image Pretrained Models for General Video Recognition	scholar	2022	1 encoder per modality	transformer based video, text encoder embeddings	HMDB-51, UCF-101	contrastive pretraining	video + text
Audio-Visual Instance Discrimination with Cross-Modal Agreement	scholar	2021	1 encoder per modality	CNN based audio, video encoder embeddings	HMDB-51, UCF-101	video/audio classification	video + audio
Robust Audio-Visual Instance Discrimination	scholar	2021	1 encoder per modality	CNN based audio, video encoder embeddings	HMDB-51, UCF-101	video/audio classification	video + audio
Learning transferable visual models from natural language supervision	scholar	2021	1 encoder per modality	transformer based image, text encoder embeddings	JFT-300M	contrastive pretraining	image + text
Self-supervised multimodal versatile networks	scholar	2020	multiple encoders	CNN based image/audio embeddings, word2vec text embeddings	UCF101, Kinetics, AudioSet	contrastive pretraining + classification	image + audio + text
Uniter: Universal image-text representation learning	scholar	2020	multimodal encoder	combined embeddings	COCO, Visual Genome, Conceptual Captions	qa/image-text retrieval	image + text
12-in-1: Multi-task vision and language representation learning	scholar	2020	multimodal encoder	combined embeddings	COCO, Flickr30k	qa/image-text retrieval	image + text
Two-stream convolutional networks for action recognition in videos	scholar	2014	1 encoder per modality	CNN based audio, text encoder embeddings	HMDB-51, UCF-101	video/audio classification	video + optical flow

asynchronous multimodal architectures

name	paper	year	model	features	datasets	tasks	modalities
OmniMAE: Single Model Masked Pretraining on Images and Videos	scholar	2022	transformer with 1 cls. head	ViT-like image/video enc. patch embeddings	ImageNet, SSv2	video/action classification	image + video
OMNIVORE: A Single Model for Many Visual Modalities	scholar	2022	transformer with 3 cls. heads	ViT-like image/video enc. patch embeddings	ImageNet, Kinetics, SSv2, SUN RGB-D	image cls., action recog., depth est.	image + video + depth map
Polyvit: Co-training vision transformers on images, videos and audio	scholar	2021	transformer with 9 cls. heads	ViT-like image/video/audio enc. embeddings	ImageNet, CIFAR, Kinetics, Moments in Time, AudioSet, VGGSound	image cls., video cls., audio cls.	image + video + audio

action recognition

with transformers

name	paper	year	model	features	datasets	tasks
Frozen CLIP Models are Efficient Video Learners	scholar	2022	transformer with 1 cls head	CLIP image embeddings	ImageNet, Kinetics, SSv2	action recognition
Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training	scholar	2022	transformer with 1 cls head	ViT-like video enc. patch embeddings	Kinetics, SSv2	action recognition
Bevt: Bert pretraining of video transformers	scholar	2022	encoder-decoder transformer	VideoSwin image/video enc. embeddings	Kinetics, SSv2	action recognition
Video swin transformer	scholar	2022	Swin trans. with cls.head	Swin video enc. embeddings	Kinetics, SSv2	action recognition
Is space-time attention all you need for video understanding?	scholar	2021	transformer with cls. head	ViT-like video enc. patch embeddings	Kinetics, SSv2	action recognition

with 3D CNNs

name	paper	year	model	features	datasets	tasks
X3d: Expanding architectures for efficient video recognition	scholar	2020	CNN with cls. head	3D CNN based video enc. embeddings	Kinetics, SSv2	action recognition
Slowfast networks for video recognition	scholar	2019	CNN with cls. head	3D CNN based video enc. embeddings	Kinetics, SSv2	action recognition
A closer look at spatiotemporal convolutions for action recognition (R2+1D)	scholar	2018	CNN with cls. head	3D CNN based video enc. embeddings	Kinetics, HMDB-51, UCF-101	action recognition
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D)	scholar	2017	CNN with cls. head	3D CNN based video enc. embeddings	Kinetics, HMDB-51, UCF-101	action recognition

contrastive representation learning

name	paper	date
Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text	scholar	2021
Supervised contrastive learning	scholar	2020

review papers

name	paper	date
Machine Learning Models for Content Classification in Film Censorship and Rating	pdf	2022
A survey of artificial intelligence strategies for automatic detection of sexually explicit videos	scholar	2022
A survey on video content rating: taxonomy, challenges and open issues	pdf	2021
Multimodal Learning with Transformers: A Survey	scholar	2022
A Survey Paper on Movie Trailer Genre Detection	scholar	2020

tools

name	url	description
safetext	github	multilingual swear word detection and filtering from strings
PySceneDetect	github	Python and OpenCV-based scene cut/transition detection program & library
LAION safety toolkit	github	NSFW detector trained on LAION dataset
pysrt	github	Python parser for SubRip (srt) files
ffsubsync	github	Automagically synchronize subtitles with video.
MoviePy	github	Video editing with Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

deep-learning-content-moderation

citation

table of contents

datasets

movie and content moderation datasets

techniques

sensitive content detection

movie content rating

content moderation

movie/scene genre classification

multimodal architectures

synchronous multimodal architectures

asynchronous multimodal architectures

action recognition

with transformers

with 3D CNNs

contrastive representation learning

review papers

tools

Files

README.md

Latest commit

History

README.md

File metadata and controls

deep-learning-content-moderation

citation

table of contents

datasets

movie and content moderation datasets

techniques

sensitive content detection

movie content rating

content moderation

movie/scene genre classification

multimodal architectures

synchronous multimodal architectures

asynchronous multimodal architectures

action recognition

with transformers

with 3D CNNs

contrastive representation learning

review papers

tools