Skip to content

frank-bridges/profanity-checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

Profanity Checker

Profanity Checker is a lightweight and efficient text moderation tool designed to detect and clean offensive language at scale. It helps teams maintain content quality by identifying profanity and transforming unsafe text into clean, publish-ready content. Ideal for platforms that value content safety and user trust.

Bitbash Banner

Telegram Β  WhatsApp Β  Gmail Β  Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for profanity-checker you've just found your team β€” Let’s Chat. πŸ‘†πŸ‘†

Introduction

This project analyzes text inputs to detect profanity, obscenity, and unwanted language, then replaces them based on configurable rules. It solves the challenge of maintaining clean and compliant text across user-generated and automated content. It is built for developers, moderators, and data teams handling large volumes of text.

Text Moderation & Sanitization Engine

  • Detects offensive words using a built-in profanity dictionary
  • Supports custom word lists to match domain-specific needs
  • Replaces profanity with configurable text or masked characters
  • Processes multiple text inputs in a single run
  • Produces structured, audit-friendly output

Features

Feature Description
Bulk Text Processing Analyze and clean multiple text entries in one execution.
Built-in Profanity List Uses a predefined list of common offensive terms.
Custom Word Support Allows adding extra words for stricter moderation.
Flexible Replacement Replace profanity with custom text or masked characters.
Detailed Results Returns original text, cleaned text, and detection status.
Character Normalization Detects obfuscated profanity using character alternates.

What Data This Scraper Extracts

Field Name Field Description
originalText The raw text provided for analysis.
containsProfanity Boolean flag indicating if profanity was detected.
newText Sanitized version of the text after filtering.

Example Output

[
    {
        "originalText": "This is a piece of text",
        "containsProfanity": false,
        "newText": "This is a piece of text"
    },
    {
        "originalText": "This is another piece of shit",
        "containsProfanity": true,
        "newText": "This is another piece of ****"
    }
]

Directory Structure Tree

profanity-checker/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ processor.py
β”‚   β”œβ”€β”€ profanity/
β”‚   β”‚   β”œβ”€β”€ default_list.txt
β”‚   β”‚   └── custom_loader.py
β”‚   └── utils/
β”‚       └── normalizer.py
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ input.sample.json
β”‚   └── output.sample.json
β”œβ”€β”€ requirements.txt
└── README.md

Use Cases

  • Content platforms use it to moderate user submissions, so they can enforce community guidelines automatically.
  • Data teams use it to sanitize text datasets, so analytics and models remain clean and reliable.
  • Comment systems use it to filter offensive language, so discussions stay respectful.
  • Publishers use it to clean articles and reviews, so content remains brand-safe.
  • Developers use it in pipelines to preprocess text, so downstream systems receive safe input.

FAQs

Can I add my own profanity words? Yes, you can include a custom list of additional words to extend the default detection rules.

How does replacement work? You can replace detected profanity with a fixed text string or a masking character that matches word length.

Does it handle obfuscated profanity? Yes, character alternates such as symbols replacing letters are normalized during detection.

Is there a limit on custom words? Custom additions are limited to a small, controlled set to maintain performance and accuracy.


Performance Benchmarks and Results

Primary Metric: Processes thousands of text entries per minute with consistent detection accuracy.

Reliability Metric: Maintains a high success rate across varied text formats and input sizes.

Efficiency Metric: Minimal memory footprint with fast string processing and low overhead.

Quality Metric: High precision in profanity detection while minimizing false positives through safe-word handling.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
β˜…β˜…β˜…β˜…β˜…

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
β˜…β˜…β˜…β˜…β˜…

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
β˜…β˜…β˜…β˜…β˜…

Releases

No releases published

Packages

No packages published