Skip to content

A fast, accurate, and lightweight browser extension that blocks AI-generated pages using pattern matching.

License

Notifications You must be signed in to change notification settings

ExtinctLLM/extinctllm

Repository files navigation

ExtinctLLM

An ultra-lightweight AI content blocker browser extension powered by regex-based pattern matching and designed with both efficiency and accuracy in mind. It has an average precision of about 80-90% in correctly classifying human text, and an average precision of around 70% in detecting an AI-generated corpus.

The Problem

Traditional AI content detectors rely on either heavy machine learning (ML) and/or intensive pretraining for a basic classifier model, which are far from ideal for a mere browser extension. They usually measure values such as perplexity, burstiness, and token probability patterns to determine whether or not a text is generated by AI. Although, admittedly, regex-based detection cannot be as precise as ML-based detection, it can reach satisfactory levels when certain exceptions/limitations are implemented.

How It Works

This section explains ExtinctLLM's algorithm and how it classifies text as either human- or AI-written.

Point-Based Pattern Matching

ExtinctLLM uses a precompiled list of regular expressions, with each regex assigned a certain "point" value depending on how common it appears in typical AI-generated content.

  • Some regexes have a negative score. A negative score indicates that the pattern is more common in human text than in AI text.
    • This helps increase accuracy by limiting the impact of outliers in the content.

Corpus Analysis

The TextClassifier.analyze function is used to analyze a given article using a chunkSize-length sliding window.

  • The match map is an object with keys as scores and values as the number of matches for a score.

  • The alpha is a parameter that is used to later scale the score for normalization.

  • The step, or stride, of the window is defined by chunkSize / 1.25, meaning that the windows/chunks overlap by about 20%.

    • This overlap is used to account for patterns that span across chunk boundaries.

    • In each step, the following operations occur:

      1. Loop through the scores in the patterns.yaml file.

        • The patterns are grouped by score. For example, a score of 7.5 contains seven regex patterns.
      2. Under each score, loop through the regex patterns and match it with the chunk with the flags /gimu.

      3. Count the number of matches for each regex.

      4. If the number of matches is greater than 0, do the following:

        • Add Math.log1p(numberOfMatches) * regexScore to the alpha.
        • Add the number of matches to the score in the match map.
  • The windows continue to slide until the end of the corpus is reached. At the end, a two-item array containing the match map and the resulting alpha is returned.

Score Calculation

The TextClassifier.calculateScore function calculates the score by iterating through the entries in the match map and summing the products of the keys and values. In psuedocode:

totalScore = Σ (score * matches) for each (score, matches) in matchMap

Normalization

The TextClassifier.normalizeScore function adjusts a raw score to account for both the size of the corpus (in characters) and the pattern intensity (alpha). The function follows the steps:

  1. Scale the alpha exponentially:

    scaledAlpha = |alpha| ^ scale
    

    The equation takes the absolute value of alpha (to avoid negative scaling) and raises it to the scale power.

  2. Combine with the raw score and corpus length:

    exponent = (-scaledAlpha * score) / corpusLength
    

    In this step, we multiply the score by the scaled alpha to weight the score by the detected pattern intensity. We then divide by the length of the corpus so longer texts do not automatically get higher scores.

  3. Run an exponential transform:

    normalizedScore = 1 - exp(exponent)
    

    The weighted score into a number between 0 and 1 (ideally, if the score is not negative). This reduces the sway of outliers and allows the score to be compared with other scores.

About

A fast, accurate, and lightweight browser extension that blocks AI-generated pages using pattern matching.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published