ExtinctLLM

An ultra-lightweight AI content blocker browser extension powered by regex-based pattern matching and designed with both efficiency and accuracy in mind. It has an average precision of about 80-90% in correctly classifying human text, and an average precision of around 70% in detecting an AI-generated corpus.

The Problem

Traditional AI content detectors rely on either heavy machine learning (ML) and/or intensive pretraining for a basic classifier model, which are far from ideal for a mere browser extension. They usually measure values such as perplexity, burstiness, and token probability patterns to determine whether or not a text is generated by AI. Although, admittedly, regex-based detection cannot be as precise as ML-based detection, it can reach satisfactory levels when certain exceptions/limitations are implemented.

How It Works

This section explains ExtinctLLM's algorithm and how it classifies text as either human- or AI-written.

Point-Based Pattern Matching

ExtinctLLM uses a precompiled list of regular expressions, with each regex assigned a certain "point" value depending on how common it appears in typical AI-generated content.

Some regexes have a negative score. A negative score indicates that the pattern is more common in human text than in AI text.
- This helps increase accuracy by limiting the impact of outliers in the content.

Corpus Analysis

The TextClassifier.analyze function is used to analyze a given article using a chunkSize-length sliding window.

The match map is an object with keys as scores and values as the number of matches for a score.
The alpha is a parameter that is used to later scale the score for normalization.
The step, or stride, of the window is defined by chunkSize / 1.25, meaning that the windows/chunks overlap by about 20%.
- This overlap is used to account for patterns that span across chunk boundaries.
- In each step, the following operations occur:
  1. Loop through the scores in the patterns.yaml file.
    - The patterns are grouped by score. For example, a score of 7.5 contains seven regex patterns.
  2. Under each score, loop through the regex patterns and match it with the chunk with the flags /gimu.
  3. Count the number of matches for each regex.
  4. If the number of matches is greater than 0, do the following:
    - Add Math.log1p(numberOfMatches) * regexScore to the alpha.
    - Add the number of matches to the score in the match map.
The windows continue to slide until the end of the corpus is reached. At the end, a two-item array containing the match map and the resulting alpha is returned.

Score Calculation

The TextClassifier.calculateScore function calculates the score by iterating through the entries in the match map and summing the products of the keys and values. In psuedocode:

totalScore = Σ (score * matches) for each (score, matches) in matchMap

Normalization

The TextClassifier.normalizeScore function adjusts a raw score to account for both the size of the corpus (in characters) and the pattern intensity (alpha). The function follows the steps:

Scale the alpha exponentially:
```
scaledAlpha = |alpha| ^ scale
```
The equation takes the absolute value of alpha (to avoid negative scaling) and raises it to the scale power.
Combine with the raw score and corpus length:
```
exponent = (-scaledAlpha * score) / corpusLength
```
In this step, we multiply the score by the scaled alpha to weight the score by the detected pattern intensity. We then divide by the length of the corpus so longer texts do not automatically get higher scores.
Run an exponential transform:
```
normalizedScore = 1 - exp(exponent)
```
The weighted score into a number between 0 and 1 (ideally, if the score is not negative). This reduces the sway of outliers and allows the score to be compared with other scores.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.d.ts		env.d.ts
global.d.ts		global.d.ts
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExtinctLLM

The Problem

How It Works

Point-Based Pattern Matching

Corpus Analysis

Score Calculation

Normalization

About

Uh oh!

Releases

Languages

License

ExtinctLLM/extinctllm

Folders and files

Latest commit

History

Repository files navigation

ExtinctLLM

The Problem

How It Works

Point-Based Pattern Matching

Corpus Analysis

Score Calculation

Normalization

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages