Skip to content

Commit

Permalink
fix(english-preset): don't include skip-non-alphabetic transformer
Browse files Browse the repository at this point in the history
For #23, #46.

BREAKING CHANGE: Using the default English preset, Obscenity will no longer strip non-alphabetic characters from the input text before matching.

This addresses a class of egregious false negatives in previous versions (see #23), but introduces a regression where cases such as 'f u c k' (with the space) will no longer be detected by default. We expect to provide a more comprehensive fix in the next minor release.

If desired, it remains possible to revert to the previous behavior by providing a custom set of transformers to the matcher.
  • Loading branch information
jo3-l committed Jan 5, 2024
1 parent b0d90aa commit 620c721
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 5 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,7 @@ With the English preset, Obscenity (correctly) finds matches in all of the follo
- **fk** you
- **ffuk** you
- i like **a$$es**
- **ʃ𝐟ʃὗƈ k** ỹоứ
- **f .... !!! uuuuuuuuu ccc k**
- <!-- prettier-ignore --> ʃ𝐟ʃὗƈk ỹоứ

...and it **does not match** on the following:

Expand Down
4 changes: 2 additions & 2 deletions src/preset/english.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import { pattern } from '../pattern/Pattern';
import { collapseDuplicatesTransformer } from '../transformer/collapse-duplicates';
import { resolveConfusablesTransformer } from '../transformer/resolve-confusables';
import { resolveLeetSpeakTransformer } from '../transformer/resolve-leetspeak';
import { skipNonAlphabeticTransformer } from '../transformer/skip-non-alphabetic';
import { toAsciiLowerCaseTransformer } from '../transformer/to-ascii-lowercase';

/**
Expand All @@ -15,7 +14,8 @@ export const englishRecommendedBlacklistMatcherTransformers = [
resolveConfusablesTransformer(),
resolveLeetSpeakTransformer(),
toAsciiLowerCaseTransformer(),
skipNonAlphabeticTransformer(),
// See #23 and #46.
// skipNonAlphabeticTransformer(),
collapseDuplicatesTransformer({
defaultThreshold: 1,
customThresholds: new Map([
Expand Down
10 changes: 9 additions & 1 deletion src/transformer/skip-non-alphabetic/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,18 @@ import { createSimpleTransformer } from '../Transformers';
* comprised of alphabetic characters (the pattern `hello` does not match
* `h.e.l.l.o` by default, but does with this transformer).
*
* **Warning**
*
* This transformation is not part of the default set of transformations, as
* there are some known rough edges with false negatives; see
* [#23](https://github.com/jo3-l/obscenity/issues/23) and
* [#46](https://github.com/jo3-l/obscenity/issues/46) on the GitHub issue
* tracker.
*
* **Application order**
*
* It is recommended that this transformer be applied near the end of the
* transformer chain.
* transformer chain, if at all.
*
* @example
* ```typescript
Expand Down

0 comments on commit 620c721

Please sign in to comment.