[go: up one dir, main page]

Skip to content

jo3-l/obscenity

Repository files navigation

Obscenity

Robust, extensible profanity filter for NodeJS.

Build status Codecov status npm version Language License

Why Obscenity?

  • Accurate: Though Obscenity is far from perfect (as with all profanity filters), it makes reducing false positives as simple as possible: adding whitelisted phrases is as easy as adding a new string to an array, and using word boundaries is equally simple.
  • Robust: Obscenity's transformer-based design allows it to match on variants of phrases other libraries are typically unable to, e.g. fuuuuuuuckkk, ʃṳ𝒸𝗄, wordsbeforefuckandafter and so on. There's no need to manually write out all the variants either: just adding the pattern fuck will match all of the cases above by default.
  • Extensible: With Obscenity, you aren't locked into anything - removing phrases that you don't agree with from the default set of words is trivial, as is disabling any transformations you don't like (perhaps you feel that leet-speak decoding is too error-prone for you).

Installation

$ npm install obscenity
$ yarn add obscenity
$ pnpm add obscenity

Example usage

First, import Obscenity:

const {
	RegExpMatcher,
	TextCensor,
	englishDataset,
	englishRecommendedTransformers,
} = require('obscenity');

Or, in TypeScript/ESM:

import {
	RegExpMatcher,
	TextCensor,
	englishDataset,
	englishRecommendedTransformers,
} from 'obscenity';

Now, we can create a new matcher using the English preset.

const matcher = new RegExpMatcher({
	...englishDataset.build(),
	...englishRecommendedTransformers,
});

Now, we can use our matcher to search for profanities in the text. Here's two examples of what you can do:

Check if there are any matches in some text:

if (matcher.hasMatch('fuck you')) {
	console.log('The input text contains profanities.');
}
// The input text contains profanities.

Output the positions of all matches along with the original word used:

// Pass "true" as the "sorted" parameter so the matches are sorted by their position.
const matches = matcher.getAllMatches('ʃ𝐟ʃὗƈk ỹоứ 𝔟ⁱẗ𝙘ɦ', true);
for (const match of matches) {
	const { phraseMetadata, startIndex, endIndex } =
		englishDataset.getPayloadWithPhraseMetadata(match);
	console.log(
		`Match for word ${phraseMetadata.originalWord} found between ${startIndex} and ${endIndex}.`,
	);
}
// Match for word fuck found between 0 and 6.
// Match for word bitch found between 12 and 18.

Censoring matched text:

To censor text, we'll need to import another class: the TextCensor. Some other imports and creation of the matcher have been elided for simplicity.

const { TextCensor, ... } = require('obscenity');
// ...
const censor = new TextCensor();
const input = 'fuck you little bitch';
const matches = matcher.getAllMatches(input);
console.log(censor.applyTo(input, matches));
// %@$% you little **%@%

This is just a small slice of what Obscenity can do: for more, check out the documentation.

Accuracy

Note: As with all swear filters, Obscenity is not perfect (nor will it ever be). Use its output as a heuristic, and not as the sole judge of whether some content is appropriate or not.

With the English preset, Obscenity (correctly) finds matches in all of the following texts:

  • you are a little fucker
  • fk you
  • ffuk you
  • i like a$$es
  • ʃ𝐟ʃὗƈk ỹоứ

...and it does not match on the following:

  • the pen is mightier than the sword
  • i love bananas so yeah
  • this song seems really banal
  • grapes are really yummy

Documentation

For a step-by-step guide on how to use Obscenity, check out the guide.

Otherwise, refer to the auto-generated API documentation.

Contributing

Issues can be reported using the issue tracker. If you'd like to submit a pull request, please read the contribution guide first.

Author

Obscenity © Joe L. under the MIT license. Authored and maintained by Joe L.

GitHub @jo3-l