Skip to content

Lightning-fast PII detection and anonymization library with 190x performance advantage - detect emails, SSNs, names, and more in <2MB package

License

Notifications You must be signed in to change notification settings

DataFog/datafog-python

Repository files navigation

DataFog logo

Open-source PII Detection for Retrieval Systems.
Scan, redact, and manage PII in your documents before they get uploaded to a Retrieval Augmented Generation (RAG) system.

PyPi Version PyPI pyversions GitHub stars PyPi downloads

codecov

Code style: black

Overview

DataFog works by scanning and redacting-out PII in files before are uploaded to a RAG system.

How it works

DataFog Overview

Installation

DataFog can be installed via pip:

pip install datafog # python client

Dev Notes

  • Clone repo
  • Run 'poetry install' to install dependencies (recommend entering poetry shell for preserving dependencies)
  • Justfile commands:
    • just format to apply formatting.
    • just lint to check formatting and style.
    • just tag to tag your project on git
    • just upload to publish to PyPi.

Testing

To run the datafog unit tests, check out this repository and do

tox

License

This software is published under the MIT license.

About

Lightning-fast PII detection and anonymization library with 190x performance advantage - detect emails, SSNs, names, and more in <2MB package

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages