Remove personally identifiable information from free text. Sometimes we have additional metadata about the people we wish to anonymize. Other times we don’t. This package makes it easy to seamlessly scrub personal information from free text, without comprimising the privacy of the people we are trying to protect.

scrubadub currently supports removing:

Quick start¶

Getting started with scrubadub is as easy as pip install scrubadub and incorporating it into your python scripts like this:

>>> import scrubadub # John may be a cat, but he doesn't want other people to know it. >>> text = u "John is a cat" # Replace names with {{NAME}} placeholder. This is the scrubadub default # because it maximally omits any information about people. >>> scrubadub . clean ( text ) u"{{NAME}} is a cat" # Replace names with {{NAME-ID}} anonymous, but consistent IDs. >>> scrubadub . clean ( text , replace_with = 'identifier' ) u"{{NAME-0}} is a cat" >>> scrubadub . clean ( "John spoke with Doug." , replace_with = 'identifier' ) u"{{NAME-0}} spoke with {{NAME-1}}."