Redacting sensitive information from doctors’ patient notes

COVID-19 public dataset on GCP from cases in Italy (part 2)

This article is the second one of a series about the release of a public dataset made of doctors medical notes about patients affected by COVID19. You will find learnings about Google Cloud DLP API that was used for the redaction of sensitive information from medical notes. If you haven’t seen it already, you can find the first article here.

To reiterate my commitment to the community, I will keep the public database up-to-date with the latest cases published by the Italian Society of Medical and Interventional Radiology (ISMIR). And if, as a side effect, you can learn a thing or two about GCP, then this series will exceed my expectations 💪 🙏 . The code used in this pipeline is available in my Github repo.

By the way, I’m very proud to see that the community is thinking of ways to leverage this data. An avenue raised by Jérôme MASSOT is to leverage this dataset and conduct a: