Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presidio integration #94

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

drewfurgdb
Copy link

This PR contains a new notebook (number 08-) that contains example code of using Microsoft Presidio in Databricks to do both metadata and DICOM image redaction and anonymization, along with using the pixels library. It also contains a Python class file as an external .py file so it can be used by Spark UDFs for parallel processing.

Copy link
Collaborator

@dmoore247 dmoore247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like to get the UDFs in the form of a SparkML Transformer class (so we can pipeline all of these). Please start with phi image detection, then phi image redaction.

Thanks so much for your submission.

"broadcasted_engine= sc.broadcast(engine)\n",
"\n",
"# define a pandas UDF function and a series function over it.\n",
"def redact_dicom_image(path: str) -> str:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redact_dicom_image is the one we're most interested in.
We'd like to integrate this into a SparkML lib Transformer type class and add to the Python package dbx.pixels

Our next POC actually is about detecting PHI. Could we get that first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants