Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(weave): Implement PresidioEntityRecognitionGuardrail #3575

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

soumik12345
Copy link
Contributor

Description

Implement thePresidioEntityRecognitionGuardrail based on the PresidioEntityRecognitionGuardrail from the safeguards library originally contributed by @ash0ts .

Sample Trace

@soumik12345 soumik12345 self-assigned this Feb 3, 2025
@soumik12345 soumik12345 requested a review from a team as a code owner February 3, 2025 11:33
Copy link

socket-security bot commented Feb 3, 2025

New dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
pypi/[email protected] Transitive: eval, filesystem, network +227 374 MB avbalter, microsoft, omri374, ...2 more
pypi/[email protected] Transitive: environment, eval, filesystem, network, shell, unsafe +2 2.23 MB microsoft, omri374, omrimendels, ...1 more
pypi/[email protected] None 0 1.26 kB thunder_007

View full report↗︎

@circle-job-mirror
Copy link

circle-job-mirror bot commented Feb 3, 2025

@soumik12345
Copy link
Contributor Author

Hi @tssweeney
Can you please review this PR?

from presidio_anonymizer import AnonymizerEngine


class PresidioEntityRecognitionResponse(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcapelle @soumik12345 are you going with TypedDict or BaseModel? Either is fine, but I would pick 1 and be consistent for all the scorers. Maybe you should also have a test to enforce this property

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Prefer the new annotations syntax here

_analyzer: "AnalyzerEngine"
_anonymizer: "AnonymizerEngine"

def __init__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Isn't Scorer a BaseModel? Can you use the pydantic-style init, or do you need to do it this way?

deny_lists: Optional[dict[str, list[str]]] = None,
regex_patterns: Optional[dict[str, list[dict[str, str]]]] = None,
custom_recognizers: Optional[list[Any]] = None,
show_available_entities: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird. Why not just have this as a classmethod or a docs page?

selected_entities = self.get_available_entities()

# Get available entities dynamically
available_entities = self.get_available_entities()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some duplication with the above available_entities

self._anonymizer = anonymizer

@weave.op
def group_analyzer_results_by_entity_type(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something that should be an op?

return "\n".join(explanation_parts)

@weave.op
def anonymize_text(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if these should all be ops. They seem to be internal helpers?

return anonymized_text

@weave.op
def score(self, output: str) -> PresidioEntityRecognitionResponse:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this signature is correct if you are returning the model dump of it.

@weave.op
def score(self, output: str) -> PresidioEntityRecognitionResponse:
analyzer_results = self._analyzer.analyze(
text=str(output), entities=self.selected_entities, language=self.language
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't output already str?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants