-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(weave): Implement PresidioEntityRecognitionGuardrail
#3575
base: master
Are you sure you want to change the base?
Conversation
New dependencies detected. Learn more about Socket for GitHub ↗︎
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=344b359812816bcf5cdb671a02248483b8c5706b |
Hi @tssweeney |
from presidio_anonymizer import AnonymizerEngine | ||
|
||
|
||
class PresidioEntityRecognitionResponse(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tcapelle @soumik12345 are you going with TypedDict or BaseModel? Either is fine, but I would pick 1 and be consistent for all the scorers. Maybe you should also have a test to enforce this property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Prefer the new annotations syntax here
_analyzer: "AnalyzerEngine" | ||
_anonymizer: "AnonymizerEngine" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Isn't Scorer
a BaseModel? Can you use the pydantic-style init, or do you need to do it this way?
deny_lists: Optional[dict[str, list[str]]] = None, | ||
regex_patterns: Optional[dict[str, list[dict[str, str]]]] = None, | ||
custom_recognizers: Optional[list[Any]] = None, | ||
show_available_entities: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems weird. Why not just have this as a classmethod or a docs page?
selected_entities = self.get_available_entities() | ||
|
||
# Get available entities dynamically | ||
available_entities = self.get_available_entities() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's some duplication with the above available_entities
self._anonymizer = anonymizer | ||
|
||
@weave.op | ||
def group_analyzer_results_by_entity_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something that should be an op?
return "\n".join(explanation_parts) | ||
|
||
@weave.op | ||
def anonymize_text( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if these should all be ops. They seem to be internal helpers?
return anonymized_text | ||
|
||
@weave.op | ||
def score(self, output: str) -> PresidioEntityRecognitionResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this signature is correct if you are returning the model dump of it.
@weave.op | ||
def score(self, output: str) -> PresidioEntityRecognitionResponse: | ||
analyzer_results = self._analyzer.analyze( | ||
text=str(output), entities=self.selected_entities, language=self.language |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't output already str
?
Description
Implement the
PresidioEntityRecognitionGuardrail
based on the PresidioEntityRecognitionGuardrail from the safeguards library originally contributed by @ash0ts .Sample Trace