-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix metadata processing in chat adapter #2040
base: main
Are you sure you want to change the base?
Fix metadata processing in chat adapter #2040
Conversation
… the prompt and added docstring throughtout the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR!
Both the metadata formatting and docstring improvements are pretty useful, but could you split them into 2 PRs? We also need to add the metadata formatting to the json adapter for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here! I think some functions that are duplicated between the chat and json adapter should move to Base in order to reduce code duplication. Otherwise, most of my comments are formatting nits (mostly in favor of type annotations, which should really help developers that have built in IDE tools that use these).
dspy/adapters/chat_adapter.py
Outdated
def __call__(self, lm: LM, lm_kwargs: dict[str, Any], signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any]) -> list[dict[str, Any]]: | ||
def __call__(self, lm: LM, lm_kwargs: dict[str, Any], signature: Type[Signature], demos: list[dict[str, Any]], | ||
inputs: dict[str, Any]) -> list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: format as follows:
def __call__(
self,
lm: LM,
lm_kwargs: Dict[str, Any],
...
) -> List[Dict[str, Any]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
def format(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any]) -> list[ | ||
dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply format I suggested in nit above.
Docstring I presume will go in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
@@ -99,7 +85,7 @@ def format(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs | |||
messages = try_expand_image_tags(messages) | |||
return messages | |||
|
|||
def parse(self, signature: Signature, completion: str, _parse_values: bool = True): | |||
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Dict instead of dict. Apply here and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
def format_finetune_data(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any], | ||
outputs: dict[str, Any]) -> dict[str, list[Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert to previous formatting of the function definition signature (multi-line and in the style I suggested in my nits above). Presumably docstring will go in separate PR. Note that we should use "Parameters" instead of "Args" (also in other function definitions where you'd added docstrings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
def format_turn(self, signature: Type[Signature], values: dict[str, Any], role: str, incomplete: bool = False, | ||
is_conversation_history: bool = False) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See suggested format for function definitions in my first nit in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
return "\n".join(parts).strip() | ||
|
||
|
||
def move_type_to_front(d: Union[Dict, List, Any]) -> Union[Dict, List, Any]: | ||
"""Moves the 'type' key to the front of the dictionary, recursively, for LLM readability/adherence.""" | ||
def move_type_to_front(d): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the type annotations here IMO. Makes the function easier to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
def prepare_schema(type_: Type) -> Dict[str, Any]: | ||
"""Prepares a JSON schema for a given type.""" | ||
schema: Dict[str, Any] = pydantic.TypeAdapter(type_).json_schema() | ||
def prepare_schema(field_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep type annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/adapters/chat_adapter.py
Outdated
parts = [] | ||
parts.append("Your input fields are:\n" + enumerate_fields(signature.input_fields)) | ||
parts.append("Your output fields are:\n" + enumerate_fields(signature.output_fields)) | ||
parts.append("All interactions will be structured in the following way, with the appropriate values filled in.") | ||
|
||
def field_metadata(field_name: str, field_info: FieldInfo) -> str: | ||
"""Creates a formatted representation of a field's information and metadata.""" | ||
def field_metadata(field_name, field_info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep type annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def _format_constraint(name: str, value: Union[str, float]) -> str: | ||
constraints = { | ||
'gt': f"greater than {value}", | ||
'lt': f"less than {value}", | ||
'ge': f"greater than or equal to {value}", | ||
'le': f"less than or equal to {value}", | ||
'multiple_of': f"a multiple of {value}", | ||
'allow_inf_nan': "allows infinite and NaN values" if value else "no infinite or NaN values allowed" | ||
} | ||
return constraints.get(name, f"{name}={value}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth moving these into the base class for adapter for the sake of inheritance.
dspy/adapters/chat_adapter.py
Outdated
""" | ||
Formats the constraints for a field. | ||
|
||
Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Replace with Parameters:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Two high level feedbacks:
- We should move the constraint formatting logic into dspy/signatures/field.py since it is irrelevant to adapter types. We don't want to introduce new complexity to adapter, which is already unnecessarily complex now.
- We are in the middle of an effort to standardize the style, and we use Google style guide as the guideline: https://google.github.io/styleguide/pyguide.html#383-functions-and-methods. Most of formatting in the PR doesn't follow the rules there.
lm_kwargs: Dict[str, Any], | ||
signature: Type[Signature], | ||
demos: List[Dict[str, Any]], | ||
inputs: Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing comma at the end, also don't use capital case Dict
, using primitive types like dict
is preferred: https://google.github.io/styleguide/pyguide.html#221-type-annotated-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
signature: Type[Signature], | ||
demos: List[Dict[str, Any]], | ||
inputs: Dict[str, Any] | ||
) -> List[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -118,10 +140,15 @@ def format_finetune_data(self, signature: Type[Signature], demos: list[dict[str, | |||
assistant_message = self.format_turn(signature, outputs, role, incomplete) | |||
messages.append(assistant_message) | |||
|
|||
# Wrap the messages in a dictionary with a "messages" key | |||
# Wrap the messages in a Dictionary with a "messages" key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to capitalize this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Args: | ||
fields_with_values: A dictionary mapping information about a field to its corresponding | ||
value. | ||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this, Args is the standard: https://google.github.io/styleguide/pyguide.html#383-functions-and-methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -228,26 +267,96 @@ def type_info(v): | |||
return {"role": role, "content": joined_messages} | |||
|
|||
|
|||
def enumerate_fields(fields: dict) -> str: | |||
def _format_constraint(name: str, value: Union[str, float]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this function and other related ones into dspy/signatures/field
and import in the adapter to keep adapter code simple.
This PR enhances the ChatAdapter implementation with two improvements:
1. Added Field Metadata Processing:
example:
relevance_and_significance (float): The score of the argument based on its relevance and significance.
When evaluating relevance and significance, consider the following questions:
The score should be a floating-point number between the specified lower and upper bounds. [Metadata: Ge(ge=0.0); Le(le=1.0)] (this is formatted using format_metadata_summary())
[[ ## relevance_and_significance ## ]]
{relevance_and_significance} # note: the value you produce must be a single float value that is greater than or equal to 0.0 and less than or equal to 1.0. (this is formatted using format_metadata_constraints())
2. Improved Code Documentation and Type Hints: