Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata processing in chat adapter #2040

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gilad12-coder
Copy link
Contributor

This PR enhances the ChatAdapter implementation with two improvements:

1. Added Field Metadata Processing:

  • Introduced new metadata handling capabilities for numeric field constraints with PERMITTED_CONSTRAINTS
  • Added new utility functions to process field metadata:
    • _format_constraint(): Formats numeric constraints (gt, lt, ge, le, multiple_of, allow_inf_nan) based on the constraints that can be passed in Pydantic's fields class (https://docs.pydantic.dev/latest/concepts/fields/#field-aliases)
    • format_metadata_summary(): Creates human-readable metadata summaries for each field
    • format_metadata_constraints(): Formats field constraints for prompt generation
  • Enhanced field descriptions in prompts to include metadata information (using the summary and the paragraphical format).
  • Added support for numeric field constraints in field metadata processing

example:
relevance_and_significance (float): The score of the argument based on its relevance and significance.
When evaluating relevance and significance, consider the following questions:

  • How significant is the central claim the argument is trying to establish?
  • How relevant is the claim to the topic?
    The score should be a floating-point number between the specified lower and upper bounds. [Metadata: Ge(ge=0.0); Le(le=1.0)] (this is formatted using format_metadata_summary())
    [[ ## relevance_and_significance ## ]]
    {relevance_and_significance} # note: the value you produce must be a single float value that is greater than or equal to 0.0 and less than or equal to 1.0. (this is formatted using format_metadata_constraints())

2. Improved Code Documentation and Type Hints:

  • Added comprehensive docstring to all functions and classes
  • Added proper type hints throughout the file
  • Improved function signatures with explicit type annotations
  • Enhanced error messages for better debugging
  • Added descriptive comments for complex logic sections

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

Both the metadata formatting and docstring improvements are pretty useful, but could you split them into 2 PRs? We also need to add the metadata formatting to the json adapter for consistency.

Copy link
Contributor

@zbambergerNLP zbambergerNLP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work here! I think some functions that are duplicated between the chat and json adapter should move to Base in order to reduce code duplication. Otherwise, most of my comments are formatting nits (mostly in favor of type annotations, which should really help developers that have built in IDE tools that use these).

Comment on lines 46 to 44
def __call__(self, lm: LM, lm_kwargs: dict[str, Any], signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any]) -> list[dict[str, Any]]:
def __call__(self, lm: LM, lm_kwargs: dict[str, Any], signature: Type[Signature], demos: list[dict[str, Any]],
inputs: dict[str, Any]) -> list[dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format as follows:

def __call__(
  self, 
  lm: LM, 
  lm_kwargs: Dict[str, Any], 
  ...
) -> List[Dict[str, Any]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 54 to 55
def format(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any]) -> list[
dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply format I suggested in nit above.
Docstring I presume will go in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -99,7 +85,7 @@ def format(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs
messages = try_expand_image_tags(messages)
return messages

def parse(self, signature: Signature, completion: str, _parse_values: bool = True):
def parse(self, signature: Type[Signature], completion: str) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Dict instead of dict. Apply here and elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 116 to 117
def format_finetune_data(self, signature: Type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any],
outputs: dict[str, Any]) -> dict[str, list[Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to previous formatting of the function definition signature (multi-line and in the style I suggested in my nits above). Presumably docstring will go in separate PR. Note that we should use "Parameters" instead of "Args" (also in other function definitions where you'd added docstrings).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 140 to 141
def format_turn(self, signature: Type[Signature], values: dict[str, Any], role: str, incomplete: bool = False,
is_conversation_history: bool = False) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggested format for function definitions in my first nit in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return "\n".join(parts).strip()


def move_type_to_front(d: Union[Dict, List, Any]) -> Union[Dict, List, Any]:
"""Moves the 'type' key to the front of the dictionary, recursively, for LLM readability/adherence."""
def move_type_to_front(d):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the type annotations here IMO. Makes the function easier to parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def prepare_schema(type_: Type) -> Dict[str, Any]:
"""Prepares a JSON schema for a given type."""
schema: Dict[str, Any] = pydantic.TypeAdapter(type_).json_schema()
def prepare_schema(field_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep type annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

parts = []
parts.append("Your input fields are:\n" + enumerate_fields(signature.input_fields))
parts.append("Your output fields are:\n" + enumerate_fields(signature.output_fields))
parts.append("All interactions will be structured in the following way, with the appropriate values filled in.")

def field_metadata(field_name: str, field_info: FieldInfo) -> str:
"""Creates a formatted representation of a field's information and metadata."""
def field_metadata(field_name, field_info):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep type annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +153 to +162
def _format_constraint(name: str, value: Union[str, float]) -> str:
constraints = {
'gt': f"greater than {value}",
'lt': f"less than {value}",
'ge': f"greater than or equal to {value}",
'le': f"less than or equal to {value}",
'multiple_of': f"a multiple of {value}",
'allow_inf_nan': "allows infinite and NaN values" if value else "no infinite or NaN values allowed"
}
return constraints.get(name, f"{name}={value}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth moving these into the base class for adapter for the sake of inheritance.

"""
Formats the constraints for a field.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Replace with Parameters:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Two high level feedbacks:

  • We should move the constraint formatting logic into dspy/signatures/field.py since it is irrelevant to adapter types. We don't want to introduce new complexity to adapter, which is already unnecessarily complex now.
  • We are in the middle of an effort to standardize the style, and we use Google style guide as the guideline: https://google.github.io/styleguide/pyguide.html#383-functions-and-methods. Most of formatting in the PR doesn't follow the rules there.

lm_kwargs: Dict[str, Any],
signature: Type[Signature],
demos: List[Dict[str, Any]],
inputs: Dict[str, Any]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing comma at the end, also don't use capital case Dict, using primitive types like dict is preferred: https://google.github.io/styleguide/pyguide.html#221-type-annotated-code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

signature: Type[Signature],
demos: List[Dict[str, Any]],
inputs: Dict[str, Any]
) -> List[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -118,10 +140,15 @@ def format_finetune_data(self, signature: Type[Signature], demos: list[dict[str,
assistant_message = self.format_turn(signature, outputs, role, incomplete)
messages.append(assistant_message)

# Wrap the messages in a dictionary with a "messages" key
# Wrap the messages in a Dictionary with a "messages" key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to capitalize this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Args:
fields_with_values: A dictionary mapping information about a field to its corresponding
value.
Parameters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -228,26 +267,96 @@ def type_info(v):
return {"role": role, "content": joined_messages}


def enumerate_fields(fields: dict) -> str:
def _format_constraint(name: str, value: Union[str, float]) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this function and other related ones into dspy/signatures/field and import in the adapter to keep adapter code simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants