Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031

gilad12-coder · 2025-03-30T21:01:38Z

Description:

This PR introduces a significantly enhanced dspy.Refine module, replacing the previous implementation with a more robust, flexible, and controllable system for iteratively improving module predictions.

Motivation:

The previous Refine module primarily focused on retrying a module with varying temperatures and generating LLM-based feedback when a reward_fn score fell below a threshold. While useful, it lacked:

Support for Hard Constraints: No built-in mechanism to enforce programmatic validation rules (e.g., output format, specific content requirements) beyond the soft constraints of a reward function.
Granular Feedback: Feedback was solely LLM-generated based on reward scores, lacking the ability to provide direct, programmatic feedback for specific validation failures.
Configurability: Limited options to control behavior beyond N attempts and the reward threshold.

Changes Introduced:

The new Refine module addresses these limitations by introducing several key features:

Validators (Hard Constraints):
- Accepts an optional validators argument: a list of functions (Prediction) -> (bool, str).
- Each validator checks the prediction. If it returns False, the accompanying string provides specific feedback on the failure.
- Failed validations trigger retry attempts with the collected feedback messages incorporated into the prompt context (as the previous_attempts input field within a signature).
Distinct Handling of Constraints:
- Hard Constraints (validators): Must pass for a prediction to be considered valid. Failure triggers retries with specific error messages.
- Soft Constraints (reward_fn, threshold): Evaluated only if validators pass (or if no validators are provided). Falling below the threshold can trigger retries aimed at improving quality, potentially using LLM-generated feedback (OfferFeedback).
Improved Retry Logic & State Management:
- Retries up to N times with varying temperatures.
- Tracks the best_prediction based on a clear hierarchy:
  - Prefers predictions passing all validators.
  - Among those passing validators, prefers higher reward_fn scores (if applicable).
  - Among those failing validators, prefers predictions with fewer validation errors.
- Uses an internal dspy.Predict or dspy.ChainOfThought (controlled by use_cot) to handle retries when validators are used, incorporating the previous_attempts log directly into the signature.
Enhanced Configuration & Control:
- verbose: Enables detailed logging of the refinement process (validation checks, reward scores, feedback).
- fail_on_invalid: If True, raises a ValueError if no prediction meets all constraints (validators and reward threshold) after N attempts. If False (default), returns the best prediction found according to the hierarchy above.
- use_cot: Allows using Chain-of-Thought for the internal prediction steps during refinement when validator feedback is being processed.
Observability:
- The returned Prediction object includes a Refine_metadata attribute containing details about the refinement process (iterations, success status, final reward, validation status, attempts log).

Breaking Changes:

The constructor signature has changed significantly. Users migrating will need to update initialization calls.
The core behavior is different due to the introduction of validators and the prioritized execution flow (validators first, then reward).
The fail_count parameter is removed, replaced by fail_on_invalid.

gilad12-coder · 2025-03-31T18:31:34Z

Heads Up: Be aware that the LLM sometimes struggles to reliably format the complex advice dictionary requested by OfferFeedback signature (used in Refine). There's a try-except fallback to "N/A", but it means the generated feedback might not always be effective. See _get_feedback in Refine.

zbambergerNLP

Looks excellent! Though worth retaining the per-submodule approach of OfferFeedback rather than globalizing its scope to the entire program. Perhaps worth adding a TODO to add support for a global-level feedback signature, and allow users to select which they'd prefer. Also, worth allowing support for Boolean thresholds/rewards, and specifying clearly that rewards/verifier scores are bounded in the range [0, 1].

zbambergerNLP · 2025-04-14T08:41:25Z

dspy/predict/refine.py

+    feedback: str = OutputField(
+        desc=(
+            "Provide concrete and actionable feedback for the module to improve"
+            " its output on similar inputs in the future. Focus on how to"
+            " achieve a score >= the threshold. If the module met the"
+            " threshold, write N/A."
+        )


We may want this to remain a dict[str, str] and maintain the functionality of advice -- notably the breakdown of feedback to specific sub-modules within the program. The existing description seems good for evoking feedback from the LLM, but we should ask for this type of feedback for every sub-module, and not the program as a whole.

When calling a module that wraps the OfferFeedback signature, you'll likely need to format the dictionary into a single string, which will then in turn be formatted into the previous_attempts field in Refine.

zbambergerNLP · 2025-04-14T08:44:02Z

dspy/predict/refine.py

+    In the discussion, analyze why the module failed to meet the desired score threshold based on its inputs and outputs.
+    Consider the trajectory, reward function, and target score. Assign blame if applicable.
+    Then, prescribe concrete, actionable advice for how the module should modify its behavior on its future input
+    when we retry the process, especially if it receives the same or similar inputs.
+    The module will not see its own history directly, only the feedback you provide.
+    Focus on specific examples from the trajectory and clear instructions for improvement.


We should still aim to offer feedback at the per-submodule level of the program. I would retain the existing functionality of OfferFeedback wherever possible, perhaps adjusting the wording of descriptions slightly how the LM should provide feedback.

zbambergerNLP · 2025-04-14T08:48:26Z

dspy/predict/refine.py

+        assert isinstance(
+            threshold, (float, int)
+        ), "`threshold` must be a numerical value."


Nit: can likely fit on one line. Worth supporting booleans as well.

chenmoneygithub · 2025-04-17T20:54:31Z

dspy/predict/refine.py

+    history of previous outputs, scores, and feedback via the `previous_attempts` input field.
+
+    Example:
+    >>> import dspy


This style breaks on mkdocs (our doc site), let's use the same style as:

dspy/dspy/streaming/streamify.py

Line 51 in 5cd355b

Example:

chenmoneygithub · 2025-04-17T20:55:19Z

dspy/predict/refine.py

-            result = best_of_3(question="What is the capital of Belgium?").answer
-            # Returns: Brussels
-            ```
+            signature (Signature): The DSPy signature defining inputs/outputs for the module being refined.


nit: just 4 indents after linebreak instead of vertical alignment.

AriMKatz · 2025-05-11T22:47:31Z

@gilad12-coder is this gonna go into 3.0? Anything blocking it? thanks for your work

gilad12-coder and others added 2 commits March 30, 2025 23:52

refactored Refine to work with soft and hard constraints

3cbcbae

Merge branch 'main' into main

ebcb60f

gilad12-coder changed the title ~~Enhanced dspy.Refine with Validators, Granular Control, and Improved Feedback~~ Enhance dspy.Refine with Validators for Flexible Hard/Soft Constraint Handling Mar 30, 2025

gilad12-coder changed the title ~~Enhance dspy.Refine with Validators for Flexible Hard/Soft Constraint Handling~~ Enhance dspy.Refine with support for Hard/Soft Constraint Handling Mar 30, 2025

updated the code so only a signature can be passed in

bf8ecc5

gilad12-coder and others added 4 commits April 5, 2025 13:35

modified the test to support the new design of Refine

82df563

updated the code after Omar's comments

766c36c

Merge branch 'main' into main

adb3e84

updated the code to work with new inspect_modules

e188955

zbambergerNLP reviewed Apr 14, 2025

View reviewed changes

gilad12-coder added 3 commits April 15, 2025 23:22

updated the code to use the original OfferFeedback signature

07066a3

updated the code based on zachs comments

5212e70

pushed with ruff check

b22c84c

chenmoneygithub reviewed Apr 17, 2025

View reviewed changes

chenged the code to align it with Chen's feedback

3d1de76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031

Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031

gilad12-coder commented Mar 30, 2025

gilad12-coder commented Mar 31, 2025

zbambergerNLP left a comment

zbambergerNLP Apr 14, 2025

zbambergerNLP Apr 14, 2025

gilad12-coder Apr 15, 2025

zbambergerNLP Apr 14, 2025

gilad12-coder Apr 15, 2025

zbambergerNLP Apr 14, 2025

gilad12-coder Apr 15, 2025

gilad12-coder Apr 15, 2025

chenmoneygithub Apr 17, 2025

gilad12-coder Apr 18, 2025

chenmoneygithub Apr 17, 2025

gilad12-coder Apr 18, 2025

AriMKatz commented May 11, 2025 •

edited

Loading

Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031

Are you sure you want to change the base?

Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031

Conversation

gilad12-coder commented Mar 30, 2025

gilad12-coder commented Mar 31, 2025

zbambergerNLP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AriMKatz commented May 11, 2025 • edited Loading

AriMKatz commented May 11, 2025 •

edited

Loading