-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enhance dspy.Refine with support for Hard/Soft Constraint Handling #8031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Heads Up: Be aware that the LLM sometimes struggles to reliably format the complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent! Though worth retaining the per-submodule approach of OfferFeedback
rather than globalizing its scope to the entire program. Perhaps worth adding a TODO to add support for a global-level feedback signature, and allow users to select which they'd prefer. Also, worth allowing support for Boolean thresholds/rewards, and specifying clearly that rewards/verifier scores are bounded in the range [0, 1].
dspy/predict/refine.py
Outdated
feedback: str = OutputField( | ||
desc=( | ||
"Provide concrete and actionable feedback for the module to improve" | ||
" its output on similar inputs in the future. Focus on how to" | ||
" achieve a score >= the threshold. If the module met the" | ||
" threshold, write N/A." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want this to remain a dict[str, str]
and maintain the functionality of advice
-- notably the breakdown of feedback to specific sub-modules within the program. The existing description seems good for evoking feedback from the LLM, but we should ask for this type of feedback for every sub-module, and not the program as a whole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When calling a module that wraps the OfferFeedback
signature, you'll likely need to format the dictionary into a single string, which will then in turn be formatted into the previous_attempts
field in Refine
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/predict/refine.py
Outdated
In the discussion, analyze why the module failed to meet the desired score threshold based on its inputs and outputs. | ||
Consider the trajectory, reward function, and target score. Assign blame if applicable. | ||
Then, prescribe concrete, actionable advice for how the module should modify its behavior on its future input | ||
when we retry the process, especially if it receives the same or similar inputs. | ||
The module will not see its own history directly, only the feedback you provide. | ||
Focus on specific examples from the trajectory and clear instructions for improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still aim to offer feedback at the per-submodule level of the program. I would retain the existing functionality of OfferFeedback
wherever possible, perhaps adjusting the wording of descriptions slightly how the LM should provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/predict/refine.py
Outdated
assert isinstance( | ||
threshold, (float, int) | ||
), "`threshold` must be a numerical value." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can likely fit on one line. Worth supporting booleans as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dspy/predict/refine.py
Outdated
history of previous outputs, scores, and feedback via the `previous_attempts` input field. | ||
|
||
Example: | ||
>>> import dspy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style breaks on mkdocs (our doc site), let's use the same style as:
dspy/dspy/streaming/streamify.py
Line 51 in 5cd355b
Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
result = best_of_3(question="What is the capital of Belgium?").answer | ||
# Returns: Brussels | ||
``` | ||
signature (Signature): The DSPy signature defining inputs/outputs for the module being refined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just 4 indents after linebreak instead of vertical alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@gilad12-coder is this gonna go into 3.0? Anything blocking it? thanks for your work |
Description:
This PR introduces a significantly enhanced
dspy.Refine
module, replacing the previous implementation with a more robust, flexible, and controllable system for iteratively improving module predictions.Motivation:
The previous
Refine
module primarily focused on retrying a module with varying temperatures and generating LLM-based feedback when areward_fn
score fell below athreshold
. While useful, it lacked:N
attempts and the rewardthreshold
.Changes Introduced:
The new
Refine
module addresses these limitations by introducing several key features:Validators (Hard Constraints):
validators
argument: a list of functions(Prediction) -> (bool, str)
.False
, the accompanying string provides specific feedback on the failure.previous_attempts
input field within a signature).Distinct Handling of Constraints:
validators
): Must pass for a prediction to be considered valid. Failure triggers retries with specific error messages.reward_fn
,threshold
): Evaluated only if validators pass (or if no validators are provided). Falling below the threshold can trigger retries aimed at improving quality, potentially using LLM-generated feedback (OfferFeedback
).Improved Retry Logic & State Management:
N
times with varying temperatures.best_prediction
based on a clear hierarchy:reward_fn
scores (if applicable).dspy.Predict
ordspy.ChainOfThought
(controlled byuse_cot
) to handle retries when validators are used, incorporating theprevious_attempts
log directly into the signature.Enhanced Configuration & Control:
verbose
: Enables detailed logging of the refinement process (validation checks, reward scores, feedback).fail_on_invalid
: IfTrue
, raises aValueError
if no prediction meets all constraints (validators and reward threshold) afterN
attempts. IfFalse
(default), returns the best prediction found according to the hierarchy above.use_cot
: Allows using Chain-of-Thought for the internal prediction steps during refinement when validator feedback is being processed.Observability:
Prediction
object includes aRefine_metadata
attribute containing details about the refinement process (iterations, success status, final reward, validation status, attempts log).Breaking Changes:
fail_count
parameter is removed, replaced byfail_on_invalid
.