-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Change the output interface of evaluate #8003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the output interface of evaluate #8003
Conversation
chenmoneygithub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work! LGTM with one minor comment.
Let's talk offline for potential breakage, and align on the release schedule.
7aeb618 to
ed8fd13
Compare
2f7a26e to
ce1ea2d
Compare
|
please merge the PR to be able to get the individual example level evaluation score. This will be useful for mlflow tracing. |
|
Hi @okhat, could you review this PR when you have a moment? Thanks! |
cb0c5bd to
a718520
Compare
dspy/primitives/prediction.py
Outdated
| def __eq__(self, other): | ||
| if isinstance(other, (float, int)): | ||
| return self.__float__() == other | ||
| elif isinstance(other, Prediction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is really dangerous! It's a really bad idea. Why did we add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added mainly for two reasons:
- To support the existing logic that compares the equality of evaluation outputs. Users may also do
eval(program_1)==eval(program_2)in their code - Completeness for the comparison operator of dspy.Prediction. It is not consistent if
>=works fordspy.Predictionbut==does not
In this PR, we change the output interface of
Evaluate.__call__.Instead of returning either score, (score, outputs), (score, scores, outputs) based on arguments, it will always return EvaluationResult containing the following fields:
Since this is a breaking change, this should be released in the next minor release rather than a patch release.
Resolve mlflow/mlflow#15476