Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log requests and responses to/from OpenAI's API #25

Merged
merged 1 commit into from
Sep 16, 2023

Conversation

yonromai
Copy link
Contributor

@yonromai yonromai commented Sep 15, 2023

This PR adds persistence on disk of all actual requests and responses to/from OpenAI's API by default. (I don't know why I didn't implement this earlier - it's been bothering me for a while)

@ravwojdyla

it would be worth to take a look at the actual explanations from the current CoT prompts and look for patterns that we could improve, at least that worked well for me in the past. Otherwise we would be doing a "shotgun prompt tuning" ™️ ?

I fetched 50 more samples with the CoT prompt and used the new logging code to exact the prompt and response choices (#1, #2)

Note: If you want the explanations for some specific nodes (e.g. based on worst errors), I can nuke the cache and rerun the tagging for all 500 nodes so we have the CoT explanation for all of them.

(cc: @eric-czech @dhimmel )

@ravwojdyla
Copy link

@yonromai 🔥 🙏 I was curious about that specific example in #24 (comment)

'efo_definition': 'A viral infectious disease that results_in infection in '
'sheep and rarely humans, has_material_basis_in Louping '
'ill virus, which is transmitted_by sheep tick, Ixodes '
'ricinus. The infection has_symptom lethargy, has_symptom '
'muscle pains, has_symptom fever, and has_symptom focal '
'neurological signs.',
'efo_id': 'EFO:0007348',

But feel free to ignore this request if it's complicated to get that.

@yonromai
Copy link
Contributor Author

@ravwojdyla

But feel free to ignore this request if it's complicated to get that.

No, just had to nuke that puppy from the cache.

(In case useful: link to prompt).

Here's completion 1:

The record describes the term "louping ill". The description indicates that this is a specific viral infectious disease that has defined symptoms and a specific mode of transmission. This disease appears to have a distinct clinical profile. Therefore, I would categorize it as a high precision term. High precision terms tend to represent specific, well-defined conditions with distinguishing clinical characteristics.

<END_OF_COT>
id|precision
EFO:0007348|high

and completion 2:

The disease term given is "louping ill", a viral infectious disease found in sheep and rarely humans. It is caused by the Louping ill virus, which is transmitted by the sheep tick, Ixodes ricinus. The infection can result in symptoms such as lethargy, muscle pains, fever, and focal neurological signs. Due to its specificity in terms of causing organism, transmission vector and symptoms, alongside its limited host range (primarily affecting sheep and very rarely humans), it represents a more definite and specific group of infected individuals. Thereby, it meets the criteria of a high precision term. 

<END_OF_COT>
id|precision
EFO:0007348|high

Obvious note: The outcome is potentially different than last execution since non-deterministic

@ravwojdyla
Copy link

@yonromai nice 🙏 ! Man, world post-GPT is going to be very hard to debug ... 🤣Now is this dramatic change from all low to all high due to non-determinism, CoT, or something else 🤷‍♂️

@yonromai
Copy link
Contributor Author

😭

I wonder if it's a sign that we should lower the model temperature

@ravwojdyla
Copy link

Important

please don't let my comments distract you from the #13. Feel free to ignore my comments/requests or respond in a week or two :)

@yonromai actually looking at the notebook where this example came from in #24, it was the original prompt that classified it as all low, we don't know what the CoT for this example there. At least here it seems like it did better.

I fetched 50 more samples with the CoT prompt and used the new logging code to exact the prompt and response choices (#1, #2)

It would be cool if this was sorted by the distance from the true label, such that we could focus on the problematic examples.

@yonromai
Copy link
Contributor Author

Sounds good, I'd be quick to check but probably wise to postpone it until after the workshop - I'll merge this PR for now.

@yonromai yonromai merged commit cb634ec into main Sep 16, 2023
@yonromai yonromai deleted the romain/log-openai-payloads branch September 16, 2023 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants