-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactored debug log for ai-collab #23565
base: main
Are you sure you want to change the base?
Conversation
… a few other small adjustments
…from a single initial api call
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts
Outdated
Show resolved
Hide resolved
…ug log tests since they are integ tests
/** | ||
* {@inheritDoc TokenUsage} | ||
*/ | ||
readonly tokensUsed: TokenUsage; | ||
tokensUsed: TokenUsage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should stay readonly
. If we need an internal version of the interface where we set the field and change it, I'd say lets copy a helper type like this one to use internally, but keep exposing a readonly
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in commit 68361c5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it got missed in that commit; latest in Github still doesn't have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked and it should be in this time.
const sf = new SchemaFactory("TestApp"); | ||
class TestAppSchema extends sf.object("TestAppSchema", { | ||
title: sf.string, | ||
tasks: sf.array( | ||
sf.object("Task", { | ||
title: sf.string, | ||
description: sf.string, | ||
}), | ||
), | ||
}) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a recent change by Craig is already merged, which made all the schemas available as static members of SchemaFactory
, so we don't need to instantiate it anymore and can just do SchemaFactory.string
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to see an important refactor here. Having several tests to validate different parts of the response to the same API call feels kind of weird. I think a single it
test that goes through the debugLog
entry by entry and validates what is expected of things would make more sense. It'd be basically telling a story of what we expect to see, step by step. We can break the validation for each "step" into a helper function for readability, but all this really feels like a single test to me.
Also, the it()
invocation(s) should not need such high timeouts if they're only working on the local data already put in debugLog
by the before
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the timeouts on the individual it() statements and adding some clarifying comments in commit 68361c5
I originally had this as one big test but worried that it would not be as clear what broke to other developers.
If we have these split tests, It will be clear what part of the debug logs eventing is not working as expected from just looking at test failures. If we have it all as one, the whole test will fail and you'll have to investigate to see what part went wrong and what's being tested in that segment of the test.
If you feel strongly I can change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each call to asssert.deepStrictEqual()
can provide a failure message in case that specific one fails (example), and that ends up in the test result. I think that's enough to identify the failure point, so would still push for having a single test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the tests to go back to a single large it()
with some new messages for each assertion.
|
||
const OPENAI_API_KEY = ""; // DON'T COMMIT THIS | ||
|
||
describe.skip("Debug Log Works as expected", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave a comment about why this is disabled (it's an integration test, not a unit test, and we still don't have a great place for those).
That said, I think we should be able to just mock the OpenAI client with Sinon stubs (or simiar), so we can make this an actual unit test that runs in milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe we should just go that route.
The only issue is that I imagine the stub/mock would be pretty complex when you consider what parameters the open ai client should receive versus when to return particular expected tree edits.
I've been thinking of holding off on this until we abstract the LLM client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple mock, tailored for a given test, could just return hardcoded responses in a sequence regardless of its inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have created a separate task to update all these tests to mocks -- Since we are updating the LLM client to be the new abstracted design I'll be efficient for us to do the mock refactor after the abstracted design is implemented.
…o debugLogRefactor
Co-authored-by: Alex Villarreal <[email protected]>
…ramework into debugLogRefactor
…only for AiCollabErrorResponse, small new usage of SchemaFactory static members in tests and removes individual timeouts from it() tests in deubgLog.spec.ts
… to have one large single test. Updates readme.md debug event names
🔗 Found some broken links! 💔 Run a link check locally to find them. See linkcheck output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall now. I left a few things that I think we should do, but could be follow ups.
|
||
/** | ||
* An edit generated by an LLM that can be applied to a given SharedTree. | ||
* @remarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's more for us than for the consumers.
* @remarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it. | |
* @privateremarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it. |
### Event flows | ||
1. `CORE_EVENT_LOOP`: All events with this `eventFlowName` are used to mark the start and end of the life cycle of a single execution of the ai-collab function. | ||
- Events: | ||
1. `CoreEventLoopStarted`: Events with the `eventName` `CORE_EVENT_LOOP_STARTED`. This event marks the start of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking all this documentation should actually go in the TSDoc for the corresponding event. Something like "This event will be emitted exactly once per ai-collab function execution" makes a lot of sense there, more than in the README. It would also make some of the docs redundant (e.g. docs for "this event has eventName X
" are probably not necessary when one is looking at the interface definition already). Keeping the list of events in each flow here is ok, and maybe we can link to the source file where all the events live for those who want more details. I'd particularly advocate for this to remove the easy-to-miss **IMPORTANT**: If you change this file make sure the root README.md file is updated to reflect the changes.
in the source file.
TReviewResponse = TIsLlmResponseValid extends true ? "yes" | "no" : undefined, | ||
> extends EventFlowDebugEvent { | ||
eventName: "FINAL_REVIEW_COMPLETED"; | ||
eventFlowName: "FINAL_REVIEW"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we use EventFlowDebugNames.FINAL_REVIEW
(or corresponding value) for eventFlowName
across all events?
Draft of refactored debug log for ai-collab. See debugLog.spec.ts for example.
The following diagram shows the sequence of expected debug event emissions

Here is a snippet from the README.md which explains the DebugLog
Debug Events
This package allows users to consume
DebugEvents
that can very helpful in understanding what's going on internally and debugging potential issues.Users can consume these events by providing a callback to the ai-collab function's
debugEventLogHandler
parameter:There are two types of debug events,
DebugEvent
which is the core interface and is used to describe ALL debug events andEventFlowDebugEvent
which is for more specific debug events that mark a progress point in a specific logic flow within a single ai-collab function call.There are a few primary event flow names:
CORE_EVENT_LOOP
: All events with thiseventFlowName
are used to mark the start and end of the life cycle of a single execution of the ai-collab function.CoreEventLoopStartedDebugEvent
: Events with theeventName
CORE_EVENT_LOOP_STARTED
. This event marks the start of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution.CoreEventLoopCompletedDebugEvent
:Events with theeventName
CORE_EVENT_LOOP_COMPLETED
. This event marks the end of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution.GENERATE_PLANNING_PROMPT
: All events with thiseventFlowName
are used to mark the start, end and outcome of the LLM generating the planning prompt used to assist the LLM to plan how it will edit the SharedTree based on the user askPlanningPromptStartedDebugEvent
:Events with theeventName
GENERATE_PLANNING_PROMPT_STARTED
. This event marks the start of the logic flow for generating the planning prompt. There will be exactly 1 of these events per ai-collab function execution.DebugEvent
's triggered:LlmApiCallDebugEvent
: In order to generate the planning prompt, a call to the LLM is necessary. ThisDebugEvent
captures the request and its raw result from said API call.PlanningPromptCompletedDebugEvent
:Events with theeventName
GENERATE_PLANNING_PROMPT_COMPLETED
: This event marks the end and outcome of the LLM generating the planning prompt There will be exactly 1 of these events per ai-collab function execution.GENERATE_TREE_EDIT
: All events with thiseventFlowName
are used to mark the start, end and outcome of the LLM generating a single TreeEdit that will be applied to the tree. It is expected that the LLM will generate multiple of these events when it must generate multiple tree edits to satisfy the user requestGenerateTreeEditStartedDebugEvent
: Events with theeventName
GENERATE_TREE_EDIT_STARTED
: This event marks the start of the logic flow for generating a single tree editDebugEvent
's triggered:LlmApiCallDebugEvent
: In order to generate a Tree Edit, a call to the LLM is necessary. ThisDebugEvent
captures the request and its raw result from said API call.GenerateTreeEditCompletedDebugEvent
: Events with theeventName
GENERATE_TREE_EDIT_COMPLETED
. This event marks the end and outcome of the LLM generating a single tree edit. Note that if the LLM returnsnull
as its edit at this step, it is signaling that it things no more edits are necessary.FINAL_REVIEW
: All events with thiseventFlowName
are used to mark the start, end and outcome of the requesting the LLM to review its work and determine whether the users ask was accomplished or more edits are needed.FinalReviewStartedDebugEvent
: Events with theeventName
FINAL_REVIEW_STARTED
: This event marks the start of the logic flow for requesting the LLM complete a final review of the edits it has created and whether it believes the users ask was accomplished or more edits are needed. If the LLM thinks more edits are needed, theGENERATE_TREE_EDIT
will start again.DebugEvent
's triggered:LlmApiCallDebugEvent
: In order to conduct the final review, a call to the LLM is necessary. ThisDebugEvent
captures the request and its raw result from said API call.FinalReviewCompletedDebugEvent
: Events with theeventName
FINAL_REVIEW_COMPLETED
. This event marks the end and outcome of the logic flow for requesting the LLM complete a final review of the edits it has created.using Trace Id's
Debug Events in ai-collab have two different types of trace id's:
traceId
: This field exists on all debug events and can be used to coorelate all debug events that happened in a single execution. Sorting the events by timestamp will show the proper chronological order of the events. Note that the events should already be emitted in chronological order.eventFlowTraceId
: this field exists on allEventFlowDebugEvents
and can be used to co