refactored debug log for ai-collab #23565

seanimam · 2025-01-15T19:42:37Z

Draft of refactored debug log for ai-collab. See debugLog.spec.ts for example.

The following diagram shows the sequence of expected debug event emissions

Here is a snippet from the README.md which explains the DebugLog

Debug Events

This package allows users to consume DebugEvents that can very helpful in understanding what's going on internally and debugging potential issues.
Users can consume these events by providing a callback to the ai-collab function's debugEventLogHandler parameter:

aiCollab({
	openAI: {
		client: new OpenAI({
			apiKey: OPENAI_API_KEY,
		}),
		modelName: "gpt-4o",
	},
	treeNode: view.root.taskGroups[0],
	prompt: {
		systemRoleContext:
			"You are a manager that is helping out with a project management tool. You have been asked to edit a group of tasks.",
		userAsk: userAsk,
	},
	limiters: {
		maxModelCalls: 25
	}
	planningStep: true,
	finalReviewStep: true,
	debugEventLogHandler: (event: DebugEvent) => {console.log(event);} // This should be your debug event log handler
});

There are two types of debug events, DebugEvent which is the core interface and is used to describe ALL debug events and EventFlowDebugEvent which is for more specific debug events that mark a progress point in a specific logic flow within a single ai-collab function call.

There are a few primary event flow names:

CORE_EVENT_LOOP: All events with this eventFlowName are used to mark the start and end of the life cycle of a single execution of the ai-collab function.
- Events:
  1. CoreEventLoopStartedDebugEvent: Events with the eventName CORE_EVENT_LOOP_STARTED. This event marks the start of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution.
  2. CoreEventLoopCompletedDebugEvent:Events with the eventName CORE_EVENT_LOOP_COMPLETED. This event marks the end of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution.
GENERATE_PLANNING_PROMPT: All events with this eventFlowName are used to mark the start, end and outcome of the LLM generating the planning prompt used to assist the LLM to plan how it will edit the SharedTree based on the user ask
- Events
  1. PlanningPromptStartedDebugEvent:Events with the eventName GENERATE_PLANNING_PROMPT_STARTED. This event marks the start of the logic flow for generating the planning prompt. There will be exactly 1 of these events per ai-collab function execution.
    - Child DebugEvent's triggered:
      1. LlmApiCallDebugEvent: In order to generate the planning prompt, a call to the LLM is necessary. This DebugEvent captures the request and its raw result from said API call.
  2. PlanningPromptCompletedDebugEvent:Events with the eventName GENERATE_PLANNING_PROMPT_COMPLETED: This event marks the end and outcome of the LLM generating the planning prompt There will be exactly 1 of these events per ai-collab function execution.
GENERATE_TREE_EDIT: All events with this eventFlowName are used to mark the start, end and outcome of the LLM generating a single TreeEdit that will be applied to the tree. It is expected that the LLM will generate multiple of these events when it must generate multiple tree edits to satisfy the user request
- Events:
  1. GenerateTreeEditStartedDebugEvent: Events with the eventName GENERATE_TREE_EDIT_STARTED: This event marks the start of the logic flow for generating a single tree edit
    - Child DebugEvent's triggered:
      1. LlmApiCallDebugEvent: In order to generate a Tree Edit, a call to the LLM is necessary. This DebugEvent captures the request and its raw result from said API call.
  2. GenerateTreeEditCompletedDebugEvent: Events with the eventName GENERATE_TREE_EDIT_COMPLETED. This event marks the end and outcome of the LLM generating a single tree edit. Note that if the LLM returns null as its edit at this step, it is signaling that it things no more edits are necessary.
FINAL_REVIEW: All events with this eventFlowName are used to mark the start, end and outcome of the requesting the LLM to review its work and determine whether the users ask was accomplished or more edits are needed.
- Events:
  - FinalReviewStartedDebugEvent: Events with the eventName FINAL_REVIEW_STARTED: This event marks the start of the logic flow for requesting the LLM complete a final review of the edits it has created and whether it believes the users ask was accomplished or more edits are needed. If the LLM thinks more edits are needed, the GENERATE_TREE_EDIT will start again.
    - Child DebugEvent's triggered:
      1. LlmApiCallDebugEvent: In order to conduct the final review, a call to the LLM is necessary. This DebugEvent captures the request and its raw result from said API call.
  - FinalReviewCompletedDebugEvent: Events with the eventName FINAL_REVIEW_COMPLETED. This event marks the end and outcome of the logic flow for requesting the LLM complete a final review of the edits it has created.

using Trace Id's

Debug Events in ai-collab have two different types of trace id's:

traceId: This field exists on all debug events and can be used to coorelate all debug events that happened in a single execution. Sorting the events by timestamp will show the proper chronological order of the events. Note that the events should already be emitted in chronological order.
eventFlowTraceId: this field exists on all EventFlowDebugEvents and can be used to co

… a few other small adjustments

…from a single initial api call

…st for debug log

packages/framework/ai-collab/package.json

packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts

…ug log tests since they are integ tests

alexvy86 · 2025-02-10T17:39:57Z

packages/framework/ai-collab/src/aiCollabApi.ts

 	/**
 	 * {@inheritDoc TokenUsage}
 	 */
-	readonly tokensUsed: TokenUsage;
+	tokensUsed: TokenUsage;


This feels like it should stay readonly. If we need an internal version of the interface where we set the field and change it, I'd say lets copy a helper type like this one to use internally, but keep exposing a readonly property.

Updated in commit 68361c5

Seems like it got missed in that commit; latest in Github still doesn't have it.

Double checked and it should be in this time.

packages/framework/ai-collab/README.md

alexvy86 · 2025-02-10T21:30:07Z

packages/framework/ai-collab/src/test/ai-collab/debugLog.spec.ts

+const sf = new SchemaFactory("TestApp");
+class TestAppSchema extends sf.object("TestAppSchema", {
+	title: sf.string,
+	tasks: sf.array(
+		sf.object("Task", {
+			title: sf.string,
+			description: sf.string,
+		}),
+	),
+}) {}


I think a recent change by Craig is already merged, which made all the schemas available as static members of SchemaFactory, so we don't need to instantiate it anymore and can just do SchemaFactory.string, etc.

packages/framework/ai-collab/src/test/ai-collab/debugLog.spec.ts

alexvy86 · 2025-02-10T21:46:49Z

packages/framework/ai-collab/src/test/ai-collab/debugLog.spec.ts

I think I'd like to see an important refactor here. Having several tests to validate different parts of the response to the same API call feels kind of weird. I think a single it test that goes through the debugLog entry by entry and validates what is expected of things would make more sense. It'd be basically telling a story of what we expect to see, step by step. We can break the validation for each "step" into a helper function for readability, but all this really feels like a single test to me.

Also, the it() invocation(s) should not need such high timeouts if they're only working on the local data already put in debugLog by the before.

I have updated the timeouts on the individual it() statements and adding some clarifying comments in commit 68361c5

I originally had this as one big test but worried that it would not be as clear what broke to other developers.
If we have these split tests, It will be clear what part of the debug logs eventing is not working as expected from just looking at test failures. If we have it all as one, the whole test will fail and you'll have to investigate to see what part went wrong and what's being tested in that segment of the test.

If you feel strongly I can change this.

Each call to asssert.deepStrictEqual() can provide a failure message in case that specific one fails (example), and that ends up in the test result. I think that's enough to identify the failure point, so would still push for having a single test.

I have updated the tests to go back to a single large it() with some new messages for each assertion.

alexvy86 · 2025-02-10T21:47:13Z

packages/framework/ai-collab/src/test/ai-collab/debugLog.spec.ts

+
+const OPENAI_API_KEY = ""; // DON'T COMMIT THIS
+
+describe.skip("Debug Log Works as expected", () => {


Let's leave a comment about why this is disabled (it's an integration test, not a unit test, and we still don't have a great place for those).

That said, I think we should be able to just mock the OpenAI client with Sinon stubs (or simiar), so we can make this an actual unit test that runs in milliseconds.

Yeah, maybe we should just go that route.

The only issue is that I imagine the stub/mock would be pretty complex when you consider what parameters the open ai client should receive versus when to return particular expected tree edits.

I've been thinking of holding off on this until we abstract the LLM client.

A simple mock, tailored for a given test, could just return hardcoded responses in a sequence regardless of its inputs.

Good point. I have created a separate task to update all these tests to mocks -- Since we are updating the LLM client to be the new abstracted design I'll be efficient for us to do the mock refactor after the abstracted design is implemented.

…o debugLogRefactor

Co-authored-by: Alex Villarreal <[email protected]>

…ramework into debugLogRefactor

…only for AiCollabErrorResponse, small new usage of SchemaFactory static members in tests and removes individual timeouts from it() tests in deubgLog.spec.ts

… to have one large single test. Updates readme.md debug event names

github-actions · 2025-02-21T14:30:56Z

🔗 Found some broken links! 💔

Run a link check locally to find them. See
https://github.com/microsoft/FluidFramework/wiki/Checking-for-broken-links-in-the-documentation for more information.

linkcheck output


> [email protected] ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> [email protected] serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> [email protected] check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

http://localhost:3000/docs/data-structures/tree
- (36:97) 'the Shar..' => http://localhost:3000/docs/api/tree (HTTP 404)

http://localhost:3000/docs/data-structures/tree/
- (36:97) 'the Shar..' => http://localhost:3000/docs/api/tree (HTTP 404)

http://localhost:3000/docs/data-structures/tree/schema-definition
- (30:128) 'SharedTr..' => http://localhost:3000/docs/api/tree (HTTP 404)

http://localhost:3000/docs/start/tree-start
- (44:4) 'the API ..' => http://localhost:3000/docs/api/tree/schemafactory-class (HTTP 404)
- (61:7) 'the API' => http://localhost:3000/docs/api/tree/treechangeevents-interface (HTTP 404)


Stats:
  158264 links
    1304 destination URLs
    1535 URLs ignored
       0 warnings
       3 errors

 ELIFECYCLE  Command failed with exit code 1.

alexvy86

It looks good overall now. I left a few things that I think we should do, but could be follow ups.

alexvy86 · 2025-02-21T15:59:37Z

packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts

+
+/**
+ * An edit generated by an LLM that can be applied to a given SharedTree.
+ * @remarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it.


Since it's more for us than for the consumers.

Suggested change

* @remarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it.

* @privateremarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it.

alexvy86 · 2025-02-21T16:04:27Z

packages/framework/ai-collab/README.md

+### Event flows
+1. `CORE_EVENT_LOOP`: All events with this `eventFlowName` are used to mark the start and end of the life cycle of a single execution of the ai-collab function.
+	- Events:
+		1. `CoreEventLoopStarted`: Events with the `eventName` `CORE_EVENT_LOOP_STARTED`. This event marks the start of the ai-collab function execution life cycle. There will be exactly 1 of these events per ai-collab function execution.


I'm thinking all this documentation should actually go in the TSDoc for the corresponding event. Something like "This event will be emitted exactly once per ai-collab function execution" makes a lot of sense there, more than in the README. It would also make some of the docs redundant (e.g. docs for "this event has eventName X" are probably not necessary when one is looking at the interface definition already). Keeping the list of events in each flow here is ok, and maybe we can link to the source file where all the events live for those who want more details. I'd particularly advocate for this to remove the easy-to-miss **IMPORTANT**: If you change this file make sure the root README.md file is updated to reflect the changes. in the source file.

alexvy86 · 2025-02-21T16:08:18Z

packages/framework/ai-collab/src/explicit-strategy/debugEvents.ts

+	TReviewResponse = TIsLlmResponseValid extends true ? "yes" | "no" : undefined,
+> extends EventFlowDebugEvent {
+	eventName: "FINAL_REVIEW_COMPLETED";
+	eventFlowName: "FINAL_REVIEW";


Nit: I think we use EventFlowDebugNames.FINAL_REVIEW (or corresponding value) for eventFlowName across all events?

Updates to api surface and removes old debug log logic

8bf34d4

github-actions bot added area: framework Framework is a tag for issues involving the developer framework. Eg Aqueduct dependencies Pull requests that update a dependency file public api change Changes to a public API base: main PRs targeted against main branch labels Jan 15, 2025

seanimam requested review from krauders, rohandubal and a team and removed request for rohandubal January 15, 2025 19:42

seanimam changed the title ~~Updates to api surface and removes old debug log logic~~ refactored debug log for ai-collab Jan 15, 2025

seanimam added 3 commits February 4, 2025 23:04

Documentation updates, test updates, addition of eventFlowTraceId and…

5943421

… a few other small adjustments

Test structure refactor into multiple tests that all use shared data …

06e8b05

…from a single initial api call

Adds tutorial to README.md, more documentation. adds skip to integ te…

5491db7

…st for debug log

alexvy86 reviewed Feb 6, 2025

View reviewed changes

seanimam marked this pull request as ready for review February 6, 2025 18:21

seanimam added 2 commits February 6, 2025 18:22

Small readme updates, api export updates, adding skip back to the deb…

ebe23df

…ug log tests since they are integ tests

Tightens the debug events types & API as well as associated JSDOC along.

0900a05

alexvy86 reviewed Feb 10, 2025

View reviewed changes

seanimam and others added 5 commits February 12, 2025 17:34

Merge branch 'main' of https://github.com/seanimam/FluidFramework int…

4cfab78

…o debugLogRefactor

Syntax and wording updates to README

d300184

Co-authored-by: Alex Villarreal <[email protected]>

Merge branch 'debugLogRefactor' of https://github.com/seanimam/FluidF…

0e77a46

…ramework into debugLogRefactor

Updates @types/uuid version, README.md updates, makes tokensUsed read…

68361c5

…only for AiCollabErrorResponse, small new usage of SchemaFactory static members in tests and removes individual timeouts from it() tests in deubgLog.spec.ts

Updates example apps to use new debug log. Refactors debugLog.spec.ts…

c5b1612

… to have one large single test. Updates readme.md debug event names

github-actions bot added the area: examples Changes that focus on our examples label Feb 20, 2025

seanimam added 2 commits February 20, 2025 19:02

Adds skip back to debugLog.spec.ts

d600ce7

running pnpm format on examples/apps/ai-collab

c330f2a

alexvy86 approved these changes Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactored debug log for ai-collab #23565

refactored debug log for ai-collab #23565

seanimam commented Jan 15, 2025 •

edited

Loading

alexvy86 Feb 10, 2025

seanimam Feb 12, 2025

alexvy86 Feb 12, 2025

seanimam Feb 20, 2025

alexvy86 Feb 10, 2025

alexvy86 Feb 10, 2025

seanimam Feb 12, 2025 •

edited

Loading

alexvy86 Feb 12, 2025

seanimam Feb 20, 2025

alexvy86 Feb 10, 2025

seanimam Feb 12, 2025

alexvy86 Feb 12, 2025

seanimam Feb 20, 2025

github-actions bot commented Feb 21, 2025

alexvy86 left a comment

alexvy86 Feb 21, 2025

alexvy86 Feb 21, 2025

alexvy86 Feb 21, 2025


		const OPENAI_API_KEY = ""; // DON'T COMMIT THIS

		describe.skip("Debug Log Works as expected", () => {

	* @remarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it.
	* @privateremarks TODO: We need a better solution here because don't want to expose the internal TreeEdit type here, but we need to be able to type it.

refactored debug log for ai-collab #23565

Are you sure you want to change the base?

refactored debug log for ai-collab #23565

Conversation

seanimam commented Jan 15, 2025 • edited Loading

Debug Events

There are a few primary event flow names:

using Trace Id's

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanimam Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 21, 2025

linkcheck output

alexvy86 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanimam commented Jan 15, 2025 •

edited

Loading

seanimam Feb 12, 2025 •

edited

Loading