-
Notifications
You must be signed in to change notification settings - Fork 1
Add collaborative session statistics #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
| } | ||
| } | ||
|
|
||
| if ( ! awareness?.doc ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be checked right away instead?
| * activity, responds to user and document changes, handles inactivity, logs | ||
| * session data. | ||
| */ | ||
| export class SessionActivityController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Feel like calling it a manager is better imo rather than a controller. A controller to me implies rest endpoints that'll be actively called.
Unless there's work planned in a follow up PR to add that in? I assumed the follow up work was to add it on the PHP side.
| public static initialize( awareness: Awareness ): void { | ||
| if ( SessionActivityController.__instance ) { | ||
| if ( SessionActivityController.__instance.awareness?.doc !== awareness.doc ) { | ||
| SessionActivityController.destroySingletonInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would an example be of this being called? Awareness Manager is initialized once per client, and then errors subsequently.
This feels like a singleton anti-pattern, if we are re-creating the entire singleton instance when the awareness doc changes? Might be a naming thing, but I wouldn't expect a singleton to be destroyed like this if we are using this pattern unless absolutely necessary?
| () => this.handleInactivityTimeout() | ||
| ); | ||
|
|
||
| this.subscribeToUserChanges(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, I wonder if the class could be re-done such that:
- Instantiate it within the constructor of the awareness manager, so their lifecycles match. This means a single clientID at a time would have this class instantiated.
- Within the existing subscriptions, call in the methods on this class that require those subscriptions. That way, there's one single grouped awareness subscription for the awareness changes to be transmitted to. That was a performance optimization that had been mentioned on some forum posts (don't have the link on hand but I remember running into that as an improvement as more clients join).
|
|
||
| try { | ||
| const { getActiveUsers } = select( awarenessStore ); | ||
| const userId = getActiveUsers().get( clientId )?.userInfo?.id ?? null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also look for the isMe property, and that'll do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an isMe validation check to the user lookup. I kept the direct clientID map access for O(1) performance (this runs on every local document change), but now we validate with isMe before proceeding.
| * @param logger The Logger instance to use for logging | ||
| */ | ||
| constructor( private awareness: Awareness, private logger: Logger ) { | ||
| this.sessionMap = this.awareness.doc.getMap( SESSION_STATS_ORIGIN ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would check awareness?.doc before accessing the getMap on it out of safety 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awareness.doc is technically guaranteed by the Awareness type definition (it's a required constructor parameter and non-optional property). Although I'm unsure where this could go wrong, I've added a defensive check.
From what I can see, we generally use awareness.doc across the codebase with the only exception being convertRelativePositionToAbsolutePosition() which can be called externally.
| /** | ||
| * Returns the current post ID. | ||
| */ | ||
| private getCurrentPostId(): PostId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we wrap an existing selector offered by Gutenberg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapper is used in exportStats(), allowing partial success. If getCurrentPostId() fails, we still log the session with postId: null rather than losing all telemetry data. Without it, a selector error would trigger the outer catch and discard the entire export.
| * Manages multi-user session statistics using Yjs Awareness for shared state. | ||
| */ | ||
| export class SessionStats { | ||
| private sessionMap: Y.Map< unknown >; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different than how we are using other maps that we are setting on the doc.
Usually, what we do this within a function rather than at the class level:
- Use observe or observeDeep to monitor for changes, if that's relevant
- Use transact to make changes to a map on the doc itself
- Store the doc in memory like in Manager, and access what's needed on that
Could this way have any issues related to state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we could have an issue here. doc.getMap() is idempotent and returns a live reference, not a snapshot. Storing it should be safe and avoids repeated lookups. To put this in more words:
doc.getMap('key')returns the same instance every time- The map reference remains valid for the lifetime of the doc
- Changes to the map are reflected immediately in the stored reference
- The map is the actual shared state object, not a snapshot
Let me know if you think I could be missing something.
| * | ||
| * @returns Array of unique connected user IDs | ||
| */ | ||
| export function getConnectedUserIds(): number[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth adding methods that you think are useful, on the awareness store itself so they could be re-used elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of this PR, I've tried to touch "outside files" the least amount possible. If any of these methods become useful for other parts of the code, we could definitely move them where it makes more sense. I don't know if this holds true currently.
src/awareness-manager.ts
Outdated
| } | ||
|
|
||
| AwarenessManager.__instance = new AwarenessManager( awareness, await getCurrentUserInfo() ); | ||
| SessionActivityController.initialize( awareness ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like an odd place to initialize it imo. I feel like it should be in the awareness manager constructor in some way? This is assuming the other comment I made about changing how SessionActivityController is initialized.
|
@ingeniumed, thank you for your review. I've addressed some of your feedback in 3505352 and left comments in the remaining items. Let's keep the discussion going where needed, and let me know if the changes made look good to you or if you have more feedback. Thanks! |
|
@acicovic This PR is pretty big!
Could you explain why storing the stats in a Yjs doc is useful / necessary? If it's to aggregate stats among peers, could we instead get these answers using aggregation functions when querying the "snapshot-style" events? |
Yeah, I'm sorry about this.
If you have any particular concern about this implementation, let me know so I can look into it. The Yjs doc enables us to log one accurate, fully-aggregated event per session.
Potential issues with snapshot events + aggregation:
|
…ion expiration handling
|
Hi, I've pushed some additional commits to this PR. I'm not planning any additional changes, so this is ready for review. |
| // Wait for initial sync before initializing awareness. | ||
| await new Promise< void >( resolve => { | ||
| provider.once( 'sync', ( isSynced: boolean ) => { | ||
| if ( isSynced ) { | ||
| resolve(); | ||
| } | ||
| } ); | ||
| } ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to await this sync event firing?
This feels like a potential bug. What if the sync event doesn't fire?
I don't know what we are trying to measure, but I can't help thinking that this client-side approach is not a good conceptual match for tracking session length. Wouldn't it be much simpler to dispatch these events from the websocket server, which can easily track connected users and "inactivity" based on received messages? I am very wary of storing statistics in the Yjs doc, since this can only grow and become more complex. In addition, the dependence between the maps risks an infinite loop. |
| } | ||
|
|
||
| if ( this.sessionStatsObserver ) { | ||
| const sessionStatsMap = this.awareness.doc.getMap( SESSION_STATS_ORIGIN ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't an origin
| // Maps that should not contain any content changes. | ||
| const excludedMaps = new Set( [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels likely to lead to future bugs
I agree that this sounds like a simpler approach. The advantage of the approach proposed in this PR is that since we're hooking into the RTC session itself, it gives us the ability to measure much more, which is something that might come up.
I'll be looking to understand the infinite loop possibility. About the first part, are there any viable alternatives you could suggest? For example, could we have another document? I don't see this working well with REST API endpoints writing the stats to the WP database. Also the stats data we're storing is both very small and simple, though I'm not familiar with the Yjs implications. |
I've mentioned the value of working backwards from the questions we need to answer, so that we can define what needs to be measured. Potentially over-engineering to maximize flexibility can lead to unwanted complexity.
I would suggest that telemetry events be shipped by the websocket server, which has access to the clients and document, and controls the lifecycle of the session. I haven't verified this but it should also have access to the user info via the awareness document.
Not sure I understand this. |
I was alluding to writing the stats into the database instead of the Yjs doc since you raised that concern. |
Summary
This PR implements telemetry infrastructure to gather and log statistics for multi-user real-time collaborative editing sessions.
Key Changes
New Telemetry Infrastructure:
TelemetryLoggerbase class providing a standard interface for logging telemetry data to multiple destinations (Logger, Pendo)src/telemetry/for extensibilitySession Statistics Tracking:
SessionStatsclass to manage multi-user session data using Yjs shared stateSessionStatsTelemetryLogger) that gathers multi-user session statisticsSession Activity Management:
SessionActivityManagerto coordinate session lifecycle eventsUserInactivityTimerwith 30-minute inactivity timeout (no document changes) to detect idle sessions and trigger session loggingStatistics Collected:
Integration:
SessionActivityManagerinitialization intoAwarenessManagerlifecyclesession-statsorigin mapTechnical Details
Data Integrity and Safeguards:
Testing
Summary of all tested scenarios:
Future Work
Pendo integration for telemetry logging will be implemented in a follow-up PR. This will involve creating a WordPress REST API endpoint that receives the telemetry data and forwards it to the VIP Telemetry class.