Skip to content

Conversation

@acicovic
Copy link
Contributor

@acicovic acicovic commented Nov 5, 2025

Summary

This PR implements telemetry infrastructure to gather and log statistics for multi-user real-time collaborative editing sessions.

Key Changes

New Telemetry Infrastructure:

  • Implements TelemetryLogger base class providing a standard interface for logging telemetry data to multiple destinations (Logger, Pendo)
  • Implements modular telemetry system under src/telemetry/ for extensibility

Session Statistics Tracking:

  • Implements SessionStats class to manage multi-user session data using Yjs shared state
  • Implements session-specific logger (SessionStatsTelemetryLogger) that gathers multi-user session statistics
  • Sessions are only tracked when 2+ users are present, and logged when 1 user remains

Session Activity Management:

  • Implements SessionActivityManager to coordinate session lifecycle events
  • Monitors user joins/leaves and transitions between single/multi-user states
  • Detects document changes and marks locally active users
  • Implements UserInactivityTimer with 30-minute inactivity timeout (no document changes) to detect idle sessions and trigger session logging
  • Uses debouncing and throttling to avoid redundant updates

Statistics Collected:

  • Post ID
  • Count of maximum users during the session
  • Count of active users (performed edits) vs. inactive users (view-only)
  • Session duration and activity timestamps

Integration:

  • Integrates SessionActivityManager initialization into AwarenessManager lifecycle
  • Statistics stored in Yjs document under session-stats origin map
  • Provides utilities for checking user connections and exporting session stats

Technical Details

Data Integrity and Safeguards:

  • Transaction-based updates to prevent race conditions
  • Graceful degradation when telemetry data cannot be prepared
  • Prevents duplicate initialization and handles reconnection edge cases
  • Prevents race conditions when logging stats, with checks on user activity and session state
  • Uses grace periods and validation to avoid duplicate or incomplete logs during quick user connect/disconnect cycles
  • Uses leader election pattern to prevent duplicate logging from multiple clients
  • Elects a fallback leader when the current leader disconnects

Testing

  • Basic functionality tested across the latest version of Brave, Vivaldi, Safari and Firefox.
  • To test this, simulate a multi-user session locally. You will see output in the console when 2+ users join (session initialization) and when only 1 user remains in the session (final session stats).

Summary of all tested scenarios:

  • Single user sessions (no logging occurs)
  • Leader logs when non-leaders leave
  • Non-leader takes over and logs when leader leaves
  • Inactivity timeout triggers logging with correct stats
  • Activity resumption starts new session
  • Multiple inactivity/activity cycles log correctly
  • Multiple tabs per user handled correctly
  • Computer sleep/wake cycles
  • Short disconnects and reconnections
  • Browser back/forward navigation

Future Work

Pendo integration for telemetry logging will be implemented in a follow-up PR. This will involve creating a WordPress REST API endpoint that receives the telemetry data and forwards it to the VIP Telemetry class.

@acicovic acicovic self-assigned this Nov 5, 2025
@acicovic acicovic marked this pull request as ready for review November 5, 2025 19:06
@acicovic acicovic requested a review from a team as a code owner November 5, 2025 19:06
@acicovic acicovic requested review from chriszarate and removed request for a team November 5, 2025 19:06
}
}

if ( ! awareness?.doc ) {
Copy link
Contributor

@ingeniumed ingeniumed Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be checked right away instead?

* activity, responds to user and document changes, handles inactivity, logs
* session data.
*/
export class SessionActivityController {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Feel like calling it a manager is better imo rather than a controller. A controller to me implies rest endpoints that'll be actively called.

Unless there's work planned in a follow up PR to add that in? I assumed the follow up work was to add it on the PHP side.

public static initialize( awareness: Awareness ): void {
if ( SessionActivityController.__instance ) {
if ( SessionActivityController.__instance.awareness?.doc !== awareness.doc ) {
SessionActivityController.destroySingletonInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would an example be of this being called? Awareness Manager is initialized once per client, and then errors subsequently.

This feels like a singleton anti-pattern, if we are re-creating the entire singleton instance when the awareness doc changes? Might be a naming thing, but I wouldn't expect a singleton to be destroyed like this if we are using this pattern unless absolutely necessary?

() => this.handleInactivityTimeout()
);

this.subscribeToUserChanges();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I wonder if the class could be re-done such that:

  • Instantiate it within the constructor of the awareness manager, so their lifecycles match. This means a single clientID at a time would have this class instantiated.
  • Within the existing subscriptions, call in the methods on this class that require those subscriptions. That way, there's one single grouped awareness subscription for the awareness changes to be transmitted to. That was a performance optimization that had been mentioned on some forum posts (don't have the link on hand but I remember running into that as an improvement as more clients join).


try {
const { getActiveUsers } = select( awarenessStore );
const userId = getActiveUsers().get( clientId )?.userInfo?.id ?? null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also look for the isMe property, and that'll do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an isMe validation check to the user lookup. I kept the direct clientID map access for O(1) performance (this runs on every local document change), but now we validate with isMe before proceeding.

* @param logger The Logger instance to use for logging
*/
constructor( private awareness: Awareness, private logger: Logger ) {
this.sessionMap = this.awareness.doc.getMap( SESSION_STATS_ORIGIN );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would check awareness?.doc before accessing the getMap on it out of safety 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awareness.doc is technically guaranteed by the Awareness type definition (it's a required constructor parameter and non-optional property). Although I'm unsure where this could go wrong, I've added a defensive check.

From what I can see, we generally use awareness.doc across the codebase with the only exception being convertRelativePositionToAbsolutePosition() which can be called externally.

/**
* Returns the current post ID.
*/
private getCurrentPostId(): PostId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we wrap an existing selector offered by Gutenberg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper is used in exportStats(), allowing partial success. If getCurrentPostId() fails, we still log the session with postId: null rather than losing all telemetry data. Without it, a selector error would trigger the outer catch and discard the entire export.

* Manages multi-user session statistics using Yjs Awareness for shared state.
*/
export class SessionStats {
private sessionMap: Y.Map< unknown >;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different than how we are using other maps that we are setting on the doc.

Usually, what we do this within a function rather than at the class level:

  • Use observe or observeDeep to monitor for changes, if that's relevant
  • Use transact to make changes to a map on the doc itself
  • Store the doc in memory like in Manager, and access what's needed on that

Could this way have any issues related to state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we could have an issue here. doc.getMap() is idempotent and returns a live reference, not a snapshot. Storing it should be safe and avoids repeated lookups. To put this in more words:

  • doc.getMap('key') returns the same instance every time
  • The map reference remains valid for the lifetime of the doc
  • Changes to the map are reflected immediately in the stored reference
  • The map is the actual shared state object, not a snapshot

Let me know if you think I could be missing something.

*
* @returns Array of unique connected user IDs
*/
export function getConnectedUserIds(): number[] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding methods that you think are useful, on the awareness store itself so they could be re-used elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of this PR, I've tried to touch "outside files" the least amount possible. If any of these methods become useful for other parts of the code, we could definitely move them where it makes more sense. I don't know if this holds true currently.

}

AwarenessManager.__instance = new AwarenessManager( awareness, await getCurrentUserInfo() );
SessionActivityController.initialize( awareness );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an odd place to initialize it imo. I feel like it should be in the awareness manager constructor in some way? This is assuming the other comment I made about changing how SessionActivityController is initialized.

@acicovic
Copy link
Contributor Author

acicovic commented Nov 6, 2025

@ingeniumed, thank you for your review.

I've addressed some of your feedback in 3505352 and left comments in the remaining items. Let's keep the discussion going where needed, and let me know if the changes made look good to you or if you have more feedback.

Thanks!

@chriszarate
Copy link
Member

@acicovic This PR is pretty big!

Statistics stored in Yjs document under session-stats origin map

Could you explain why storing the stats in a Yjs doc is useful / necessary? If it's to aggregate stats among peers, could we instead get these answers using aggregation functions when querying the "snapshot-style" events?

@acicovic
Copy link
Contributor Author

acicovic commented Nov 7, 2025

@acicovic This PR is pretty big!

Yeah, I'm sorry about this.

Could you explain why storing the stats in a Yjs doc is useful / necessary?

If you have any particular concern about this implementation, let me know so I can look into it.

The Yjs doc enables us to log one accurate, fully-aggregated event per session.

  • Shared session state: All clients need to track the same user list (allUserIds, activeUserIds) and timing data (sessionTimeStart, sessionTimeLastActivity) as users join/leave and make edits. Without this shared state, each client would only see its own partial view.
  • Leader election: The sessionInitializerClientId stored in the Yjs doc enables leader election so exactly one client logs the final aggregated stats when the session ends. Otherwise, multiple clients would each log conflicting "session ended" events.
  • Consistent inactivity detection: All clients check the shared sessionTimeLastActivity to detect the 30-minute timeout. Without shared state, clients would have different timeout values and log prematurely.
  • Edge case handling: Clients joining mid-session need to see who was there before them. The shared Yjs doc makes this possible.

Potential issues with snapshot events + aggregation:

  • No way to coordinate which client should log the final event
  • Race conditions from multiple clients sending conflicting session-end events
  • Missing data (users who left before you joined)
  • Reconciliation logic required

@acicovic
Copy link
Contributor Author

acicovic commented Dec 1, 2025

Hi, I've pushed some additional commits to this PR. I'm not planning any additional changes, so this is ready for review.

Comment on lines +233 to +240
// Wait for initial sync before initializing awareness.
await new Promise< void >( resolve => {
provider.once( 'sync', ( isSynced: boolean ) => {
if ( isSynced ) {
resolve();
}
} );
} );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to await this sync event firing?

This feels like a potential bug. What if the sync event doesn't fire?

@chriszarate
Copy link
Member

Potential issues with snapshot events + aggregation:

  • No way to coordinate which client should log the final event
  • Race conditions from multiple clients sending conflicting session-end events
  • Missing data (users who left before you joined)
  • Reconciliation logic required

I don't know what we are trying to measure, but I can't help thinking that this client-side approach is not a good conceptual match for tracking session length. Wouldn't it be much simpler to dispatch these events from the websocket server, which can easily track connected users and "inactivity" based on received messages?

I am very wary of storing statistics in the Yjs doc, since this can only grow and become more complex. In addition, the dependence between the maps risks an infinite loop.

}

if ( this.sessionStatsObserver ) {
const sessionStatsMap = this.awareness.doc.getMap( SESSION_STATS_ORIGIN );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an origin

Comment on lines +292 to +293
// Maps that should not contain any content changes.
const excludedMaps = new Set( [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels likely to lead to future bugs

@acicovic
Copy link
Contributor Author

acicovic commented Dec 2, 2025

Wouldn't it be much simpler to dispatch these events from the websocket server, which can easily track connected users and "inactivity" based on received messages?

I agree that this sounds like a simpler approach. The advantage of the approach proposed in this PR is that since we're hooking into the RTC session itself, it gives us the ability to measure much more, which is something that might come up.

I am very wary of storing statistics in the Yjs doc, since this can only grow and become more complex. In addition, the dependence between the maps risks an infinite loop.

I'll be looking to understand the infinite loop possibility. About the first part, are there any viable alternatives you could suggest? For example, could we have another document? I don't see this working well with REST API endpoints writing the stats to the WP database. Also the stats data we're storing is both very small and simple, though I'm not familiar with the Yjs implications.

@chriszarate
Copy link
Member

it gives us the ability to measure much more, which is something that might come up

I've mentioned the value of working backwards from the questions we need to answer, so that we can define what needs to be measured. Potentially over-engineering to maximize flexibility can lead to unwanted complexity.

About the first part, are there any viable alternatives you could suggest?

I would suggest that telemetry events be shipped by the websocket server, which has access to the clients and document, and controls the lifecycle of the session. I haven't verified this but it should also have access to the user info via the awareness document.

I don't see this working well with REST API endpoints writing the stats to the WP database.

Not sure I understand this.

@acicovic
Copy link
Contributor Author

acicovic commented Dec 2, 2025

I don't see this working well with REST API endpoints writing the stats to the WP database.
Not sure I understand this.

I was alluding to writing the stats into the database instead of the Yjs doc since you raised that concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants