Skip to content

feat(telemetry): add device ID logging COMPASS-8443 #2411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 1, 2025
Merged

Conversation

gagik
Copy link
Contributor

@gagik gagik commented Mar 19, 2025

To standardize identification across DevTools and Atlas CLI, this introduces a native-machine-id dependency which uses the same base system calls for determining the device Id as https://github.com/denisbrodbeck/machineid that is used for the device ID by the Atlas CLI.

Both libraries use hashing to protect the machine-specific information.

We could using anonymousId as deviceId as done in the Atlas CLI but the concerns are: a) this transition may break many existing user associations we have and b) if for whatever reason device ID cannot be determined on a given OS, we'd lose anonymousId altogether. Therefore this instead adds a new identity field for Segment.

We're using our own native-machine-id library for this as opposed to node-machine-id as:

  • The node library isn't actively being maintained.
  • Depends on spawning child processes that call OS-specific functions which can be troublesome.
  • In case of nicher OS environments which may not provide the system calls it relies on, its behavior may be unpredictable.

To Do:

  • Verify the hashed device ID is the same as the Atlas Device ID.

@gagik gagik requested a review from addaleax March 19, 2025 13:17
@gagik gagik force-pushed the gagik/add-device-id branch from f83cffc to 09eca29 Compare March 24, 2025 11:41
@gagik gagik force-pushed the gagik/add-device-id branch 4 times, most recently from 0a9e5a0 to 95d93a5 Compare March 25, 2025 09:41
@gagik
Copy link
Contributor Author

gagik commented Mar 25, 2025

@addaleax I ended up sticking with device ID for reasons I mentioned in Alternative Considerations in the PR description, mainly easier adaptability from existing Atlas CLI data.

I also re-organized the logic so now we have 2 parallel buffers for bus events in general as well as telemetry events in particular until the device ID is resolved. Let me know if you have any thoughts about either of that.

try {
this.deviceId ??= await Promise.race([
getDeviceId(),
new Promise<string>((resolve) => {
Copy link
Contributor Author

@gagik gagik Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be necessary I am just not sure if there's an issue with leaving the getDeviceId() "running" while we're flushing the events; does this get killed? My guess is it this doesn't matter but don't want to end up delaying the shell exit because of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. Like you said probably not strictly necessary, but nice to have. Personally I'd leave a comment in this code with this reasoning for our future selves, but consider that a nit.

@@ -1206,11 +1207,14 @@ export class CliRepl implements MongoshIOProvider {
* @param code The user-provided exit code, if any.
*/
async exit(code?: number): Promise<never> {
this.loggingAndTelemetry?.flush();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be doing this generally in any case to make sure all telemetry (including the error afterwards?) get reported even before this change

@gagik gagik force-pushed the gagik/add-device-id branch 2 times, most recently from 26df86b to 3dda787 Compare March 25, 2025 12:44
@gagik gagik requested a review from addaleax March 26, 2025 09:39
@gagik gagik force-pushed the gagik/add-device-id branch from 3dda787 to 7484681 Compare March 26, 2025 10:04
Copy link
Collaborator

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... I don't want to slow down work that I know we want to just get done¹, but I do still feel pretty uneasy about using node-machine-id, the fact that we're spawning multiple extra child processes on each startup of mongosh just feels fairly wrong, from a performance¹ and security perspective, even if we don't see an immediate critical issue with it.

Can we try to align even more closely with the Atlas CLI and use the same approach, which is to essentially perform the lookups in native/compiled code? I know that that comes with a bit of overhead, but it's far from infeasible (we've done that for other smaller things as well, like https://github.com/mongodb-js/glibc-version) and I'm happy to help with getting that off the ground.

¹ Telemetry is still turned off for mongosh, so it's probably not super time-sensitive?
² Startup performance has been a big pain point in the past for us and even though the perf tests in CI seem okay here, this change feels a bit like we'd be pushing it 😞

@gagik
Copy link
Contributor Author

gagik commented Mar 26, 2025

@addaleax Sounds good, I can look into ways we could put this into the larger plan and see what we could come up with. Sorry, might have been meaningful to have more discussion about this earlier, I just figured after talking with Atlas CLI folk that we'd likely end up in some form of machine ID-powered setup anyways unless we'd like to push them to adopt the alternative instead. And potential of having to deal with directory permissions or whatnot seemed like a good enough argument against the shared directory idea.

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

@addaleax
Copy link
Collaborator

But yeah about node-machine-id and performance concerns, that makes a lot of sense. The native lookup does sound worthwhile and hopefully not much effort from our end (honestly surprised this is something that doesn't exist already). I'll follow-up about that and regarding expected timeline.

Yeah, I'm also happy to support this in any way I can, overall I'd expect it to be somewhat straightforward to put together

@gagik gagik marked this pull request as draft April 14, 2025 10:07
@gagik gagik changed the title feat(telemetry): add device ID logging COMPASS-8443 WIP - feat(telemetry): add device ID logging COMPASS-8443 Apr 14, 2025
@gagik gagik force-pushed the gagik/add-device-id branch 7 times, most recently from e839252 to 3c99a11 Compare April 22, 2025 09:20
@gagik gagik changed the title WIP - feat(telemetry): add device ID logging COMPASS-8443 feat(telemetry): add device ID logging COMPASS-8443 Apr 28, 2025
@gagik gagik marked this pull request as ready for review April 28, 2025 08:48
@gagik gagik requested a review from lerouxb April 29, 2025 13:18
// to match it exactly with the denisbrodbeck/machineid library that Atlas CLI uses.
const originalId: string = (
await require('native-machine-id').getMachineId({ raw: true })
)?.toUpperCase();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and when you do update, I'd recommend wrapping this in a try/catch, since native addons will not always work in all environments)

@gagik gagik requested a review from addaleax April 30, 2025 14:28
/**
* @returns A hashed, unique identifier for the running device or `"unknown"` if not known.
*/
export async function getDeviceId({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go inside the class but seems clearer to make it independent of it and avoid tying it to it to potentially even expose it on its own if we have some need for that in the future

@gagik gagik force-pushed the gagik/add-device-id branch 2 times, most recently from f49ea47 to c904f56 Compare May 1, 2025 08:17
@gagik gagik force-pushed the gagik/add-device-id branch from 937e057 to 5a4c8e3 Compare May 1, 2025 10:42
@gagik
Copy link
Contributor Author

gagik commented May 1, 2025

Going ahead as remaining tests are failing on main. regardless.

@gagik gagik merged commit 477df76 into main May 1, 2025
128 of 135 checks passed
@gagik gagik deleted the gagik/add-device-id branch May 1, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants