Skip to content

Conversation

@jaredsnyder
Copy link
Contributor

The pocket interactions dataset is huge and makes experiments fragile. This replaces it with the much smaller newtab_visits datatset and updates the exposure signal accordingly

@jaredsnyder jaredsnyder requested a review from land-edi November 18, 2025 16:49
@github-actions
Copy link

✅ Jetstream Validation is complete. Check the CI logs for this step for Query SQL and data processing estimates.

Copy link
Collaborator

@mikewilli mikewilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add [ci rerun-skip] to the beginning of the PR title before merging so that analysis is not rerun.

I left a comment on the data source definition about the client ID, but the affected experiment here uses the profile_group_id as its analysis unit (which you can find in the Experimenter API -- this is the default for the past ~6 months on desktop). This means that you don't have to worry about the fact that the new table doesn't have a legacy ID. If this data source were used in other older experiments, you would want to consider this, but since it's only used for this one experiment it's not a problem.

client_id_column = "legacy_telemetry_client_id"
description = "Visits-level table or Newtab Homepage Data"
friendly_name = "Newtab Visits Daily"
client_id_column = "client_id"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"client_id" is the default value for client_id_column (the default values are derived from this enum), so you don't need to specify it if you don't want. If you want to be explicit then feel free to leave it in.

However, you may want to specify that this is to be used as the Glean client ID like so:

glean_client_id_column = "client_id"

This is separate from the client_id_column and was added as a convenience for validating analysis before migrating from legacy to glean identifiers as the default. This parameter is null by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think it can be removed. Pushed a fix

@jaredsnyder jaredsnyder changed the title replace pocket interactions dataset with newtab visits and update exposure_signal [ci rerun-skip] replace pocket interactions dataset with newtab visits and update exposure_signal Nov 18, 2025
Copy link
Contributor

@land-edi land-edi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's :shipit:

@github-actions
Copy link

✅ Jetstream Validation is complete. Check the CI logs for this step for Query SQL and data processing estimates.

@jaredsnyder jaredsnyder added this pull request to the merge queue Nov 18, 2025
Merged via the queue into main with commit d2c5b6f Nov 18, 2025
8 checks passed
@jaredsnyder jaredsnyder deleted the remove_newtab_visits_pocket_interactions branch November 18, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants