Gather feedback on possible reasons for the add-on failing to trigger a payload send event #26

motin · 2018-05-02T12:17:28Z

Background:
We are seeing considerably fewer pings ending up in our study's parquet table than expected compared to the Normandy recipe's DAU estimates. Now we are exploring possible sources to this discrepancy.

This issue is created to gather feedback on possible reasons for the add-on failing to trigger a payload send event.

After the add-on is installed by Normandy, the bootstrap.js startup method is the first to fire up.

Possible reasons gathered so far:
a. The add-on yields fatal errors in the lines before the eligibility check, where Pioneer utils is initiated (of such nature that they were not seen during development nor QA)
b. The add-on's double-checking of Normandy's eligibility targets are not in sync with the Normandy recipe used: (normandy.addons[‘[email protected]’].isActive && normandy.version >= 59.0)
c. Normandy installs does not set the installation reason to one of install or upgrade (https://github.com/motin/esper-pioneer-shield-study/blob/1.0.4/addon/bootstrap.js#L62)

@pdehaan, @rhelmer, @rehandalal: can you spot any other potential reasons, stemming from the add-on implementation / pioneer-utils usage etc?

The text was updated successfully, but these errors were encountered:

mlopatka · 2018-05-02T12:41:42Z

FYI: @jtg567 @sunahsuh

rehandalal · 2018-05-02T12:50:14Z

There's nothing that immediately stands out.

mlopatka · 2018-05-02T14:55:11Z

@gregglind @raymak In a discussion with @jtg567, we thought that since we are not getting any endStudy pings at all, either the addon is hard-failing or targetting is not happening correctly. Do either of you have any insight into what may be going wrong wiht the Normandy/targetting portion of this?

mlopatka · 2018-05-02T15:02:20Z

Adding @dzeber to the party!

pdehaan · 2018-05-02T16:35:30Z

I can't explain any whys, but it may be worth adding a few try..catch blocks around any async+await blocks, which may help us figure out where this is happenin'.

Looking at $ git grep "async" addon/, I only see 7 potential candidates:

addon/bootstrap.js: async startup(addonData, reason) {...
addon/lib/Pioneer.jsm: async startup(addonData) {...
addon/lib/StudyTelemetryCollector.jsm: async start() {...
addon/lib/StudyTelemetryCollector.jsm: async telemetry(schemaName, schemaVersion, payload) {...
addon/lib/StudyTelemetryCollector.jsm: async collectAndSendTelemetry() {...
addon/lib/StudyTelemetryCollector.jsm: static async collectPlacesDbBasedAttributes() {...
addon/lib/StudyTelemetryCollector.jsm: static async queryPlacesDbTelemetry() {...

rhelmer · 2018-05-02T22:48:33Z

Hm, one thing I noticed is that https://github.com/motin/esper-pioneer-shield-study/blob/1.0.4/addon/bootstrap.js#L57 is async but there's no await... since this is doing an early return in startup() could it be that the extension gets torn down before Pioneer.utils.endStudy has a chance to run?

rhelmer · 2018-05-02T23:13:41Z

FWIW I took a look at how extension tear-down works: each extension runs in its own Cu.Sandbox and when the extension is shut-down, it is removed from the list of active extensions and the sandbox is eventually GC'd.

Honestly not sure if this confirms my theory or not :) Should be pretty easy to test locally, although if GC is involved then it may not be consistent and should be tested over multiple runs.

motin · 2018-05-03T12:49:11Z

@rhelmer Good catch! Issue reported against pioneer-utils. This could definitely affect the sending of endStudy pings.

motin · 2018-05-03T12:53:40Z

Chatted with Rehan, who also believes that b is definitely not the issue and c seems highly unlikely. The only way to understand a better is to do as @pdehaan suggests - more try/catch. Sending of event pings upon caught errors will allow us to get more insight into thrown errors in the wild (ie not detected during QA/development).

motin · 2018-05-03T13:08:38Z

Action plan for a re-launched esper add-on:

Send an install event directly upon study add-on installation so that we can confirm how many installed add-ons are able to send pioneer event telemetry (EVENTS.INSTALL) (Depends on Add EVENTS.INSTALLED pioneer-utils#41)
Expire the study if no payload was submitted within 30 min (EVENTS.EXPIRED)
End the study after the ordinary payload is submitted (EVENTS.ENDED_POSITIVE)
Await endStudy() (So that endStudy events get sent more reliably) - (depends on Async/await for endStudy and submitEventPing pioneer-utils#40)
Add more try/catch blocks and let caught errors end the study (EVENTS.ENDED_NEGATIVE)

motin · 2018-05-03T13:09:48Z

@mlopatka Looks good enough?

mlopatka · 2018-05-03T13:42:17Z

@motin R+ on the action plan. Lets aim to relaunch (again) in 2 weeks (assuming another pass through QA).

dzeber · 2018-05-03T18:01:51Z

I've been approaching from the opposite direction, looking into Telemetry data for Pioneers who did and didn't install ESPER, and I've got a lead on the Normandy targeting issue.

It looks like ESPER is only getting installed when a profile is on a major release ("59.0") but not when on a dot release ("59.0.3").

Pioneers who were exclusively on dot releases during the launch week did not get ESPER. This accounts for the vast majority of Pioneer profiles, who were on a combination of 59.0.2 and 59.0.3.
ESPER is only installed on profiles which had at least one subsession on a major version (eg. 59.0) during the week (even if they updated to a dot release later on).

Evidence is here, scrolling down to cells 49, 52 and 56.

Another issue is that a majority of ESPER enrollees are on prerelease (~90%), and it got installed while they were on version 60.0 (cells 64, 73).

The majority of Pioneer profiles (~98%) are on release.
However, ESPER appears to have been mainly shipped to the 2% that are on prerelease. This may just be a result of prerelease being more likely to be on a major version number (60.0) than release, and not a separate issue.
Nonetheless, this is not supposed to happen given the eligibility double-checking.

dzeber · 2018-05-03T23:56:39Z

Also, it looks like the number of ESPER pings received in the Pioneer pipeline is very close to the number of profiles reporting the ESPER add-on installed in standard Telemetry.

This suggests that there is not a significant problem with the add-on sending pings beyond the usual failures. That said, I think the additional instrumentation and validation recommended by @motin is still a great idea.

mlopatka · 2018-05-04T06:49:34Z

@jtg567, based on the findings listed above:

It looks like ESPER is only getting installed when a profile is on a major release ("59.0") but not when on a dot release ("59.0.3").

Is there any way to tell if the delivery of ESPER exclusively to majour releases and not dot releases could be a Normandy targetting issue?

jtg567 · 2018-05-04T17:04:31Z

@mythmon

mythmon · 2018-05-04T17:35:33Z

I checked the filter expression for the study. It is this:

(
    normandy.addons['[email protected]'].isActive &&
    normandy.version >= 59.0
)

At first glance, it looks fine. But there is a subtle and very important problem. normandy.version is a string, and it is being compared to a float. This can sometimes be OK. Filter expressions use a lot of JS semantics, including comparison semantics. So in a filter expression "59.0" >= 59.0 will try and coerce the types together, which results in 59.0 >= 59.0, which matches as desired.

Unfortunately these semantics only go so far. On a dot release, normandy.version would have a value like "59.0.2". When comparing "59.0.2" >= 59.0, the left side will again be converted to a number to make the comparison. Unfortunately, in JS "59.0.2".toString() is NaN. All comparisons with NaN result in false, so the filter fails to match.

The fix is to change the filter expression to use normandy.version >= "59.0" (note that the right hand value is now a string). Since both sides are now strings, they will be compared lexicographically, and match will behave as expected.

jtg567 · 2018-05-04T17:50:00Z

I revised the recipe on normandy - not sure why I count more clients with the ESPER addon than @dzeber, though - I'm not yet sure the recipe was the only issue

dzeber · 2018-05-08T16:14:51Z

@jtg567 FWIW it looks like you're counting subsessions (main_summary rows) rather than client IDs.

motin mentioned this issue May 3, 2018

Async/await for endStudy and submitEventPing mozilla/pioneer-utils#40

Closed

motin mentioned this issue May 3, 2018

Add EVENTS.INSTALLED mozilla/pioneer-utils#41

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gather feedback on possible reasons for the add-on failing to trigger a payload send event #26

Gather feedback on possible reasons for the add-on failing to trigger a payload send event #26

motin commented May 2, 2018 •

edited

Loading

mlopatka commented May 2, 2018

rehandalal commented May 2, 2018

mlopatka commented May 2, 2018

mlopatka commented May 2, 2018

pdehaan commented May 2, 2018

rhelmer commented May 2, 2018

rhelmer commented May 2, 2018

motin commented May 3, 2018

motin commented May 3, 2018

motin commented May 3, 2018 •

edited

Loading

motin commented May 3, 2018

mlopatka commented May 3, 2018

dzeber commented May 3, 2018

dzeber commented May 3, 2018

mlopatka commented May 4, 2018

jtg567 commented May 4, 2018

mythmon commented May 4, 2018

jtg567 commented May 4, 2018 •

edited

Loading

dzeber commented May 8, 2018

Gather feedback on possible reasons for the add-on failing to trigger a payload send event #26

Gather feedback on possible reasons for the add-on failing to trigger a payload send event #26

Comments

motin commented May 2, 2018 • edited Loading

mlopatka commented May 2, 2018

rehandalal commented May 2, 2018

mlopatka commented May 2, 2018

mlopatka commented May 2, 2018

pdehaan commented May 2, 2018

rhelmer commented May 2, 2018

rhelmer commented May 2, 2018

motin commented May 3, 2018

motin commented May 3, 2018

motin commented May 3, 2018 • edited Loading

motin commented May 3, 2018

mlopatka commented May 3, 2018

dzeber commented May 3, 2018

dzeber commented May 3, 2018

mlopatka commented May 4, 2018

jtg567 commented May 4, 2018

mythmon commented May 4, 2018

jtg567 commented May 4, 2018 • edited Loading

dzeber commented May 8, 2018

motin commented May 2, 2018 •

edited

Loading

motin commented May 3, 2018 •

edited

Loading

jtg567 commented May 4, 2018 •

edited

Loading