A surprising difference between q0=0 and q0=0.01 #52

fasiha · 2021-10-26T05:09:53Z

@kirianguiller sent the following surprising snippet:

hl = 1.0
tnow = 100.0
new = ebisu.updateRecall((3.0, 3.0, hl), 1, 1, tnow, q0=0)
newQ0 = ebisu.updateRecall((3.0, 3.0, hl), 1, 1, tnow, q0=1e-2)
print(new)
print(newQ0)

# prints out
# (3.0874602456940186, 3.087460245694014, 27.03029484646187)
# (2.8931573238863244, 2.893157323886327, 1.008102483044954)

That is, using the noisy quiz model and a successful quiz 100 halflives after last seen,

q0=0 ➜ expected behavior: the halflife jumps from 1 to 27
q0=1e-2 ➜ halflife barely changes?

Pourquoi?

Is this a numerical problem?

No I don't think so. I double-checked with Stan and I it agrees with the above.

Click here to see Stan and Python code

with the following Stan model file:

// ebisu.stan
data {
  real<lower=0> t0;
  real<lower=0> alpha;
  real<lower=0> beta;
  int<lower=0, upper=1> z;
  real<lower=0, upper=1> q1;
  real<lower=0, upper=1> q0;
  real<lower=0> t;
  real<lower=0> t2;
}
parameters {
  real<lower=0, upper=1> p0;

  // We WANT this:
  // `int<lower=0, upper=1> x;`
  // But we can't have it: https://mc-stan.org/docs/2_28/stan-users-guide/change-point.html
  // So we marginalize over x.
}
transformed parameters {
  real<lower=0, upper=1> p = pow(p0, t / t0); // Precall at t
  real<lower=0, upper=1> p2 = pow(p, t2 / t); // Precall at t2
}
model {
  p0 ~ beta(alpha, beta); // Precall at t0

  // Again, we WANT the following:
  // `x ~ bernoulli(p);`
  // `z ~ bernoulli(x ? q1 : q0);`
  // But we can't so we had to marginalize:
  target += log_mix(p, bernoulli_lpmf(z | q1), bernoulli_lpmf(z | q0));
  // log_mix is VERY handy: https://mc-stan.org/docs/2_28/functions-reference/composed-functions.html
}

which is the Ebisu model except, we have to marginalize x the "true" Bernoulli quiz out because Stan, while very awesome, simply can't handle discrete parameters 😭. Thankfully the marginalization is quite straightforward:

P(z | p) = sum([P(z, x | p) for x in [0, 1]) 
         = P(z | x=1) * P(x=1 | p) + P(z | x=0) * P(x=0 | p)
         = Bernoulli(z; q1) * p + Bernoulli(z; q0) * (1-p)

With this model, we can double-check the analytical results we got from Ebisu:

import numpy as np
import pandas as pd  # type:ignore
from cmdstanpy import CmdStanModel  # type:ignore
import json

fits = []
for q0, t2 in zip([0.0, 0.01], [model[2] for model in [new, newQ0]]):
  data = dict(t0=1.0, alpha=3.0, beta=3.0, z=1, q1=1.0, q0=q0, t=100.0, t2=t2)
  with open('ebisu_data.json', 'w') as fid:
    json.dump(data, fid)
  model = CmdStanModel(stan_file="ebisu.stan")
  fit = model.sample(
      data='ebisu_data.json',
      chains=2,
      iter_warmup=10_000,
      iter_sampling=100_000,
  )
  fits.append(fit)
  print(fit.diagnose())

fitdfs = [
    pd.DataFrame({
        k: v.ravel()
        for k, v in fit.stan_variables().items()
        if 1 == len([s
                     for s in v.shape if s > 1])
    })
    for fit in fits
]


def _meanVarToBeta(mean, var) -> tuple[float, float]:
  """Fit a Beta distribution to a mean and variance."""
  # [betaFit] https://en.wikipedia.org/w/index.php?title=Beta_distribution&oldid=774237683#Two_unknown_parameters
  tmp = mean * (1 - mean) / var - 1
  alpha = mean * tmp
  beta = (1 - mean) * tmp
  return alpha, beta


alphabetas = [_meanVarToBeta(np.mean(fitdf.p2), np.var(fitdf.p2)) for fitdf in fitdfs]
print(alphabetas)
# prints [(3.083029695444059, 3.085366775525092), (2.8794053385199345, 2.8665345558604955)]

Comparing

q0=0:
- Ebisu: new (alpha, beta) = 3.0874602456940186, 3.087460245694014
- Stan: 3.083029695444059, 3.085366775525092
q0=1e-2:
- Ebisu: 2.8931573238863244, 2.893157323886327
- Stan: 2.8794053385199345, 2.8665345558604955

This is close enough that I have confidence in Ebisu. It's possible Stan is underflowing or overflowing or somehow losing precision but it's unlikely to be losing precision in the same way as Ebisu, which computes the posterior using an entirely different approach.

What's happening?

Checking the behavior of the updated model's halflife as we vary tnow (quiz time), using Ebisu:

Click here for Python source code

import numpy as np
import pylab as plt

plt.ion()
plt.style.use('ggplot')

tnows = np.logspace(0, 2)  # 1.0 to 100
q0ToNewHalflife = lambda q0: [
    ebisu.modelToPercentileDecay(ebisu.updateRecall((3.0, 3.0, hl), 1, 1, tnow, q0=q0))
    for tnow in tnows
]

plt.figure()
plt.plot(tnows, q0ToNewHalflife(1e-2), label='q0=1e-2')
plt.plot(tnows, q0ToNewHalflife(1e-3), linestyle='--', label='q0=1e-3')
plt.xlabel('tnow')
plt.ylabel('halflife after update')
plt.title('Behavior of update for q0')
axis = plt.axis()
plt.plot(tnows, q0ToNewHalflife(0), linestyle=':', label='q0=0')
plt.axis(axis)
plt.legend()
plt.savefig('q01.png', dpi=300)
plt.savefig('q01.svg')

For low tnow>1, the q0=0 and q0=1e-2 and q0=1e-3 curves are all very similar, but they begin to deviate: while the q0=0 case keeps rising linearly, the q0!=0 peak and drop asymptotically to 1.0.

Hypothesis This happens because, at tnow much higher than initial halflife, we have so much belief that a quiz will fail that any doubt about the true quiz result is magnified so we get a non-update.

As we show in the plot above, by modifying q0=1e-3 instead 1e-2, we can delay the peak in updated halflife to greater tnow. For some applications, this may be sufficient.

Nonetheless, this does point to a surprising behavior of the algorithm, and unfortunately means we might have to think hard about our choice of parameters for q0.

The text was updated successfully, but these errors were encountered:

fasiha mentioned this issue Jan 28, 2023

Request for comment: Ebisu v3 API #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A surprising difference between q0=0 and q0=0.01 #52

A surprising difference between q0=0 and q0=0.01 #52

fasiha commented Oct 26, 2021 •

edited

Loading

A surprising difference between q0=0 and q0=0.01 #52

A surprising difference between q0=0 and q0=0.01 #52

Comments

fasiha commented Oct 26, 2021 • edited Loading

Is this a numerical problem?

What's happening?

fasiha commented Oct 26, 2021 •

edited

Loading