-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease memory foot print of computing FMR metric #251
Comments
from @dapladoc :
|
from me:
|
I don't like this idea. It is very uncontrollable approach. I think there could be very high variance for this approach. Here is a small simulation. I drew The simulation of fnmr@fmr approximation on random subsets.pos_dist = torch.from_numpy(np.random.gamma(shape=1, scale=1, size=1000))
neg_dist = torch.from_numpy(np.random.gamma(shape=6, scale=1, size=1_000_000))
fmr_vals = (0.1, 0.5, 1, 5)
fnmr_true = calc_fnmr_at_fmr(pos_dist, neg_dist, fmr_vals).numpy()
fnmr_approx = []
for i in tqdm(range(10000)):
_neg_dist = neg_dist[torch.rand(neg_dist.shape) > 0.9]
fnmr_approx.append(calc_fnmr_at_fmr(pos_dist, _neg_dist, fmr_vals).tolist())
fnmr_approx = np.array(fnmr_approx)
fig, ax = plt.subplots(len(fnmr_true))
for i in range(len(fnmr_true)):
ax[i].hist(fnmr_approx[:, i], bins=100)
ax[i].axvline(fnmr_true[i], color='r')
ax[i].set_title(f'fmr_val = {fmr_vals[i]}')
print(f'Std(fnmr@fmr({fmr_vals[i]})) = {fnmr_approx[:, i].std():.4e}')
print(f'Bias(fnmr@fmr({fmr_vals[i]})) = {np.mean(np.abs(fnmr_approx[:, i] - fnmr_true[i])):.4e}')
fig.set_size_inches(6.4, 4.8 * 3) And here is its output
Here are histograms of Usually we are interested in |
The graphs above seem okay to me, it's I consider fnmr@fmr only as an additional (auxiliary) metric. So, it's not super important 30% or 32% of positive distances are smaller than 10% of the negative ones. |
If we want to work on this problem systematically, we have to measure the following first (for example, on SOP dataset):
Then it would be clear what and how should be optimized PS. My IMHO is that we can postpone the problem for now (#250 is enough) and focus on the tasks with higher priorities. In the end, it's probably not optimal to optimize fmr metric without (at least) having examples from the verification domain. |
It really depends on the domain. For security related tasks (i.e. biometrics) 99.9% and 99.99% are noticebly different results.
I agree with you. There should be more thorough memory measurements done in order to find possible ways to solve this issue. |
A bit optimizer it along with OML 3.0 release (removed unwilling convertations): |
Initial discussion is in #250
The text was updated successfully, but these errors were encountered: