Online estimation of covariance matrix #749
DanielWicz
started this conversation in
Ideas
Replies: 1 comment 5 replies
-
Hello :) We have the logic to compute an online covariance in import collections
import itertools
import numpy as np
from river import stats
np.random.seed(144_000)
X = np.random.random(size=(1000, 5))
cov = collections.defaultdict(stats.Cov)
for batch in np.split(X, 5):
for x in batch:
for i, j in itertools.combinations(range(len(x)), 2):
cov[i, j].update(x[i], x[j])
print(cov)
You can pretty-print this with import pandas as pd
print((
pd.DataFrame(
[
{'i': i, 'j': j, 'cov': cov.get()}
for (i, j), cov in cov.items()
] +
[
{'i': i, 'j': j, 'cov': cov.get()}
for (j, i), cov in cov.items()
]
)
.pivot('i', 'j', 'cov')
))
There's certainly a way to do with numpy with mini-batch formulas, but that's not part of River as of yet. The above will work fine if you're not working with big data. |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The problem - I'm getting data in a batch manner. So in every timestep i
I get data as
Bi={x1,x2...,xm}
,where m is batch size.
Every batch does not cover whole population, but only its part. So the constructed covariance matrix from one batch is highly biased. And the variance from batch to batch is high (but not very high).
The goal is to make an online-batch estimate of the covariance matrix. So that e.g. in every iteration i
I estimate the covariance matrix and in iteration i+1 I update it. And after i
iterations, the algorithm converges to the covariance matrix, that I should represent the unbiased covariance matrix of whole population.
Do you have guys some already-implemented algorithm to do it so ?
Beta Was this translation helpful? Give feedback.
All reactions