Skip to content
/ pymc Public

v4 refactor for GP module #5055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Nov 21, 2021
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
984a1d0
np.int -> int, fix np DepricationWarning
bwengals Oct 5, 2021
bd916ae
remove shape arg from non-kron implementations, TODO for is_observed …
bwengals Oct 6, 2021
490311f
np.int -> int in gp/util.py
bwengals Oct 6, 2021
a64975d
force all mean_func, cov_func args to GP constructors to be required …
bwengals Oct 7, 2021
77c4392
fix predictt functions, rename to _predict_at. because theano -> aesara
bwengals Oct 20, 2021
d1e09fe
fix TP tests, force mean_func, cov_func to be req kwarg
bwengals Oct 27, 2021
31150c3
fix TP reparameterization to sample studentt instead of chi2/norm
bwengals Oct 27, 2021
03c2a13
change naming shape->size where appropriate
bwengals Oct 27, 2021
4bcb448
add deprecation warning for is_observed
bwengals Oct 28, 2021
167cb4e
add jitter arg for covs headed for cholesky decomps, previously fixed…
bwengals Oct 28, 2021
e18b8c4
clean up trivial aesara.function usage to .eval() instead
bwengals Oct 28, 2021
212fb1e
fix gp.util.replace_with_values to handle case with no symbolic value…
bwengals Oct 28, 2021
ad4a1ca
jitter=0 for conditonals/predicts, fix replace_with_values calls
bwengals Oct 28, 2021
a4a4f99
fix more tests
bwengals Oct 28, 2021
9b26ca1
black stuff
bwengals Oct 28, 2021
6072565
Merge branch 'main' into gpv4update
bwengals Oct 28, 2021
76b6c16
small fixes to get kronlatent and kronmarginal to pass
bwengals Oct 28, 2021
90ae450
remove leftover prints
bwengals Oct 28, 2021
df1fe57
dep warning -> future warning
bwengals Oct 30, 2021
dceabf8
roll back mkl and mkl-service version
bwengals Nov 3, 2021
4692407
fix precommit
bwengals Nov 3, 2021
cc16bd5
remove old DeprecationWarning
bwengals Nov 3, 2021
062a9ca
merge from main
bwengals Nov 3, 2021
9763b98
improve tests
bwengals Nov 3, 2021
23fc5c7
fix pre-commit issue
bwengals Nov 4, 2021
158529f
fix precommit on cov.py
bwengals Nov 4, 2021
08a7652
fix comment
bwengals Nov 8, 2021
196e702
dont force blas version in windows dev enviornment (roll back changes)
bwengals Nov 8, 2021
63e5d16
Merge branch 'main' into gpv4update
bwengals Nov 8, 2021
d02e320
update release notes
bwengals Nov 9, 2021
8fee214
add removed ... line from release notes
bwengals Nov 9, 2021
d5474a7
add link to PR
bwengals Nov 13, 2021
5d6e94c
Merge branch 'pymc-devs:main' into gpv4update
bwengals Nov 20, 2021
5348575
remove is_observed usage from TestMarginalVsLatent
bwengals Nov 20, 2021
c2b850d
remove is_observed usage from TestMarginalVsMarginalSparse
bwengals Nov 20, 2021
461d26b
Update RELEASE-NOTES.md
twiecki Nov 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion RELEASE-NOTES.md
Original file line number Diff line number Diff line change
@@ -45,9 +45,16 @@ All of the above apply to:
- Changes to the BART implementation:
- A BART variable can be combined with other random variables. The `inv_link` argument has been removed (see [4914](https://github.com/pymc-devs/pymc3/pull/4914)).
- Moved BART to its own module (see [5058](https://github.com/pymc-devs/pymc3/pull/5058)).
- Changes to the Gaussian Process (GP) submodule (see [5055](https://github.com/pymc-devs/pymc/pull/5055)):
- For all implementations, `gp.Latent`, `gp.Marginal` etc., `cov_func` and `mean_func` are required kwargs.
- In Windows test conda environment the `mkl` version is fixed to verison 2020.4, and `mkl-service` is fixed to `2.3.0`. This was required for `gp.MarginalKron` to function properly.
- `gp.MvStudentT` uses rotated samples from `StudentT` directly now, instead of sampling from `pm.Chi2` and then from `pm.Normal`.
- The "jitter" parameter, or the diagonal noise term added to Gram matrices such that the Cholesky is numerically stable, is now exposed to the user instead of hard-coded. See the function `gp.util.stabilize`.
- The `is_observed` arguement for `gp.Marginal*` implementations has been deprecated.
- In the gp.utils file, the `kmeans_inducing_points` function now passes through `kmeans_kwargs` to scipy's k-means function.
- The function `replace_with_values` function has been added to `gp.utils`.
- ...


### Expected breaks
+ New API was already available in `v3`.
+ Old API had deprecation warnings since at least `3.11.0` (2021-01).
3 changes: 2 additions & 1 deletion conda-envs/windows-environment-test-py38.yml
Original file line number Diff line number Diff line change
@@ -12,7 +12,8 @@ dependencies:
- fastprogress>=0.2.0
- h5py>=2.7
- libpython
- mkl-service
- mkl==2020.4
- mkl-service==2.3.0
- m2w64-toolchain
- numpy>=1.15.0
- pandas
6 changes: 4 additions & 2 deletions pymc/gp/cov.py
Original file line number Diff line number Diff line change
@@ -64,7 +64,7 @@ def __init__(self, input_dim, active_dims=None):
if active_dims is None:
self.active_dims = np.arange(input_dim)
else:
self.active_dims = np.asarray(active_dims, np.int)
self.active_dims = np.asarray(active_dims, int)

def __call__(self, X, Xs=None, diag=False):
r"""
@@ -152,7 +152,9 @@ def __array_wrap__(self, result):
elif isinstance(result[0][0], Prod):
return result[0][0].factor_list[0] * A
else:
raise RuntimeError
raise TypeError(
f"Unknown Covariance combination type {result[0][0]}. Known types are `Add` or `Prod`."
)


class Combination(Covariance):
270 changes: 154 additions & 116 deletions pymc/gp/gp.py

Large diffs are not rendered by default.

81 changes: 72 additions & 9 deletions pymc/gp/util.py
Original file line number Diff line number Diff line change
@@ -17,6 +17,7 @@
import aesara.tensor as at
import numpy as np

from aesara.compile import SharedVariable
from aesara.tensor.slinalg import ( # noqa: W0611; pylint: disable=unused-import
cholesky,
solve,
@@ -30,10 +31,66 @@
from aesara.tensor.var import TensorConstant
from scipy.cluster.vq import kmeans

from pymc.aesaraf import compile_rv_inplace, walk_model

def infer_shape(X, n_points=None):
# Avoid circular dependency when importing modelcontext
from pymc.distributions.distribution import NoDistribution

assert NoDistribution # keep both pylint and black happy
from pymc.model import modelcontext

JITTER_DEFAULT = 1e-6


def replace_with_values(vars_needed, replacements=None, model=None):
R"""
Maybe attempt to infer the shape of a Gaussian process input matrix.
Replace random variable nodes in the graph with values given by the replacements dict.
Uses untransformed versions of the inputs, performs some basic input validation.
Parameters
----------
vars_needed: list of TensorVariables
A list of variable outputs
replacements: dict with string keys, numeric values
The variable name and values to be replaced in the model graph.
model: Model
A PyMC model object
"""
model = modelcontext(model)

inputs, input_names = [], []
for rv in walk_model(vars_needed, walk_past_rvs=True):
if rv in model.named_vars.values() and not isinstance(rv, SharedVariable):
inputs.append(rv)
input_names.append(rv.name)

# Then it's deterministic, no inputs are required, can eval and return
if len(inputs) == 0:
return tuple(v.eval() for v in vars_needed)

fn = compile_rv_inplace(
inputs,
vars_needed,
allow_input_downcast=True,
accept_inplace=True,
on_unused_input="ignore",
)

# Remove unneeded inputs
replacements = {name: val for name, val in replacements.items() if name in input_names}
missing = set(input_names) - set(replacements.keys())

# Error if more inputs are needed
if len(missing) > 0:
missing_str = ", ".join(missing)
raise ValueError(f"Values for {missing_str} must be included in `replacements`.")

return fn(**replacements)


def infer_size(X, n_points=None):
R"""
Maybe attempt to infer the size, or N, of a Gaussian process input matrix.
If a specific shape cannot be inferred, for instance if X is symbolic, then an
error is raised.
@@ -48,31 +105,31 @@ def infer_shape(X, n_points=None):
"""
if n_points is None:
try:
n_points = np.int(X.shape[0])
n_points = int(X.shape[0])
except TypeError:
raise TypeError("Cannot infer 'shape', provide as an argument")
return n_points


def stabilize(K, c=1e-6):
def stabilize(K, jitter=JITTER_DEFAULT):
R"""
Adds small diagonal to a covariance matrix.
Often the matrices calculated from covariance functions, `K = cov_func(X)`
do not appear numerically to be positive semi-definite. Adding a small
correction, `c`, to the diagonal is usually enough to fix this.
correction, `jitter`, to the diagonal is usually enough to fix this.
Parameters
----------
K: array-like
A square covariance or kernel matrix.
c: float
jitter: float
A small constant.
"""
return K + c * at.identity_like(K)
return K + jitter * at.identity_like(K)


def kmeans_inducing_points(n_inducing, X):
def kmeans_inducing_points(n_inducing, X, **kmeans_kwargs):
R"""
Use the K-means algorithm to initialize the locations `X` for the inducing
points `fu`.
@@ -83,6 +140,8 @@ def kmeans_inducing_points(n_inducing, X):
The number of inducing points (or k, the number of clusters)
X: array-like
Gaussian process input matrix.
**kmeans_kwargs:
Extra keyword arguments that are passed to `scipy.cluster.vq.kmeans`
"""
# first whiten X
if isinstance(X, TensorConstant):
@@ -100,7 +159,11 @@ def kmeans_inducing_points(n_inducing, X):
# if std of a column is very small (zero), don't normalize that column
scaling[scaling <= 1e-6] = 1.0
Xw = X / scaling
Xu, distortion = kmeans(Xw, n_inducing)

if "k_or_guess" in kmeans_kwargs:
warn.UserWarning("Use `n_inducing` to set the `k_or_guess` parameter instead.")

Xu, distortion = kmeans(Xw, k_or_guess=n_inducing, **kmeans_kwargs)
return Xu * scaling


458 changes: 258 additions & 200 deletions pymc/tests/test_gp.py

Large diffs are not rendered by default.