Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing pd.Series to gp.Marginal().marginal_likelihood gives cryptic error #5053

Closed
michaelosthege opened this issue Oct 7, 2021 · 5 comments

Comments

@michaelosthege
Copy link
Member

Description of your problem

The y argument to Latent.marginal_likelihood is just forwarded into self.y.

Later in the workflow, when attempting to make conditional observations this leads to a traceback that does not immediately make it obvious what the problem was.

Converting the y to a numpy array or pm.Data container solves the problem.

Traceback from my debugging
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-169-dfcf8011ab79> in <module>
      1 givens = gp._get_given_vals(None)
----> 2 gp._build_conditional(
      3     X_denses["D0"],
      4     False,
      5     False,

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pymc3\gp\gp.py in _build_conditional(self, Xnew, pred_noise, diag, X, y, noise, cov_total, mean_total)
    472         Kxs = self.cov_func(X, Xnew)
    473         Knx = noise(X)
--> 474         rxx = y - mean_total(X)
    475         L = cholesky(stabilize(Kxx) + Knx)
    476         A = solve_lower(L, Kxs)

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     67         other = item_from_zerodim(other)
     68 
---> 69         return method(self, other)
     70 
     71     return new_method

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\arraylike.py in __sub__(self, other)
     98     @unpack_zerodim_and_defer("__sub__")
     99     def __sub__(self, other):
--> 100         return self._arith_method(other, operator.sub)
    101 
    102     @unpack_zerodim_and_defer("__rsub__")

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\series.py in _arith_method(self, other, op)
   5525             result = ops.arithmetic_op(lvalues, rvalues, op)
   5526 
-> 5527         return self._construct_result(result, name=res_name)
   5528 
   5529 

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\series.py in _construct_result(self, result, name)
   2942         # We do not pass dtype to ensure that the Series constructor
   2943         #  does inference in the case where `result` has object-dtype.
-> 2944         out = self._constructor(result, index=self.index)
   2945         out = out.__finalize__(self)
   2946 

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    421                 pass
    422             else:
--> 423                 data = com.maybe_iterable_to_list(data)
    424 
    425             if index is None:

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\pandas\core\common.py in maybe_iterable_to_list(obj)
    286     """
    287     if isinstance(obj, abc.Iterable) and not isinstance(obj, abc.Sized):
--> 288         return list(obj)
    289     obj = cast(Collection, obj)
    290     return obj

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\aesara\tensor\var.py in __iter__(self)
    637     def __iter__(self):
    638         try:
--> 639             for i in range(aet.basic.get_vector_length(self)):
    640                 yield self[i]
    641         except TypeError:

~\AppData\Local\Continuum\miniconda3\envs\CARenv\lib\site-packages\aesara\tensor\basic.py in get_vector_length(v)
   2832             raise ValueError(f"Length of {v} cannot be determined")
   2833 
-> 2834     raise ValueError(f"Length of {v} cannot be determined")
   2835 
   2836 

ValueError: Length of Elemwise{sub,no_inplace}.0 cannot be determined

Possible solutions

  • Raise an error unless y is a numpy array or tensor
  • Automatically convert it

Versions and main components

  • PyMC3 Version: main
@michaelosthege michaelosthege changed the title Passing pd.Series to gp.Latent().marginal_likelihood gives cryptic error Passing pd.Series to gp.Marginal().marginal_likelihood gives cryptic error Oct 7, 2021
@heaven00
Copy link

heaven00 commented Oct 10, 2021

I would like to take a stab at it.

Possible solutions

Raise an error unless y is a numpy array or tensor
Automatically convert it

If we want the interface and usage experience to be similar to scikit, I would suggest going with Automatically converting it since scikit works quite nicely with pandas as an interface and also just to keep Marginal and MarginalSparse implementation same the change should go on both?

I was thinking of a basic check like

if isinstance(y, pd.Series):
    self.y = y.to_numpy()

pandas reference https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy

P.S. in the description you are still referring to Latent, I assume you mean Marginal given the heading change also Latent not having the y parameter.

The y argument to Latent.marginal_likelihood is just forwarded into self.y.

@michaelosthege
Copy link
Member Author

Hi @heaven00
Yes that sounds good.
Like you said it would be good to check the other classes (and kwargs) too.

@heaven00
Copy link

curious to know why would this check be required on kwargs too? because I wasn't thinking about checking kwargs at all, since they are just passed down directly to other objects like the pm.KroneckerNormal

Also, when you mention other classes how big is the scope of the list of classes?

if its a lot we can add an exception within the conditioned_vars decorator that seems to be doing a getter and setter logic and just append the conditional logic to check the instance type there itself.

  def make_setter(name):
      def setter(self, val):
          if isinstance(val, pd.Series):
               val = val.to_numpy()
          setattr(self, name, val)

code: https://github.com/pymc-devs/pymc/blob/main/pymc/gp/util.py#L126

I did a quick check and the conditioned_vars is used within gp.py alone.

@michaelosthege
Copy link
Member Author

I mean the parameters from the method signatures in general, so also x, Xnew. The GP module is generally lacking type hints in the signatures.
You're right that we don't need to check the kwargs.

Be sure to check out #5055. In order to avoid git conflicts it might be best to do a PR into that branch, or to keep the PR as focused as possible.

@michaelosthege
Copy link
Member Author

I just confirmed that this problem no longer persists on main.

Probably due to #5920.

with pm.Model() as pmodel:
    x = np.arange(10)
    y = np.sin(x)
    gp = pm.gp.Marginal(
        cov_func=pm.gp.cov.ExpQuad(1, ls=pm.HalfNormal("ls"))
    )
    gp.marginal_likelihood(
        "L",
        X=x[:, None],
        y=pd.Series(y),
        noise=1
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants