`Nowcast` models need to do some extra steps to REMOVE the nan values #52

tommylees112 · 2019-07-02T12:29:48Z

In the final month and for the final feature (VHI) the values are ALL nan (in the DataLoader these are coded up as -inf). This is because the feature (VHI) is hidden from the model and so we will have to do some extra preprocessing for this step to avoid errors in fitting the models.

x.shape
(35134, 12, 5)

x[:,-1,-1]
array([-inf, -inf, -inf, ..., -inf, -inf, -inf])

So for the linear model:

# ... scripts/models.py
experiment = 'nowcast'
predictor = LinearRegression(data_path, experiment=experiment)
predictor.train()

# ... src/models/regression.py
self.model.partial_fit(batch_x, batch_y.ravel())

Creates the error:

ValueError                                Traceback (most recent call last)
<ipython-input-23-5bc95e9d1f4d> in <module>
----> 1 regression(experiment='nowcast')

<ipython-input-15-acf066373942> in regression(experiment)
      8
      9     predictor = LinearRegression(data_path, experiment=experiment)
---> 10     predictor.train()
     11     predictor.evaluate(save_preds=True)
     12

~/ml_drought/src/models/regression.py in train(self, num_epochs, early_stopping, batch_size)
     61                     batch_x = batch_x.reshape(batch_x.shape[0],
     62                                               batch_x.shape[1] * batch_x.shape[2])
---> 63                     self.model.partial_fit(batch_x, batch_y.ravel())
     64
     65                     train_pred_y = self.model.predict(batch_x)

~/miniconda3/envs/esowc-drought/lib/python3.7/site-packages/sklearn/linear_model/stochastic_gradient.py in partial_fit(self, X, y, sample_weight)
   1151                                  learning_rate=self.learning_rate, max_iter=1,
   1152                                  sample_weight=sample_weight, coef_init=None,
-> 1153                                  intercept_init=None)
   1154
   1155     def _fit(self, X, y, alpha, C, loss, learning_rate, coef_init=None,

~/miniconda3/envs/esowc-drought/lib/python3.7/site-packages/sklearn/linear_model/stochastic_gradient.py in _partial_fit(self, X, y, alpha, C, loss, learning_rate, max_iter, sample_weight, coef_init, intercept_init)
   1097                      max_iter, sample_weight, coef_init, intercept_init):
   1098         X, y = check_X_y(X, y, "csr", copy=False, order='C', dtype=np.float64,
-> 1099                          accept_large_sparse=False)
   1100         y = y.astype(np.float64, copy=False)
   1101

~/miniconda3/envs/esowc-drought/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    717                     ensure_min_features=ensure_min_features,
    718                     warn_on_dtype=warn_on_dtype,
--> 719                     estimator=estimator)
    720     if multi_output:
    721         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~/miniconda3/envs/esowc-drought/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    540         if force_all_finite:
    541             _assert_all_finite(array,
--> 542                                allow_nan=force_all_finite == 'allow-nan')
    543
    544     if ensure_min_samples > 0:

~/miniconda3/envs/esowc-drought/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
     54                 not allow_nan and not np.isfinite(X).all()):
     55             type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56             raise ValueError(msg_err.format(type_err, X.dtype))
     57     # for object dtype data, we only check for NaNs (GH-13254)
     58     elif X.dtype == np.dtype('object') and not allow_nan:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

The text was updated successfully, but these errors were encountered:

tommylees112 · 2019-07-02T12:41:34Z

Maybe we can have some sort of 'is dimension all nan/inf' and then just drop that coordinate from the axes:

So we would select x[:, :-1, :-1] from our x values if the final dimension of the 2nd and 3rd axis were all inf

tommylees112 · 2019-07-02T12:42:22Z

i'm sure this is a relatively easy one liner in the src/models/data.py file? Maybe i'm wrong. Where else are we checking for nans and dropping them?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Nowcast` models need to do some extra steps to REMOVE the nan values #52

`Nowcast` models need to do some extra steps to REMOVE the nan values #52

tommylees112 commented Jul 2, 2019

tommylees112 commented Jul 2, 2019

tommylees112 commented Jul 2, 2019

Nowcast models need to do some extra steps to REMOVE the nan values #52

Nowcast models need to do some extra steps to REMOVE the nan values #52

Comments

tommylees112 commented Jul 2, 2019

tommylees112 commented Jul 2, 2019

tommylees112 commented Jul 2, 2019

`Nowcast` models need to do some extra steps to REMOVE the nan values #52

`Nowcast` models need to do some extra steps to REMOVE the nan values #52