Trouble saving models #147

mbw314 · 2019-07-23T23:01:00Z

I'm trying to export a model trained on a dataset with the following features:
#users: 700K
#user features: IDs only
#items: 30K
#item features: IDs only
#interactions: 56M

The model uses the default architecture except for WRMB loss, and was trained for only 5 epochs.

I've tried to export the model in 2 ways.

Use TensorRec.save_model. The problem here is that apparently the model is enormous, since I get this error: InvalidArgumentError: Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size (11399300521bytes) would be larger than the limit (2147483647 bytes)
pickle the model directly. The cause of this problem is less clear to me, but a Google search reveals many similar issues. The error is TypeError: can't pickle _thread.RLock objects

Any advice on how to get around these issues would be greatly appreciated!

The text was updated successfully, but these errors were encountered:

mbw314 · 2019-07-25T23:11:12Z

To add a bit more context, finding a way to use TensorRec.save_model would be great, but I'm ultimately interested in obtaining a TensorFlow SavedModel representation of the model to use for scoring (i.e., the equivalent of calling predict).

One option is calling the tf.saved_model.simple_save function. Similar to #142, this requires one to specify the input and output nodes of the graph to be saved. A comment there points to placeholders in TensorRec._build_tf_graph for training hyper-parameters and user/item feature iterators for input, and to the predict function for output (presumably self.tf_prediction).

As mentioned in another comment to #142, the hyper-parameter placeholders seem to be irrelevant inputs for prediction, but I'm not sure how to specify the input nodes for user and item features. E.g., using self.tf_user_feature_iterator for user input fails: AttributeError: 'Iterator' object has no attribute 'dtype'.

Does anyone know how to do this or anything similar?

mbw314 · 2019-08-06T17:21:39Z

A quick follow-up:

The huge model size seems to a result of how TensorFlow handles numpy data; see this link: https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays

I had been using training data in sparse matrix format, which gets converted to numpy arrays here:

tensorrec/tensorrec/input_utils.py

Line 22 in 3e7dfe8

def create_tensorrec_dataset_from_sparse_matrix(sparse_matrix):

One way around this is to save the training data first in TFRecord format. One caveat here is that it is easy to run out of memory during training unless batching is enabled. Since the user_batch_size parameter is only enabled for sparse matrix input, I had to handle batching directly, which was not difficult.

As for obtaining a SavedModel object, the method described in this comment worked fine: #138

mbw314 closed this as completed Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trouble saving models #147

Trouble saving models #147

mbw314 commented Jul 23, 2019

mbw314 commented Jul 25, 2019

mbw314 commented Aug 6, 2019

Trouble saving models #147

Trouble saving models #147

Comments

mbw314 commented Jul 23, 2019

mbw314 commented Jul 25, 2019

mbw314 commented Aug 6, 2019