Skip to content

Commit 47f9598

Browse files
David Cavazosdavidcavazos
David Cavazos
authored andcommitted
Fixed links to source files
1 parent ca79560 commit 47f9598

11 files changed

+124
-45
lines changed

molecules/README.md

Lines changed: 87 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ For small datasets it will be faster to run locally.
148148
python trainer/task.py
149149

150150
# To get the path of the trained model
151-
EXPORT_DIR=/tmp/cloudml-samples/molecules/model/export
151+
EXPORT_DIR=/tmp/cloudml-samples/molecules/model/export/final
152152
MODEL_DIR=$(ls -d -1 $EXPORT_DIR/* | sort -r | head -n 1)
153153
```
154154

@@ -168,11 +168,19 @@ gcloud ml-engine jobs submit training $JOB \
168168
--work-dir $WORK_DIR
169169

170170
# To get the path of the trained model
171-
EXPORT_DIR=$WORK_DIR/model/export
171+
EXPORT_DIR=$WORK_DIR/model/export/final
172172
MODEL_DIR=$(gsutil ls -d $EXPORT_DIR/* | sort -r | head -n 1)
173173
```
174174

175-
## Batch Predictions
175+
To visualize the training job, we can use [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard).
176+
```bash
177+
tensorboard --logdir $WORK_DIR/model
178+
```
179+
180+
You can access the results at `localhost:6006`.
181+
182+
## Predictions
183+
### Option 1: Batch Predictions
176184
Source code: [`predict.py`](predict.py)
177185

178186
Batch predictions are optimized for throughput rather than latency. These work best if there's a large amount of predictions to make and you can wait for all of them to finish before having the results.
@@ -204,7 +212,7 @@ python predict.py \
204212
--outputs-dir $WORK_DIR/predictions
205213
```
206214

207-
## Streaming Predictions
215+
### Option 2: Streaming Predictions
208216
Source code: [`predict.py`](predict.py)
209217

210218
Streaming predictions are optimized for latency rather than throughput. These work best if you are sending sporadic predictions, but want to get the results as soon as possible.
@@ -252,7 +260,7 @@ Now that we have the prediction service running, we want to run a publisher to s
252260
For convenience, we provided a sample [`publisher.py`](publisher.py) and [`subscriber.py`](subscriber.py) to show how to implement one.
253261

254262
These will have to be run as different processes concurrently, so you'll need to have a different terminal running each command.
255-
> NOTE: remember to activate the virtualenv on each terminal.
263+
> NOTE: remember to activate the `virtualenv` on each terminal.
256264
257265
We'll first run the subscriber, which will listen for prediction results and log them.
258266
```bash
@@ -272,3 +280,77 @@ python publisher.py \
272280
```
273281

274282
Once the publisher starts parsing and publishing molecules, we'll start seeing predictions from the subscriber.
283+
284+
### Option 3: Cloud ML Engine Predictions
285+
If you have a different way to extract the features (in this case the atom counts) that is not through our existing preprocessing pipeline for SDF files, it might be easier to build a JSON file with one request per line and make the predictions on Cloud ML Engine.
286+
287+
We've included the [`sample-requests.json`](sample-requests.json) file with an example of how these requests look like. Here are the contents of the file:
288+
```json
289+
{"TotalC": 9, "TotalH": 17, "TotalO": 4, "TotalN": 1}
290+
{"TotalC": 9, "TotalH": 18, "TotalO": 4, "TotalN": 1}
291+
{"TotalC": 7, "TotalH": 8, "TotalO": 4, "TotalN": 0}
292+
{"TotalC": 3, "TotalH": 9, "TotalO": 1, "TotalN": 1}
293+
```
294+
295+
Before creating the model in Cloud ML Engine, it is a good idea to test our model's predictions locally:
296+
```bash
297+
# First we have to get the exported model's directory
298+
EXPORT_DIR=$WORK_DIR/model/export/final
299+
if [[ $EXPORT_DIR == gs://* ]]; then
300+
# If it's a GCS path, use gsutil
301+
MODEL_DIR=$(gsutil ls -d $EXPORT_DIR/* | sort -r | head -n 1)
302+
else
303+
# If it's a local path, use ls
304+
MODEL_DIR=$(ls -d -1 $EXPORT_DIR/* | sort -r | head -n 1)
305+
fi
306+
307+
# To do the local predictions
308+
gcloud ml-engine local predict \
309+
--model-dir $MODEL_DIR \
310+
--json-instances sample-requests.json
311+
```
312+
313+
For reference, these are the *real* energy values for the `sample-requests.json` file:
314+
```
315+
PREDICTIONS
316+
[37.801]
317+
[44.1107]
318+
[19.4085]
319+
[-0.1086]
320+
```
321+
322+
Once we are happy with our results, we can now upload our model into Cloud ML Engine for online predictions.
323+
```bash
324+
# We want the model to reside on GCS and get its path
325+
EXPORT_DIR=$WORK_DIR/model/export/final
326+
if [[ $EXPORT_DIR == gs://* ]]; then
327+
# If it's a GCS path, use gsutil
328+
MODEL_DIR=$(gsutil ls -d $EXPORT_DIR/* | sort -r | head -n 1)
329+
else
330+
# If it's a local path, first upload it to GCS
331+
LOCAL_MODEL_DIR=$(ls -d -1 $EXPORT_DIR/* | sort -r | head -n 1)
332+
MODEL_DIR=$BUCKET/cloudml-samples/molecules/model
333+
gsutil -m cp -r $LOCAL_MODEL_DIR $MODEL_DIR
334+
fi
335+
336+
# Now create the model and a version in Cloud ML Engine and set it as default
337+
MODEL=molecules
338+
REGION=$(gcloud config get-value compute/region)
339+
gcloud ml-engine models create $MODEL \
340+
--regions $REGION
341+
342+
VERSION="${MODEL}_$(date +%Y%m%d_%H%M%S)"
343+
gcloud ml-engine versions create $VERSION \
344+
--model $MODEL \
345+
--origin $MODEL_DIR \
346+
--runtime-version 1.8
347+
348+
gcloud ml-engine versions set-default $VERSION \
349+
--model $MODEL
350+
351+
# Finally, we can request predictions via gcloud ml-engine
352+
gcloud ml-engine predict \
353+
--model $MODEL \
354+
--version $VERSION \
355+
--json-instances sample-requests.json
356+
```

molecules/data-extractor.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@
99
# License for the specific language governing permissions and limitations under
1010
# the License.
1111

12+
# This tool downloads SDF files from an FTP source.
13+
1214
import StringIO
1315
import argparse
1416
import ftplib

molecules/predict.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# License for the specific language governing permissions and limitations under
1212
# the License.
1313

14+
# This tool does either batch or streaming predictions on a trained model.
15+
1416
from __future__ import print_function
1517

1618
import argparse
@@ -93,7 +95,7 @@ def run(model_dir, feature_extraction, sink, beam_options=None):
9395
| 'Feature extraction' >> feature_extraction
9496
| 'Predict' >> beam.ParDo(Predict(model_dir, 'ID'))
9597
| 'Format as JSON' >> beam.Map(lambda result: json.dumps(result))
96-
| 'Write to sink' >> sink)
98+
| 'Write predictions' >> sink)
9799

98100

99101
if __name__ == '__main__':

molecules/preprocess.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# License for the specific language governing permissions and limitations under
1212
# the License.
1313

14+
# This tool preprocesses and extracts features from SDF files.
15+
1416
import argparse
1517
import dill as pickle
1618
import os

molecules/pubchem/pipeline.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def expand(self, p):
157157
# Return the preprocessing pipeline. In this case we're reading the PubChem
158158
# files, but the source could be any Apache Beam source.
159159
return (p
160-
| 'Read from source' >> self.source
160+
| 'Read raw molecules' >> self.source
161161
| 'Format molecule' >> beam.ParDo(FormatMolecule())
162162
| 'Count atoms' >> beam.ParDo(CountAtoms())
163163
)

molecules/publisher.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# License for the specific language governing permissions and limitations under
1212
# the License.
1313

14+
# This is a sample publisher for the streaming predictions service.
15+
1416
import argparse
1517
import os
1618
import sys

molecules/run-cloud

Lines changed: 1 addition & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -83,31 +83,11 @@ run gcloud ml-engine jobs submit training $JOB \
8383
echo ''
8484

8585
# Get the model path
86-
EXPORT_DIR=$WORK_DIR/model/export
86+
EXPORT_DIR=$WORK_DIR/model/export/final
8787
MODEL_DIR=$(gsutil ls -d $EXPORT_DIR/* | sort -r | head -n 1)
8888
echo "Model: $MODEL_DIR"
8989
echo ''
9090

91-
# Create a model in Google Cloud ML Engine if it doesn't exist
92-
MODEL=molecules
93-
if [[ -z $(gcloud ml-engine models list | awk '{print $1}' | grep "^$MODEL$") ]]; then
94-
echo '>> Creating model'
95-
run gcloud ml-engine models create $MODEL
96-
echo ''
97-
fi
98-
99-
# Create a model version
100-
VERSION=$JOB
101-
echo '>> Creating version'
102-
run gcloud ml-engine versions create $VERSION \
103-
--model $MODEL \
104-
--origin $MODEL_DIR \
105-
--runtime-version $RUNTIME
106-
107-
run gcloud ml-engine versions set-default $VERSION \
108-
--model $MODEL
109-
echo ''
110-
11191
# Make batch predictions on SDF files
11292
echo '>> Batch prediction'
11393
run python predict.py \

molecules/run-local

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ run python trainer/task.py \
4949
echo ''
5050

5151
# Get the model path
52-
EXPORT_DIR=$WORK_DIR/model/export
52+
EXPORT_DIR=$WORK_DIR/model/export/final
5353
if [[ $EXPORT_DIR == gs://* ]]; then
5454
MODEL_DIR=$(gsutil ls -d $EXPORT_DIR/* | sort -r | head -n 1)
5555
else

molecules/sample-requests.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{"TotalC": 9, "TotalH": 17, "TotalO": 4, "TotalN": 1}
2+
{"TotalC": 9, "TotalH": 18, "TotalO": 4, "TotalN": 1}
3+
{"TotalC": 7, "TotalH": 8, "TotalO": 4, "TotalN": 0}
4+
{"TotalC": 3, "TotalH": 9, "TotalO": 1, "TotalN": 1}

molecules/subscriber.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# License for the specific language governing permissions and limitations under
1212
# the License.
1313

14+
# This is a sample subscriber for the streaming predictions service.
15+
1416
import argparse
1517
import json
1618
import logging

molecules/trainer/task.py

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
# License for the specific language governing permissions and limitations under
1212
# the License.
1313

14+
# This tool trains an ML model on preprocessed data.
15+
1416
import argparse
1517
import dill as pickle
1618
import multiprocessing as mp
@@ -42,7 +44,7 @@ def decode(elem):
4244
if mode == tf.estimator.ModeKeys.TRAIN:
4345
if shuffle:
4446
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(
45-
batch_size * 8))
47+
batch_size * 8))
4648
else:
4749
dataset = dataset.cache()
4850
dataset = dataset.repeat()
@@ -152,21 +154,22 @@ def train_and_evaluate(
152154
tft_output = tft.TFTransformOutput(work_dir)
153155
feature_spec = tft_output.transformed_feature_spec()
154156

155-
# Train the model
156-
train_input_fn = make_train_input_fn(
157-
feature_spec, labels, train_files_pattern, batch_size)
158-
estimator.train(input_fn=train_input_fn, max_steps=train_max_steps)
159-
160-
# Evaluate the model
161-
eval_input_fn = make_eval_input_fn(
162-
feature_spec, labels, eval_files_pattern, batch_size)
163-
estimator.evaluate(input_fn=eval_input_fn, steps=None)
164-
165-
# Export the model
166-
export_dir = os.path.join(model_dir, 'export')
167-
serving_input_fn = make_serving_input_fn(
168-
tft_output, input_feature_spec, labels)
169-
estimator.export_savedmodel(export_dir, serving_input_fn)
157+
# Create the training and evaluation specifications
158+
train_spec = tf.estimator.TrainSpec(
159+
input_fn=make_train_input_fn(
160+
feature_spec, labels, train_files_pattern, batch_size),
161+
max_steps=train_max_steps)
162+
163+
exporter = tf.estimator.FinalExporter(
164+
'final', make_serving_input_fn(tft_output, input_feature_spec, labels))
165+
166+
eval_spec = tf.estimator.EvalSpec(
167+
input_fn=make_eval_input_fn(
168+
feature_spec, labels, eval_files_pattern, batch_size),
169+
exporters=[exporter])
170+
171+
# Train and evaluate the model
172+
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
170173

171174

172175
if __name__ == '__main__':

0 commit comments

Comments
 (0)