|
1 |
| -## Model Evaluation Design |
| 1 | +# Model Evaluation Design |
2 | 2 |
|
3 | 3 | This document describes the design of model evaluation task for ElasticDL.
|
4 | 4 |
|
5 |
| -### Minimal Viable Product |
| 5 | +## Minimal Viable Product |
6 | 6 |
|
7 |
| -#### Definitions |
| 7 | +### Definitions |
8 | 8 |
|
9 |
| -* `Model evaluation`: Computing metrics to judge the performance of the trained model. |
10 |
| -* `Evaluation worker`: The worker responsible for performing model evaluation task. |
11 |
| -* `Multiprocessing`: Executing tasks in multiple threads in parallel on the same pod. |
| 9 | +- `Model evaluation`: Computing metrics to judge the performance of the trained |
| 10 | +model. |
| 11 | +- `Evaluation worker`: The worker responsible for performing model evaluation |
| 12 | +task. |
| 13 | +- `Multiprocessing`: Executing tasks in multiple threads in parallel on the |
| 14 | +same pod. |
12 | 15 |
|
13 |
| -#### Requirements |
| 16 | +### Requirements |
14 | 17 |
|
15 |
| -* There's only one evaluation worker without multiprocessing. |
16 |
| -* Master pod is responsible for creating the evaluation worker. |
17 |
| -* Evaluation worker is created by master pod together with the workers for training. |
18 |
| -* Evaluation starts after a specified warm-up period and on a given time interval. For example, we need to expose |
19 |
| - the following parameters to users: |
20 |
| - * `start_delay_secs`: Start evaluating after waiting for this many seconds. |
21 |
| - * `throttle_secs`: Do not re-evaluate unless the last evaluation was started at least this many seconds ago. |
22 |
| -* The evaluation worker fetches the latest model from master pod. |
23 |
| -* Model can be evaluated by a specified number of steps or batches of evaluation samples. If `None`, |
| 18 | +- There's only one evaluation worker without multiprocessing. |
| 19 | +- Master pod is responsible for creating the evaluation worker. |
| 20 | +- Evaluation worker is created by master pod together with the workers for |
| 21 | +training. |
| 22 | +- Evaluation starts after a specified warm-up period and on a given time |
| 23 | +interval. For example, we need to expose the following parameters to users: |
| 24 | + - `start_delay_secs`: Start evaluating after waiting for this many seconds. |
| 25 | + - `throttle_secs`: Do not re-evaluate unless the last evaluation was |
| 26 | +started at least this many seconds ago. |
| 27 | +- The evaluation worker fetches the latest model from master pod. |
| 28 | +- Model can be evaluated by a specified number of steps or batches of |
| 29 | +evaluation samples. If `None`, |
24 | 30 | evaluation will continue until reaching the end of input.
|
25 |
| -* Model evaluation metrics can be defined by users together with the model definition. |
26 |
| -* The computed model evaluation metrics can be report back to master through RPC call. |
| 31 | +- Model evaluation metrics can be defined by users together with the model |
| 32 | +definition. |
| 33 | +- The computed model evaluation metrics can be report back to master through |
| 34 | +RPC call. |
27 | 35 |
|
28 |
| -#### Implementation Plan |
| 36 | +### Implementation Plan |
29 | 37 |
|
30 |
| -* Implement `MasterServicer.ReportEvaluationMetrics()` and additional proto definitions such as |
| 38 | +- Implement `MasterServicer.ReportEvaluationMetrics()` and additional proto |
| 39 | +definitions such as |
31 | 40 | `ReportEvaluationMetricsReply` and `ReportEvaluationMetricsRequest`.
|
32 |
| -* Extend `Worker` to support the following: |
33 |
| - * `distributed_evaluate()` that contains the main logic for model evaluation. |
34 |
| - * `report_task_result()` that reports evaluation task result (e.g. task id and error message) back to master through RPC call. |
35 |
| - * `report_evaluation_metrics()` that reports the computed evaluation metrics (e.g. accuracy, precision, recall, etc.) back to master through RPC call. |
36 |
| -* Add main CLI entry-point to `Worker.distributed_evaluate()` that will be used in `WorkerManager`. |
37 |
| -* Extend `WorkerManager` to support the following: |
38 |
| - * Instantiate a separate evaluation task queue from evaluation data directory. |
39 |
| - * Start an evaluation worker from evaluation task queue. |
40 |
| - * Update `master.main()` to support model evaluation task if user requested. |
41 |
| - |
42 |
| -### Future Development |
| 41 | +- Extend `Worker` to support the following: |
| 42 | + - `distributed_evaluate()` that contains the main logic for model |
| 43 | +evaluation. |
| 44 | + - `report_task_result()` that reports evaluation task result (e.g. task id |
| 45 | +and error message) back to master through RPC call. |
| 46 | + - `report_evaluation_metrics()` that reports the computed evaluation |
| 47 | +metrics (e.g. accuracy, precision, recall, etc.) back to master through RPC |
| 48 | +call. |
| 49 | +- Add main CLI entry-point to `Worker.distributed_evaluate()` that will be used |
| 50 | +in `WorkerManager`. |
| 51 | +- Extend `WorkerManager` to support the following: |
| 52 | + - Instantiate a separate evaluation task queue from evaluation data |
| 53 | +directory. |
| 54 | + - Start an evaluation worker from evaluation task queue. |
| 55 | + - Update `master.main()` to support model evaluation task if user requested. |
| 56 | + |
| 57 | +## Future Development |
43 | 58 |
|
44 | 59 | A list of potential features we may want for model evaluation in the future:
|
45 | 60 |
|
46 |
| -* `num_parallel_processes`: The number of children processes to run evaluation on each individual evaluation worker. |
47 |
| -* `sample_weights`: Optional Numpy array of weights for the test samples, used for weighting the loss function. |
| 61 | +- `num_parallel_processes`: The number of children processes to run evaluation |
| 62 | +on each individual evaluation worker. |
| 63 | +- `sample_weights`: Optional Numpy array of weights for the test samples, used |
| 64 | +for weighting the loss function. |
48 | 65 |
|
49 |
| -### References |
| 66 | +## References |
50 | 67 |
|
51 | 68 | Some of the ideas are borrowed from existing solutions listed below:
|
52 | 69 |
|
53 |
| -* [`tf.keras.models.Model.evaluate()`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#evaluate) |
54 |
| -* [`tf.keras.metrics`](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) |
55 |
| -* [`tf.estimator.EvalSpec`](https://www.tensorflow.org/api_docs/python/tf/estimator/EvalSpec) |
56 |
| -* [`tf.estimator.Estimator.evaluate()`](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#evaluate) |
57 |
| -* [`tf.estimator.train_and_evaluate()`](https://www.tensorflow.org/api_docs/python/tf/estimator/train_and_evaluate) |
| 70 | +- [`tf.keras.models.Model.evaluate()`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#evaluate) |
| 71 | +- [`tf.keras.metrics`](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) |
| 72 | +- [`tf.estimator.EvalSpec`](https://www.tensorflow.org/api_docs/python/tf/estimator/EvalSpec) |
| 73 | +- [`tf.estimator.Estimator.evaluate()`](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#evaluate) |
| 74 | +- [`tf.estimator.train_and_evaluate()`](https://www.tensorflow.org/api_docs/python/tf/estimator/train_and_evaluate) |
0 commit comments