Skip to content

Commit b0a7089

Browse files
authored
Merge branch 'master' into dcgan_fashiongen_example
2 parents 676d550 + 65cd16b commit b0a7089

File tree

176 files changed

+5273
-3005
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

176 files changed

+5273
-3005
lines changed

benchmarks/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -273,16 +273,16 @@ python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar
273273
python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar --content_type application/png --config_properties config.properties --inference_model_url explanations/benchmark --input ../examples/image_classifier/mnist/test_data/0.png
274274
```
275275

276-
* KUBEFLOW SERVING PREDICTIONS
276+
* KSERVE SERVING PREDICTIONS
277277

278278
```
279-
python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar --content_type application/json --config_properties config_kf.properties --inference_model_url v1/models/benchmark:predict --input ../kubernetes/kfserving/kf_request_json/mnist.json
279+
python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar --content_type application/json --config_properties config_kf.properties --inference_model_url v1/models/benchmark:predict --input ../kubernetes/kserve/kf_request_json/mnist.json
280280
```
281281

282-
* KUBEFLOW SERVING EXPLANATIONS
282+
- KSERVE SERVING EXPLANATIONS
283283

284284
```
285-
python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar --content_type application/json --config_properties config_kf.properties --inference_model_url v1/models/benchmark:explain --input ../kubernetes/kfserving/kf_request_json/mnist.json
285+
python benchmark-ab.py --url https://torchserve.pytorch.org/mar_files/mnist.mar --content_type application/json --config_properties config_kf.properties --inference_model_url v1/models/benchmark:explain --input ../kubernetes/kserve/kf_request_json/mnist.json
286286
```
287287

288288
* TORCHSERVE SERVING PREDICTIONS WITH DOCKER

test/benchmark/README.md renamed to benchmarks/automated/README.md

Lines changed: 38 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -10,17 +10,17 @@ Check out a sample vgg11 model config at the path: `tests/suite/vgg11.yaml`
1010
-- [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess) <br>
1111
-- [AmazonEC2FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2FullAccess) <br>
1212
-- [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) <br>
13-
-- [AmazonIAMFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonIAMFullAccess)
13+
-- [IAMFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/IAMFullAccess)
1414
<br (or at the least iam:passrole).
1515

1616
* [Create](https://docs.aws.amazon.com/cli/latest/reference/ecr/create-repository.html) an ECR repository with the name “torchserve-benchmark” in the us-west-2 region, e.g.
1717
```
1818
aws ecr create-repository --repository-name torchserve-benchmark --region us-west-2
1919
```
20-
If you'd like to use your own repo, edit the __init__.py under `serve/test/benchmark/tests/utils`
20+
If you'd like to use your own repo, edit the `config.yaml` file at `serve/benchmarks/automated/tests/suite/benchmark/config.yaml`
2121
* Ensure you have [docker](https://docs.docker.com/get-docker/) client set-up on your system - osx/ec2
22-
* Adjust the following global variables to your preference in the file `serve/test/benchmark/tests/utils/__init__.py` <br>
23-
-- IAM_INSTANCE_PROFILE :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br>
22+
* Adjust the following global variables to your preference in the file `serve/benchmarks/automated/tests/suite/benchmark/config.yaml` <br>
23+
-- iam_instance_profile :this role is attached to all ec2 instances created as part of the benchmarking process. Create this as described [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#create-iam-role). Default role name is 'EC2Admin'.<br>
2424
Use the following commands to create a new role if you don't have one you can use.
2525
1. Create the trust policy file `ec2-admin-trust-policy.json` and add the following content:
2626
```
@@ -50,16 +50,12 @@ aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAcc
5050
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --role-name EC2Admin
5151
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess --role-name EC2Admin
5252
```
53-
-- S3_BUCKET_BENCHMARK_ARTIFACTS :all temporary benchmarking artifacts including server logs will be stored in this bucket: <br>
53+
-- s3_bucket_benchmark_artifacts :all temporary benchmarking artifacts including server logs will be stored in this bucket. Note that this bucket must be in the same account, or the credentials being used should have read and write access to the bucket. <br>
5454
Use the following command to create a new S3 bucket if you don't have one you can use.
5555
```
56-
aws s3api create-bucket --bucket <torchserve-benchmark> --region us-west-2
57-
```
58-
-- DEFAULT_DOCKER_DEV_ECR_REPO :docker image used for benchmarking will be pushed to this repo <br>
59-
Use the following command to create a new ECR repo if you don't have one you can use.
60-
```
61-
aws ecr create-repository --bucket torchserve-benchmark --region us-west-2
56+
aws s3api create-bucket --bucket <torchserve-benchmark> --region us-west-2 --create-bucket-configuration LocationConstraint=us-west-2
6257
```
58+
-- default_docker_dev_ecr_repo :docker image used for benchmarking will be pushed to this repo <br>
6359
* If you're running this setup on an EC2 instance, please ensure that the instance's security group settings 'allow' inbound ssh port 22. Refer [docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-rules.html).
6460

6561
*The following steps assume that the current working directory is serve/.*
@@ -80,20 +76,23 @@ pip install -r test/benchmark/requirements.txt
8076
```
8177
aws sts get-caller-identity
8278
```
83-
4. For each of the test files under `test/benchmark/tests/`, e.g., test_vgg11.py, set the list of instance types you want to test on:
79+
4. The automation scripts uses the ts-config from the following location: `benchmarks/config.properties`. Make changes to this file in the current local folder to use this across all the runs.
80+
5. The simplest way to run a benchmark is to spin-up the ec2 instance type of your choice (must be a DLAMI), and run the benchmark with `--local-execution`, this will run through the models located in `benchmarks/automated/tests/suite/`, and execute benchmarks against these on the current instance.
81+
Start the benchmark run as follows (run this a pseudo shell such as tmux or screen, as this is a long-running script):
8482
```
85-
INSTANCE_TYPES_TO_TEST = ["p3.8xlarge"]
83+
python benchmarks/automated/run_benchmark.py --local-execution
8684
```
87-
5. The automation scripts uses the ts-config from the following location: `benchmarks/config.properties`. Make changes to this file in the current local folder to use this across all the runs.
88-
6. Finally, start the benchmark run as follows (run this a pseudo shell such as tmux or screen, as this is a long-running script):
85+
7. Another method is to execute the above command from your desktop terminal, **without** the argument `local-exeuction`. This will cause the instance types mentioned in the `<model>.yaml` files to be spun up. For each of the model config files under `benchmarks/automated/tests/suite/`, e.g., vgg11.yaml, set the list of instance types you want to test on:
8986
```
90-
python test/benchmark/run_benchmark.py
87+
instance_types:
88+
- c4.4xlarge
89+
- p3.8xlarge"
9190
```
92-
7. To start test for a particual model, modify the `pytest_args` list in run_benchmark.py to include `["-k", "vgg11"]`, if that particular model is vgg11
93-
8. For generating benchmarking report, modify the argument to function `generate_comprehensive_report()` to point to the s3 bucket uri for the benchmark run. Run the script as:
91+
Start the benchmark run as follows:
9492
```
95-
python report.py
93+
python benchmarks/automated/run_benchmark.py --local-execution
9694
```
95+
9796
The final benchmark report will be available in markdown format as `report.md` in the `serve/` folder.
9897

9998
**Example report for vgg11 model**
@@ -143,36 +142,39 @@ The final benchmark report will be available in markdown format as `report.md` i
143142

144143

145144
## Features of the automation:
146-
1. To save time by *not* creating new instances for every benchmark run for local testing, use the '--do-not-terminate' flag. This will automatically create a file called 'instances.yaml' and write instance-related data into the file so that it may be re-used next time.
147-
```
148-
python test/benchmark/run_benchmark.py --do-not-terminate
149-
```
150-
151-
2. To re-use an instance already recorded in `instances.yaml`, use the '--use-instances' flag:
152-
```
153-
python test/benchmark/run_benchmark.py --use-instances <full_path_to>/instances.yaml --do-no-terminate
154-
```
155-
`Note: Use --do-not-termninate flag to keep re-using the instances, else, it will be terminated`.
156145

157146
3. To run a test containing a specific string, use the `--run-only` flag. Note that the argument is 'string matched' i.e. if the test-name contains the supplied argument as a substring, the test will run.
158147
```
159148
# To run mnist test
160-
python test/benchmark/run_benchmark.py --run-only mnist
149+
python benchmarks/automated/run_benchmark.py --run-only mnist
161150
162151
# To run fastrcnn test
163-
python test/benchmark/run_benchmark.py --run-only fastrcnn
152+
python benchmarks/automated/run_benchmark.py --run-only fastrcnn
164153
165-
# To run bert_neuron and bert
166-
python test/benchmark/run_benchmark.py --run-only bert
154+
# To run bert_neuron and bert_cpu
155+
python benchmarks/automated/run_benchmark.py --run-only bert_cpu
167156
168157
# To run vgg11 test
169-
python test/benchmark/run_benchmark.py --run-only vgg11
158+
python benchmarks/automated/run_benchmark.py --run-only vgg11
170159
171160
# To run vgg16 test
172-
python test/benchmark/run_benchmark.py --run-only vgg16
161+
python benchmarks/automated/run_benchmark.py --run-only vgg16
162+
163+
# To run multiple:
164+
python benchmarks/automated/run_benchmark.py --run-only vgg11 vgg16 bert_cpu
173165
```
174166

175167
4. You can benchmark a specifc branch of the torchserve github repo by specifying the flag `--use-torchserve-branch` e.g.,
176168
```
177-
python test/benchmark/run_benchmark.py --use-torchserve-branch issue_1115
169+
python benchmarks/automated/run_benchmark.py --use-torchserve-branch issue_1115
170+
```
171+
172+
5. Once the docker image is built, you may choose to not have it re-built by passing the argument `--skip-docker-build` e.g.,
173+
```
174+
python benchmarks/automated/run_benchmark.py --skip-docker-build
175+
```
176+
177+
6. If you do not wish to benchmark on different instance types specified in the model config `*.yaml`, you may pass the argument `--local-execution`. In this case, the instance types specified in the model config `*.yaml` file are *ignored*. Also, in this case, all the model benchmarks will be performed *sequentially* in order to
178+
```
179+
python benchmarks/automated/run_benchmarks.py --local-execution
178180
```

benchmarks/automated/run_benchmark.py

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
import argparse
2+
import os
3+
import random
4+
import sys
5+
import logging
6+
import re
7+
import uuid
8+
9+
10+
import boto3
11+
import pytest
12+
13+
from invoke import run
14+
from invoke.context import Context
15+
16+
17+
from tests.utils.report import Report
18+
from tests.utils import (
19+
S3_BUCKET_BENCHMARK_ARTIFACTS,
20+
DEFAULT_REGION,
21+
DEFAULT_DOCKER_DEV_ECR_REPO,
22+
YamlHandler,
23+
DockerImageHandler,
24+
)
25+
26+
LOGGER = logging.getLogger(__name__)
27+
LOGGER.setLevel(logging.DEBUG)
28+
LOGGER.addHandler(logging.StreamHandler(sys.stdout))
29+
30+
31+
def build_docker_container(torchserve_branch="master", push_image=True, use_local_serve_folder=False):
32+
LOGGER.info(f"Setting up docker image to be used")
33+
34+
docker_dev_image_config_path = os.path.join(
35+
os.getcwd(), "benchmarks", "automated", "tests", "suite", "docker", "docker.yaml"
36+
)
37+
38+
39+
if use_local_serve_folder:
40+
LOGGER.info(f"*** Using the local 'serve' folder closure when creating the container image.")
41+
42+
43+
local_serve_folder = os.getcwd()
44+
tmp_local_serve_folder = os.path.join("/tmp", "serve")
45+
serve_folder_in_docker_context = os.path.join(os.getcwd(), "docker", "serve")
46+
47+
run(f"mkdir -p {tmp_local_serve_folder}")
48+
run(f"mkdir -p {serve_folder_in_docker_context}")
49+
50+
run(f"rsync -av --progress {local_serve_folder}/ {tmp_local_serve_folder}/")
51+
run(f"rsync -av --progress {tmp_local_serve_folder}/ {serve_folder_in_docker_context}/")
52+
53+
run(f"rm -rf {tmp_local_serve_folder}")
54+
55+
docker_config = YamlHandler.load_yaml(docker_dev_image_config_path)
56+
YamlHandler.validate_docker_yaml(docker_config)
57+
58+
account_id = run("aws sts get-caller-identity --query Account --output text").stdout.strip()
59+
60+
for processor, config in docker_config.items():
61+
docker_tag = None
62+
cuda_version = None
63+
dockerhub_image = None
64+
for config_key, config_value in config.items():
65+
if processor == "gpu" and config_key == "cuda_version":
66+
cuda_version = config_value
67+
if config_key == "docker_tag":
68+
docker_tag = config_value
69+
70+
if config_key == "dockerhub_image":
71+
dockerhub_image = config_value
72+
73+
dockerImageHandler = DockerImageHandler(docker_tag, cuda_version, torchserve_branch)
74+
75+
if not dockerhub_image:
76+
dockerImageHandler.build_image(use_local_serve_folder=use_local_serve_folder)
77+
else:
78+
# Image is pulled by process_docker_config in __init__.py
79+
LOGGER.info(f"*** Note: dockerhub_image specified in docker.yaml. This container image will be used for benchmark.")
80+
dockerImageHandler.pull_docker_image(dockerhub_image, docker_tag=docker_tag)
81+
82+
if push_image:
83+
dockerImageHandler.push_docker_image_to_ecr(
84+
account_id, DEFAULT_REGION, f"{DEFAULT_DOCKER_DEV_ECR_REPO}:{docker_tag}"
85+
)
86+
else:
87+
LOGGER.warn(f"Docker image will not be pushed to ECR repo in local execution.")
88+
89+
90+
def main():
91+
92+
parser = argparse.ArgumentParser()
93+
94+
parser.add_argument(
95+
"--use-instances",
96+
action="store",
97+
help="Supply a .yaml file with test_name, instance_id, and key_filename to re-use already-running instances",
98+
)
99+
parser.add_argument(
100+
"--do-not-terminate",
101+
action="store_true",
102+
default=False,
103+
help="Use with caution: does not terminate instances, instead saves the list to a file in order to re-use",
104+
)
105+
106+
parser.add_argument(
107+
"--run-only", nargs="+", default=None, help="Runs the tests that contain the supplied keyword as a substring"
108+
)
109+
110+
parser.add_argument(
111+
"--use-torchserve-branch",
112+
default="master",
113+
help="Specify a specific torchserve branch to build a container to benchmark on, else uses 'master' by default",
114+
)
115+
116+
parser.add_argument(
117+
"--use-local-serve-folder",
118+
action="store_true",
119+
default=False,
120+
help="Specify this option if you'd like to build a container image out of your current 'serve' folder."
121+
)
122+
123+
parser.add_argument(
124+
"--skip-docker-build",
125+
action="store_true",
126+
default=False,
127+
help="Use if you already have a docker image built and available locally and have specified it in docker.yaml",
128+
)
129+
130+
parser.add_argument(
131+
"--local-execution",
132+
action="store_true",
133+
default=False,
134+
help="Specify when you want to execute benchmarks on the current instance. Note: this will execute the model benchmarks sequentially, and will ignore instances specified in the model config *.yaml files.",
135+
)
136+
137+
parser.add_argument(
138+
"--local-instance-type",
139+
default=None,
140+
help="Specify the current ec2 instance on which the benchmark executes. May not specify any other value than a valid ec2 instance type."
141+
)
142+
143+
144+
arguments = parser.parse_args()
145+
146+
if arguments.local_instance_type and not arguments.local_execution:
147+
LOGGER.error(f"--local-instance-type may only be used with --local-execution")
148+
sys.exit(1)
149+
150+
if arguments.local_execution and not arguments.local_instance_type:
151+
LOGGER.error(f"--local-instance-type must be specified when using --local-execution")
152+
sys.exit(1)
153+
154+
do_not_terminate_string = "" if not arguments.do_not_terminate else "--do-not-terminate"
155+
local_execution_string = "" if not arguments.local_execution else "--local-execution"
156+
use_instances_arg_list = ["--use-instances", f"{arguments.use_instances}"] if arguments.use_instances else []
157+
run_only_test = arguments.run_only
158+
159+
if run_only_test:
160+
LOGGER.info(f"run_only_test:{run_only_test}")
161+
LOGGER.info(f"run_only_test type:{type(run_only_test)}")
162+
run_only_string_list = " or ".join([model for model in run_only_test])
163+
run_only_string = f"-k {run_only_string_list}"
164+
LOGGER.info(f"Note: running only the tests that have the name '{run_only_string_list}'.")
165+
else:
166+
run_only_string = ""
167+
168+
if arguments.local_execution:
169+
number_of_threads_string = ""
170+
local_instance_type_list = ["--local-instance-type", arguments.local_instance_type]
171+
else:
172+
number_of_threads_string = "-n=4"
173+
local_instance_type_list = []
174+
175+
torchserve_branch = arguments.use_torchserve_branch
176+
use_local_serve_folder = arguments.use_local_serve_folder
177+
178+
# Build docker containers as specified in docker.yaml
179+
if not arguments.skip_docker_build:
180+
push_image = False if arguments.local_execution else True
181+
build_docker_container(torchserve_branch=torchserve_branch, push_image=push_image, use_local_serve_folder=use_local_serve_folder)
182+
else:
183+
LOGGER.warn(f"Skipping docker build.")
184+
185+
# Run this script from the root directory 'serve', it changes directory below as required
186+
os.chdir(os.path.join(os.getcwd(), "benchmarks", "automated"))
187+
188+
execution_id = f"ts-benchmark-run-{str(uuid.uuid4())}"
189+
190+
test_path = os.path.join(os.getcwd(), "tests")
191+
LOGGER.info(f"Running tests from directory: {test_path}")
192+
193+
pytest_args = [
194+
"-s",
195+
run_only_string,
196+
"-rA",
197+
test_path,
198+
number_of_threads_string,
199+
"--disable-warnings",
200+
"-v",
201+
"--execution-id",
202+
execution_id,
203+
do_not_terminate_string,
204+
local_execution_string,
205+
] + local_instance_type_list + use_instances_arg_list
206+
207+
LOGGER.info(f"Running pytest")
208+
209+
pytest.main(pytest_args)
210+
211+
# Generate report
212+
s3_results_uri = f"{S3_BUCKET_BENCHMARK_ARTIFACTS}/{execution_id}"
213+
214+
report = Report()
215+
report.download_benchmark_results_from_s3(s3_results_uri)
216+
report.generate_comprehensive_report()
217+
218+
219+
if __name__ == "__main__":
220+
main()

0 commit comments

Comments
 (0)