Skip to content

Commit 5a198e9

Browse files
[libshortfin] Add simple invocation test. (#170)
This test is not particularly inspired (and the API needs to be simplified) but it represents the first full system test in the repo. In order to run the test, it is downloading a mobilenet onnx file from the zoo, upgrading it, and compiling. In the future, I'd like to switch this to a simpler model like MNIST for basic functionality, but I had some issues getting that to work via ONNX import and punted. While a bit inefficient (it will fetch on each pytest run), this will keep things held together until we can do something more comprehensive. Note that my experience here prompted me to file iree-org/iree#18289, as this is way too much code and sharp edges to compile from ONNX (but it does work). Verifies numerics against a silly test image. Includes some fixes: * Reworked the system detect marker so that we only run system specific tests (like amdgpu) on opt-in via a `--system amdgpu` pytest arg. This refinement was prompted by an ASAN violation in the HIP runtime code which was tripping me up when enabled by default. Filed here: iree-org/iree#18449 * Fixed a bug revealed when writing the test where an exception thrown from main could trigger a use-after-free because we were clearing workers when shutting down (vs at destruction) when all objects owned at the system level need to have a lifetime no less than the system.
1 parent a038133 commit 5a198e9

13 files changed

+193
-14
lines changed

.github/workflows/ci_linux_x64-libshortfin.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ jobs:
8585
# TODO: Switch to `pip install -r requirements.txt -e libshortfin/`.
8686
run: |
8787
pip install -r ${{ env.LIBSHORTFIN_DIR }}/requirements-tests.txt
88+
pip install -r ${{ env.LIBSHORTFIN_DIR }}/requirements-iree-compiler.txt
8889
pip freeze
8990
9091
- name: Build libshortfin (full)
@@ -107,7 +108,7 @@ jobs:
107108
cd ${{ env.LIBSHORTFIN_DIR }}/build
108109
ctest --timeout 30 --output-on-failure
109110
cd ${{ env.LIBSHORTFIN_DIR }}
110-
pytest -s -v -m "not requires_amd_gpu"
111+
pytest -s
111112
112113
- name: Build libshortfin (host-only)
113114
run: |

.github/workflows/ci_linux_x64_asan-libshortfin.yml

+8-3
Original file line numberDiff line numberDiff line change
@@ -76,8 +76,10 @@ jobs:
7676
needs: [setup-python-asan]
7777
runs-on: ubuntu-24.04
7878
env:
79-
# TODO(#151): Don't ignore ODR violations
80-
ASAN_OPTIONS: detect_odr_violation=0
79+
# We can't count on being leak free in general (i.e. pip, etc) so disable
80+
# leak checker by default. Here we suppress any ASAN features needed to
81+
# pass the build. Test configuration is done specially just for that step.
82+
ASAN_OPTIONS: detect_leaks=0,detect_odr_violation=0
8183
LSAN_OPTIONS: suppressions=${{ github.workspace }}/libshortfin/build_tools/python_lsan_suppressions.txt
8284
steps:
8385
- name: Install dependencies
@@ -170,7 +172,10 @@ jobs:
170172
171173
- name: Run pytest
172174
if: ${{ !cancelled() }}
175+
env:
176+
# TODO(#151): Don't ignore ODR violations
177+
ASAN_OPTIONS: detect_odr_violation=0
173178
run: |
174179
eval "$(pyenv init -)"
175180
cd ${{ env.LIBSHORTFIN_DIR }}
176-
pytest -m "not requires_amd_gpu" -s
181+
pytest -s

libshortfin/README.md

+14
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,20 @@ does mean that the C++ core of the library must always be built with the
6161
Python bindings to test the most behavior. Given the target of the project,
6262
this is not considered to be a significant issue.
6363

64+
### Python tests
65+
66+
Run platform independent tests only:
67+
68+
```
69+
pytest tests/
70+
```
71+
72+
Run tests including for a specific platform:
73+
74+
```
75+
pytest tests/ --system amdgpu
76+
```
77+
6478
# Production Library Building
6579

6680
In order to build a production library, additional build steps are typically
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
11
leak:PyUnicode_New
2+
leak:_PyUnicodeWriter_PrepareInternal
23
leak:_PyUnicodeWriter_Finish
4+
leak:numpy
5+
leak:_mlir_libs
6+
leak:google/_upb
7+
leak:import_find_and_load
8+
leak:ufunc

libshortfin/pyproject.toml

-3
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,6 @@ addopts = [
1212
"-ra",
1313
"--import-mode=importlib",
1414
]
15-
markers = [
16-
"requires_amd_gpu: tests that require and AMD GPU (deselect with '-m \"not requires_amd_gpu\"')",
17-
]
1815
testpaths = [
1916
"tests",
2017
]

libshortfin/requirements-tests.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
pytest
22
requests
33
fastapi
4+
onnx
45
uvicorn

libshortfin/src/shortfin/local/system.cc

+7-6
Original file line numberDiff line numberDiff line change
@@ -61,31 +61,28 @@ System::~System() {
6161

6262
void System::Shutdown() {
6363
// Stop workers.
64-
std::vector<std::unique_ptr<Worker>> local_workers;
6564
{
6665
iree::slim_mutex_lock_guard guard(lock_);
6766
if (!initialized_ || shutdown_) return;
6867
shutdown_ = true;
69-
workers_by_name_.clear();
70-
local_workers.swap(workers_);
7168
}
7269

7370
// Worker drain and shutdown.
74-
for (auto &worker : local_workers) {
71+
for (auto &worker : workers_) {
7572
worker->Kill();
7673
}
77-
for (auto &worker : local_workers) {
74+
for (auto &worker : workers_) {
7875
if (worker->options().owned_thread) {
7976
worker->WaitForShutdown();
8077
}
8178
}
8279
blocking_executor_.Kill();
83-
local_workers.clear();
8480
}
8581

8682
std::shared_ptr<Scope> System::CreateScope(Worker &worker,
8783
std::span<Device *const> devices) {
8884
iree::slim_mutex_lock_guard guard(lock_);
85+
AssertRunning();
8986
return std::make_shared<Scope>(shared_ptr(), worker, devices);
9087
}
9188

@@ -102,6 +99,7 @@ void System::InitializeNodes(int node_count) {
10299

103100
Queue &System::CreateQueue(Queue::Options options) {
104101
iree::slim_mutex_lock_guard guard(lock_);
102+
AssertRunning();
105103
if (queues_by_name_.count(options.name) != 0) {
106104
throw std::invalid_argument(fmt::format(
107105
"Cannot create queue with duplicate name '{}'", options.name));
@@ -140,6 +138,7 @@ Worker &System::CreateWorker(Worker::Options options) {
140138
Worker *unowned_worker;
141139
{
142140
iree::slim_mutex_lock_guard guard(lock_);
141+
AssertRunning();
143142
if (options.name == std::string_view("__init__")) {
144143
throw std::invalid_argument(
145144
"Cannot create worker '__init__' (reserved name)");
@@ -161,6 +160,7 @@ Worker &System::CreateWorker(Worker::Options options) {
161160

162161
Worker &System::init_worker() {
163162
iree::slim_mutex_lock_guard guard(lock_);
163+
AssertRunning();
164164
auto found_it = workers_by_name_.find("__init__");
165165
if (found_it != workers_by_name_.end()) {
166166
return *found_it->second;
@@ -207,6 +207,7 @@ void System::FinishInitialization() {
207207

208208
int64_t System::AllocateProcess(detail::BaseProcess *p) {
209209
iree::slim_mutex_lock_guard guard(lock_);
210+
AssertRunning();
210211
int pid = next_pid_++;
211212
processes_by_pid_[pid] = p;
212213
return pid;

libshortfin/src/shortfin/local/system.h

+7
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,13 @@ class SHORTFIN_API System : public std::enable_shared_from_this<System> {
148148
"initialization");
149149
}
150150
}
151+
void AssertRunning() {
152+
if (!initialized_ || shutdown_) {
153+
throw std::logic_error(
154+
"System manipulation methods can only be called when initialized and "
155+
"not shutdown");
156+
}
157+
}
151158

152159
// Allocates a process in the process table and returns its new pid.
153160
// This is done on process construction. Note that it acquires the

libshortfin/tests/__init__.py

Whitespace-only changes.

libshortfin/tests/amdgpu_system_test.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
import pytest
88

99

10-
@pytest.mark.requires_amd_gpu
10+
@pytest.mark.system("amdgpu")
1111
def test_create_amd_gpu_system():
1212
from _shortfin import lib as sfl
1313

libshortfin/tests/conftest.py

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Copyright 2024 Advanced Micro Devices, Inc.
2+
#
3+
# Licensed under the Apache License v2.0 with LLVM Exceptions.
4+
# See https://llvm.org/LICENSE.txt for license information.
5+
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
7+
import pytest
8+
9+
10+
def pytest_addoption(parser):
11+
parser.addoption(
12+
"--system",
13+
action="store",
14+
metavar="NAME",
15+
nargs="*",
16+
help="Enable tests for system name ('amdgpu', ...)",
17+
)
18+
19+
20+
def pytest_configure(config):
21+
config.addinivalue_line(
22+
"markers", "system(name): mark test to run only on a named system"
23+
)
24+
25+
26+
def pytest_runtest_setup(item):
27+
required_system_names = [mark.args[0] for mark in item.iter_markers("system")]
28+
if required_system_names:
29+
available_system_names = item.config.getoption("--system") or []
30+
if not all(name in available_system_names for name in required_system_names):
31+
pytest.skip(
32+
f"test requires system in {required_system_names!r} but has "
33+
f"{available_system_names!r} (set with --system arg)"
34+
)
+55
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Copyright 2024 Advanced Micro Devices, Inc.
2+
#
3+
# Licensed under the Apache License v2.0 with LLVM Exceptions.
4+
# See https://llvm.org/LICENSE.txt for license information.
5+
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
7+
import pytest
8+
import urllib.request
9+
10+
11+
def upgrade_onnx(original_path, converted_path):
12+
import onnx
13+
14+
original_model = onnx.load_model(original_path)
15+
converted_model = onnx.version_converter.convert_version(original_model, 17)
16+
onnx.save(converted_model, converted_path)
17+
18+
19+
@pytest.fixture(scope="session")
20+
def mobilenet_onnx_path(tmp_path_factory):
21+
try:
22+
import onnx
23+
except ModuleNotFoundError:
24+
raise pytest.skip("onnx python package not available")
25+
print("Downloading mobilenet.onnx")
26+
parent_dir = tmp_path_factory.mktemp("mobilenet_onnx")
27+
orig_onnx_path = parent_dir / "mobilenet_orig.onnx"
28+
urllib.request.urlretrieve(
29+
"https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-12.onnx",
30+
orig_onnx_path,
31+
)
32+
upgraded_onnx_path = parent_dir / "mobilenet.onnx"
33+
upgrade_onnx(orig_onnx_path, upgraded_onnx_path)
34+
return upgraded_onnx_path
35+
36+
37+
@pytest.fixture(scope="session")
38+
def mobilenet_compiled_cpu_path(mobilenet_onnx_path):
39+
try:
40+
import iree.compiler.tools as tools
41+
import iree.compiler.tools.import_onnx.__main__ as import_onnx
42+
except ModuleNotFoundError:
43+
raise pytest.skip("iree.compiler packages not available")
44+
print("Compiling mobilenet")
45+
mlir_path = mobilenet_onnx_path.parent / "mobilenet.mlir"
46+
vmfb_path = mobilenet_onnx_path.parent / "mobilenet_cpu.vmfb"
47+
args = import_onnx.parse_arguments(["-o", str(mlir_path), str(mobilenet_onnx_path)])
48+
import_onnx.main(args)
49+
tools.compile_file(
50+
str(mlir_path),
51+
output_file=str(vmfb_path),
52+
target_backends=["llvm-cpu"],
53+
input_type="onnx",
54+
)
55+
return vmfb_path
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Copyright 2024 Advanced Micro Devices, Inc.
2+
#
3+
# Licensed under the Apache License v2.0 with LLVM Exceptions.
4+
# See https://llvm.org/LICENSE.txt for license information.
5+
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
7+
import array
8+
import functools
9+
import pytest
10+
11+
import shortfin as sf
12+
import shortfin.array as sfnp
13+
14+
15+
@pytest.fixture
16+
def lsys():
17+
sc = sf.host.CPUSystemBuilder()
18+
lsys = sc.create_system()
19+
yield lsys
20+
lsys.shutdown()
21+
22+
23+
@pytest.fixture
24+
def scope(lsys):
25+
return lsys.create_scope()
26+
27+
28+
@pytest.fixture
29+
def device(scope):
30+
return scope.device(0)
31+
32+
33+
def test_invoke_mobilenet(lsys, scope, mobilenet_compiled_cpu_path):
34+
device = scope.device(0)
35+
dummy_data = array.array(
36+
"f", ([0.2] * (224 * 224)) + ([0.4] * (224 * 224)) + ([-0.2] * (224 * 224))
37+
)
38+
program_module = lsys.load_module(mobilenet_compiled_cpu_path)
39+
program = sf.Program([program_module], scope=scope)
40+
main_function = program["module.torch-jit-export"]
41+
42+
async def main():
43+
device_input = sfnp.device_array(device, [1, 3, 224, 224], sfnp.float32)
44+
staging_input = device_input.for_transfer()
45+
staging_input.storage.data = dummy_data
46+
device_input.copy_from(staging_input)
47+
(device_output,) = await main_function(device_input)
48+
host_output = device_output.for_transfer()
49+
host_output.copy_from(device_output)
50+
await device
51+
flat_output = array.array("f")
52+
flat_output.frombytes(host_output.storage.data)
53+
absmean = functools.reduce(
54+
lambda x, y: x + abs(y) / len(flat_output), flat_output, 0.0
55+
)
56+
assert absmean == pytest.approx(5.01964943873882)
57+
58+
lsys.run(main())

0 commit comments

Comments
 (0)