Skip to content

Commit e779b47

Browse files
authored
Blog post "OpenReg..." (#1948)
* Blog post "OpenReg..." Signed-off-by: Chris Abraham <[email protected]> * update publish date Signed-off-by: Chris Abraham <[email protected]> --------- Signed-off-by: Chris Abraham <[email protected]>
1 parent c1b82ce commit e779b47

File tree

2 files changed

+129
-0
lines changed

2 files changed

+129
-0
lines changed

_posts/2025-03-21-openreg.md

+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
layout: blog_detail
3+
title: "OpenReg: A Self-Contained PyTorch Out-of-Tree Backend Implementation Using \"PrivateUse1\" Mechanism"
4+
author: Zhenbin Lin (Huawei)
5+
---
6+
7+
OpenReg is a self-contained demonstration of a PyTorch out-of-tree backend implementation utilizing the core framework's "PrivateUse1" mechanism. This implementation serves two primary purposes:
8+
9+
1. Reference Implementation: Provides a practical template for third-party device vendors integrating with PyTorch through PrivateUse1.
10+
2. CI Testing Infrastructure: Enables device-agnostic testing capabilities for continuous integration pipelines.
11+
12+
13+
## Usage
14+
15+
16+
### Module Installation
17+
18+
19+
```
20+
cd {project}/test/cpp_extensions/open_registration_extension
21+
python setup.py install
22+
```
23+
24+
25+
26+
### Use Case
27+
28+
29+
```
30+
import torch
31+
import pytorch_openreg
32+
33+
if __name__ == "__main__":
34+
print(torch.ones(1, 2, device='openreg'))
35+
```
36+
37+
38+
39+
## Architectural Overview
40+
41+
42+
### Process Management
43+
44+
OpenReg implements virtual device isolation by spawning N independent subprocesses, each maintaining dedicated request/response queues for inter-process communication. The parent process driver encapsulates device operations into command packets that are:
45+
46+
47+
48+
1. Dispatched to target devices via request queues
49+
2. Processed asynchronously with results returned through response queues
50+
51+
52+
![Parent-Subprocess Communication Flow](/assets/images/openreg.png){:style="width:100%;"}
53+
54+
Figure: Parent-Subprocess Communication Flow
55+
56+
57+
### Memory Management
58+
59+
Device memory allocations occur within individual subprocesses to ensure:
60+
61+
62+
63+
1. Strict memory isolation between devices
64+
2. Realistic simulation of physical device constraints
65+
66+
67+
## Component Breakdown
68+
69+
70+
### _aten_impl.py
71+
72+
This module handles dual responsibilities:
73+
74+
75+
76+
1. Hook Registration:
77+
* Utilizes _IMPL_REGISTRY to bind C++ backend hooks (e.g., getDevice, getStream) to device driver implementations
78+
2. Fallback Mechanism:
79+
* Define a new `torch.Library` that registers a fallback that will be called whenever a backend kernel for PrivateUse1 is called. It contains the logic to handle all kind of native functions, computing the output metadata, allocating it and only calling into the device daemon to perform computation
80+
81+
82+
### _device_daemon.py
83+
84+
Core Subsystems
85+
86+
87+
88+
1. **Allocators**:
89+
* `HostAllocator`: Manages pinned memory in parent process
90+
* `DeviceAllocator`: Handles device memory with tensor reconstruction capabilities
91+
2. **Driver (Parent Process)**:
92+
* Maintains device context (active device/streams)
93+
* Implements device control operations:
94+
* setDevice/getDevice
95+
* deviceCount
96+
* exchangeStream
97+
* Orchestrates command execution through queue-based IPC
98+
3. **Executor (Subprocess)**:
99+
* Processes command types:
100+
* Memory operations (`malloc`/`free`)
101+
* Tensor computations (`run_op`)
102+
* Data transfers (`send_data`/`recv_data`)
103+
* Stream/event management (primarily no-op due to CPU sync nature)
104+
105+
106+
### _meta_parser.py
107+
108+
Key Features:
109+
110+
111+
112+
* Implements serialization utilities for cross-process object transfer
113+
* OpenRegTensorMeta class encapsulates complete tensor metadata for:
114+
* Output tensor reconstruction
115+
* Device-side computation preparation
116+
117+
118+
## Design Considerations
119+
120+
121+
### Execution Characteristics
122+
123+
124+
125+
* **Synchronous Computation**: CPU operator execution necessitates synchronous processing
126+
* **Stream/Event Semantics**: Implemented as no-ops due to synchronous execution model
127+
* **Memory Isolation**: Strict per-device memory boundaries enforced through subprocess allocation
128+
129+
This architecture enables realistic simulation of device integration while maintaining PyTorch compatibility through standard backend interfaces.

assets/images/openreg.png

92.8 KB
Loading

0 commit comments

Comments
 (0)