pytorch
diff --git a/‎_posts/2025-03-21-openreg.md
Lines changed: 129 additions & 0 deletions b/‎_posts/2025-03-21-openreg.md
Lines changed: 129 additions & 0 deletions
diff --git a/‎assets/images/openreg.png
92.8 KB b/‎assets/images/openreg.png
92.8 KB
@@ -0,0 +1,129 @@
+---
+layout: blog_detail
+title: "OpenReg: A Self-Contained PyTorch Out-of-Tree Backend Implementation Using \"PrivateUse1\" Mechanism"
+author: Zhenbin Lin (Huawei)
+---
+
+OpenReg is a self-contained demonstration of a PyTorch out-of-tree backend implementation utilizing the core framework's "PrivateUse1" mechanism. This implementation serves two primary purposes:
+
+1. Reference Implementation: Provides a practical template for third-party device vendors integrating with PyTorch through PrivateUse1.
+2. CI Testing Infrastructure: Enables device-agnostic testing capabilities for continuous integration pipelines.
+
+
+## Usage
+
+
+### Module Installation
+
+
+```
+cd {project}/test/cpp_extensions/open_registration_extension
+python setup.py install
+```
+
+
+
+### Use Case
+
+
+```
+import torch
+import pytorch_openreg
+
+if __name__ == "__main__":
+   print(torch.ones(1, 2, device='openreg'))
+```
+
+
+
+## Architectural Overview
+
+
+### Process Management
+
+OpenReg implements virtual device isolation by spawning N independent subprocesses, each maintaining dedicated request/response queues for inter-process communication. The parent process driver encapsulates device operations into command packets that are:
+
+
+
+1. Dispatched to target devices via request queues
+2. Processed asynchronously with results returned through response queues
+
+
+![Parent-Subprocess Communication Flow](/assets/images/openreg.png){:style="width:100%;"}
+
+Figure: Parent-Subprocess Communication Flow
+
+
+### Memory Management
+
+Device memory allocations occur within individual subprocesses to ensure:
+
+
+
+1. Strict memory isolation between devices
+2. Realistic simulation of physical device constraints
+
+
+## Component Breakdown
+
+
+### _aten_impl.py
+
+This module handles dual responsibilities:
+
+
+
+1. Hook Registration:
+    * Utilizes _IMPL_REGISTRY to bind C++ backend hooks (e.g., getDevice, getStream) to device driver implementations
+2. Fallback Mechanism:
+    * Define a new `torch.Library` that registers a fallback that will be called whenever a backend kernel for PrivateUse1 is called. It contains the logic to handle all kind of native functions, computing the output metadata, allocating it and only calling into the device daemon to perform computation
+
+
+### _device_daemon.py
+
+Core Subsystems
+
+
+
+1. **Allocators**:
+    * `HostAllocator`: Manages pinned memory in parent process
+    * `DeviceAllocator`: Handles device memory with tensor reconstruction capabilities
+2. **Driver (Parent Process)**:
+    * Maintains device context (active device/streams)
+    * Implements device control operations:
+        * setDevice/getDevice
+        * deviceCount
+        * exchangeStream
+    * Orchestrates command execution through queue-based IPC
+3. **Executor (Subprocess)**:
+    * Processes command types:
+        * Memory operations (`malloc`/`free`)
+        * Tensor computations (`run_op`)
+        * Data transfers (`send_data`/`recv_data`)
+        * Stream/event management (primarily no-op due to CPU sync nature)
+
+
+### _meta_parser.py
+
+Key Features:
+
+
+
+* Implements serialization utilities for cross-process object transfer
+* OpenRegTensorMeta class encapsulates complete tensor metadata for:
+    * Output tensor reconstruction
+    * Device-side computation preparation
+
+
+## Design Considerations
+
+
+### Execution Characteristics
+
+
+
+* **Synchronous Computation**: CPU operator execution necessitates synchronous processing
+* **Stream/Event Semantics**: Implemented as no-ops due to synchronous execution model
+* **Memory Isolation**: Strict per-device memory boundaries enforced through subprocess allocation
+
+This architecture enables realistic simulation of device integration while maintaining PyTorch compatibility through standard backend interfaces.