Instructions to Add Eager Operations #1173
jmalone-tt
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Adding Eager Mode Operations
The following document gives an overview of the current state of eager mode architecture in the PyTorch TT-NN compiler. It also provides a primer of steps needed to add support for more operations.
Prerequisites
This guide assumes you have already followed the setup steps for the PT2.0 project. If not, please follow along with the steps at the bottom of the project's README.
Current Architecture
The code for eager mode is mostly located under the
torch_ttnn/cpp_extensiondirectory. The cpp_extension subsystem consists of several key components that work together to provide native TT-NN integration:Device Integration
The ttnn_module provides the primary interface for converting between PyTorch and TT-NN devices in
test_cpp_extension_functionality.py:24. This module exposes functions likeas_torch_device()to wrap TT-NN devices as PyTorch devices andget_ttnn_tensor()to extract underlying TT-NN tensors from PyTorch tensors (seetest_cpp_extension_functionality.py:39).Tensor Copy Operations
The core tensor data movement is handled by
copy.cpp, which implements bidirectional copying between CPU and TT-NN device tensors (copy.cpp:13). The system supports multiple data types including BFLOAT16, UINT32, and INT32 with proper type conversions (copy.cpp:32-49).For CPU -> TT-NN transfers, the system creates TT-NN tensors with host storage and then transfers them to the device (
copy.cpp:38-48). For TT-NN -> CPU transfers, it extracts data vectors from TT-NN tensors and copies them to CPU memory (copy.cpp:84-86).TtnnTensorImpl Integration
The subsystem uses custom tensor implementations (
TtnnTensorImpl) that wrap TT-NN tensors within PyTorch's tensor interface (copy.cpp:21-22). This allows PyTorch operations to work directly with TT-NN tensors while maintaining proper device guards and memory management (copy.cpp:20).Build System and Dependencies
The cpp_extension includes a complete build system with
tt-metalas a third-party dependency (run-cpp-native-tests.yaml:58). The build process involves:run-cpp-native-tests.yaml:54-59)tt-metalbuilds, and ccache data (run-cpp-native-tests.yaml:98-133)build_cpp_extension.shscript compiles the native extensions with proper Python library suffixes (run-cpp-native-tests.yaml:92-116)Testing Framework
The subsystem includes comprehensive tests that validate the native integration functionality (
test_cpp_extension_functionality.py:21-58). Tests cover:abs()(test_cpp_extension_functionality.py:44-46)test_cpp_extension_functionality.py:87-89)test_cpp_extension_functionality.py:107)The testing infrastructure runs as part of CI/CD workflows, specifically triggered by changes to the cpp_extension directory (
run-cpp-native-tests.yaml:8-13).Integration with Main Backend
The cpp_extension works in conjunction with the main torch_ttnn backend through the native_integration option (
conftest.py:276-283). When native integration is enabled, models and inputs are moved directly to TT-NN devices using the cpp_extension's device interface, bypassing some of the higher-level tensor conversion layers (conftest.py:278-281).Usage
To use eager mode, the following steps can be taken:
Adding New Operations
Based on the above description of the project today, you may notice that there are several changes required to add a new operation. Broadly, the following steps are needed (update as appropriate):
torch_ttnn/cpp_extension/ttnn_cpp_extension/src/ops/<additional_folder_if_needed>/<filename.cpp>and the corresponding function declarations in the matching.hppfile. You can reference the existing operations for examples.open_registration_extension.cpp.tests/cpp_extension!Additionally, there are some miscellaneous improvements for the future that do not map cleanly to adding a new operation including:
AddDataMovePass). It may be worth changing based on dtype (e.g. assumeuint32tensors will be used for indexing, so convert them asRowMajor, assumebfloat16tensors will be used for calculations, so convert them asTileLayout).Testing Loop
Here are some useful commands to use during OP creation to speed up development:
Beta Was this translation helpful? Give feedback.
All reactions