Instructions to Add Eager Operations #1173

jmalone-tt · 2025-07-22T21:48:58Z

jmalone-tt
Jul 22, 2025
Maintainer

Adding Eager Mode Operations

The following document gives an overview of the current state of eager mode architecture in the PyTorch TT-NN compiler. It also provides a primer of steps needed to add support for more operations.

Prerequisites

This guide assumes you have already followed the setup steps for the PT2.0 project. If not, please follow along with the steps at the bottom of the project's README.

Current Architecture

The code for eager mode is mostly located under the torch_ttnn/cpp_extension directory. The cpp_extension subsystem consists of several key components that work together to provide native TT-NN integration:

Device Integration

The ttnn_module provides the primary interface for converting between PyTorch and TT-NN devices in test_cpp_extension_functionality.py:24. This module exposes functions like as_torch_device() to wrap TT-NN devices as PyTorch devices and get_ttnn_tensor() to extract underlying TT-NN tensors from PyTorch tensors (see test_cpp_extension_functionality.py:39).

Tensor Copy Operations

The core tensor data movement is handled by copy.cpp, which implements bidirectional copying between CPU and TT-NN device tensors (copy.cpp:13). The system supports multiple data types including BFLOAT16, UINT32, and INT32 with proper type conversions (copy.cpp:32-49).

For CPU -> TT-NN transfers, the system creates TT-NN tensors with host storage and then transfers them to the device (copy.cpp:38-48). For TT-NN -> CPU transfers, it extracts data vectors from TT-NN tensors and copies them to CPU memory (copy.cpp:84-86).

TtnnTensorImpl Integration

The subsystem uses custom tensor implementations (TtnnTensorImpl) that wrap TT-NN tensors within PyTorch's tensor interface (copy.cpp:21-22). This allows PyTorch operations to work directly with TT-NN tensors while maintaining proper device guards and memory management (copy.cpp:20).

Build System and Dependencies

The cpp_extension includes a complete build system with tt-metal as a third-party dependency (run-cpp-native-tests.yaml:58). The build process involves:

Submodule Management: The system automatically updates tt-metal submodules to the latest pre-release versions (run-cpp-native-tests.yaml:54-59)
Caching Strategy: Build artifacts are cached to improve CI performance, including compiled extensions, tt-metal builds, and ccache data (run-cpp-native-tests.yaml:98-133)
Extension Building: The build_cpp_extension.sh script compiles the native extensions with proper Python library suffixes (run-cpp-native-tests.yaml:92-116)

Testing Framework

The subsystem includes comprehensive tests that validate the native integration functionality (test_cpp_extension_functionality.py:21-58). Tests cover:

Basic tensor operations: Creating tensors on TT-NN devices and performing operations like abs() (test_cpp_extension_functionality.py:44-46)
Arithmetic operations: Testing addition between TT-NN tensors with proper layout conversions (test_cpp_extension_functionality.py:87-89)
Memory allocation: Validating direct tensor allocation on TT-NN devices (test_cpp_extension_functionality.py:107)

The testing infrastructure runs as part of CI/CD workflows, specifically triggered by changes to the cpp_extension directory (run-cpp-native-tests.yaml:8-13).

Integration with Main Backend

The cpp_extension works in conjunction with the main torch_ttnn backend through the native_integration option (conftest.py:276-283). When native integration is enabled, models and inputs are moved directly to TT-NN devices using the cpp_extension's device interface, bypassing some of the higher-level tensor conversion layers (conftest.py:278-281).

Usage

To use eager mode, the following steps can be taken:

# open a device as normal
device = ttnn.open_device(device_id=0)

# wrap the device
wrapped_device = ttnn_module.as_torch_device(device)

# marshal inputs like this...
ttnn_input = input.to(wrapped_device)

# or modules like this...
ttnn_module = m.to(wrapped_device)

# run ops like normal torch ops
ttnn_result = torch.add(ttnn_input, ttnn_input)

# convert result back to torch and move to cpu
torch_result = ttnn_result.to("cpu")

Adding New Operations

Based on the above description of the project today, you may notice that there are several changes required to add a new operation. Broadly, the following steps are needed (update as appropriate):

Add an implementation for the new operation in torch_ttnn/cpp_extension/ttnn_cpp_extension/src/ops/<additional_folder_if_needed>/<filename.cpp> and the corresponding function declarations in the matching .hpp file. You can reference the existing operations for examples.
Register the newly created function in open_registration_extension.cpp.
Note that you may need to handle conversions to supported dtypes, layouts, etc. You can refer to the files that implement copy and binary ops for examples.
A new operation is not considered complete until there is a matching unit test in tests/cpp_extension!

Additionally, there are some miscellaneous improvements for the future that do not map cleanly to adding a new operation including:

Do not assume tensors should be row major. As a first cut, we can match the behavior in the rest of the compiler (see AddDataMovePass). It may be worth changing based on dtype (e.g. assume uint32 tensors will be used for indexing, so convert them as RowMajor, assume bfloat16 tensors will be used for calculations, so convert them as TileLayout).

Testing Loop

Here are some useful commands to use during OP creation to speed up development:

<!-- rebuild the cpp extension -->
cd torch_ttnn/cpp_extension
./build_cpp_extension.sh

<!-- run the tests -->
pytest -s tests/cpp_extension

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Instructions to Add Eager Operations #1173

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Instructions to Add Eager Operations #1173

Uh oh!

Uh oh!

jmalone-tt Jul 22, 2025 Maintainer

Adding Eager Mode Operations

Prerequisites

Current Architecture

Device Integration

Tensor Copy Operations

TtnnTensorImpl Integration

Build System and Dependencies

Testing Framework

Integration with Main Backend

Usage

Adding New Operations

Testing Loop

Replies: 0 comments

jmalone-tt
Jul 22, 2025
Maintainer