A high-performance C++ machine learning framework with lazy evaluation, similar to MLX, designed for fast dispatch times and efficient computation graphs.
TT Lazy is a CPU math functions backend with a lazy tensor evaluation framework. The system stores operation graphs and optimizes them before computation, similar to MLX but with a focus on fast dispatch times.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND β
β User Operations: matmul(a,b), relu(x), reduce_sum(y) β
β β’ Fast dispatch (just graph building) β
β β’ No computation, only graph construction β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GRAPH β
β Computation Graph: Nodes + Dependencies β
β β’ Lazy tensors store graph references β
β β’ Operations stored as nodes with arguments β
β β’ No actual computation yet β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TAPE β
β Linear Execution Plan + Optimization β
β β’ Dead code elimination β
β β’ Operation fusion (future) β
β β’ Memory optimization (future) β
β β’ Operation handlers (bridge to math) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MATH β
β CPU Math Functions: Actual Computation β
β β’ Element-wise operations (ReLU, Sigmoid) β
β β’ Matrix operations (MatMul, Transpose) β
β β’ Reduction operations (Sum, Mean) β
β β’ Memory-efficient implementations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- tt_lazy_core: Basic graph infrastructure (Tensor, Node, Context, MemoryManager)
- tt_lazy_operations: Frontend operations that build computation graphs (Split, MatMul, Reduce, ReLU)
- tt_math_lib: CPU math functions for actual computation (immediate evaluation)
- tt_lazy_tape: Tape-based execution system with operation handlers (lowering/bridge layer)
#include "Tensor.hpp"
#include "operations.hpp"
// Create tensors
Tensor a({2, 3});
Tensor b({3, 4});
a.fill(1.0f);
b.fill(2.0f);
// Build lazy computation graph (no computation yet!)
Tensor c = matmul(a, b); // Lazy operation
Tensor d = relu(c); // Lazy operation
Tensor e = reduce_sum(d); // Lazy operation
// Automatic evaluation when accessing data
float* result = e.data_ptr(); // Graph evaluated automatically!
std::vector<float> data = e.to_vector(); // Also triggers evaluation// Multiple element-wise operations that get fused
Tensor x({1000, 1000});
x.fill(1.0f);
Tensor y = relu(x); // Element-wise
Tensor z = add(y, y); // Element-wise
Tensor w = multiply(z, z); // Element-wise
// TT Lazy optimizes: relu + add + multiply β single fused kernel
float* optimized_result = w.data_ptr(); // Fused execution!import tt_lazy
import numpy as np
# Create tensors
a = tt_lazy.tensor([2, 3], data=np.ones((2, 3), dtype=np.float32))
b = tt_lazy.tensor([3, 4], data=np.ones((3, 4), dtype=np.float32))
# Build lazy computation graph
c = tt_lazy.matmul(a, b) # No computation yet
d = tt_lazy.relu(c) # Still no computation
e = tt_lazy.reduce_sum(d) # Still lazy
# Automatic evaluation when converting to numpy
result_np = e.to_numpy() # Graph evaluated automatically!// Print the computation graph structure
Tensor result = some_computation();
result.print_graph(std::cout);
// Manual evaluation when needed
result.eval(); // Explicit evaluation (optional)- C++17 or later
- CMake 3.16+
- Conan 2.0+ (for dependency management)
- Boost 1.84.0+ (container library)
- Google Test 1.14.0+ (for testing)
- pybind11 2.12.0+ (for Python bindings)
- Install Conan:
pip install conan- Ensure Conan is in your PATH:
export PATH="$HOME/.local/bin:$PATH" # For pipx installationsTT Lazy uses Ninja as the default build system for fast parallel builds and CMake presets for streamlined configuration.
# Clone the repository
git clone <repository-url>
cd tt_lazy
# Build with Conan (recommended)
chmod +x build_with_conan.sh
./build_with_conan.shRelease Build (Production):
./build_with_conan.sh # Full build with dependenciesDebug Build:
./build_debug.sh # Debug build with all debug symbolsCMake Presets (Advanced):
# Using CMake presets directly
cmake --preset conan-release # Configure
cmake --build --preset conan-release # Build
ctest --preset conan-release # Test
# Available presets:
cmake --list-presets # See all available presets- Install all dependencies via Conan
- Configure CMake with Ninja generator
- Build all libraries and tests with parallel compilation
- Run the complete test suite
- Generate compile commands for IDE support
# Install dependencies
conan install . --build=missing
# Configure with preset
cmake --preset conan-release
# Build with Ninja
cmake --build --preset conan-release
# Run tests
ctest --preset conan-releasemacOS:
brew install ninjaUbuntu/Debian:
sudo apt install ninja-buildWindows:
choco install ninja # Using Chocolatey
# OR download from: https://ninja-build.org/cd build
ctest --output-on-failurecd tests/python
python3 run_tests.py- MatMul: Matrix multiplication with optional transposition
- ReLU: Rectified Linear Unit activation
- Reduce: Sum, mean, max, min along specified dimensions
- Split: Split tensor along a dimension
- Add/Multiply: Element-wise operations
- Transpose: Transpose tensor dimensions
Operations support configurable arguments:
// Matrix multiplication with transposition
Tensor result = matmul(a, b, true, false); // transpose_a=true, transpose_b=false
// Reduce with specific dimensions
Tensor sum = reduce_sum(input, {0, 2}, true); // dims={0,2}, keepdim=true
// ReLU in-place
Tensor activated = relu(input, true); // inplace=trueAdding a new operation requires implementing three layers: Frontend, Math, and Handler.
File: includes/operations/operations.hpp and frontend/operations.cpp
// 1. Define operation arguments
DEFINE_OP_ARGS(Sigmoid,
bool inplace = false;
);
// 2. Declare frontend function
Tensor sigmoid(const Tensor& input, bool inplace = false);
// 3. Implement frontend function (builds graph)
Tensor sigmoid(const Tensor& input, bool inplace) {
SigmoidArgs args;
args.inplace = inplace;
SmallVector<Tensor, 2> inputs{input};
NodeId node_id = Context::instance().create_node(inputs, std::move(args));
// Output has same shape as input
std::vector<uint32_t> shape(input.shape(), input.shape() + input.rank());
uint32_t shape_array[4] = {1, 1, 1, 1};
for (size_t i = 0; i < shape.size(); ++i) {
shape_array[i] = shape[i];
}
return Tensor(node_id, 0, {shape_array[0], shape_array[1], shape_array[2], shape_array[3]});
}File: math/math_operations.hpp and math/eltwise.cpp (or new file)
// 1. Declare in math_operations.hpp
namespace math {
Tensor sigmoid(const Tensor& input);
}
// 2. Implement in math/eltwise.cpp
namespace math {
Tensor sigmoid(const Tensor& input) {
// Ensure input is materialized
if (!input.is_materialized()) {
throw std::runtime_error("Math functions require materialized tensors");
}
// Create output tensor with same shape
Tensor output(input.shape(), input.shape() + input.rank());
// Perform actual computation
const float* input_data = input.const_data_ptr();
float* output_data = output.data_ptr();
size_t num_elements = input.total_elements();
for (size_t i = 0; i < num_elements; ++i) {
output_data[i] = 1.0f / (1.0f + std::exp(-input_data[i]));
}
return output;
}
}File: tape/OperationHandlers.cpp
// 1. Implement handler function
void handle_sigmoid(TapeOperation& op, TapeExecutor& executor) {
// Collect input tensors
std::vector<std::shared_ptr<Tensor>> input_tensors;
// Add lazy input tensors
for (NodeId node_id : op.input_nodes) {
auto tensor = executor.get_result(node_id);
if (!tensor) {
throw std::runtime_error("Missing lazy input tensor for sigmoid operation");
}
input_tensors.push_back(tensor);
}
// Add constant input tensors
for (const auto& const_tensor : op.constant_inputs) {
input_tensors.push_back(std::make_shared<Tensor>(const_tensor));
}
if (input_tensors.size() != 1) {
throw std::runtime_error("Sigmoid operation requires exactly 1 input");
}
// Call math function
auto result = std::make_shared<Tensor>(math::sigmoid(*input_tensors[0]));
executor.set_result(op.node_id, result);
op.result = result;
}
// 2. Register handler in register_all_operations()
void register_all_operations(TapeExecutor& executor) {
executor.register_operation(SplitArgs::type_id(), handle_split);
executor.register_operation(MatMulArgs::type_id(), handle_matmul);
executor.register_operation(ReduceArgs::type_id(), handle_reduce);
executor.register_operation(ReLUArgs::type_id(), handle_relu);
executor.register_operation(SigmoidArgs::type_id(), handle_sigmoid); // Add this line
}File: bindings/operations.cpp
// Add Python binding
m.def("sigmoid", &sigmoid, "Apply sigmoid activation",
py::arg("input"), py::arg("inplace") = false);// Usage in C++
Tensor x({2, 3});
x.fill(0.5f);
Tensor y = sigmoid(x); // Lazy operation - builds graph
float* data = y.data_ptr(); // Automatic evaluation!
// Usage in Python
import tt_lazy
x = tt_lazy.tensor([2, 3], data=[[0.5, 1.0, -1.0], [2.0, -0.5, 0.0]])
y = tt_lazy.sigmoid(x) # Lazy operation
result = y.to_numpy() # Automatic evaluation!- Frontend:
sigmoid(x)creates graph node withSigmoidArgs - Lazy: Operation stored in graph, no computation yet
- Materialization: Tape executor processes graph
- Handler:
handle_sigmoid()bridges graph operation to math function - Math:
math::sigmoid()performs actual CPU computation - Result: Materialized tensor returned to user
tt_lazy/
βββ core/ # Core source files
β βββ Tensor.cpp # Tensor implementation
β βββ Node.cpp # Graph node implementation
β βββ Context.cpp # Global context
β βββ MemoryManager.cpp # Memory management
βββ includes/ # Header files
β βββ Tensor.hpp # Tensor interface
β βββ Node.hpp # Node interface
β βββ Context.hpp # Context interface
β βββ operations/ # Operation definitions
β βββ tape/ # Tape system headers
βββ frontend/ # Graph-based operations
βββ math/ # Immediate computation operations
βββ tape/ # Tape execution system
βββ bindings/ # Python bindings
βββ tests/ # Test suite
β βββ cpp/ # C++ unit tests
β βββ python/ # Python integration tests
βββ build/ # Build artifacts
βββ CMakeLists.txt # CMake configuration
βββ conanfile.py # Conan dependencies
βββ build_with_conan.sh # Build script
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.