Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine indirectly defined constant for clamp op #2268

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mmanzoorTT
Copy link
Contributor

Ticket

closes #2185

Problem description

stablehlo.clamp is lowered to either 1) ttir.clamp or 2) ttir.maximum followed by ttir.minimum op.
However, tt-metal does not support implicit broadcast for maximum and minimum ops. These two
ops fail if the operand shapes does not match and broadcast is required.

What's changed

  1. Broadcast min/max value(s) if the shape of min/max does not match with input operand.
  2. If a constant op is indirectly used in the stablehlo.clamp op (e.g. constant is converted/reshaped and then used in clamp op), then determine the constant value of min/max and use it in ttir.clamp conversion.

Checklist

  • New/Existing tests provide coverage for changes

@mmanzoorTT mmanzoorTT force-pushed the mmanzoor/clamp-reshape-input branch 4 times, most recently from 62fd183 to ac19198 Compare February 27, 2025 12:21
Constant can be converted/reshaped/broadcasted and then used in stablehlo.clamp
op. Determine the base constant value for min/max and use them in ttir.clamp op.
@mmanzoorTT mmanzoorTT force-pushed the mmanzoor/clamp-reshape-input branch from ac19198 to 5ebaa5e Compare February 27, 2025 12:23
Copy link
Contributor

@mrakitaTT mrakitaTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on ttnn docs and source code it seems that ttnn.clamp op supports providing either scalars or tensors as min and max operands.

Here is the breakdown in third party dialects:

  • StableHLO supports providing either scalars or tensors as min and max operands
  • ONNX and TOSA support only scalars as min and max operands
  • linalg doesn't seem to have a clamp/clip equivalent
  • torch-mlir has separate ops: aten.clamp which supports scalar operands, and aten.clamp.Tensor which supports tensor operands

Based on all of this, it seems like the most general way to support clamp/clip op in TTIR would be to do the same as StableHLO does and support providing either scalars or tensors as min and max operands, that way all third party dialects' equivalents will be able to lower to it without any additional logic. That means removing all this decomposition logic from the StableHLO->TTIR conversion, updating the TTIR and TTNN ops, and updating the runtime function for ttnn.clamp op.

What do others think? @sdjordjevicTT @nsmithtt @AleksKnezevic

@nsmithtt
Copy link
Contributor

nsmithtt commented Mar 5, 2025

it seems like the most general way to support clamp/clip op in TTIR would be to do the same as StableHLO does and support providing either scalars or tensors as min and max operands, that way all third party dialects' equivalents will be able to lower to it without any additional logic. That means removing all this decomposition logic from the StableHLO->TTIR conversion, updating the TTIR and TTNN ops, and updating the runtime function for ttnn.clamp op.

Yes this seems like a better approach, we probably didn't have ttnn clamp or something when we initially brought the op up.

@mmanzoorTT
Copy link
Contributor Author

mmanzoorTT commented Mar 5, 2025

When I initially brought up clamp op; tt-metal only supported clamp with scalar min/max values.

StableHLO does not support scalars as min/max; instead it uses zero rank tensors in case of scalars. We can use the same approach like stablehlo as Marko suggested. However, we will still need decomposition (creation of zero rank tensors and broadcasting them) for scalar min/max as tt-metal does not support implicit broadcast of the operands for clamp op. We can determine whether to use scalar or tensor interface. I think, it will be better to always use tensor-based interface to keep runtime simple.

Other possible solutions can be

  1. ttir.clamp and ttnn.clamp will have 2 fields for minimum attribute (one for scalar min value and other for tensor min value) and maximum attribute will be updated similarly. Runtime uses the appropriate interface.
  2. last option is to use two variations of clamp op; one for scalar min/max and other for tensor min/max. Runtime will use the appropriate interface based on op variations.

Stablehlo based approach seems more suitable to me.

@mmanzoorTT
Copy link
Contributor Author

I'll request to unblock this PR and merge it as it will unblock the models bring up with tt-torch. One of the model is failing due to this clamp op problem.

I'll implement the generalize approach and we will have to sync with tt-forge front end as well; so that changes in both repos get merge simultaneously.

@nsmithtt
Copy link
Contributor

nsmithtt commented Mar 6, 2025

I'll implement the generalize approach and we will have to sync with tt-forge front end as well; so that changes in both repos get merge simultaneously.

Can you file / link an issue? Otherwise this sounds like a good approach to me, @mrakitaTT thoughts?

@mrakitaTT
Copy link
Contributor

StableHLO does not support scalars as min/max; instead it uses zero rank tensors in case of scalars.

Oh I see, I missed that, yeah we can't easily map the zero-rank tensor arguments to the ttnn.clamp with constant value attributes, because we would need to read the value from the tensor in compile time (or complicate the runtime). We can have two clamp ops then similar to torch-mlir, one that we currently have (with constant value attributes) and another with tensor arguments, that's the 2nd solution that you listed @mmanzoorTT

As for the mapping from stablehlo to the one with constant value attributes, I believe that it will always be a situation like this (maybe some reshape/broadcast ops in between, or arith.constant instead of stablehlo.constant):

%min_arg = stablehlo.constant dense<0> : tensor<f32>
%max_arg = stablehlo.constant dense<1> : tensor<f32>
%result = stablehlo.clamp %min_arg, %arg0, %max_arg : (tensor<f32>, tensor<32x128xf32>, tensor<f32>) -> tensor<32x128xf32>

which would get converted into:

%min_arg = ttir.constant dense<0> : tensor<f32>
%max_arg = ttir.constant dense<1> : tensor<f32>
%0 = tensor.empty() : tensor<32x128xf32>
%1 = "ttir.clamp_tensor"(%arg0, %min_arg, %max_arg, %0) : (tensor<32x128xf32>, tensor<f32>, tensor<f32>, tensor<32x128xf32>) -> tensor<32x128xf32>

for which we can create a fussion pass and convert it into this when the arguments tensors have a 0-rank:

%0 = tensor.empty() : tensor<32x128xf32>
%1 = "ttir.clamp"(%arg0, %0) <{max = 1.000000e+00 : f32, min = 0.000000e+00 : f32}> : (tensor<32x128xf32>, tensor<32x128xf32>) -> tensor<32x128xf32>

Another tool we can use in situations like this is StableHLO interpreter, they listed const-folding as one of the use cases here.

Can you file / link an issue? Otherwise this sounds like a good approach to me, @mrakitaTT thoughts?

Yeah, let's file the issue @mmanzoorTT and I will review this PR now to unblock, then we can come back to refactor this.

ttmlir::utils::replaceOpWithNewDPSOp<ttir::MinimumOp>(
rewriter, srcOp, outputType, maximumOp.getResult(0), adaptor.getMax());
rewriter, srcOp, outputType, maximumOp.getResult(0), max);

return success();
}

private:
std::optional<float> getConstantValue(Value value) const {
Copy link
Contributor

@mrakitaTT mrakitaTT Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you actually already have the logic here to extract the constant values to use. I would then avoid delaying the proper solution for later and just do it right now since it doesn't require much more work (to add two separate clamp ops to ttir and ttnn, and map here to the one with constant value attributes, and remove this decomposition logic). One model being blocked in tt-torch doesn't seem like that big of priority to me, please let me know if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stablehlo.clamp conversion bug
4 participants