Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX model bringup #344

Open
AleksKnezevic opened this issue Feb 21, 2025 · 2 comments
Open

ONNX model bringup #344

AleksKnezevic opened this issue Feb 21, 2025 · 2 comments
Assignees

Comments

@AleksKnezevic
Copy link
Contributor

AleksKnezevic commented Feb 21, 2025

We need to bring up the following ONNX models

The goal is op-by-op flow in torch then, once @ddilbazTT onnx flow is ready in onnx.

If the original source is a github repo (as opposed to huggingface), please try and extract the minimum necessary repro code and add it into the model folder as an model_implementation.py file (in pytorch) then convert the model to onnx and run. If the model is published under a licence we don't currently have in our LICENCE file, add it.

Don't add the onnx file to CI, do the conversion at runtime.

If you cannot extract the model cleanly into a single (or several) files, let's discuss.

@LPanosTT
Copy link
Contributor

LPanosTT commented Feb 27, 2025

MobilenetV2 in pytorch is working as of today on main.

When exporting the model to onnx and running that compile flow. The parameters become inlined as constants in the graph. TTIR has no way to consume this. Not quite sure how to handle this.

If we did bringup functionality for this through tt-mlir, at runtime the parameter data would end up being pushed to device every inference call since it would have to come from host as it would be embedded as a constant op which is called in the graph. I think ideally, once consteval is working these constants would just be hoisted into the consteval function and passed as graph inputs to the main program. However, that seems as though its a while away.

@LPanosTT
Copy link
Contributor

LPanosTT commented Feb 27, 2025

Much of the following information does not pertain to the ONNX flow. However when originally attempting to bringup SwinV2 in PyTorch I discovered a number of issues:

  • In some cases in-place aten ops may remain in the graph. This is an issue because:

    • We use import_stateless_graph - a function provided by torch-mlir to convert the FX graph to TorchFX IR. In-place ops are not stateless, and so this IR is invalid.
      • Furthermore, the IR is not verified when it is finished being built, so when in-place ops are lowered it can sometime cause ops to have invalid return types, which aren't caught can cause issues during lowering further down the line.
      • I've added a small change which walks the initial TorchFX graph and calls each op's verify method to do this for us: Call op::verify() methods after importing the inital graph #374
  • The use of import_stateless_graph may become an issue in the future as well. It seems that they intend to deprecate this function according to this comment

    • The intended main entry point is import_program instead.
      • This entry point does seem to handle in-place cases just fine and can generate a valid graph even if the underlying FX graph contains in-place ops.
      • This entry point does not accept a torch FX graph, rather a torch ExportedProgram
      • However using this entry-point has its issues:
        • Since it only takes an ExportedProgram, we cannot use our FX graph alone.
          • torch.compile will only give us an FX graph to begin with.
          • Constructing an ExportedProgram directly with the provided FX graph may prove to be very difficult as we'd need to generate an ExportGraphSignature which contains the input information (i.e. user input, param, const).
          • As of right now, we can only call torch.export.export on the FX graph at the very beginning unless we make some changes to the pass pipeline. Attempting to do so afterward causes the export to fail due to some fake-tensor context mismatch - not sure what that's about yet.
            • When we do this the stack traces that come in the original FX graph are lost.
          • I've tried manually compiling without torch.compile and just using torch-mlir's example flow. That is, creating an ExportedProgram from the original nn.Module and then using import_program to generate the initial TorchFX IR, and then compiling to Stablehlo and so on from there. There is an issue that occurs here which we face with onnx models as well:
            • The parameters are inlined as constants in the graph, which we cannot lower at the moment, and has its own issues even if we do. See this comment for a more detailed description of said issues: ONNX model bringup #344 (comment)
  • More information about ExportedProgram

    • An ExportedProgram holds a torch FX GraphModule in addition to more information.
    • An ExportedProgram has useful information like, which tensors are user inputs, which are constants, and which are parameters.
    • An ExportedProgram is generated by calling torch.export.export(model, inputs)
    • ExportedProgram has a helper named run_decompositions which can take the decomposition map as we currently have it, and run decompositions.
    • ExportedProgram automatically folds constants but not parameters, and I'm not sure how/if we can make it do so.
      • I suppose when we get a functional consteval in tt-mlir this wouldn't matter.
    • Stack traces are not included with the underlying FX graph, we may be able to get this by utilizing torch.export.export's preserve_module_call_signature kwarg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants