Skip to content

NXP Backend: Add eIQ Neutron Backend #10196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

robert-kalmar
Copy link
Contributor

@robert-kalmar robert-kalmar commented Apr 15, 2025

Summary

Initial implementation for the NXP eIQ Neutron Backend for Neutron-N3-64 (i.MX RT700)

Test plan

Functionality tested by python unit tests:

pytest -c /dev/null/ backend/nxp/tests

cc @digantdesai @JakeStevens , @JakeStevens , @skywall

Copy link

pytorch-bot bot commented Apr 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10196

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit b102be2 with merge base 4559a61 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2025
@robert-kalmar
Copy link
Contributor Author

@pytorchbot label "module: nxp" , label "release notes: nxp"

Copy link

pytorch-bot bot commented Apr 15, 2025

Didn't find following labels among repository labels: ,,label

@pytorch-bot pytorch-bot bot added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels Apr 15, 2025
def convert(self, tflite_model: bytes, target: str, neutron_converter_flavor: str) -> bytes:
# Neutron converter crashes if we provide invalid target -> verify.
if target not in self._supported_target_names:
raise RuntimeError(f"Target '{target}' is not supported by NeutronConverterManager.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai it requires to install the Neutron Converter from eiq.nxp.com/repository:

pip install --index-url https://eiq.nxp.com/repository neutron_converter_SDK_25_03

do you prefer to add requirements.txt into the /backends/nxp directory or rather follow Arm's approach with setup.sh script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can start with setup.sh, I need to discuss with the team if we want to do optional pip installs or setup.sh is the right approach

@digantdesai
Copy link
Contributor

wow this is a huge PR! I will try to go through this bit by bit and leave comments, expect a few days before we can finish reviewing this. Thanks though :)

@robert-kalmar
Copy link
Contributor Author

wow this is a huge PR! I will try to go through this bit by bit and leave comments, expect a few days before we can finish reviewing this. Thanks though :)

Unfortunately, it was hard to break it down as most of it is the infrastructure for conversion from Edge Dialect to format suitable for Neutron Converter (LiteRT/tflite flatbuffer).
But let me give you some insight into the code, what hopefully help you with review:

  • the backends/nxp/backend/neutron_partitioner.py and backends/nxp/backend/nxp_backend.py are the main files for the eIQ Neutron Backend
  • the backends/nxp/backend/neutron_node_extraction.py contains logic to extract the microcode for the Neutron NPU from the Neutron Converter invocation.
  • the backends/nxp/backend/ir/lib/tflite is automatically generated from LiteRT flatbuffers schema
  • the backends/nxp/backend/ir/converter/builder is the LiteRT flatbuffer generation logic
  • the backend/nxp/backedns/ir/converter/node_conversion/ contains the conversion logic from Edge Dialect to LiteRT

@JakeStevens
Copy link
Contributor

@robert-kalmar please make sure to run the linter and re-submit

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/neutron-backend_2_upstream branch from d58bec5 to 32253f6 Compare April 22, 2025 13:48
@JakeStevens
Copy link
Contributor

@robert-kalmar please make sure to run the linter and re-submit

Still failing after latest push.

@@ -0,0 +1,250 @@
# Copyright 2024 NXP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai is this functionality not provided elsewhere in ET already?


# We need to create custom model verifier with max_pool2d added as exception.
# Otherwise, we get violation that this op is not part of ATen Core ops.
edge_program._verifiers = [EXIREdgeDialectVerifier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. For my own understanding, why is this the case? Seems like max_pool is in aten from the path for the op torch.ops.aten

model = Model.GetRootAs(flatbuffer, 0)
assert model.SubgraphsLength() == 1, f'The model has `{model.SubgraphsLength()}` SubGraphs instead of `1`.'

sub_graph = model.Subgraphs(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with graph breaks? Can there only be one subgraph for Neutron?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least add an assert?

Copy link
Contributor Author

@robert-kalmar robert-kalmar Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the workflow:
The Neutron Converter currently supports tflite format as input and returns the tflite as output. That is the returned tflite flatbuffers in general contains combination of Neutron Nodes (which are parts of the original compute graph to be computed on Neutron) and leftover tflite ops, which will be computed on CPU.

In case of ExecuTorch's NeutronBackend, we obviously do not have any TFLite runtime, so the partitioning logic is in the NeutronPartitioner. The Neutron Partitioner already identifies subgraphs that are supported on Neutron, and so the NeutronBackend invoking the Neutron Converter shall generate a microcode for whole such a subgraph. That is no tflite ops shall stay in the tflite graph passed to the Neutron Converter. If that happens, that is a RuntimeError, as we do not convert the leftovers tflite ops back to executorch ops. The Partitioner shall ensure that not happens.
In short - the return value from Neutron converter shall contain exactly 1 op/node, which is to be the "NeutronNode". This is checked on #44 and #47 and the node to be NeutronNode on line #76.

There can be multiple subraphs for Neutron, but this is controlled by the ExecuTorch, which for every identified partition creates a subraph - ExportedProgram, calls the NeutronBackend::preprocess() function, which converts this subgraph and respond with the payload (microcode) for Neutron NPU. We convert 1 subgraph at a time.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part 1 / n

setup.py Outdated
@@ -144,6 +145,10 @@ def openvino(cls) -> bool:
def xnnpack(cls) -> bool:
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_XNNPACK", default=True)

@classmethod
def neutron(cls) -> bool:
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=True)
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this in the future, let's start conservatively to no disrupt existing workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -53,7 +53,7 @@ def clean():


# Please keep this insync with `ShouldBuild.pybindings` in setup.py.
VALID_PYBINDS = ["coreml", "mps", "xnnpack", "training", "openvino"]
VALID_PYBINDS = ["coreml", "mps", "xnnpack", "neutron", "training", "openvino"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broadly implies we can run the .PTE with neutron delegates in it on the dev machine like x86.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,297 @@
# Copyright 2024 NXP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filename nit
s/nxp_backend.py/neutron_backend.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't mind, let do the file renaming as we push all our changes from internal repository.

extract_artifacts_from_neutron_node,
NeutronNodeArtifacts,
)
from executorch.backends.xnnpack._passes import RemoveGetItemPass, XNNPACKPassManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fork the pass manager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the RemoveGetItemPass to backends/transforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Introduced NeutronPassManager.
  • The RemoveGetItemPass moved to backend/transform in separate commit for better visibility

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai Is there some specific reason PassManager is built on top of GraphModule and not ExportedProgram? I see basically every backend has some custom managers that accept ExportedProgram as input type and also produce ExportedProgram as an output of run/__call__ function. I guess it is done that way because wrapping of modified graph_module back into ExportedProgram seems to be reponsibility of PassManager itself and not PassManager call site.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PassManager is FX concept, while ExportedProgram is an export concept. FX has more users than export. The ExportedProgram deals with GraphSignature etc. which is not in the scope for FX. So XNNPACKPassmanager is that wrapper. EdgeProgramManager.transform is kind of similar which is also built on top of _transform.

Args:
config: Neutron accelerator configuration, e.g. "imxrt700"
extra_flags: Extra flags for the Neutron compiler
operators_not_to_delegate: List of operators that should not be delegated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete args docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qdq_clusterer.tag_qdq_clusters(nodes)

graph_module.recompile()
target = self.delegation_spec[1][2].value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use CompileSpec.key : str to make this more readable?

graph_module = exported_program.graph_module
nodes = list(graph_module.graph.nodes)

qdq_clusterer = QDQClusterRecognizer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
qdq_clusterer = QDQClusterRecognizer()
qdq_cluster_recognizer = QDQClusterRecognizer()

QUANTIZE_OPERATORS = [
exir_ops.edge.quantized_decomposed.quantize_per_channel.default,
exir_ops.edge.quantized_decomposed.quantize_per_tensor.default,
exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for dynamic (at runtime) quantization of the activation for conv, linear etc. Do we support that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Ouh. we do not.
For our education, how this is defined? I see ops definition in kernels/quantized/quantized.yaml There are out variants and tensor variants. In executorch.exir.dialects._ops there are variants default and tensor. Is there some definition what is the difference btw. "out"/"default" and "tensor". For quantize ops is starts make sense - static quantization vs. dynamic quantization. But any general rule of thumb?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the best way to check for the dynamic quant is presence of the choose_qparam node to dynamically populate scale and zp. I am not sure if we are being consistent with .tensor variant TBH.

model = Model.GetRootAs(flatbuffer, 0)
assert model.SubgraphsLength() == 1, f'The model has `{model.SubgraphsLength()}` SubGraphs instead of `1`.'

sub_graph = model.Subgraphs(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least add an assert?

opcode.BuiltinCode() == BuiltinOperator.CUSTOM
and opcode.CustomCode() == b"NeutronGraph"
):
# Found the NeutronNode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this the only thing in the tflite flatbuffer of relevance?

robert-kalmar and others added 2 commits April 25, 2025 12:53
Co-authored-by: Lukas Sztefek <[email protected]>
Co-authored-by: Martin Pavella <[email protected]>
Co-authored-by: Jiri Ocenasek <[email protected]>
Co-authored-by: Roman Janik <[email protected]>
Co-authored-by: Simon Strycek <[email protected]>
Multiple backends are using this pass - Arm, Neutron, XNNPACK.
@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/neutron-backend_2_upstream branch from 32253f6 to b102be2 Compare April 25, 2025 10:55
@robert-kalmar
Copy link
Contributor Author

robert-kalmar commented Apr 25, 2025

Linting errors should be fixed now.

@JakeStevens
Copy link
Contributor

Think you need just one more run of the linter, misplaced import

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants