-
Notifications
You must be signed in to change notification settings - Fork 527
NXP Backend: Add eIQ Neutron Backend #10196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NXP Backend: Add eIQ Neutron Backend #10196
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10196
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit b102be2 with merge base 4559a61 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "module: nxp" , label "release notes: nxp" |
Didn't find following labels among repository labels: ,,label |
def convert(self, tflite_model: bytes, target: str, neutron_converter_flavor: str) -> bytes: | ||
# Neutron converter crashes if we provide invalid target -> verify. | ||
if target not in self._supported_target_names: | ||
raise RuntimeError(f"Target '{target}' is not supported by NeutronConverterManager.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai it requires to install the Neutron Converter from eiq.nxp.com/repository:
pip install --index-url https://eiq.nxp.com/repository neutron_converter_SDK_25_03
do you prefer to add requirements.txt
into the /backends/nxp
directory or rather follow Arm's approach with setup.sh
script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can start with setup.sh, I need to discuss with the team if we want to do optional pip installs or setup.sh is the right approach
wow this is a huge PR! I will try to go through this bit by bit and leave comments, expect a few days before we can finish reviewing this. Thanks though :) |
Unfortunately, it was hard to break it down as most of it is the infrastructure for conversion from Edge Dialect to format suitable for Neutron Converter (LiteRT/tflite flatbuffer).
|
@robert-kalmar please make sure to run the linter and re-submit |
d58bec5
to
32253f6
Compare
Still failing after latest push. |
@@ -0,0 +1,250 @@ | |||
# Copyright 2024 NXP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai is this functionality not provided elsewhere in ET already?
|
||
# We need to create custom model verifier with max_pool2d added as exception. | ||
# Otherwise, we get violation that this op is not part of ATen Core ops. | ||
edge_program._verifiers = [EXIREdgeDialectVerifier( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. For my own understanding, why is this the case? Seems like max_pool is in aten from the path for the op torch.ops.aten
model = Model.GetRootAs(flatbuffer, 0) | ||
assert model.SubgraphsLength() == 1, f'The model has `{model.SubgraphsLength()}` SubGraphs instead of `1`.' | ||
|
||
sub_graph = model.Subgraphs(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with graph breaks? Can there only be one subgraph for Neutron?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least add an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the workflow:
The Neutron Converter currently supports tflite format as input and returns the tflite as output. That is the returned tflite flatbuffers in general contains combination of Neutron Nodes (which are parts of the original compute graph to be computed on Neutron) and leftover tflite ops, which will be computed on CPU.
In case of ExecuTorch's NeutronBackend, we obviously do not have any TFLite runtime, so the partitioning logic is in the NeutronPartitioner. The Neutron Partitioner already identifies subgraphs that are supported on Neutron, and so the NeutronBackend invoking the Neutron Converter shall generate a microcode for whole such a subgraph. That is no tflite ops shall stay in the tflite graph passed to the Neutron Converter. If that happens, that is a RuntimeError, as we do not convert the leftovers tflite ops back to executorch ops. The Partitioner shall ensure that not happens.
In short - the return value from Neutron converter shall contain exactly 1 op/node, which is to be the "NeutronNode". This is checked on #44 and #47 and the node to be NeutronNode on line #76.
There can be multiple subraphs for Neutron, but this is controlled by the ExecuTorch, which for every identified partition creates a subraph - ExportedProgram, calls the NeutronBackend::preprocess() function, which converts this subgraph and respond with the payload (microcode) for Neutron NPU. We convert 1 subgraph at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part 1 / n
setup.py
Outdated
@@ -144,6 +145,10 @@ def openvino(cls) -> bool: | |||
def xnnpack(cls) -> bool: | |||
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_XNNPACK", default=True) | |||
|
|||
@classmethod | |||
def neutron(cls) -> bool: | |||
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=True) | |
return cls._is_cmake_arg_enabled("EXECUTORCH_BUILD_NEUTRON", default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change this in the future, let's start conservatively to no disrupt existing workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
install_executorch.py
Outdated
@@ -53,7 +53,7 @@ def clean(): | |||
|
|||
|
|||
# Please keep this insync with `ShouldBuild.pybindings` in setup.py. | |||
VALID_PYBINDS = ["coreml", "mps", "xnnpack", "training", "openvino"] | |||
VALID_PYBINDS = ["coreml", "mps", "xnnpack", "neutron", "training", "openvino"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broadly implies we can run the .PTE with neutron delegates in it on the dev machine like x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -0,0 +1,297 @@ | |||
# Copyright 2024 NXP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename nit
s/nxp_backend.py/neutron_backend.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If don't mind, let do the file renaming as we push all our changes from internal repository.
backends/nxp/nxp_backend.py
Outdated
extract_artifacts_from_neutron_node, | ||
NeutronNodeArtifacts, | ||
) | ||
from executorch.backends.xnnpack._passes import RemoveGetItemPass, XNNPACKPassManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fork the pass manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the RemoveGetItemPass
to backends/transforms
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
- Introduced NeutronPassManager.
- The RemoveGetItemPass moved to backend/transform in separate commit for better visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai Is there some specific reason PassManager
is built on top of GraphModule
and not ExportedProgram
? I see basically every backend has some custom managers that accept ExportedProgram
as input type and also produce ExportedProgram
as an output of run/__call__
function. I guess it is done that way because wrapping of modified graph_module
back into ExportedProgram
seems to be reponsibility of PassManager
itself and not PassManager
call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PassManager is FX concept, while ExportedProgram is an export concept. FX has more users than export. The ExportedProgram deals with GraphSignature etc. which is not in the scope for FX. So XNNPACKPassmanager is that wrapper. EdgeProgramManager.transform is kind of similar which is also built on top of _transform.
Args: | ||
config: Neutron accelerator configuration, e.g. "imxrt700" | ||
extra_flags: Extra flags for the Neutron compiler | ||
operators_not_to_delegate: List of operators that should not be delegated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete args docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
qdq_clusterer.tag_qdq_clusters(nodes) | ||
|
||
graph_module.recompile() | ||
target = self.delegation_spec[1][2].value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use CompileSpec.key : str
to make this more readable?
graph_module = exported_program.graph_module | ||
nodes = list(graph_module.graph.nodes) | ||
|
||
qdq_clusterer = QDQClusterRecognizer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
qdq_clusterer = QDQClusterRecognizer() | |
qdq_cluster_recognizer = QDQClusterRecognizer() |
backends/nxp/neutron_partitioner.py
Outdated
QUANTIZE_OPERATORS = [ | ||
exir_ops.edge.quantized_decomposed.quantize_per_channel.default, | ||
exir_ops.edge.quantized_decomposed.quantize_per_tensor.default, | ||
exir_ops.edge.quantized_decomposed.quantize_per_tensor.tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for dynamic (at runtime) quantization of the activation for conv, linear etc. Do we support that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
Ouh. we do not.
For our education, how this is defined? I see ops definition in kernels/quantized/quantized.yaml
There are out
variants and tensor
variants. In executorch.exir.dialects._ops
there are variants default
and tensor
. Is there some definition what is the difference btw. "out"/"default" and "tensor". For quantize ops is starts make sense - static quantization vs. dynamic quantization. But any general rule of thumb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the best way to check for the dynamic quant is presence of the choose_qparam
node to dynamically populate scale and zp. I am not sure if we are being consistent with .tensor
variant TBH.
model = Model.GetRootAs(flatbuffer, 0) | ||
assert model.SubgraphsLength() == 1, f'The model has `{model.SubgraphsLength()}` SubGraphs instead of `1`.' | ||
|
||
sub_graph = model.Subgraphs(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least add an assert?
opcode.BuiltinCode() == BuiltinOperator.CUSTOM | ||
and opcode.CustomCode() == b"NeutronGraph" | ||
): | ||
# Found the NeutronNode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this the only thing in the tflite flatbuffer of relevance?
Co-authored-by: Lukas Sztefek <[email protected]> Co-authored-by: Martin Pavella <[email protected]> Co-authored-by: Jiri Ocenasek <[email protected]> Co-authored-by: Roman Janik <[email protected]> Co-authored-by: Simon Strycek <[email protected]>
Multiple backends are using this pass - Arm, Neutron, XNNPACK.
32253f6
to
b102be2
Compare
Linting errors should be fixed now. |
Think you need just one more run of the linter, misplaced import |
Summary
Initial implementation for the NXP eIQ Neutron Backend for Neutron-N3-64 (i.MX RT700)
Test plan
Functionality tested by python unit tests:
cc @digantdesai @JakeStevens , @JakeStevens , @skywall